Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Trellix · 4 months ago

Site Reliability Engineer

Skyhigh Security is a leader in the security industry, dedicated to protecting the world’s data through innovative cloud solutions. The Site Reliability Engineer will be responsible for monitoring and maintaining operational issues in a high availability production environment, acting as a bridge between Operations, Engineering, and Product Management teams.

Cyber SecurityInformation TechnologyNetwork Security
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Perform Incident Management and Change Management to maintain the continuous availability of all Cloud Infrastructure services
Ensure all SRE and operating procedures are maintained and executed
Maintain a 24x7 production environment with a high level of service availability and perform quality reviews, manage operational issues
Perform root cause analysis for major incidents and drive the process by involving required stakeholders
Perform problem management by analyzing metrics, alarms and dashboards to troubleshoot problem areas, report issues to assist in performance tuning and fault finding
Implementation of proactive monitoring, alerting, trend analysis, and self-healing solutions
Explore and innovate new technologies, features, and tools to improve the platform and automate operational tasks using Bash, Python or any other programming language
Manage and maintain Runbooks and Standard Operating procedures
Manage, coordinate, and document all types of maintenance activities and outages
Perform patching and upgrades for vulnerability management
Work closely with the teams to initiate the development of new ideas into internal tools
Understand the existing architecture and work with various Engineering teams to develop and execute strategies to provide a high-quality production service
Capable of working a flexible work schedule in a 24 x 7 environment with rotational shifts

Qualification

Site Reliability EngineeringCloud TechnologiesMonitoring ToolsScripting/ProgrammingLinux AdministrationContainerizationNetwork AdministrationConfiguration ManagementDatabase ManagementAnalytical SkillsCommunication SkillsProblem-Solving Skills

Required

Bachelor's degree in computer science, electrical engineering or a related area, with 7+ years of SRE experience in a large enterprise organization
System admin experience on Linux environments
Experience with end-to-end monitoring setup for infra and applications
Experience with Prometheus, Grafana, ELK, Opensearch, Cloudwatch, PagerDuty and other monitoring tools
Solid experience with Cloud Technologies such as AWS and OCI
Good experience with containerized workloads tools like Kubernetes
Network knowledge (TCP/IP, UDP, DNS, Load balancing) and prior network administration experience is required
Experience with BGP, NAT, TCP/IP, iBGP, Proxies, Cross connects
Experience with L2/L3 switching, knowledge of Juniper and Cisco routing devices
Experience understanding and managing web servers (Apache, Tomcat, Nginx)
Ability to script/program with one or more high level languages, such as Python, Go, etc
Experience with any configuration management tools like Salt or Puppet or Ansible or similar
Experience with source control tools such as Github and SVN
Experience with deployment tools Jenkins, Harness etc
Experience with SQL and NoSQL databases like Redis, Crate, Elasticsearch
Experience in performing and writing Root Cause Analysis documents
Strong communication and analytical/problem-solving skills
Systematic approach and to drive problems to resolution
Capable of working a flexible work schedule in a 24 x 7 environment with rotational shifts

Preferred

Good to have experience/knowledge of GCP, Azure
Experience in Security domain will be added advantage
Experience with open-source technologies like Kafka, Hadoop, HBase, Zookeeper, Oozie will be an added advantage

Benefits

Retirement Plans
Medical, Dental and Vision Coverage
Paid Time Off
Paid Parental Leave
Support for Community Involvement

Company

Trellix

twittertwittertwitter
company-logo
Trellix is a software and services provider for easy-to-create Web sites for consumers and small businesses.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Vishal Rao
Chief Executive Officer
linkedin
leader-logo
Martin Holste
CTO, Cloud
linkedin
Company data provided by crunchbase