This job has closed.

Granicus India · 2 months ago

Site Reliability Engineer 3

United States

Full-time

Remote

Senior Level

5+ years exp

Granicus is a company focused on transforming the Govtech industry by connecting governments with their constituents. They are seeking a Senior Site Reliability Engineer to ensure the reliability, scalability, and performance of their services, while leading efforts in building robust infrastructure and automating processes.

Computer Software

Responsibilities

Provide production support on a shift according to the team on-call roster

Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support. For example, a client may request to correct some data on the database server which cannot be done through the web interface

Work on SREs backlog items

Continuously monitor the health and performance of our services, systems, and infrastructure. Respond to alerts and incidents promptly to ensure high availability

Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention

Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence

Participate in designing and implementing system improvements to enhance reliability, scalability, and performance

Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes

Create and maintain documentation for processes, procedures, and troubleshooting guides to ensure knowledge sharing within the team

Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth

Implement and adhere to security best practices to protect our systems and data

Qualification

Linux/Unix systemsCloud servicesScripting languagesMonitoring toolsContainerizationDatabase managementConfiguration managementCI/CD pipelinesCertificationsProblem-SolvingCommunicationLeadership

Required

Good understanding of Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud)

Experience with scripting languages such as Python, Bash, or Ruby

Bachelor's or Master's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience

5+ years of experience in site reliability engineering, system administration, or a similar role, with a proven track record of managing large-scale, high-availability systems

Expertise in Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud)

Proficiency in scripting languages (Python, Bash, Ruby) and programming languages (Go, Java, C++)

Advanced knowledge of monitoring and logging tools (Prometheus, Grafana, Splunk), configuration management (Ansible, Chef, Puppet), and CI/CD pipelines

Strong analytical and problem-solving skills with the ability to diagnose and resolve complex issues efficiently

Excellent verbal and written communication skills, with the ability to convey complex technical concepts to non-technical stakeholders

Demonstrated ability to lead and mentor a team, drive projects to completion, and manage cross-functional initiatives

5+ years experience in a SRE, DevOps or Software Engineering role

In-depth understanding of containerization (Docker, Kubernetes) and infrastructure as code (Terraform, CloudFormation)

Experience with database management (SQL, NoSQL), load balancing, and distributed systems

Preferred

Relevant certifications such as AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or similar

Company

Granicus India

1001-5000 employees

https://granicus.com/india/

Funding

Current Stage

Late Stage

Company data provided by crunchbase