Senior Site Reliability Engineer (AWS, AI/ML, & APM) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Granicus · 4 days ago

Senior Site Reliability Engineer (AWS, AI/ML, & APM)

Granicus is a company focused on transforming the Govtech industry by connecting governments with their constituents. They are seeking a Senior Site Reliability Engineer to ensure the reliability, scalability, and performance of their services while leading efforts in building robust infrastructure and automating processes.

Cloud ComputingCollaborationEnterprise SoftwareGovTechSoftwareVideo Streaming
check
H1B Sponsor Likelynote

Responsibilities

On-call Production Support: Provide production support on a shift according to the team on-call roster
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support. For example, a client may request to correct some data on the database server which cannot be done through the web interface
Work on SREs backlog items
Monitor and Maintain Systems: Continuously monitor the health and performance of our services, systems, and infrastructure. Respond to alerts and incidents promptly to ensure high availability
Automate Processes: Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention
Incident Management: Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence
System Improvements: Participate in designing and implementing system improvements to enhance reliability, scalability, and performance
Collaboration: Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes
Documentation: Create and maintain documentation for processes, procedures, and troubleshooting guides to ensure knowledge sharing within the team
Capacity Planning: Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth
Security: Implement and adhere to security best practices to protect our systems and data

Qualification

Site Reliability EngineeringAWSAI/ML InfrastructureLinux/Unix SystemsScripting LanguagesConfiguration ManagementELK StackAI/ML ToolchainsIncident ManagementCapacity PlanningSecurity Best PracticesRoot Cause AnalysisAutomationProgramming LanguagesCertificationsDocumentationCollaboration

Required

5+ years in site reliability engineering, system administration, or a similar role, with a proven track record of managing large-scale, high-availability systems
Expertise in Linux/Unix systems, and cloud platforms (AWS, Azure, or Google Cloud)
Strong proficiency in scripting languages (Python, Bash, Ruby) and programming languages (Go, Java, C++)
Familiarity with AI/ML operations, including model lifecycle management, vector databases, and inference performance tuning
Experience with the ELK Stack (Elasticsearch, Logstash, Kibana) for centralized logging, monitoring, and observability
Experience with configuration management tools (Ansible, Chef, Puppet)
Exposure to AI/ML toolchains, including AWS Bedrock, SageMaker, and LLMOps frameworks
Responsible for Granicus information security by appropriately preserving the Confidentiality, Integrity, and Availability (CIA) of Granicus information assets in accordance with the company's information security program
Responsible for ensuring the data privacy of our employees and customers, their data, as well as taking all required privacy training in a timely manner, in accordance with company policies

Preferred

Experience supporting AI/ML infrastructure, including model deployment, inference optimization, and integration with services like AWS Bedrock is highly desirable
Certifications: Relevant certifications such as AWS Certified DevOps Engineer, AWS Certified Machine Learning – Specialty, Google Cloud Professional DevOps Engineer, or similar are a plus

Benefits

Flexible Time Off – Take the time you need to rest, recharge, and live your life.
Company-Wide Wellbeing Days – Paid days off to unplug and focus on your mental health.
Work From Home Reimbursement – Support a productive home office environment.
Multiple Health Plan Options – Including a 100% employer-paid plan.
Employer HSA Contributions – When enrolled in a High-Deductible Health Plan.
Fitness Reimbursement Program – Stay active, your way.
On-Demand Mental Health Support – Access to Headspace and other wellness tools.
Paid Parental Leave – For both birthing and non-birthing parents.
Traditional & Roth 401(k) – With a generous company match.
Life & AD&D Insurance – 100% employer-paid coverage for peace of mind.
Online Learning Platforms – Fuel your professional development.
Competitive Salary & Bonuses – Your contributions are valued and rewarded.

Company

Granicus

company-logo
Granicus provides technology that empowers government organizations to create better lives for the people they serve.

H1B Sponsorship

Granicus has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (3)
2023 (2)
2022 (3)
2020 (3)

Funding

Current Stage
Late Stage
Total Funding
$10.3M
Key Investors
JMI Equity
2020-12-17Private Equity
2016-08-19Acquired
2014-09-01Private Equity

Leadership Team

leader-logo
Mark Hynes
CEO
linkedin
leader-logo
Jordan Copland
Chief Financial Officer
linkedin
Company data provided by crunchbase