Senior Site Reliability Engineer (SRE) - (Dublin, CA) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Articul8 AI · 1 month ago

Senior Site Reliability Engineer (SRE) - (Dublin, CA)

Articul8 AI is at the forefront of Generative AI innovation, delivering cutting-edge SaaS products that transform how businesses operate. We are seeking an experienced Site Reliability Engineer (SRE) to join our team and help ensure the reliability, performance, and scalability of our GenAI SaaS platform.

Artificial Intelligence (AI)Enterprise SoftwareGenerative AISoftware
check
H1B Sponsor Likelynote

Responsibilities

Architect and maintain scalable, highly available infrastructure for our GenAI platform
Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance
Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency
Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality
Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact
Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads
Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives
Optimize infrastructure for performance, scalability, and cost-effectiveness—especially for high-demand AI workloads
Implement and enforce security best practices across all systems and environments
Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge

Qualification

Cloud platforms AWSCloud platforms GCPCloud platforms AzureInfrastructure as code TerraformInfrastructure as code CloudFormationContainerization DockerContainerization KubernetesMonitoring tools PrometheusMonitoring tools GrafanaMonitoring tools ELKProgramming/scripting PythonProgramming/scripting GoProgramming/scripting BashCI/CD pipelinesTroubleshooting skillsAI/ML systems supportGPU infrastructure managementDistributed systems knowledgeDatabase systems SQLDatabase systems NoSQLCloud certificationsChaos engineeringSecurity best practices

Required

Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
8+ years of experience in DevOps, SRE, or similar roles
Strong experience with cloud platforms (AWS, GCP, or Azure)
Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.)
Hands-on experience with infrastructure as code tools (Terraform, CloudFormation, etc.)
Solid background in containerization technologies (Docker, Kubernetes)
Proven experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)
Strong understanding of CI/CD pipelines and automation
Exceptional troubleshooting and problem-solving skills and ability to troubleshoot complex systems

Preferred

Experience supporting AI/ML systems in production
Knowledge of GPU infrastructure management and optimization
Familiarity with distributed systems and high-performance computing
Experience with database systems (SQL and NoSQL)
Certifications in cloud platforms (AWS, GCP, Azure)
Experience with chaos engineering and resilience testing
Knowledge of security best practices and compliance requirements

Company

Articul8 AI

twittertwitter
company-logo
Articul8 AI is a technology company whose products transform enterprise data and expertise into powerful engines of growth, value and impact.

H1B Sponsorship

Articul8 AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)

Funding

Current Stage
Growth Stage
Total Funding
$75M
Key Investors
Adara VenturesAmazon Web ServicesDigitalBridge
2026-01-07Series B· $35M
2025-11-12Non Equity Assistance
2024-01-03Series A· $40M

Leadership Team

leader-logo
Arun Subramaniyan
Founder & CEO
linkedin
Company data provided by crunchbase