Articul8 AI · 1 month ago
Senior Site Reliability Engineer (SRE) - (Dublin, CA)
Articul8 AI is at the forefront of Generative AI innovation, delivering cutting-edge SaaS products that transform how businesses operate. We are seeking an experienced Site Reliability Engineer (SRE) to join our team and help ensure the reliability, performance, and scalability of our GenAI SaaS platform.
Artificial Intelligence (AI)Enterprise SoftwareGenerative AISoftware
Responsibilities
Architect and maintain scalable, highly available infrastructure for our GenAI platform
Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance
Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency
Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality
Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact
Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads
Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives
Optimize infrastructure for performance, scalability, and cost-effectiveness—especially for high-demand AI workloads
Implement and enforce security best practices across all systems and environments
Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge
Qualification
Required
Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
8+ years of experience in DevOps, SRE, or similar roles
Strong experience with cloud platforms (AWS, GCP, or Azure)
Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.)
Hands-on experience with infrastructure as code tools (Terraform, CloudFormation, etc.)
Solid background in containerization technologies (Docker, Kubernetes)
Proven experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)
Strong understanding of CI/CD pipelines and automation
Exceptional troubleshooting and problem-solving skills and ability to troubleshoot complex systems
Preferred
Experience supporting AI/ML systems in production
Knowledge of GPU infrastructure management and optimization
Familiarity with distributed systems and high-performance computing
Experience with database systems (SQL and NoSQL)
Certifications in cloud platforms (AWS, GCP, Azure)
Experience with chaos engineering and resilience testing
Knowledge of security best practices and compliance requirements
Company
Articul8 AI
Articul8 AI is a technology company whose products transform enterprise data and expertise into powerful engines of growth, value and impact.
H1B Sponsorship
Articul8 AI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
Funding
Current Stage
Growth StageTotal Funding
$75MKey Investors
Adara VenturesAmazon Web ServicesDigitalBridge
2026-01-07Series B· $35M
2025-11-12Non Equity Assistance
2024-01-03Series A· $40M
Recent News
2026-01-16
2026-01-11
Techcircle
2026-01-11
Company data provided by crunchbase