SIGN IN
Senior AWS Cloud Site Reliability Engineer (SRE) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Peraton · 17 hours ago

Senior AWS Cloud Site Reliability Engineer (SRE)

Peraton is a next-generation national security company that drives missions of consequence. They are seeking a Senior AWS Cloud Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of their cloud infrastructure on Amazon Web Services (AWS). This role involves collaborating with cross-functional teams to automate infrastructure, monitor systems, and improve release processes.
RoboticsInformation Technology
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Infrastructure Automation: Design, implement, and manage infrastructure as code (IaC) solutions using tools like AWS CloudFormation, Terraform or Helm Charts to automate continuous database deployment and scaling processes. Collaborate with development teams to integrate continuous deployment practices and ensure the reliability of applications and databases
Monitoring and Alerting: Implement robust monitoring and alerting systems to proactively identify and address potential issues before they impact system performance. Analyze system metrics, logs, and alerts to troubleshoot and resolve issues promptly
Performance Optimization: Conduct performance analysis and optimization of AWS infrastructure components to enhance system efficiency and reduce latency. Identify and implement improvements to enhance system reliability and resilience
Incident Response: Participate in on-call rotations to respond to and resolve incidents promptly. Conduct post-incident reviews to identify root causes and implement preventive measures
Security and Compliance: Work closely with security teams to implement and enforce best practices for securing AWS environments. Ensure compliance with industry standards and regulations related to cloud infrastructure
Communication: Facilitate clear communication across teams, providing updates on release status, known issues, and any potential impact on stakeholders. Coordinate communication of release schedules and changes to all relevant parties
Release Planning and Coordination: Collaborate with development, QA, and operations teams to plan and coordinate database schema releases. Define release scope, schedule, and dependencies to ensure timely and smooth deployments. Create and submit change records as required for process and audit compliance. Participation in Technical Change Advisory and Review boards as required
Release Automation: Develop and maintain automated deployment pipelines using industry-standard tools such as GitLab CI/CD, Liquibase, or similar. Automate and streamline release processes to improve efficiency and reduce manual errors
Continuous Improvement: Proactively identify areas for process improvement within the release management lifecycle. Implement feedback loops to capture lessons learned from each release and apply improvements iteratively. Stay up to date with industry best practices, emerging technologies, and trends related database automation
Quality Assurance: Collaborate with QA teams to establish and execute release validation procedures. Ensure releases are thoroughly tested and meet quality standards before deployment. Drive continuous improvement by analyzing release management trends, identifying recurring issues, and working with teams to implement solutions

Qualification

AWS servicesInfrastructure as CodeRelational databasesCI/CD toolsProgramming languagesContainerization toolsMonitoring toolsAgile methodologiesProblem-solving skillsCommunication skillsCollaboration skillsAttention to detail

Required

Bachelor's Degree and 8 years of experience or 12 years of experience and a HS Degree/Diploma
Proven experience as a Site Reliability Engineer or similar role with a strong emphasis with relational databases
In-depth knowledge of AWS services like RDS and DynamoDB and expertise in managing cloud infrastructure
Advanced level programming and/or scripting in 3 or more of the following languages: Python, Java, Chef, Helm, Playwright, Bash, JavaScript, Terraform
Strong understanding of DevOps principles and continuous integration/continuous deployment (CI/CD) pipelines
Proficiency in CI/CD tools such as GitLab CI/CD, Liquibase, or others
Familiarity with infrastructure as code (IaC) tools like CloudFormation, Terraform, Helm Charts, or similar technologies
Hands-on experience with version control systems (GitLab, GitHub, AWS CodeCommit) and branching strategies
Experience with containerization and orchestration tools (e.g., Amazon Elastic Compute Service (ECS), Amazon Elastic Kubernetes Service (EKS), Docker, Kubernetes)
Familiarity with monitoring tools (e.g., CloudWatch, Prometheus, Grafana, Datadog) and log analysis
Attention to detail, with a focus on maintaining high-quality software releases
Solid understanding of Agile methodologies and their application in release management
Excellent problem-solving and troubleshooting skills
Strong communication and collaboration skills
Must be a US Citizen
Must be able to obtain and maintain the required agency clearance (6C Public Trust)

Preferred

Relevant certifications in DevOps or related fields are a plus
High Risk Public Trust or Secret Clearance preferred
3 or more years in SRE or Platform Engineering group for high availability/critical platforms/applications
2 or more years managing relational databases

Benefits

Medical
Dental
Vision
Life
Health savings account
Short/long term disability
EAP
Parental leave
401(k)
Paid time off (PTO) for vacation
Company paid holidays

Company

Peraton Fearlessly solving the toughest national security challenges.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Thomas Terjesen
Chief Information Officer
linkedin
Company data provided by crunchbase