Site Reliability Engineer (SRE) @ BRAMKAS INC | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
Site Reliability Engineer (SRE) jobs in Virginia, United States
48 applicantsPosted by Agency
company-logo

BRAMKAS INC · 2 days ago

Site Reliability Engineer (SRE)

Wonder how qualified you are to the job?

ftfMaximize your interview chances
AnalyticsCloud Computing

Insider Connection @BRAMKAS INC

Discover valuable connections within the company who might provide insights and potential referrals, giving your job application an inside edge.

Responsibilities

Work with DevOps teams to Build, Release, Monitor and run the services to improve service reliability.
Write software to automate API-driven tasks at scale and contribute to the product codebase in Java, JS, React, Node, Go, and Python.
Write automation to reduce toil and eliminate manual tasks that are repeatable.
Work with Ansible, Puppet, Chef, Terraform, or another config management / orchestration suite, know where it's broken, work towards fixing them and explore new alternatives.
Maintain services once they are live by measuring and monitoring availability, latency, and overall system reliability.
Handle cross team performance issues from identification of the cause, determining the areas of improvement and driving those actions to closure.
Performance and maturity baselining of DevOps process, tools maturity & coverage, metrics, technology, and engineering practices.
Define, Measure and improve Reliability Metrics (SLO/SLI), Observability (Monitoring, Logging-Tracing solutions), Ops process (Incident, Problem Mgmt.) and streamline – automate release management.
Build dashboards to provide visibility into performance of the applications.
Understand the current process, system setup and propose the improvements needed in the processes, and technology so that the application exceeds the desired Service Level Objective.
Strong believer of automation to bring in sustained continuous improvement by automating Toil, Runbooks, improving ability of the applications to auto-heal leading to improved reliability.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

Development OperationsSRECodingPythonGolangJavaBashObservabilityChaos EngineeringAPMNew RelicAWSGoogle Cloud PlatformConfiguration ManagementAnsibleSaltStackTerraformCloudFormationPager DutySLIsSLOsIncident ManagementAgileLeanDevOpsInfrastructure ManagementService OwnershipStakeholder ManagementProblem-SolvingCommunication

Required

5 + years of Development and Operations experience in building and running applications in production that has uptime over 99%
3-5 years of experience as a SRE in handling applications that are web scale
Strong hands-on coding experience in one or more of programming languages such as Python, Golang, Java, Bash, etc.
Good understanding of Observability (monitoring, logging, tracing, metrics), Chaos engineering concepts
Proficiency in using Application Performance Monitoring (APM) tool New Relic for monitoring, logging, tracing
Expert level hands-on knowledge in public cloud platform AWS and/or Google Cloud Platform
Must have hands-on experience in using configuration management systems such as Ansible or SaltStack and infrastructure automation tools like Terraform or CloudFormation
Should have used alerting systems such as Pager Duty
Should have implemented solutions around Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for services
Should have supported Production Incidents (PIs) on critical applications of a company
Troubleshoot, debug, and diagnose operational issues and drive them to closure
Understanding of software delivery life cycles, particularly Agile/Lean & DevOps
Proven experience in handling large scale and growing infrastructure across Data Centers and heterogeneous Cloud platforms
Experience as a service owner in managing large – geographically diverse stakeholders
Ability to work with creative – fast growing engineering team and motivate them to deliver their best work
History of driving innovation
Bachelor’s/Master’s Degrees

Preferred

Professional level certificate on one of the public clouds is highly desirable
Familiarity with handling: Containerization – Kubernetes, Docker, Rancher, etc Kafka, Yarn, ElasticSearch etc. Source code management and Implementation of Security best practices
Networking knowledge
Contribution to open source community
Tech Stack - Python, Falcon, Elastic Search, MongoDB, AWS (SQS S3), Map Reduce
Understanding of software delivery life cycles, particularly Agile/Lean & DevOps

Company

BRAMKAS INC

twittertwitter
company-logo
BRAMKAS Inc.

Funding

Current Stage
Early Stage
Company data provided by crunchbase
logo

Orion

Your AI Copilot