Blu Omega · 1 week ago
SWE Site Reliability Engineer
Wonder how qualified you are to the job?
Cloud InfrastructureConsulting
Insider Connection @Blu Omega
Responsibilities
Design, write and deliver software and automation to dramatically improve the availability, scalability, latency, and efficiency of infrastructure
Improve system design and architecture to ensure high stability and performance of the services across global multi-DC
Manage operations of data service, realtime/batch data pipelines, such as SLA management, system deployment, performance tuning on-call and trouble shooting
Perform lifecycle management of production systems including change management, service deployment, operations and emergency response.
Provide strong support during big events to ensure the system is capable to consume large volume of Internet traffic.
Managing infrastructure services, responsible for including but not limited to deployment, operation and troubleshooting
Work with team to establish service level objectives and monitor to ensure the objectives are met
Continually improve cloud operations automation and tooling to monitor and maintain enterprise cloud-based infrastructure
Execute automation for known cloud-operations tasks, and create new automation for new situations or issues you encounter; automate everything
Facilitate blame-free root cause analysis meetings in the event of a production-systems incident so that the team can learn from mistakes and improve our systems and run books
Be Vigilant about security and adhere to best practices to secure our cloud infrastructure and real-time platform
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Ability to code in Python
Linux Admin (System Administration & Network Configuration)
Debugging & Troubleshooting (Application and Infrastructure) production performance issues
Knowledge of MQ (MessageQueue – i.e., Kafka, RabbitMQ)
Kubernetes Administration
CICD Tooling & DevOps Automation
Preferred
Shell Scripting
Knowledge of Containers
Have exposure to distributed systems, e.g., consul, zookeeper, mongodb, etc.
Knowledge of Saltstack
Monitoring tools usage: Grafana, Prometheus, etc.
Mandarin (strong preference, but not a requirement)
Benefits
Health Insurance
401K w/ match
Paid Time Off