DH ACCOUNTING SERVICES LTD ยท 6 hours ago
Site Reliability Engineer
Maximize your interview chances
Accounting
Insider Connection @DH ACCOUNTING SERVICES LTD
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Ensuring system reliability: SREs are responsible for ensuring the reliability, scalability, and performance of the organization's systems. They monitor the systems to ensure that they are running smoothly and intervene when issues arise.
Automating processes: SREs develop and maintain automation tools and processes to reduce manual work and improve system efficiency.
Managing infrastructure: SREs manage the infrastructure, including servers, networks, databases, and storage systems. They ensure that the infrastructure is reliable, scalable, and secure.
Incident management: SREs are responsible for responding to and resolving incidents, such as system outages or performance issues.
Capacity planning: SREs analyze system usage patterns and develop capacity plans to ensure that the organization's systems can handle expected traffic and usage.
Continuous improvement: SREs work to continually improve system reliability, scalability, and performance. They analyze data to identify areas for improvement and implement changes to address any issues.
Collaboration: SREs work closely with development teams, operations teams, and other stakeholders to ensure that the organization's systems are reliable and meet business needs.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Understanding of most Azure resources and related services
Understanding of core AWS resources.
Kubernetes
Infrastructure-as-Code using ARM Templates and Bicep
Grafana
Elasticsearch
C# .Net
Python
Networking
DevOps / Github
Azure CLI
SQL
Mongo
Bicep
Bash/Shell Scripting
Preferred
Understanding of relational and NoSQL databases, including replication, scaling, and backup strategies.
Knowledge of designing and managing virtual networks, subnets, and network security groups.
Configuring and managing load balancing and traffic routing.
Expertise in infrastructure-as-code (IaC) for provisioning resources. Bicep/ARM a plus.
Ability to manage, and scale containerized applications using Kubernetes.
Have experience monitoring the performance, availability, and health of applications and infrastructure.
Collect and analyze log data from various resources for troubleshooting and insights.
Integrating and managing application performance monitoring for distributed applications.
Familiarity with Prometheus and Grafana for enhanced monitoring, alerting and observability.
Experience in infrastructure as code (IaC) for provisioning and managing resources.
Understanding of secret management, key encryption, and certificate management.
Understand cloud infrastructure and application performance, including optimizing SQL queries, storage, and compute resources.
Experience in backup strategies and disaster recovery solutions.
Ability to diagnose incidents and identify root causes for system failures or performance degradation.
Ability to respond to critical incidents and participate in on-call rotations to ensure system availability.
Automate resource management and day-to-day tasks using scripting languages like Python, PowerShell and Azure CLI.
Write infrastructure-as-code to deploy and manage resources consistently and reliably.
Expertise in managing containers, and container orchestration with Kubernetes.
Benefits
Generous health dental and vision benefits
401K plan vests immediately
Company
DH ACCOUNTING SERVICES LTD
Funding
Current Stage
Early StageCompany data provided by crunchbase