Randstad Digital Americas · 17 hours ago
Site Reliability Engineer
Randstad Digital Americas is seeking a Site Reliability Engineer to manage and support highly distributed multi-tiered systems. The role involves performing root cause analysis, chaos testing, and automating day-to-day activities using various tools and programming languages.
Information Technology & Services
Responsibilities
Ability to triage, complete root cause analysis, and be decisive under pressure
Experience managing and interpreting large datasets using query languages and visualization tools
Proficient communication skills with an ability to reach both technical and non-technical audience
Proven experience performing chaos testing to build confidence in the system's capability to withstand turbulent conditions in production
Strong understanding in API testing tools (SoapUI, Postman)
Experience managing systems using infrastructure as code tools (IAM, ARM, Terraform, Chef)
Handle a huge fleet of on-prem servers (including security & patching oversight)
Handle hundreds of SSL certificates for all applications in scope
Use Ansible & Python for automating day-to-day activities, Web development with Django, JavaScript
Qualification
Required
Ability to triage, complete root cause analysis, and be decisive under pressure
Experience managing and interpreting large datasets using query languages and visualization tools
Proficient communication skills with an ability to reach both technical and non-technical audience
Proven experience performing chaos testing to build confidence in the system's capability to withstand turbulent conditions in production
Strong understanding in API testing tools (SoapUI, Postman)
Experience managing systems using infrastructure as code tools (IAM, ARM, Terraform, Chef)
Handle a huge fleet of on-prem servers (including security & patching oversight)
Handle hundreds of SSL certificates for all applications in scope
Use Ansible & Python for automating day-to-day activities, Web development with Django, JavaScript
Bachelor's degree or equivalent experience or higher in a technology related field (e.g. Engineering, Computer Science, etc.) required, Master's degree a plus
5-8+ years of hands-on experience deploying and/or supporting highly distributed multi-tiered systems at scale
Hands-on experience with Public Cloud environments, preferably AWS and Azure. Certifications a plus
Exposure to basic OS level scripting languages such as Korn/Bash/Jscript
Experience with container orchestration, preferably with Kubernetes
Experience operating and implementing distributed & highly concurrent service-based
Ability to solve application issues on Unix/Linux with J2EE, WebSphere, Tomcat and SQL
Familiarity with ITIL processes like Incident management, Change/Problem management
Balancing delivery with ad hoc workloads and re-evaluating priorities
Solid understanding of Cloud Computing and DevOps concepts including CI/CD pipelines
Hands on experience with one or more observability tools (Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, etc.)
Use Datadog, Catchpoint, Splunk & Grafana for Application Observability and monitoring of app & infrastructure
Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale
Proven experience in maintaining scalability and resiliency of complex environment
Proven experience in implementing advanced observability practices and techniques at scale
Provide enterprise Cloud and Platform Engineering support for production environments and ability to participate in on-call rotation to provide solutions
Experience in Cloud development (AWS and Azure) and migration skills; Experience with building and operating highly resilient platforms in public cloud environments
Benefits
Medical
Prescription
Dental
Vision
AD&D
Life insurance offerings
Short-term disability
401K plan
Company
Randstad Digital Americas
Randstad Digital is a trusted digital enablement partner that facilitates accelerated transformation for businesses by providing global talent, capacity, and solutions across specialized domains.