Innova Solutions · 21 hours ago
Site Reliability Engineer - Production Support
Innova Solutions is a global technology and business transformation solutions provider, and they are seeking a Site Reliability Engineer to provide end-to-end support of production applications. The role involves ensuring stability, reliability, and performance of applications while collaborating with various teams to implement fixes and enhancements.
Artificial Intelligence (AI)Business Information SystemsCloud Data ServicesConsultingCyber SecurityInformation TechnologyInfrastructureIT ManagementMachine LearningSoftware
Responsibilities
Provide end-to-end support of production applications, ensuring their stability, reliability, and performance
Conduct in-depth problem analysis of application, troubleshoot system errors, and performance issues
Perform proactive application stability analysis to investigate performance concerns, system errors and improvement opportunities
Take ownership of problem root cause analysis and implement appropriate remediation
Lead chronic issue investigations to minimize business impact and maintain system health
Drive proactive monitoring review and implementation strategies
Collaborate with Engineering, Application Development, and Infrastructure teams to implement break fixes, code updates, configuration changes, and production enhancements
Handle application management, business continuity, server patching coordination, vulnerabilities remediation, Splunk monitor setup
Lead production incident triaging calls
Provide production on-call support, including weekend rotation on a round-robin basis
Execute disaster recovery procedures and strategies
Respond to and resolve production tickets promptly to meet SLA requirements
Qualification
Required
Provide end-to-end support of production applications, ensuring their stability, reliability, and performance
Conduct in-depth problem analysis of application, troubleshoot system errors, and performance issues
Perform proactive application stability analysis to investigate performance concerns, system errors and improvement opportunities
Take ownership of problem root cause analysis and implement appropriate remediation
Lead chronic issue investigations to minimize business impact and maintain system health
Drive proactive monitoring review and implementation strategies
Collaborate with Engineering, Application Development, and Infrastructure teams to implement break fixes, code updates, configuration changes, and production enhancements
Handle application management, business continuity, server patching coordination, vulnerabilities remediation, Splunk monitor setup
Lead production incident triaging calls
Provide production on-call support, including weekend rotation on a round-robin basis
Execute disaster recovery procedures and strategies
Respond to and resolve production tickets promptly to meet SLA requirements
Expertise in Unix systems
Strong experience on OpenShift containers, Apache Kafka
Strong understanding of CICD pipeline
Proficiency in scripting languages such as Python or Bash for automation
Experience with monitoring tools like Dynatrace and Splunk to maintain platform reliability
Demonstrated ability to contribute to automation efforts for improved operational efficiency
Bachelors degree in computer science, Information technology, or a related field
Proven experience in a production support or similar role, preferably in a CICD Platform environment
Availability for on-call support, including weekend rotations, on a round-robin basis
7+ years of experience in production operations
Familiarity with database platforms such as Oracle or PostgreSQL
Need someone who has experience in end-to-end production support and automation using tools specifically - Red Hat Ansible Automation Platform for configuration management, Cloudbees Jenkins and XLRelease for CI/CD orchestration, and Bitbucket for version control
Hands-on experience with specific JFROG Artifactory for artifact repository management, including vulnerability scanning and compliance enforcement is a must
The environment is built on Unix/Linux systems, demanding strong command-line expertise, patching, server ownership, and scripting in Python or Bash
Monitoring and observability are driven by Splunk and Dynatrace to ensure platform reliability
OpenShift containers are needed and they are using it for deployment and Apache Kafka for distributed messaging
Database - Oracle and PostgreSQL databases, F5 Load Balancer, GTM, and high availability/disaster recovery architectures is essential
Benefits
Medical & pharmacy coverage
Dental/vision insurance
401(k)
Health saving account (HSA)
Flexible spending account (FSA)
Life Insurance
Pet Insurance
Short term and Long term Disability
Accident & Critical illness coverage
Pre-paid legal & ID theft protection
Sick time
Other types of paid leaves (as required by law)
Employee Assistance Program (EAP)
Company
Innova Solutions
Innova Solutions is an IT company that specializes in data analytics, cloud computing, cyber security, and machine learning.
H1B Sponsorship
Innova Solutions has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (452)
2024 (399)
2023 (290)
2022 (78)
2021 (36)
2020 (98)
Funding
Current Stage
Late StageTotal Funding
unknown2012-01-01Private Equity
Recent News
2025-12-15
EIN Presswire
2025-08-01
Company data provided by crunchbase