Site Reliability Engineer @ Team Remotely Inc | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
Site Reliability Engineer jobs in Wilmington, DE
Be an early applicantLess than 25 applicantsPosted by Agency
expire-info-iconThis job has closed.
company-logo

Team Remotely Inc ยท 3 days ago

Site Reliability Engineer

Wonder how qualified you are to the job?

ftfMaximize your interview chances
Staffing and Recruiting

Insider Connection @Team Remotely Inc

Discover valuable connections within the company who might provide insights and potential referrals, giving your job application an inside edge.

Responsibilities

Apply SRE principles to maintain the reliability, availability, and performance of software systems.
Automate deployment processes, configuration management, and CI/CD pipelines to streamline software development and delivery.
Planned and assisted with the migration of Windows and Linux-based machines to containerized machines.
Plan and Assist with the overall Disaster Recovery (DR) of the infrastructure and operations (InfraOps).
Manage and maintain software infrastructure, ensuring proper configuration, security, and scalability.
Perform system administration tasks, monitor system performance, troubleshoot issues, and apply necessary fixes.
Act as a versatile problem solver, filling gaps in team knowledge and expertise to ensure smooth and efficient software operations.
Facilitate smooth team and project transitions, providing guidance, training, and support for development teams to manage their infrastructure independently.
Develop a reliability rating system to assess team and project performance, collecting and analyzing metrics to evaluate adherence to best practices.
Respond quickly and effectively to critical incidents, conducting post-incident reviews to identify root causes and implement preventive measures.
Develop and maintain automation tools and scripts to improve operational efficiency.
Identify performance bottlenecks and implement optimizations to enhance system response times and resource utilization.
Stay up to date with the latest industry trends, technologies, and best practices related to SRE, DevOps, and infrastructure management.
Collaborate effectively with cross-functional teams and communicate technical concepts and recommendations clearly to both technical and non-technical stakeholders.
Implement a reliability-based release management process, allowing teams with higher reliability scores to perform quick and frequent releases.
Proactively identify potential issues and implement preventive measures to reduce incidents and outages.
Implement observability practices to detect abnormal behaviors in the software and collect information for effective problem resolution.
Set and monitor critical metrics to gain insights into system reliability, including latency, traffic, errors, and saturation levels.
Establish Service-Level Objectives (SLOs) and measure Service-Level Indicators (SLIs) to assess the quality-of-service delivery and reliability.
Planned, participated, and managed on-call rotations to ensure prompt response to reported software issues.
Utilize incident response tools to categorize the severity of reported cases and handle them promptly.
Implement configuration management tools to automate software workflows and enhance team productivity.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

Site Reliability EngineeringTroubleshootingAutomationDeployment ProcessesConfiguration ManagementCI/CD PipelinesMigrationDisaster RecoveryInfrastructure ManagementSystem AdministrationMonitoringScriptingPerformance OptimizationDevOpsRelease ManagementIncident ResponseMetrics MonitoringService-Level ObjectivesOn-Call RotationsConfiguration Management ToolsProblem-SolvingCommunicationCollaborationObservability PracticesCloud Infrastructure DesignIncident Response DesignAutomation Tools DevelopmentSystem Performance AnalysisResource OptimizationService Quality Assessment

Required

1 year experience as a Site Reliability Engineer
Experience troubleshooting and resolving technical issues
Experience with automation of deployment processes, configuration management, and CI/CD pipelines
Experience with migration of Windows and Linux-based machines to containerized machines
Experience with Disaster Recovery (DR) of infrastructure and operations (InfraOps)
Experience in managing and maintaining software infrastructure with proper configuration, security, and scalability
Experience in system administration, monitoring system performance, troubleshooting issues, and applying fixes
Experience in developing and maintaining automation tools and scripts for operational efficiency
Knowledge of performance optimization and system response time enhancements
Knowledge of industry trends, technologies, and best practices related to SRE, DevOps, and infrastructure management
Ability to collaborate effectively with cross-functional teams and communicate technical concepts to both technical and non-technical stakeholders
Experience in implementing reliability-based release management processes
Experience in incident response, post-incident reviews, and preventive measures implementation
Experience in setting and monitoring critical metrics for system reliability
Experience in establishing Service-Level Objectives (SLOs) and measuring Service-Level Indicators (SLIs)
Experience in managing on-call rotations and utilizing incident response tools
Experience in implementing configuration management tools for software workflows

Preferred

Experience in implementing observability practices for software issue detection and resolution
Experience in designing and maintaining reliable and scalable cloud infrastructure
Experience in designing incident response procedures and post-incident review processes
Experience in developing automation tools to improve team productivity
Experience in analyzing system performance metrics and optimizing resources
Experience in defining SLOs and SLIs to assess service quality and reliability
Experience in configuring and maintaining software workflows using configuration management tools

Company

Team Remotely Inc

twitter
company-logo
Looking for a job at Team Remotely? Visit teamremotely.com & apply! Redefine Your Hiring Strategy.

Funding

Current Stage
Early Stage
Company data provided by crunchbase
logo

Orion

Your AI Copilot