Sr. Site Reliability Engineer (SRE) (Remote) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Talent 360 ME · 1 month ago

Sr. Site Reliability Engineer (SRE) (Remote)

Talent 360 ME is a rapidly growing B2B Fin-Tech company transforming expense management for businesses in Saudi Arabia. They are seeking a Senior Site Reliability Engineer (SRE) to take ownership of the reliability, performance, and scalability of their production systems, focusing on designing, automating, and operating mission-critical environments.

ConsultingHuman ResourcesStaffing AgencyTraining

Responsibilities

Maintain and evolve multi-region cloud infrastructure using Terraform-based Infrastructure as Code (IaC)
Operate and optimize Kubernetes (OKE) clusters running microservices, data pipelines, and workflow orchestration
Manage SQL Server backup/restore pipelines, DR testing, and performance optimization
Ensure high availability for .NET and Python applications hosted behind load balancers and WAF
Design and maintain cross-network connectivity (DRGs, LPGs, VCNs, subnets, and NSGs)
Build and maintain a centralized orchestration platform integrated with alerting and notification systems
Develop self-healing, monitoring, and auto-remediation scripts for infrastructure and databases
Implement logging, metrics, and tracing pipelines
Automate recurring operational tasks using Python, Bash, and PowerShell to reduce manual effort and improve reliability
Manage GitHub Actions and Octopus Deploy pipelines for backend and data services
Apply strong security principles — least privilege, network segmentation, secure credentials, and encrypted communications
Promote GitOps and Infrastructure-as-Code practices to ensure repeatable and traceable deployments
Collaborate with developers to embed reliability and resilience into every release
Lead incident response, run blameless post-mortems, and turn findings into lasting improvements
Partner closely with engineering teams to drive design and code-level reliability improvements
Conduct capacity planning, cost optimization, and system tuning for performance and scalability
Mentor engineers in automation, observability, and root-cause analysis best practices

Qualification

Site Reliability EngineeringDevOpsInfrastructure EngineeringOracle Cloud InfrastructureInfrastructure as CodeActive DirectoryLinux AdministrationKubernetesCI/CD PipelinesMicrosoft SQL ServerPostgreSQLNetworking FundamentalsPythonBashPowerShellTroubleshootingCommunication

Required

5+ years of experience in Site Reliability, DevOps, or Infrastructure Engineering
Experience working in regulated environments (FinTech, banking, compliance-driven systems) is a MUST
Experience managing Active Directory / Domain Controllers (authentication (SSO/MFA), policies, integrations, troubleshooting) is a MUST
Strong hands-on experience with Oracle Cloud Infrastructure (OCI) is a MUST
Strong Linux system administration skills (performance tuning, storage, networking, systems, troubleshooting)
Solid networking fundamentals (TCP/IP, DNS, routing, load balancing)
Hands-on experience with databases, specifically: Microsoft SQL Server & PostgreSQL
Experience administering and troubleshooting Kubernetes workloads (pods, jobs, networking, secrets)
Strong experience with Infrastructure as Code, preferably Terraform
Experience building and maintaining CI/CD pipelines and automation workflows
Comfortable supporting production systems, including incident response and troubleshooting under pressure

Company

Talent 360 ME

twittertwitter
company-logo
Talent 360 is a people management solutions company operating across the MENA region, with offices in Egypt and Saudi Arabia.

Funding

Current Stage
Growth Stage
Total Funding
unknown
Key Investors
C.Star Venture Studio
2025-01-21Series Unknown
Company data provided by crunchbase