DYNE · 1 day ago
Senior JAVA SRE
DYNE is seeking a Senior Java Site Reliability Engineer (SRE) to design, build, and operate highly resilient, low-latency, enterprise-scale systems supporting core banking, payments, and trading platforms. This role requires deep expertise in Java microservices, Kubernetes, and AWS cloud infrastructure, with a focus on ensuring reliability, scalability, and production excellence.
Information Technology & Services
Responsibilities
Design, implement, and operate highly available, fault-tolerant, and scalable systems for mission-critical financial platforms
Lead SRE practices including SLIs, SLOs, error budgets, and reliability-driven engineering decisions
Provide L3/L4 production support, including incident management, root cause analysis (RCA), and post-incident remediation
Drive continuous improvement through blameless postmortems and operational excellence initiatives
Support and optimize Java-based microservices, including JVM internals, GC tuning, and performance optimization
Operate and scale workloads on Kubernetes (EKS) across multi-cluster environments
Implement and manage AWS services including EC2, EKS, IAM, VPC, RDS, DynamoDB, S3, and CloudWatch
Design and maintain zero-downtime deployment strategies and robust disaster recovery (DR) architectures
Build and manage infrastructure using Terraform and infrastructure-as-code best practices
Automate operational workflows using Python, Go, Bash, and cloud-native tooling
Architect and maintain enterprise-grade CI/CD pipelines using GitLab CI/CD, Jenkins, and Kubernetes-native integrations
Manage Kubernetes networking, storage, and ingress using Nginx Controller, Seesaw, and advanced networking patterns
Implement and operate service mesh solutions including Istio and Anthos Service Mesh
Design and manage Kubernetes storage solutions using Portworx
Support multi-cluster Kubernetes environments, including federation and cross-cluster communication
Implement monitoring, logging, and alerting using Prometheus, Datadog, Splunk, Kiali, and custom dashboards
Utilize eBPF for deep kernel-level observability, performance analysis, and system tuning
Optimize latency, throughput, and scalability under high-frequency transaction loads
Support real-time data platforms using Kafka, Kafka Streams, KSQLDB, and Spark Streaming
Ensure reliability and performance of streaming pipelines in high-volume, low-latency environments
Enforce banking-grade security controls, IAM policies, secrets management, and least-privilege access
Support platforms aligned with SOC 2, PCI-DSS, SOX, and internal banking security standards
Participate in regulatory audits, risk assessments, and compliance reviews
Participate in 24×7 on-call rotations, including nights and weekends, supporting U.S. time zones
Act as a senior escalation point during major incidents and platform outages
Qualification
Required
15+ Years of experience
Deep expertise across Java microservices, Kubernetes, AWS cloud infrastructure, and SRE best practices
Hands-on responsibility for reliability, scalability, and production excellence in high-transaction environments
Operate at L3/L4 production support level
Lead reliability engineering initiatives
Work closely with platform, application, and security teams to ensure zero-downtime, compliance-aligned operations
Design, implement, and operate highly available, fault-tolerant, and scalable systems for mission-critical financial platforms
Lead SRE practices including SLIs, SLOs, error budgets, and reliability-driven engineering decisions
Provide L3/L4 production support, including incident management, root cause analysis (RCA), and post-incident remediation
Drive continuous improvement through blameless postmortems and operational excellence initiatives
Support and optimize Java-based microservices, including JVM internals, GC tuning, and performance optimization
Operate and scale workloads on Kubernetes (EKS) across multi-cluster environments
Implement and manage AWS services including EC2, EKS, IAM, VPC, RDS, DynamoDB, S3, and CloudWatch
Design and maintain zero-downtime deployment strategies and robust disaster recovery (DR) architectures
Build and manage infrastructure using Terraform and infrastructure-as-code best practices
Automate operational workflows using Python, Go, Bash, and cloud-native tooling
Architect and maintain enterprise-grade CI/CD pipelines using GitLab CI/CD, Jenkins, and Kubernetes-native integrations
Manage Kubernetes networking, storage, and ingress using Nginx Controller, Seesaw, and advanced networking patterns
Implement and operate service mesh solutions including Istio and Anthos Service Mesh
Design and manage Kubernetes storage solutions using Portworx
Support multi-cluster Kubernetes environments, including federation and cross-cluster communication
Implement monitoring, logging, and alerting using Prometheus, Datadog, Splunk, Kiali, and custom dashboards
Utilize eBPF for deep kernel-level observability, performance analysis, and system tuning
Optimize latency, throughput, and scalability under high-frequency transaction loads
Support real-time data platforms using Kafka, Kafka Streams, KSQLDB, and Spark Streaming
Ensure reliability and performance of streaming pipelines in high-volume, low-latency environments
Enforce banking-grade security controls, IAM policies, secrets management, and least-privilege access
Support platforms aligned with SOC 2, PCI-DSS, SOX, and internal banking security standards
Participate in regulatory audits, risk assessments, and compliance reviews
Participate in 24×7 on-call rotations, including nights and weekends, supporting U.S. time zones
Act as a senior escalation point during major incidents and platform outages
Java: JVM internals, GC tuning, microservices architecture
Cloud: AWS (EKS, EC2, IAM, VPC, RDS, CloudWatch)
Containers Orchestration: Kubernetes (CKA/CKS-level depth), Docker
Infrastructure as Code: Terraform
CI/CD: GitLab CI/CD, Jenkins
Streaming Platforms: Kafka, KSQLDB, Kafka Streams, Spark Streaming
Service Mesh: Istio, Anthos Service Mesh
Observability: Prometheus, Datadog, Splunk, Kiali
OS Scripting: Linux/Unix, Bash
Programming: Python and/or Go
Virtualization: VMware
Networking Performance: Nginx Controller, Seesaw, eBPF
Experience supporting core banking systems, payment gateways, or trading platforms
Exposure to high-frequency, high-volume transaction environments
Proven experience with zero-downtime deployments, high availability, and disaster recovery
Strong understanding of regulatory audits and financial compliance controls
AWS Certified Solutions Architect – Professional or AWS DevOps Engineer – Professional
Certified Kubernetes Administrator (CKA) or Certified Kubernetes Security Specialist (CKS)
Company
DYNE
DYNE as name suggests and stands for Delivering Your Needs Efficiently DYNE an Information Technology and Services company providing IT Consulting, IT Resourcing, Business Analysis, Project and Program management, Test Automation, QA Testing, Data Center Maintenance, Bespoke Software development and Cloud Software services DYNE is a fast growing trusted Brand in IT world for delivering quality to clients and candidates with added value.
Funding
Current Stage
Growth StageCompany data provided by crunchbase