Site Reliability Engineer, Cloud Platform @ Qualys | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
Site Reliability Engineer, Cloud Platform jobs in Raleigh, NC
116 applicants
company-logo

Qualys · 4 days ago

Site Reliability Engineer, Cloud Platform

Wonder how qualified you are to the job?

ftfMaximize your interview chances
Business Process Automation (BPA)Compliance

Insider Connection @Qualys

Discover valuable connections within the company who might provide insights and potential referrals, giving your job application an inside edge.

Responsibilities

Co-develop and participate in the full lifecycle development of cloud platform services from inception and design, deployment, operation and improvement by applying scientific principles.
Increase the effectiveness, reliability and performance of cloud platform technologies by identifying and measuring key indicators, making changes to the production systems in an automated way and evaluating the results.
Support cloud platform team before the technologies are pushed for production release through activities such as system design, capacity planning, automation of key deployments, engaging in building a strategy for production monitoring and alerting and participate in testing/verification process.
Ensure that the cloud platform technologies are maintained properly by measuring and monitoring availability, latency, performance and system health.
Advice the cloud platform team to improve the reliability of the systems in production and scale them based on need.
Participate in the development process by supporting new features, services, releases and hold an ownership mindset for the cloud platform technologies.
Develop tools and automate the process for achieving large scale provisioning and deployment of cloud platform technologies.
Participate in on-call rotation for cloud platform technologies. At times of incidents, lead incident response and be part of writing detailed postmortem analysis reports which are brutally honest with no-blame.
Propose improvements and drive efficiencies in systems and processes related to capacity planning, configuration management, scaling services, performance tuning, monitoring, alerting and root cause analysis.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

JavaPythonGoBash scriptingSqlNosqlSystems programmingPerformance analysisJvm conceptsSecurity best practicesIncident managementPost-mortemsProblem-solvingIndependenceElasticsearchKafkaRDBMSOracleNoSQLCassandraRedisMemcachedDockerKubernetesGraphiteGrafanaPrometheusHashicorpConsulVault

Required

4+ years of relevant experience in running distributed systems at scale in production.
Expertise in one of the programming language: Java, Python or Go.
Proficient in writing bash scripts
Good understanding of SQL and NoSQL systems
Good understanding of systems programming (network stack, file system, OS services)
Understanding of network elements such as firewalls, load balancers, DNS, NAT, TLS/SSL, VLANs etc.
Skilled in identifying performance bottlenecks, identifying anomalous system behavior, and determining the root cause of incidents.
Knowledge of JVM concepts like garbage collection, heap, stack, profiling, class loading, etc.
Knowledge of best practices related to security, performance, high-availability, and disaster recovery.
Demonstrate a proven record of handling production issues, planning escalation procedures, conducting post-mortems, impact analysis, risk assessments and other related procedures.
Able to drive results and set priorities independently
BS/MS degree in Computer Science, Applied Math or related field

Preferred

Experience with managing large scale deployments of search engines like Elasticsearch
Experience with managing large scale deployments of message-oriented middleware such as Kafka
Experience with managing large scale deployments of RDBMS systems such as oracle
Experience with managing large scale deployments of NoSQL databases such as Cassandra
Experience with managing large scale deployments of In-memory caching using Redis, Memcached, etc.
Experience with container and orchestration technologies such as Docker, Kubernetes etc
Experience with monitoring tools such as Graphite, Grafana and Prometheus
Experience with Hashicorp technologies such as Consul, Vault, Terraform and Vagrant
Experience with configuration management tools such as Chef, Puppet or Ansible
In-depth experience with continuous integration and continuous deployment pipelines
Exposure to Maven, Ant or Gradle for builds

Company

Qualys is the pioneer and leading provider of information security and compliance cloud solutions.

Funding

Current Stage
Public Company
Total Funding
$34M
2012-10-05IPO· nasdaq:QLYS
2004-11-22Series C· $5.6M
2003-11-12Series B· Undisclosed

Leadership Team

leader-logo
Philippe courtot
CEO
leader-logo
Sumedh Thakar
President & CEO
linkedin
Company data provided by crunchbase
logo

Orion

Your AI Copilot