This job has closed.

griddable.io · 11 hours ago

Software Engineer (Tooling - Principal Level)

San Francisco, CA

Full-time

Onsite

Lead/Staff

10+ years exp

Griddable.io is seeking a Software Engineer in the SWEET Team at MuleSoft, focused on architecting and scaling infrastructure tools for distributed systems. The role involves writing production-grade code, enhancing system reliability, and developing observability and automation solutions.

AnalyticsBig DataCloud Data ServicesData IntegrationInformation TechnologySaaSSoftware

No H1B

Security Clearance Required

U.S. Citizen Only

Responsibilities

Design and develop systems, libraries, and tools that strengthen the resiliency and reliability of distributed services running on the MuleSoft Anypoint Platform

Develop and extend monitoring, logging, and alerting capabilities using industry-standard observability platforms (e.g., metrics, tracing, and log aggregation tools) to ensure issues are detected and diagnosed before they impact customers

Write production-grade code in Python, Go, or similar languages to automate operational tasks, scale deployment pipelines, and implement self-healing systems

Participate in on-call rotations, drive root cause analysis, and deliver software-based solutions that prevent recurrence and reduce meantime to recovery (MTTR)

Build internal platforms, shared APIs, and systems that enhance developer velocity while improving overall system resilience and operability

Optimize and evolve our CI/CD pipelines using Jenkins, Spinnaker, and infrastructure-as-code tools such as Terraform and Kubernetes to enable safe and frequent delivery

Develop and maintain automated solutions to meet FedRAMP, Protected B, and other regulatory requirements—integrating security and compliance directly into deployment workflows

Work closely with product engineers, platform teams, and security stakeholders to influence architectural decisions and bake reliability into all layers of the stack

Create and maintain high-quality documentation for systems, processes, and playbooks to promote operational excellence and team scalability

Qualification

PythonGoTerraformKubernetesAWSJavaCI/CDDistributed SystemsObservabilityBashReliability EngineeringComplianceDocumentationCollaboration

Required

10+ years of experience in Software Engineering, with a particular focus on developing production-quality, maintainable, and testable code for infrastructure and platform automation

Proven proficiency in coding with Java, Python, Go, Bash

Hands-on experience with infrastructure as code, CI/CD pipelines, and deployment automation using tools like Terraform, Jenkins, and Spinnaker

Proven experience architecting, developing, and operating systems in cloud-native environments (AWS) and managing containerized workloads with Kubernetes

Strong understanding of observability engineering, including instrumentation, metrics, logging, and distributed tracing—experience with OpenTelemetry, Grafana, Splunk, Sumo Logic, or similar platforms

Solid knowledge of distributed systems, network protocols (TCP/IP, DNS, HTTP, TLS), and API design standards (REST, RAML, OAS)

Demonstrated ability to diagnose complex system issues, design for fault tolerance and high availability, and continuously improve reliability through software

Familiarity with compliance-bound environments, including FedRAMP, Protected B, or similar, and experience incorporating security and compliance into engineering workflows

A passion for engineering reliability through software—you drive automation, eliminate toil, and foster a culture of operational excellence

A related technical degree required

This candidate must be a U.S. citizen (U.S. born or naturalized) operating on U.S. Soil who does not hold dual citizenship with the ability to meet customer and government screening standards applicable to this role, including a Criminal Justice Information Services screening with fingerprint scan

You agree to complete a Minimum Background Investigation (MBI) for a Moderate Public Trust position with the U.S. federal government and gain other clearances as deemed appropriate for the role

Preferred

Experience with chaos engineering, fault injection, or reliability gamedays to proactively validate system resilience and recovery readiness

Background in platform-as-a-service (PaaS), internal developer tooling, or building self-service infrastructure that accelerates engineering productivity

Prior experience operating in hybrid or multi-cloud environments, with a focus on portability, automation, and infrastructure standardization

Company

griddable.io

Griddable.io is a San Jose, CA based SaaS startup that closed Series A funding in 2017 from August Capital, Artiman Ventures, and Carsten Thoma, founding CEO of Hybris (acquired by SAP).

Founded in 2016

San Jose, California, USA

11-50 employees

https://griddable.io

Funding

Current Stage

Early Stage

Total Funding

$8M

2019-01-28Acquired

2018-02-28Series A· $8M

Leadership Team

Burton Hipp

VP of Engineering/Founder

Company data provided by crunchbase