Ad Hoc LLC · 1 day ago
Staff Software Engineer - Full Stack / SRE (Remote)
Ad Hoc LLC is a technology company that empowers organizations to deliver scalable, impactful digital services. The Staff Software Engineer - Full Stack / SRE will lead and monitor project delivery, improve software engineering processes, and contribute to the reliability and performance of the va.gov platform, ultimately enhancing the experience of Veterans accessing critical services.
ComputerData ManagementSoftware
Responsibilities
Plans and executes on roadmaps for new projects without explicit guidance and direction from technical supervisors
Actively participates in conversations and planning sessions with partners and key stakeholders
Periodically travels to work with and present to clients, partners, and stakeholders
Elaborates on and evolves complex and ambiguous products to uncover constraints and new opportunities
Reduces ambiguity in the systems they work with, including adding documentation, refactoring, and automated testing
Effectively communicates on existing systems, design decisions, past performance, and a major history of the projects that they’ve been part of for bid-writing, tech demos, and other potentially client-facing communications
Participates in technical depth interviews with new candidates
Presents on technical topics effectively, articulating implementation complexity and other costs to inform business decisions
Troubleshoot and Resolve Production Issues: Diagnose and fix performance bottlenecks, errors, and other issues within the va.gov application (primarily a Ruby on Rails monolith, including Sidekiq background jobs, but familiarity with similar frameworks is valuable)
Observability & Monitoring: Utilize DataDog (and potentially Dynatrace) to monitor application performance, identify anomalies, and proactively address potential problems. Develop and maintain relevant dashboards and alerts
Incident Response and On-Call Rotation ("The Watch"): Participate in our on-call rotation approximately once per month. Unlike traditional pager-driven on-call, "The Watch" involves reviewing the previous day's alerts and ensuring no silent failures occurred (such as background jobs exhausting without an alternate submission path). During your on-call week, expect to work 2-4 hours each day on the weekend to maintain system reliability
Code Contributions: Write and review code to improve observability and fix bugs (Ruby on Rails), implement improvements, and maintain internal tools (JavaScript/SvelteKit, and Python)
Consulting & Collaboration: Work closely with other engineering teams to provide guidance on best practices for observability, reliability, and performance. Communicate technical issues clearly to both technical and non-technical audiences
Process Improvement: Identify and implement improvements to our monitoring, alerting, and incident response processes. Contribute to documentation and runbooks
Maintain Internal Tools: Contribute to the development and maintenance of a small SvelteKit application used for tracking team metrics and success
Qualification
Required
Bachelor's Degree and 9+ years of relevant experience
5+ years of experience as a Software Engineer or Site Reliability Engineer
3+ years of experience with backend web application development in a production environment. Strong preference for Ruby on Rails experience, but candidates with demonstrable experience in other dynamic languages (e.g., Python/Django/Flask, Node.js/Express, PHP/Laravel) or compiled languages with web frameworks (e.g., Java/Spring, C#/.NET) will be considered
Experience with Sidekiq or other background job processing framework. If not Sidekiq, experience must be with a comparable system in their chosen language/framework (e.g., Celery for Python)
Proven experience with application performance monitoring (APM) tools, specifically DataDog and/or Dynatrace. Ability to interpret metrics and identify root causes of performance issues
Demonstrated experience in incident response and troubleshooting complex production issues
Experience with at least one modern JavaScript framework (React, Angular, Vue, Svelte, etc.)
Excellent communication, collaboration, and consulting skills
Ability to work effectively in a fast-paced, dynamic environment
Experience working within an Agile environment
Preferred
Experience with vets-api
Prior experience working within the VA/OCTO environment or any large government software deployment that integrates with multiple legacy services
Experience with Python for scripting, API interactions, and ETL/data engineering tasks
General understanding of DevOps concepts (containerization, virtualization, networking)
Familiarity with GitHub Actions
Experience with the U.S. Web Design System (USWDS)
Benefits
Company-subsidized health, dental, and vision insurance
Flexible PTO
401K with employer match
Paid parental leave after one year of service
Employee Assistance Program