Tandym Group · 1 week ago
Datadog Principal Engineer (Remote)
Tandym Group is a recognized services organization in Virginia seeking an experienced Datadog Principal Engineer to lead and scale monitoring, analytics, and reliability capabilities. The role involves designing and architecting enterprise observability solutions using Datadog while collaborating closely with various teams to enhance observability in operations.
EmploymentRecruitingStaffing Agency
Responsibilities
Design, architect, and scale enterprise observability solutions using Datadog across applications, infrastructure, cloud, and security platforms
Own and drive enterprise-level observability strategy and architecture
Serve as a final decision-maker for observability standards, tooling approaches, and long-term platform direction
Architect dashboards, monitors, and alerting frameworks aligned to business, operational, and reliability requirements
Define and implement best practices for metrics, logs, traces, and anomaly detection
Lead deployment and configuration of Datadog agents, APIs, integrations, and automation across complex, multi-cloud environments
Integrate Datadog with CI/CD pipelines, logging platforms, and collaboration tools (e.g., GitLab, ServiceNow, Jira, Slack)
Identify observability gaps and drive improvements in signal quality, reliability, and incident response
Optimize Datadog usage and licensing costs while maintaining strong coverage and actionable insights
Partner closely with DevOps, SRE, Cloud, Application, and Security teams to embed observability into daily operations
Produce clear documentation and contribute to knowledge sharing across teams
Qualification
Required
8+ years of Engineering experience
Hands-on experience designing and owning Datadog observability solutions (not execution-only roles)
Proven experience as a technical decision-maker in a modern software development environment
Strong experience with: Datadog Monitoring, APM, Distributed Tracing, and alerting
Cloud platforms and DevOps / SRE practices
CI/CD integrations and automation
Scripting and configuration skills (Python, Bash, PowerShell, YAML, etc.)
Strong communication skills with the ability to collaborate across engineering, product, and business stakeholders
Demonstrated ability to articulate measurable impact and outcomes (e.g., MTTR reduction, reliability improvements, cost optimization)
Preferred
Datadog certifications (Fundamentals, APM, Distributed Tracing, or equivalent)
Experience optimizing observability cost and usage at enterprise scale
Experience mentoring teams and influencing technical standards
Background working in large, complex, or regulated enterprise environments
Company
Tandym Group
Tandym Group is a provider of full-service recruitment, temporary staffing, and workforce management solutions in the Northeast and Florida.
Funding
Current Stage
Growth StageTotal Funding
unknownKey Investors
Mill Rock CapitalNew Heritage Capital
2021-04-06Private Equity
2016-10-06Series Unknown
Recent News
2025-11-05
2025-10-29
Company data provided by crunchbase