Engineering Manager, Observability (TLM) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Anyscale · 3 days ago

Engineering Manager, Observability (TLM)

Anyscale is on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. The Engineering Manager for Observability will lead a team focused on building user-facing features for the Anyscale AI platform, ensuring robust monitoring tools that enhance the development lifecycle for AI applications.

Artificial Intelligence (AI)Developer PlatformInformation TechnologyMachine LearningOpen Source
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Interacting with users, understanding their requirements, designing and implementing features, and finally maintaining and improving these features over time
The Ray Dashboard observability tool which gives users insight into their Ray application including what code is running in which machine, how much data is being moved between various machines, and the hardware utilization of each machine
Library-specific observability tools like the Ray Train dashboard or Ray Serve dashboard which accelerates our users ability to develop distributed training or model serving applications
Unified log viewer, a tool that ingests logs across a ray cluster and presents the ability to query those logs in meaningful ways, such as by function name, log level, timestamp, or machine
Anomaly detection. The ability for the Anyscale platform to automatically detect performance bottlenecks or bugs in our users workloads and suggest or automatically fix these issues
Work with a team of leading distributed systems and machine learning experts
Communicate your work to a broader audience through talks, tutorials, and blog posts
Help us to build and shape a world class company

Qualification

Backend developmentPythonAI conceptsObservability toolsDistributed systemsArchitecture designProblem-solvingCollaborative mindset

Required

Proficiency in backend or full stack development, including experience with web API frameworks and databases
Proficiency in Python or an ability to quickly learn new programming languages
Good understanding of AI and machine learning concepts
Experience with observability tools and monitoring solutions (e.g., Datadog, Splunk, AWS CloudWatch)
Familiarity with Ray or similar distributed systems frameworks
Solid background in debugging, architecture design, and coding
Excellent problem-solving skills and a collaborative mindset
Passion for building tools that enhance user experience and optimize workflows

Benefits

Stock Options
Healthcare plans, with premiums covered by Anyscale at 99% for both employees and dependents
401k Retirement Plan
Education & Wellbeing Stipend
Paid Parental Leave
Fertility Benefits
Paid Time Off
Commute reimbursement
100% of in-office meals covered

Company

Anyscale

twittertwittertwitter
company-logo
Anyscale accelerates the development and productionization of any AI app, on any cloud, at any scale.

H1B Sponsorship

Anyscale has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (33)
2024 (14)
2023 (10)
2022 (10)
2021 (4)
2020 (1)

Funding

Current Stage
Growth Stage
Total Funding
$259M
Key Investors
New Enterprise AssociatesAndreessen Horowitz
2022-08-23Series C· $99M
2021-12-07Series C· $100M
2020-10-21Series B· $40M

Leadership Team

leader-logo
Keerti Melkote
Chief Executive Officer
linkedin
leader-logo
Ion Stoica
Co-Founder & Executive Chairman
linkedin
Company data provided by crunchbase