d-Matrix · 3 weeks ago
AI Infrastructure Solution Architect, Principal
d-Matrix is focused on unleashing the potential of generative AI and is at the forefront of software and hardware innovation. They are seeking a Solution Architect to develop comprehensive reference solutions for scalable AI inference workloads, collaborating closely with customers and ecosystem partners.
Artificial Intelligence (AI)Cloud InfrastructureData CenterSemiconductor
Responsibilities
Develop end-to-end AI infrastructure reference solutions optimized for d-Matrix servers including compute, networking, storage, and orchestration layers, in collaboration with various internal teams
Create reference blueprints that integrate smoothly into cloud-native and on-prem environments
Develop infrastructure-as-code templates and examples using Ansible, Terraform, and Helm for provisioning d-Matrix-based nodes and clusters
Integrate with Kubernetes-based systems to enable model deployment, auto-scaling, and fault-tolerant execution
Design and deploy telemetry and monitoring frameworks to support real-time visibility into d-Matrix cluster health, job status, and system performance
Integrate with industry-standard observability stacks (e.g., Prometheus, Grafana, OpenTelemetry) for data collection, visualization, and alerting
Develop dashboards, health check systems, and metric pipelines that track performance, availability, and operational KPIs
Collaborate with performance and software teams to validate infrastructure using real-world workloads and benchmarks
Incorporate telemetry hooks for benchmark reporting and feedback-driven tuning
Create and publish detailed infrastructure deployment guides, monitoring configuration templates, and operational best practices
Collaborate with customers and OEM/ISV ecosystem, enable them to adopt and customize reference solutions to their specific datacenter environments and/or software stacks
Qualification
Required
Bachelor's or Master's degree in Computer Science, or related technical field
10+ years of experience in infrastructure solution architecture, systems management, DevOps, or platform engineering roles
Experience working with GPUs, custom AI accelerators or heterogeneous compute environments
Proven expertise in building, managing, and monitoring full-stack AI infrastructure at scale
Strong scripting/automation skills: Python, Bash, Ansible, Terraform, Helm, Docker/Kubernetes
Deep understanding of orchestration technologies (Kubernetes, Ray, KServe, etc.), containerization, server clusters, multi-tenant serving, etc
Experience with observability stacks (Prometheus, Grafana, OpenTelemetry, etc.)
Strong skills in scripting and automation (e.g., Python, Bash, Ansible, Terraform, Helm)
Strong system debugging and incident response skills
Outstanding collaboration and communication skills
Preferred
Familiarity with model serving and orchestration platforms (e.g., Triton Inference Server, Ray Serve, Kubeflow)
Company
d-Matrix
D-Matrix is a platform that enables data centers to handle large-scale generative AI inference with high throughput and low latency.
H1B Sponsorship
d-Matrix has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (20)
2024 (15)
2023 (8)
2022 (7)
Funding
Current Stage
Growth StageTotal Funding
$429MKey Investors
Temasek HoldingsTSVC
2025-11-12Series C· $275M
2023-09-06Series B· $110M
2022-04-20Series A· $44M
Recent News
2025-12-22
2025-12-17
Crunchbase News
2025-12-10
Company data provided by crunchbase