AI Instinct System Management Architect jobs in United States
cer-icon
Apply on Employer Site
company-logo

AMD · 2 weeks ago

AI Instinct System Management Architect

AMD is a company dedicated to building innovative products for next-generation computing experiences. The AI Instinct System Management Architect will lead the architecture for system management and observability across AMD’s AI datacenter platforms, focusing on developing scalable and secure infrastructure.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
check
Growth Opportunities
badNo H1Bnote

Responsibilities

Architect Scalable System Management: Develop a unified architecture for rack-scale and pod-scale management, ensuring seamless integration from device-level firmware to orchestration layers
Industry Leadership: Benchmark against competitors, track emerging standards, and propose innovative features to position AMD as a leader in AI datacenter manageability
Integration & Interoperability: Ensure compatibility with industry standards (Redfish, DMTF profiles) and customer monitoring stacks for observability and analytics
Define and deliver manageability solutions (BMC/BSP, rack/pod controllers, APIs) ensuring coherent architecture from device to orchestration
Develop standards-based interfaces and telemetry frameworks (Redfish, DMTF) for compute, storage, networking, and accelerators at scale
Rack-Scale Lifecycle Management: Enable discovery, provisioning, firmware upgrades, and decommissioning workflows across racks and pods
Collaborate with customers and partners to align with DCIM/ITSM environments, validate designs, and lead proof-of-concepts
Produce architectural collateral and documentation, including reference designs, integration guides, and telemetry baselines for customer readiness
Influence product strategy with customer-backed roadmaps optimized for AI workloads

Qualification

System management architectureBMC firmware stacksDMTF standardsPCIeCXLNVMeProgramming in C/C++Programming in Python/GoTelemetry stacksDatacenter infrastructure knowledgeCustomer engagementTechnical leadership

Required

Expert background in systems or platform software architecture with focus on system management and server manageability
Deep expertise in BMC firmware stacks, telemetry, inventory, alerting, and management protocols
Strong knowledge of DMTF standards (MCTP, PLDM, SPDM, Redfish), platform security, and management networking
Experience with PCIe, CXL, NVMe interconnects and cluster schedulers (Kubernetes, Slurm)
Proven ability to combine technical leadership with customer engagement for scalable AI datacenter deployments

Preferred

Experience with GPU/accelerator platforms
Familiarity with telemetry stacks (Prometheus, Loki, ELK, OpenTelemetry)
Knowledge of datacenter infrastructure components (racks, PDUs, power/thermal systems, fabric networking)
Contributions to open standards or open-source projects related to manageability or observability
Strong programming skills in C/C++ and Python/Go; solid Linux systems experience

Benefits

AMD benefits at a glance.

Company

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

Funding

Current Stage
Public Company
Total Funding
unknown
Key Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity

Leadership Team

leader-logo
Lisa Su
Chair & CEO
linkedin
leader-logo
Mark Papermaster
CTO and EVP
linkedin
Company data provided by crunchbase