AMD · 2 weeks ago
AI Instinct System Management Architect
AMD is a company dedicated to building innovative products for next-generation computing experiences. The AI Instinct System Management Architect will lead the architecture for system management and observability across AMD’s AI datacenter platforms, focusing on developing scalable and secure infrastructure.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
Responsibilities
Architect Scalable System Management: Develop a unified architecture for rack-scale and pod-scale management, ensuring seamless integration from device-level firmware to orchestration layers
Industry Leadership: Benchmark against competitors, track emerging standards, and propose innovative features to position AMD as a leader in AI datacenter manageability
Integration & Interoperability: Ensure compatibility with industry standards (Redfish, DMTF profiles) and customer monitoring stacks for observability and analytics
Define and deliver manageability solutions (BMC/BSP, rack/pod controllers, APIs) ensuring coherent architecture from device to orchestration
Develop standards-based interfaces and telemetry frameworks (Redfish, DMTF) for compute, storage, networking, and accelerators at scale
Rack-Scale Lifecycle Management: Enable discovery, provisioning, firmware upgrades, and decommissioning workflows across racks and pods
Collaborate with customers and partners to align with DCIM/ITSM environments, validate designs, and lead proof-of-concepts
Produce architectural collateral and documentation, including reference designs, integration guides, and telemetry baselines for customer readiness
Influence product strategy with customer-backed roadmaps optimized for AI workloads
Qualification
Required
Expert background in systems or platform software architecture with focus on system management and server manageability
Deep expertise in BMC firmware stacks, telemetry, inventory, alerting, and management protocols
Strong knowledge of DMTF standards (MCTP, PLDM, SPDM, Redfish), platform security, and management networking
Experience with PCIe, CXL, NVMe interconnects and cluster schedulers (Kubernetes, Slurm)
Proven ability to combine technical leadership with customer engagement for scalable AI datacenter deployments
Preferred
Experience with GPU/accelerator platforms
Familiarity with telemetry stacks (Prometheus, Loki, ELK, OpenTelemetry)
Knowledge of datacenter infrastructure components (racks, PDUs, power/thermal systems, fabric networking)
Contributions to open standards or open-source projects related to manageability or observability
Strong programming skills in C/C++ and Python/Go; solid Linux systems experience
Benefits
AMD benefits at a glance.
Company
AMD
Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.
Funding
Current Stage
Public CompanyTotal Funding
unknownKey Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity
Recent News
GlobeNewswire
2026-01-09
Company data provided by crunchbase