GPU Pipeline Microarchitect & RTL Designer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Oxmiq Labs · 2 days ago

GPU Pipeline Microarchitect & RTL Designer

Oxmiq Labs is focused on high-throughput GPU pipeline blocks, and they are seeking a GPU Pipeline Microarchitect & RTL Designer to own microarchitecture and RTL design. The role involves defining pipeline stages, collaborating with architecture teams, and driving performance, power, and area optimizations.

Computer Software

Responsibilities

Microarchitecture: Define pipeline stages, flow control, queues/buffers, and interfaces; write concise design specs and lead reviews
RTL Design & PPA: Implement clean, synthesizable SystemVerilog; drive performance/power/area optimizations (datapaths, arbitration, backpressure, gating)
Architecture collaboration: Work day-to-day with the architecture team to refine requirements, align on performance targets, and iterate on uArch choices with data
Verification Partnership: Build unit tests, create coverage plans, and author SVA; collaborate with UVM/formal to close corner cases
Quality & Sign-off: Run lint/CDC/RDC; support synthesis/STA and timing convergence; engage with PD/DFT for constraints and test
Bring-up & Debug: Support emulation/FPGA and silicon; instrument counters, analyze traces, and root-cause issues end-to-end
Communication & teamwork: Communicate trade-offs clearly across architecture, software, and PD; mentor peers and contribute to cross-IP integration

Qualification

RTL design (SystemVerilog)PPA optimizationGPU architecture understandingUnit testingCoverage-based verificationShader Core designFloating-Point Unit designInstruction Scheduler designL1/L2 Cache DesignTensor Core DesignCommunicationTeam player

Required

5+ years industry experience on desktop, mobile, or data center GPUs with real, shipped project ownership
Proficient in RTL design (SystemVerilog) and PPA optimization across performance, power, and area
Team player with strong understanding of overall GPU architecture and micro-architecture (SIMT/SIMD execution, scheduling and flow control, memory hierarchy)
Hands-on first: Able to build unit tests, drive coverage-based verification (functional/code), and write robust SVA
Depth in at least one of the following domains: Shader Core (execution pipelines, hazards, replay), Floating-Point Unit (IEEE-754, exceptions, denormals), Instruction Scheduler (warp/wavefront issuing, fairness, QoS), Job Scheduler / Command Submission, L1/L2 Cache Design (coherency, miss handling, prefetch), Command Processor (front-end, MMIO, context management), Tensor Core Design (matrix/tensor datapaths, mixed precision), Tensor DMA (high-BW engines, tiling, compression)

Preferred

Experience with ray tracing blocks, texture/sampler, ROP/blend, or MMU/TLB
Performance modeling, perf counter design, and trace analysis
EDA fluency: VCS/Questa, Verdi, Jasper/IFV, DC/Genus, PrimeTime/Tempus; emulation (Palladium/Veloce) or FPGA protos
Collaboration with compiler/LLVM and driver/runtime teams

Company

Oxmiq Labs

twittertwitter
company-logo
OXMIQ delivers a comprehensive GPU hardware IP stack scalable from edge AI devices.

Funding

Current Stage
Early Stage
Total Funding
$20M
Key Investors
MediaTek
2025-08-04Seed· $20M
Company data provided by crunchbase