Site Reliability Engineer (SRE) - Hardware Specialist jobs in United States
cer-icon
Apply on Employer Site
company-logo

ArrowCore Group ยท 1 day ago

Site Reliability Engineer (SRE) - Hardware Specialist

ArrowCore Group is focused on enhancing hardware reliability within datacenter operations. As an SRE - Hardware Specialist, you will analyze firmware and hardware specifications, manage vendor relations, and proactively resolve hardware issues to ensure optimal performance and reliability.

ConsultingInformation TechnologyManagement Consulting
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Analyze firmware packages and hardware specifications for upcoming releases to ensure compatibility, performance, and reliability in our datacenter environment
Investigate and diagnose hardware failures, including "grey failures" (ambiguous or intermittent issues), proving them as true hardware defects through rigorous testing and data analysis
Manage vendor relationships, including initiating RMA (Return Merchandise Authorization) claims, negotiating beyond standard processes when necessary, and holding vendors accountable for resolutions
Collaborate with Datacenter Operations Technicians to troubleshoot, repair, and optimize hardware systems in real-time
Research and evaluate next-generation hardware technologies that are not yet released, providing insights and recommendations to inform our infrastructure roadmap
Develop and implement monitoring tools, scripts, and processes to detect hardware anomalies early and minimize downtime
Document failure modes, RMA outcomes, and hardware evaluations to build a knowledge base for the team
Participate in on-call rotations and incident response for hardware-related issues in the Memphis datacenter

Qualification

Hardware reliability engineeringFirmware analysisVendor negotiationsDiagnostic toolsScripting languagesDatacenter hardware knowledgeFailure analysisEmerging technologiesCertifications in hardware engineeringProblem-solving skillsCollaborative skills

Required

Bachelor's degree in Systems Engineering, Electrical Engineering, Computer Science, or a related field (or equivalent experience)
5+ years of experience in hardware reliability engineering, preferably in high-performance computing or datacenter environments
Proven expertise in firmware analysis, hardware specifications review, and release validation
Strong experience with RMA processes, including filing claims, vendor negotiations, and pushing for resolutions outside standard protocols
Demonstrated ability to diagnose and prove complex hardware failures, including grey or intermittent issues, using tools like oscilloscopes, logic analyzers, or diagnostic software
Familiarity with datacenter hardware components (e.g., servers, GPUs, networking equipment) and emerging technologies
Proficiency in scripting languages (e.g., Python, Bash) for automation and analysis
Excellent problem-solving skills with a data-driven approach to reliability engineering
Ability to work collaboratively with cross-functional teams, including operations technicians

Preferred

Experience in AI/ML infrastructure or supercomputing environments
Knowledge of vendor ecosystems (e.g., NVIDIA, Dell, HP, Supermicro) and supply chain management
Certifications in hardware engineering or reliability (e.g., CRE, CompTIA Server+)
Prior work in a fast-paced startup or tech companies

Company

ArrowCore Group

twittertwittertwitter
company-logo
ArrowCore Group is a consulting firm that offers management consulting, technology innovation and business process management.

H1B Sponsorship

ArrowCore Group has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2022 (1)

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Brent Bassett
CEO
linkedin
leader-logo
Rohit Pandey
Managing Director, COO
linkedin

Recent News

Company data provided by crunchbase