Dev Ops Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Berkeley Lab · 4 months ago

Dev Ops Engineer

Berkeley Lab is seeking a Dev Ops Engineer to join its NERSC Division. In this role, you will help architect, deploy, configure, and operate large-scale high-performance computing systems, collaborating with teams to develop innovative solutions that enhance scientific discovery.

Research
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Participate in team-oriented agile development and management process for HPC systems using languages like Go, Rust and Python
Develop and maintain APIs to securely expose system functionality to end users
Automate common tasks and processes to continuously improve HPC systems management
Analyze system issues and develop solutions to improve end-user experience
Be part of a team that installs, tests, maintains and manages HPC systems
Assist with technology evaluation of systems and system architecture
Work with vendors to prioritize, develop and enhance their technologies in order to better meet the needs of our users
Be part of team providing on-call rotation for 24x7 HPC system support
Work on and resolve complex issues where analysis of situations or data requires an in-depth evaluation of variable factors
Exercise judgment in selecting methods, techniques and evaluation criteria for obtaining results
Determine methods and procedures on new assignments and may coordinate activities of other personnel
Network with key contacts outside own area of expertise
Provide leadership and technical guidance to group members, and members of other groups at NERSC
Recommend and lead implementation and deployment efforts for system improvements that enhance reliability, stability, usability, performance and security
Identify and evaluate emerging HPC technologies and explore new features that would create new capabilities and enhance system performance and usability
Participate in working/user/advocacy groups and represent NERSC and its interests to the broader HPC community
Work at a higher level of independence while carrying out work assignment
Work on and solve significant and issues where analysis of situations or data requires an in-depth evaluation of variable factors

Qualification

Linux systems programmingHigh-performance computing (HPC)Python programmingKubernetes operationsInfrastructure as codeGitlabGithub CIAgile processCommunicationTeam collaborationProblem-solving

Required

Typically requires a minimum of 8 years of related experience with a Bachelor's degree; or 6 years and a Master's degree; or equivalent experience
Minimum of 2 years of experience with systems programming in Linux environment or management of large-scale Linux-based systems in a high-performance computing, cloud computing, or hyper-scale environment
Experience with C, bourne shell, and Python3 programming languages
Typically requires a minimum of 12 years of related experience with a Bachelor's degree; or 8 years and a Master's degree; or equivalent experience
Demonstrated excellent systems programming skills and strong knowledge of Linux internals
Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active and respectful intellectual environment
Excellent oral and written communication skills
Ability to resolve complex issues in creative and effective ways and derive technical solutions in a collaborative environment to meet end user requirements or needs
Ability to network and collaborate with key contacts outside own area of expertise
Ability to work on and resolve significant and unique issues where analysis of situations or data requires an evaluation of intangibles
Ability to exercise independent judgment in methods, techniques and evaluation criteria for obtaining results

Preferred

Development of kubernetes microservices using technologies like helm or loftsman for deployment
Operations of kubernetes, etcd
Infrastructure as code solutions like argo, terraform, ansible, puppet, salt
Rust or Go programming language
Gitlab or Github Continuous Integration and Project Management
Agile process, scrum
Linux kernel interfaces, cgroups, ebpf
Installation, configuration, monitoring, and tuning of workload management systems such as Slurm, PBSPro, or GridEngine
Monitoring solutions such grafana, prometheus, ldms
HPC systems administration
HPC applications analysis, MPI
Specialized networking (Infiniband, Slingshot or other high-speed networks)
Lustre, SpectrumScale (GPFS) or other parallel file systems

Company

Berkeley Lab

twittertwittertwitter
company-logo
Berkeley Lab is a national laboratory that creates advanced new tools for scientific discovery.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Mary Barnum, MBA
Business Manager, COO Office
linkedin
leader-logo
Rebecca Rishell
Deputy Chief Operating Officer
linkedin
Company data provided by crunchbase