Machine Learning Infrastructure Engineer @ Arcee.ai | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
External
0
Machine Learning Infrastructure Engineer jobs in United States
200+ applicants
company-logo

Arcee.ai · 1 day ago

Machine Learning Infrastructure Engineer

ftfMaximize your interview chances
Artificial Intelligence (AI)Generative AI

Insider Connection @Arcee.ai

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Design and implement scalable, efficient, and reliable machine learning infrastructure (e.g., containerization, orchestration, and cloud services).
Develop and maintain infrastructure as code (IaC) using tools like Terraform, AWS CloudFormation, or Google Cloud Deployment Manager.
Design and implement model serving platforms (e.g., TensorFlow Serving, AWS SageMaker, or Azure Machine Learning) for efficient model deployment and management.
Develop and maintain automated model deployment pipelines using tools like Jenkins, GitLab CI/CD, or CircleCI.
Collaborate with data engineers to design and implement data pipelines that feed machine learning models.
Ensure data quality, integrity, and security throughout the data lifecycle.
Develop and implement monitoring and logging solutions (e.g., Prometheus, Grafana, or ELK Stack) to track model performance, latency, and system health.
Optimize infrastructure resources and model performance using techniques like hyperparameter tuning, model pruning, and knowledge distillation.
Work closely with data scientists, engineers, and researchers to identify infrastructure needs and develop solutions.
Communicate technical information effectively to both technical and non-technical stakeholders.
Stay current with industry trends, emerging technologies, and best practices in machine learning infrastructure.
Participate in conferences, meetups, and online forums to expand knowledge and network with peers.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

AWSAzureGCPKubernetesTerraformPythonCloudFormationTensorFlow ServingAWS SageMakerPyTorchJenkinsPrometheusGrafanaRESTful APIDeep learning frameworksC++Shell scriptingModel quantizationModel pruningML model lifecycle managementDistributed inferenceGPU accelerationMachine learning certificationsData privacy

Required

Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
3+ years of experience in machine learning infrastructure, DevOps, or a related field.
Experience with cloud providers (e.g., AWS, GCP, or Azure) and containerization (e.g., Docker).
Proficiency in programming languages like Python, Java, or C++.
Experience with machine learning frameworks like TensorFlow, PyTorch, or Scikit-learn.
Familiarity with infrastructure as code (IaC) tools like Terraform or CloudFormation.
Knowledge of container orchestration tools like Kubernetes or Docker Swarm.
Excellent communication, collaboration, and problem-solving skills.
Ability to work in a fast-paced environment and prioritize tasks effectively.

Preferred

Cloud provider certifications (e.g., AWS Certified DevOps Engineer or GCP Professional Cloud Developer).
Machine learning certifications (e.g., TensorFlow Certified Developer or PyTorch Certified Engineer).
Experience with model serving platforms like TensorFlow Serving or AWS SageMaker.
Automated model deployment pipelines using tools like Jenkins or GitLab CI/CD.
Monitoring and logging solutions like Prometheus or ELK Stack.
Model explainability and interpretability techniques.
Data privacy and security best practices.

Benefits

Health, dental, and vision insurance
401(k)
Opportunities for growth, training, and conference attendance
A dynamic, diverse team that values innovation and open communication

Company

Arcee.ai

twittertwitter
company-logo
Arcee.ai develops context-adapted LLMs through their domain-adapted language model system (DALM).

Funding

Current Stage
Early Stage
Total Funding
$29.5M
Key Investors
Emergence
2024-07-18Series A· $24M
2024-01-24Seed· $5.5M

Leadership Team

leader-logo
Mark McQuade
Co-founder & CEO
linkedin
leader-logo
Jacob Solawetz
Co-founder, CTO
linkedin
Company data provided by crunchbase
logo

Orion

Your AI Copilot