200+ applicants

Company

Arcee.ai · 1 day ago

Machine Learning Infrastructure Engineer

United States

Full-time

Remote

Mid Level

3+ years exp

Maximize your interview chances

Artificial Intelligence (AI)Generative AI

Insider Connection @Arcee.ai

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Design and implement scalable, efficient, and reliable machine learning infrastructure (e.g., containerization, orchestration, and cloud services).

Develop and maintain infrastructure as code (IaC) using tools like Terraform, AWS CloudFormation, or Google Cloud Deployment Manager.

Design and implement model serving platforms (e.g., TensorFlow Serving, AWS SageMaker, or Azure Machine Learning) for efficient model deployment and management.

Develop and maintain automated model deployment pipelines using tools like Jenkins, GitLab CI/CD, or CircleCI.

Collaborate with data engineers to design and implement data pipelines that feed machine learning models.

Ensure data quality, integrity, and security throughout the data lifecycle.

Develop and implement monitoring and logging solutions (e.g., Prometheus, Grafana, or ELK Stack) to track model performance, latency, and system health.

Optimize infrastructure resources and model performance using techniques like hyperparameter tuning, model pruning, and knowledge distillation.

Work closely with data scientists, engineers, and researchers to identify infrastructure needs and develop solutions.

Communicate technical information effectively to both technical and non-technical stakeholders.

Stay current with industry trends, emerging technologies, and best practices in machine learning infrastructure.

Participate in conferences, meetups, and online forums to expand knowledge and network with peers.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

AWSAzureGCPKubernetesTerraformPythonCloudFormationTensorFlow ServingAWS SageMakerPyTorchJenkinsPrometheusGrafanaRESTful APIDeep learning frameworksC++Shell scriptingModel quantizationModel pruningML model lifecycle managementDistributed inferenceGPU accelerationMachine learning certificationsData privacy

Required

Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

3+ years of experience in machine learning infrastructure, DevOps, or a related field.

Experience with cloud providers (e.g., AWS, GCP, or Azure) and containerization (e.g., Docker).

Proficiency in programming languages like Python, Java, or C++.

Experience with machine learning frameworks like TensorFlow, PyTorch, or Scikit-learn.

Familiarity with infrastructure as code (IaC) tools like Terraform or CloudFormation.

Knowledge of container orchestration tools like Kubernetes or Docker Swarm.

Excellent communication, collaboration, and problem-solving skills.

Ability to work in a fast-paced environment and prioritize tasks effectively.

Preferred

Cloud provider certifications (e.g., AWS Certified DevOps Engineer or GCP Professional Cloud Developer).

Machine learning certifications (e.g., TensorFlow Certified Developer or PyTorch Certified Engineer).

Experience with model serving platforms like TensorFlow Serving or AWS SageMaker.

Automated model deployment pipelines using tools like Jenkins or GitLab CI/CD.

Monitoring and logging solutions like Prometheus or ELK Stack.

Model explainability and interpretability techniques.

Data privacy and security best practices.

Benefits

Health, dental, and vision insurance

401(k)

Opportunities for growth, training, and conference attendance

A dynamic, diverse team that values innovation and open communication

Company

Arcee.ai

Arcee.ai develops context-adapted LLMs through their domain-adapted language model system (DALM).

Founded in 2023

Miami, Florida, USA

2-10 employees

https://arcee.ai

Funding

Current Stage

Early Stage

Total Funding

$29.5M

Key Investors

Emergence

2024-07-18Series A· $24M

2024-01-24Seed· $5.5M

Leadership Team

Mark McQuade

Co-founder & CEO

Jacob Solawetz

Co-founder, CTO

Company data provided by crunchbase

Orion

Your AI Copilot