Full Time – Page 63 – EmploymentForce.com

Showing 1241 – 1260 of 3471 results

Machine Learning Engineer (Distributed Training) – CloudWalk

[Machine Learning, AI]

São Paulo

Full Time

Machine Learning Engineer – Maps Services Evaluation – Apple

[Software, Machine Learning]

San Francisco Bay Area, California, United States

Full Time

Physician Assistant – InstaCare – Belleville – Er-Meds

[Healthcare Providers]

Belleville, IL

Full Time

Project Manager – 19six Architects

[Primary & Secondary Education]

Santa Cruz, California, 95060, United States

Full Time

購買プログラムマネージャー・Procurement Program Manager – Micron

[Procurement]

Hiroshima - Fab 15, Japan

Full Time

Program Manager / Senior Program Manager – Weather Initiatives – Precision Development

[Accounting, Consulting, eCom]

New Delhi, 110001, India

Full Time

Technical Program Manager – Bluesight

[Pharmaceuticals, Hospital]

United States - Remote

Full Time

Program Manager, Broken Variation (BV) – Amazon

[eCom, Gaming, Music, Sporting Goods, Pets, Apparel & Fashion, Business Supplies & Equipment, Fashion]

Bengaluru, Karnataka, IND

Full Time

SHAREM Program Manager (Navy/DoD) – THOR Solutions

[Maritime]

San Diego, CA

Full Time

Program Manager, Amazon Flex Mexico – Amazon

[Support]

Mexico City, Mexico City, MEX

Full Time

Release of Information/Redaction Manager (Project/Program Manager III) – King County

[Public Safety, Government]

Chinook Building 401 5th Avenue Seattle, WA

Full Time

Commercial Paralegal/ Legal Assistant – Stillman Law Office

[Legal Services, Law & Politics, Insurance]

Farmington Hills, MI

Full Time

Bankruptcy Paralegal – Friedman Vartolo LLP

[Legal Services]

Rochester, New York, 14604, United States

Full Time

Paralegal – Catastrophic Injury/Travel Claims – Clyde & Co Claims LLP

[Legal Services, Travel & Tourism]

Manchester

Full Time

FOIA Paralegal – Contact Government Services

[Legal Services, Government]

Washington, DC / Remote / Hybrid / Arlington, VA

Full Time

Sr. Paralegal – Ram Realty Advisors

[Legal Services]

Palm Beach Gardens, FL

Full Time

Paralegal & Billing Specialist (Small Law Firm) – TES Staffing

[Recruiting & Staffing, Legal Services]

Rochester, NY · Legal

Full Time

Physician Assistant – Ortho – Baptist Health Care

[Healthcare Providers]

FL, United States

Full Time

Physician Assistant – Ortho – Baptist Health Care

[Healthcare Providers]

FL, United States

Full Time

Load more All jobs loaded.

Machine Learning Engineer (Distributed Training) – CloudWalk

in [Machine Learning, AI]

Full Time

São Paulo

Apply Now

Job Overview

Date Posted

August 21, 2025
Location

São Paulo
Expiration date

--

Job Description

Who we are:

cloudwalk.io.

Who We’re Looking For:

We’re looking for a Machine Learning Engineer to own and evolve our distributed training pipeline for large language models. You’ll work inside our GPU cluster to help researchers train and scale foundation models using frameworks like Hugging Face Transformers, Accelerate, DeepSpeed, FSDP, and others. Your focus will be distributed training: from designing sharding strategies and multi-node orchestration to optimizing throughput and managing checkpoints at scale.

This role is not research – it’s about building and scaling the systems that let researchers move fast and models grow big. You’ll work closely with MLOps, infra, and model developers to make our training runs efficient, resilient, and reproducible.

What You’ll Do:

Own the architecture and maintenance of our distributed training pipeline;
Train LLMs using tools like DeepSpeed, FSDP, and Hugging Face Accelerate;
Design and debug multi-node/multi-GPU training runs (Kubernetes-based);
Optimize training performance: memory usage, speed, throughput, and cost;
Help manage experiment tracking, artifact storage, and resume logic;
Build reusable, scalable training templates for internal use;
Collaborate with researchers to bring their training scripts into production shape.

What We’re Looking For:

Expertise in distributed training: Experience with DeepSpeed, FSDP, or Hugging Face Accelerate in real-world multi-GPU or multi-node setups;
Strong PyTorch background: Comfortable writing custom training loops, schedulers, or callbacks;
Hugging Face stack experience: Transformers, Datasets, Accelerate – you know the ecosystem and how to bend it;
Infra literacy: You understand how GPUs, containers, and job schedulers work together. You can debug cluster issues, memory bottlenecks, or unexpected slowdowns;
Resilience mindset: You write code that can checkpoint, resume, log correctly, and keep running when things go wrong;
Collaborative builder: You don’t mind digging into other people’s scripts, making them robust, and helping everyone train faster.

Bonus Points:

Experience with Kubernetes-based GPU clusters and Ray;
Experience with experiment tracking (MLflow, W&B);
Familiarity with mixed precision, ZeRO stages, model parallelism;
Comfort with CLI tooling, profiling, logging, and telemetry;
Experience with dataloading bottlenecks and dataset streaming.

How We Hire:

Online assessment: technical logic and fundamentals (Math/Calculus, Statistics, Probability, Machine Learning/Deep Learning, Code)
Technical interview: deep dive into distributed training theory and reasoning (no code)
Cultural interview

If you are not willing to take an online quiz, do not apply.

If you’ve trained LLMs before – or helped others do it better – this role is for you. Even if you don’t check every box, if you’re confident working with distributed compute and real-world LLM workloads, we want to hear from you.

Machine Learning Engineer (Distributed Training) – CloudWalk

Machine Learning Engineer – Maps Services Evaluation – Apple

Physician Assistant – InstaCare – Belleville – Er-Meds

Project Manager – 19six Architects

購買プログラムマネージャー・Procurement Program Manager – Micron

Program Manager / Senior Program Manager – Weather Initiatives – Precision Development

Technical Program Manager – Bluesight

Program Manager, Broken Variation (BV) – Amazon

SHAREM Program Manager (Navy/DoD) – THOR Solutions

Program Manager, Amazon Flex Mexico – Amazon

Release of Information/Redaction Manager (Project/Program Manager III) – King County

Commercial Paralegal/ Legal Assistant – Stillman Law Office

Bankruptcy Paralegal – Friedman Vartolo LLP

Paralegal – Catastrophic Injury/Travel Claims – Clyde & Co Claims LLP

FOIA Paralegal – Contact Government Services

Sr. Paralegal – Ram Realty Advisors

Paralegal & Billing Specialist (Small Law Firm) – TES Staffing

Physician Assistant – Ortho – Baptist Health Care

Physician Assistant – Ortho – Baptist Health Care

Machine Learning Engineer (Distributed Training) – CloudWalk

Job Overview

Job Description

What You’ll Do:

What We’re Looking For:

Bonus Points:

How We Hire:

Tags

Call us

(844) 4-EFORCE

For Candidates

For Employers

About Us

Helpful Resources

Login to Employmentforce

Reset Password

Create a free Employmentforce account

Job Overview

Job Description

What You’ll Do:

What We’re Looking For:

Bonus Points:

How We Hire:

Share this post

Tags

Call us

(844) 4-EFORCE

For Candidates

For Employers

About Us

Helpful Resources