Founding Machine Learning Engineer - Post Training, RL


New York
Permanent
USD220000 - USD300000
Research And Development
PR/557394_1757286620
Founding Machine Learning Engineer - Post Training, RL

Founding Machine Learning Engineer - Post Training, RL

A stealth-stage venture backed by Lux Capital (including backers of DeepMind and OpenAI)is on a mission to transform drug development with frontier-scale AI. Their goal: make large language models and multimodal AI systems practical for real-world biomedical applications-accelerating discovery and saving billions in R&D costs.

As a Founding Engineer, you'll work end-to-end-from data engine → training recipe → evaluation → deployment-to make cutting-edge models useful for drug development. You'll own post-training pipelines (SFT, DPO, RLHF), reward modeling, and evaluation systems, while collaborating closely with product and research teams.


Core Responsibilities

  • Build and optimize post-training workflows for large-scale LLMs and multimodal models.
  • Architect scalable data processing and filtering pipelines for proprietary biomedical datasets.
  • Design and implement distributed training systems for foundation models.
  • Rapidly iterate on prototypes and ship production-ready systems in a fast-paced, collaborative environment.


Skills

  • Strong software engineering skills and experience building and deploying AI/ML systems at scale.
  • Deep understanding of LLM training and post-training techniques (RLHF, instruction tuning, reward modeling).
  • Proficiency in Python and modern ML frameworks (PyTorch, JAX).
  • Familiarity with distributed training, multi-cloud infrastructure, and data pipeline design.
  • Bonus: Prior startup experience or background in life sciences. Experience shipping frontier models end-to-end


Why This Role Is Unique

  • Frontier-Scale Modeling: Architect and train a multimodal biomedical foundation model on a dataset at the magnitude of that used to train GPT-4.
  • Applied LLMs for Science: Build systems that reason over heterogeneous biomedical data to accelerate decision-making in drug development.
  • Massive-Scale Data Infrastructure: Design pipelines for ingesting and processing terabytes of structured and unstructured data across modalities.
  • Founding-Level Impact: Own the core AI stack, shape model architecture, and define scaling laws for applied life sciences AI.

Handpicked roles for you