Apply on Employer Site

Poolside · 16 hours ago

Member of Engineering (Pre-training / Synthetic Data)

United States

Full-time

Remote

Mid, Senior Level

Poolside is a company focused on building a world where AI drives economically valuable work and scientific progress. They are seeking a Member of Engineering to work on their data team, improving the quality of pretraining datasets and generating synthetic data at scale. The role involves collaboration with various teams to define data needs and ensure high-quality datasets for training large models.

AI InfrastructureArtificial Intelligence (AI)Developer PlatformFoundational AIInformation TechnologySoftware

H1B Sponsor Likely

Responsibilities

Follow the latest research related to LLMs and synthetic data generation in particular. Be familiar with the most relevant open-source datasets and models

Design and implement complex pipelines that can generate large amounts of data while maintaining high diversity and optimizing the resources available

Closely work with other teams such as Pretraining, Posttraining, Evals and Product to ensure alignment on the quality of the models delivered

Continuously measure and refine the quality of the datasets being generated while validating the final data strategy through quantitative data ablation experiments

Qualification

Machine LearningLarge Language ModelsData Pipeline EngineeringPython ProgrammingSynthetic Data GenerationData Quality OptimizationDistributed Data PipelinesPrompt EngineeringResearch ExperienceCollaboration Skills

Required

Strong machine learning and engineering background

Experience with Large Language Models (LLM), including: Understanding of how LLMs learn, Data ablations and scaling laws, Post-training techniques, Training reasoning and agentic models

Experience with implementing cost-efficient, complex pipelines to generate synthetical datasets at scale optimizing for data quality, correctness, diversity, etc

Experience with evals tracking model capabilities (general knowledge, reasoning, math, coding, long-context, etc)

Experience in building trillion-scale pretraining datasets, and familiarity with concepts like data curation, deduplication, data mixing, tokenization, curriculum, impact of data repetition, etc

Excellent programming skills in Python

Strong prompt engineering skills

Experience working with large-scale GPU clusters and distributed data pipelines

Strong obsession with data quality

Preferred

Author of scientific papers on any of the topics: applied deep learning, LLMs, source code generation, etc. - is a nice to have

Can freely discuss the latest papers and descend to fine details

Is reasonably opinionated

Benefits

Fully remote work & flexible hours

37 days/year of vacation & holidays

Health insurance allowance for you and dependents

Company-provided equipment

Wellbeing, always-be-learning and home office allowances

Frequent team get togethers

Great diverse & inclusive people-first culture

Company

Poolside

Poolside is an artificial intelligence platform that offers foundation concepts and infrastructure to write software codes.

Founded in 2023

San Francisco, California, USA

51-200 employees

http://www.poolside.ai

H1B Sponsorship

Poolside has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1)

Funding

Current Stage

Growth Stage

Total Funding

$626M

Key Investors

Bain Capital VenturesRedpoint

2024-10-02Series B· $500M

2023-08-24Series A· $100M

2023-05-14Seed· $26M

Leadership Team

Eiso Kant

Co-CEO & Co-founder

Jason Warner

Co-CEO & Co-Founder

Recent News

Globes

Poolside AI targets Israel’s defense sector

2025-12-17

Crunchbase News

Cursor’s $2.3B Financing Reminds Us: Coding Automation Is Still Ultra-Hot

2025-11-13

Investing.com

JPMorgan cuts CoreWeave to Neutral on escalating supply chain pressures

2025-11-11

Company data provided by crunchbase