Protege · 2 days ago
Data Annotation Associate
Protege is a company focused on solving the challenges of accessing the right training data for AI. They are seeking a Data Annotation Associate to support their data operations by preparing sensitive healthcare documents for AI training workflows, specifically by redacting Protected Health Information (PHI).
AnalyticsArtificial Intelligence (AI)Data Management
Responsibilities
De-identify high volumes of healthcare PDFs by accurately redacting PHI identifiers (names, locations, dates, ages, IDs, and other identifiers) in accordance with established guidelines (hhs.gov)
Follow a redaction/annotation playbook closely, including how to handle edge cases and when to escalate questions
Complete light QA on your own work (spot checks, verify redactions applied correctly, ensure no PHI remains visible/searchable)
Track daily throughput and communicate status clearly (what’s done, what’s blocked, what needs review)
Maintain organized file handling and versioning so work is easy to audit and review
Operate within strict security policies for PHI handling (confidentiality, access controls, and device hygiene)
Qualification
Required
Authorized to work in the U.S. and able to work as a W2 employee based anywhere in the United States (required for PHI access)
Comfort handling sensitive information and following strict privacy/security rules
Experience with detail-oriented work (administrative operations, document review, medical records handling, QA, compliance support, or data labeling/annotation)
Comfort working with PDFs and basic productivity tools (Google Workspace / Microsoft Office)
Strong written communication and reliable follow-through
Ability to maintain speed and accuracy across large volumes of similar documents
Preferred
Prior experience in HIPAA-regulated environments or working with healthcare documents (hhs.gov)
Experience redacting or reviewing documents (legal, healthcare, insurance, or compliance contexts)
Experience in data annotation/labeling workflows
Comfort tracking work in spreadsheets and following simple metrics (throughput, error rate)
Company
Protege
Protege is the AI training data platform enabling seamless and compliant data exchange.
Funding
Current Stage
Early StageTotal Funding
$65MKey Investors
Andreessen HorowitzFootworkCRV
2026-01-07Series A· $30M
2025-08-13Series A· $25M
2024-09-10Seed· $10M
Recent News
2026-02-03
Sourcery
2026-01-15
alleywatch.com
2026-01-14
Company data provided by crunchbase