Data Scientist, Knowledge Graphs jobs in United States
cer-icon
Apply on Employer Site
company-logo

Mithrl · 1 month ago

Data Scientist, Knowledge Graphs

Mithrl is a fast-growing tech bio company building the world's first commercially available AI Co-Scientist. The Data Scientist, Knowledge Graphs will focus on ingesting and harmonizing biological data to curate relationships that enable reasoning across various biological datasets and enhance the AI Co-Scientist's capabilities.

Artificial Intelligence (AI)Data Center AutomationLife ScienceMedicalSoftware

Responsibilities

Ingest, harmonize, and version high value public biological datasets such as CellxGene, Gemma, ARCHS4, ENCODE, GTEx, TCGA, etc
Ingest well maintained peer reviewed knowledgebases including OpenTargets, HPA, and similar resources
Build automated pipelines to curate and expand relationships inside the knowledge graph
Define and evolve schemas for node types, relationships, metadata rules, and ontology alignment
Harmonize variable IDs and metadata fields across all imported sources to create a unified knowledge layer
Build and maintain versioning, change tracking, and provenance systems for all data and relationships
Develop the framework that allows users to build custom knowledge graphs from the analyses they run inside Mithrl
Build features that allow users to explore, query, and interact with their graphs
Work closely with ML engineers, bioinformatics teams, and discovery application teams to ensure the knowledge graph supports downstream reasoning and analysis
Validate the correctness, completeness, and integrity of the knowledge graph across releases

Qualification

Data scienceBioinformaticsGraph data structuresPythonKnowledge graph conceptsHarmonizing datasetsMetadata standardsCommunicationCollaboration skills

Required

Strong experience in data science, bioinformatics, computational biology, or a related field
Experience working with biological knowledgebases, public datasets, or ontology driven systems
Familiarity with graph data structures, relationship modeling, and knowledge graph concepts
Experience harmonizing heterogeneous biological datasets and mapping variable IDs across sources
Proficiency in Python and scientific computing libraries
Ability to build ingestion pipelines for structured or semi structured biological data
Strong understanding of metadata standards, biological ontologies, and domain logic
Ability to translate complex biological information into structured, machine readable representations
Excellent communication skills and comfort collaborating across engineering and scientific teams

Preferred

Experience with graph databases or graph query languages
Experience with KG curation, link prediction, relationship extraction, or graph based ML
Familiarity with multi modal data integration
Previous work on biological or chemical knowledge graphs
Experience with public consortia such as ENCODE, GTEx, TCGA, or ChEMBL, etc
Prior experience in a tech bio startup or scientific software environment

Benefits

Comprehensive PPO health coverage through Anthem (medical, dental, and vision)
401(k) with top-tier plans

Company

Mithrl

twittertwittertwitter
company-logo
Mithrl is a software development company that builds the custom workflows for NGS data on-demand.

Funding

Current Stage
Early Stage
Total Funding
$4M
Key Investors
Bonfire Ventures
2024-11-14Seed· $4M

Leadership Team

leader-logo
Shara Balakrishnan, Ph.D.
Chief Technology Officer
linkedin
Company data provided by crunchbase