The Chan Zuckerberg Initiative launched rBio on Thursday, the first AI model trained to reason about cellular biology using virtual simulations rather than expensive laboratory experiments. This breakthrough could dramatically accelerate biomedical research by allowing scientists to test biological hypotheses computationally before committing resources to costly lab work, potentially flipping the traditional paradigm where 90% of biology research happens experimentally.
The big picture: rBio addresses a fundamental challenge in applying AI to biological research by creating the first conversational AI system that can answer complex biological questions in plain English while being grounded in rigorous scientific data.
- Traditional biological foundation models work with complex molecular data that scientists struggle to query naturally, requiring “complicated ways to prompt them,” according to Ana-Maria Istrate, a senior research scientist at CZI.
- The model distills knowledge from CZI’s TranscriptFormer—trained on 112 million cells from 12 species spanning 1.5 billion years of evolution—into a user-friendly interface researchers can query conversationally.
How it works: rBio uses a novel “soft verification” training approach that teaches the AI to think in probabilities rather than binary yes-or-no answers.
- Instead of traditional reinforcement learning with simple correct/incorrect rewards, rBio receives rewards proportional to the likelihood that its biological predictions align with reality, as determined by virtual cell simulations.
- Scientists can ask complex questions like “Would suppressing the actions of gene A result in an increase in activity of gene B?” and receive scientifically grounded responses about cellular changes.
- The system successfully applies knowledge about gene co-expression patterns to make accurate predictions about gene perturbation effects—completely different biological tasks.
Performance benchmarks: rBio demonstrated competitive performance against models trained on real experimental data, with enhanced capabilities when using advanced prompting techniques.
- On the PerturbQA benchmark for gene perturbation prediction, rBio matched specialized biological models and outperformed baseline large language models.
- When enhanced with chain-of-thought prompting that encourages step-by-step reasoning, rBio achieved state-of-the-art performance, surpassing the previous leading model SUMMER.
- The model showed strong “transfer learning” capabilities, successfully generalizing to out-of-distribution cell lines without needing cell-line specific experimental data.
Why this matters: The development could fundamentally transform drug discovery and biomedical research by dramatically reducing the time and cost traditionally required for biological experimentation.
- Drug discovery typically takes decades and costs billions of dollars, but rBio’s ability to predict cellular responses could accelerate early-stage research significantly.
- The model’s predictions about gene interactions could prove particularly valuable for understanding neurodegenerative diseases like Alzheimer’s, potentially leading to “earlier intervention, perhaps halting these diseases altogether someday.”
Open source advantage: CZI’s commitment to making rBio freely available distinguishes it from commercial competitors and could democratize access to sophisticated biological AI tools.
- All CZI models are available through the organization’s Virtual Cell Platform with tutorials that run on free Google Colab notebooks.
- This approach benefits smaller research institutions and startups that lack resources to develop such models independently, while creating network effects that could accelerate scientific progress.
- “One of the main goals for our work is to accelerate science. So everything we do is we want to make it open source for that purpose only,” Istrate explained.
Data quality focus: CZI’s approach benefits from years of careful data curation through its CZ CELLxGENE repository, one of the largest single-cell biological data collections.
- The organization’s data undergoes rigorous quality control and was “generated with diversity in mind to minimize bias in terms of cell types, ancestry, tissues, and donors.”
- This attention to data quality becomes crucial for AI models that could influence medical decisions, unlike some commercial efforts that rely on potentially biased public datasets.
Future vision: rBio represents the first step toward CZI’s goal of creating “universal virtual cell models” that integrate knowledge from multiple biological domains.
- Currently, researchers work with separate models for transcriptomics, proteomics, and imaging data without easy integration methods.
- The team demonstrated improved performance when combining multiple verification sources—TranscriptFormer, specialized neural networks, and knowledge databases like Gene Ontology.
- This integration capability could eventually allow scientists to ask questions spanning different types of biological data within a single conversational interface.
Strategic context: The launch comes as CZI has refocused from broad philanthropic missions to targeted scientific research, concentrating resources on the intersection of AI and biology.
- The $6 billion initiative under Priscilla Chan, a pediatrician, and Mark Zuckerberg, Meta’s CEO, aims to “cure, prevent, and manage all disease by the end of this century.”
- CZI’s continued investment in biological AI infrastructure could help maintain research momentum during potential cuts to National Institutes of Health funding under the current administration.
                Chan Zuckerberg Initiative’s rBio uses virtual cells to train AI, bypassing lab work