Exploring the 2023 update to one of the world's most comprehensive open-source bioactivity databases
Imagine if every chemist and biologist could access the collective knowledge of decades of drug research with just a few clicks—no more repeating failed experiments, no more stumbling in the dark when designing new medications. This isn't science fiction; it's the reality enabled by ChEMBL, one of the world's most comprehensive open-source bioactivity databases. In our ongoing quest to develop better medicines faster, this digital resource has become an indispensable ally in the global fight against disease.
Managed by the European Bioinformatics Institute (EMBL-EBI), ChEMBL has undergone remarkable transformations since its inception. The 2023 update represents a significant milestone in its evolution—transitioning from a literature curation tool to a multifaceted drug discovery platform that now contains slightly more bioactivity data from direct deposits than from published literature 2 . This shift reflects its growing role as a central hub for the global research community.
ChEMBL's story began in 2009 with a straightforward but ambitious goal: to systematically organize the scattered wealth of bioactivity data published in scientific journals 3 . Before resources like ChEMBL, this valuable information was trapped in unstructured formats within PDF files, with compound structures often depicted as unmachine-readable images and target proteins referred to by various names and abbreviations 7 .
Initial launch with literature-derived data from 12 key medicinal chemistry journals 3 .
First incorporation of PubChem bioassay data, expanding beyond literature sources 3 .
Introduction of pChEMBL value for standardized potency comparisons 3 .
Addition of bioactivity data for understudied targets through IDG collaboration 3 .
Special COVID-19 release with drug repurposing data 3 .
Deposited data surpasses literature-extracted data for the first time 2 .
ChEMBL began incorporating direct data depositions from research groups and pharmaceutical companies, particularly in neglected tropical diseases 3 .
The creation of the pChEMBL value in 2013 allowed researchers to compare different potency measurements on a standardized negative logarithmic scale 3 .
The 2023 update to ChEMBL represents more than just additional data—it introduces fundamental shifts in content sourcing and new capabilities that enhance its utility across drug discovery applications.
For the first time, bioactivity data from deposited datasets now slightly exceeds that extracted from literature 2 . This reflects growing embrace of open data practices.
Through collaborations with initiatives like the Illuminating the Druggable Genome (IDG) project, ChEMBL has incorporated bioactivity data for less-studied targets 3 .
Chemical structures standardized and curated
Experimental procedures categorized
Quantitative measurements standardized
When the COVID-19 pandemic emerged, the scientific community faced an urgent challenge: quickly identifying existing drugs that could be repurposed to combat SARS-CoV-2. Traditional drug development takes years, but with millions of lives at stake, researchers needed to accelerate this process dramatically. ChEMBL's comprehensive collection of drug activity data positioned it as an ideal resource for this critical task.
The approach taken by multiple research consortiums and documented in ChEMBL followed a systematic process:
ChEMBL Release 27 made curated data available from eight large-scale drug repurposing screens, providing a comprehensive resource for COVID-19 therapeutic development 2 3 .
| Compound Name | Original Indication | Anti-SARS-CoV-2 Activity | Stage of COVID-19 Development |
|---|---|---|---|
| Remdesivir | Ebola virus infection | EC50 ~ 0.77 μM | Approved for emergency use |
| Dexamethasone | Inflammation | Reduced mortality in severe cases | Recommended for severe COVID-19 |
| Hydroxychloroquine | Malaria, autoimmune diseases | Inactive in controlled trials | Development discontinued |
| Ivermectin | Parasitic infections | Conflicting results | Not recommended outside trials |
This case study exemplifies how ChEMBL serves as a central repository for crisis-relevant bioactivity data, enabling coordinated global research efforts during public health emergencies.
ChEMBL provides multiple access points tailored to different research needs and technical expertise levels. Whether you're a bench scientist looking for a quick answer or a bioinformatician conducting large-scale analyses, ChEMBL offers appropriate tools for the task.
User-friendly website with search and filtering capabilities 5 . Ideal for quick compound/target searches and report card browsing.
RESTful API for programmatic data access 5 . Perfect for application development and automated data retrieval.
Complete database dumps in MySQL format 5 . Suitable for large-scale analyses and custom database implementations.
Example workflows using REST nodes 5 . Excellent for visual workflow design and reproducible data analysis.
As we look beyond the 2023 update, ChEMBL continues to evolve in response to the changing landscape of drug discovery. The database now stands as a FAIR (Findable, Accessible, Interoperable, Reusable) and Global Core Biodata Resource, reflecting its fundamental importance to the life sciences community 2 . With celebrations of its 15th anniversary in 2024, ChEMBL has firmly established itself as Europe's most impactful open-access drug discovery database 3 .
Greater emphasis on direct data deposition as pre-competitive collaboration increases across pharmaceutical and academic sectors.
Comprehensive, standardized datasets becoming vital for training and validating predictive models in drug discovery.
Perhaps most importantly, ChEMBL exemplifies how open science practices can accelerate therapeutic development. By making vast amounts of bioactivity data freely available, it helps reduce redundant research, enables the identification of promising compounds that might otherwise be overlooked, and provides critical information about potential safety concerns early in the drug development process.
As drug discovery grows increasingly data-driven, ChEMBL's role as a centralized, curated, and open knowledge resource becomes ever more essential—proving that in the complex journey from chemical compound to effective medicine, shared knowledge may be the most valuable catalyst of all.