ChEMBL: The Digital Library Powering Modern Drug Discovery

Exploring the 2023 update to one of the world's most comprehensive open-source bioactivity databases

Drug Discovery Bioactivity Data Open Science

The Treasure Trove of Medicinal Chemistry

Imagine if every chemist and biologist could access the collective knowledge of decades of drug research with just a few clicks—no more repeating failed experiments, no more stumbling in the dark when designing new medications. This isn't science fiction; it's the reality enabled by ChEMBL, one of the world's most comprehensive open-source bioactivity databases. In our ongoing quest to develop better medicines faster, this digital resource has become an indispensable ally in the global fight against disease.

Managed by the European Bioinformatics Institute (EMBL-EBI), ChEMBL has undergone remarkable transformations since its inception. The 2023 update represents a significant milestone in its evolution—transitioning from a literature curation tool to a multifaceted drug discovery platform that now contains slightly more bioactivity data from direct deposits than from published literature 2 . This shift reflects its growing role as a central hub for the global research community.

15+
Years of Development
Millions
Bioactivity Measurements
200+
Journals Covered

From Humble Beginnings to a Global Resource

ChEMBL's story began in 2009 with a straightforward but ambitious goal: to systematically organize the scattered wealth of bioactivity data published in scientific journals 3 . Before resources like ChEMBL, this valuable information was trapped in unstructured formats within PDF files, with compound structures often depicted as unmachine-readable images and target proteins referred to by various names and abbreviations 7 .

2009 - ChEMBL 01

Initial launch with literature-derived data from 12 key medicinal chemistry journals 3 .

2011 - ChEMBL 10

First incorporation of PubChem bioassay data, expanding beyond literature sources 3 .

2013 - ChEMBL 16

Introduction of pChEMBL value for standardized potency comparisons 3 .

2017 - ChEMBL 23

Addition of bioactivity data for understudied targets through IDG collaboration 3 .

2020 - ChEMBL 27

Special COVID-19 release with drug repurposing data 3 .

2023 - ChEMBL 32

Deposited data surpasses literature-extracted data for the first time 2 .

ChEMBL Data Growth Over Time
Data Integration

ChEMBL began incorporating direct data depositions from research groups and pharmaceutical companies, particularly in neglected tropical diseases 3 .

Key Metrics

The creation of the pChEMBL value in 2013 allowed researchers to compare different potency measurements on a standardized negative logarithmic scale 3 .

What's New in 2023: A More Powerful Platform

The 2023 update to ChEMBL represents more than just additional data—it introduces fundamental shifts in content sourcing and new capabilities that enhance its utility across drug discovery applications.

Data Sources in ChEMBL 2023
Deposited Data Dominance

For the first time, bioactivity data from deposited datasets now slightly exceeds that extracted from literature 2 . This reflects growing embrace of open data practices.

Enhanced Annotations

New features include Natural Product likeness score, updated flags for Natural Products, and a new flag for Chemical Probes 2 8 .

Understudied Targets

Through collaborations with initiatives like the Illuminating the Druggable Genome (IDG) project, ChEMBL has incorporated bioactivity data for less-studied targets 3 .

ChEMBL Data Structure
Compounds

Chemical structures standardized and curated

Assays

Experimental procedures categorized

Activities

Quantitative measurements standardized

ChEMBL in Action: Powering COVID-19 Drug Repurposing

The Challenge of Rapid Response

When the COVID-19 pandemic emerged, the scientific community faced an urgent challenge: quickly identifying existing drugs that could be repurposed to combat SARS-CoV-2. Traditional drug development takes years, but with millions of lives at stake, researchers needed to accelerate this process dramatically. ChEMBL's comprehensive collection of drug activity data positioned it as an ideal resource for this critical task.

Methodology: From Large-Scale Screening to Data Integration

The approach taken by multiple research consortiums and documented in ChEMBL followed a systematic process:

  1. Large-scale screening of approved drugs and clinical candidates against SARS-CoV-2 in cell-based assays 2
  2. Dose-response testing of promising compounds to quantify their potency 2
  3. Integration of screening results with existing knowledge about safety profiles
  4. Curation and deposition of the resulting bioactivity data into ChEMBL 2
COVID-19 Drug Repurposing Workflow
Results and Impact: Accelerating Therapeutic Development

ChEMBL Release 27 made curated data available from eight large-scale drug repurposing screens, providing a comprehensive resource for COVID-19 therapeutic development 2 3 .

Compound Name Original Indication Anti-SARS-CoV-2 Activity Stage of COVID-19 Development
Remdesivir Ebola virus infection EC50 ~ 0.77 μM Approved for emergency use
Dexamethasone Inflammation Reduced mortality in severe cases Recommended for severe COVID-19
Hydroxychloroquine Malaria, autoimmune diseases Inactive in controlled trials Development discontinued
Ivermectin Parasitic infections Conflicting results Not recommended outside trials

This case study exemplifies how ChEMBL serves as a central repository for crisis-relevant bioactivity data, enabling coordinated global research efforts during public health emergencies.

The Scientist's Toolkit: Navigating ChEMBL

ChEMBL provides multiple access points tailored to different research needs and technical expertise levels. Whether you're a bench scientist looking for a quick answer or a bioinformatician conducting large-scale analyses, ChEMBL offers appropriate tools for the task.

Web Interface

User-friendly website with search and filtering capabilities 5 . Ideal for quick compound/target searches and report card browsing.

Web Services

RESTful API for programmatic data access 5 . Perfect for application development and automated data retrieval.

Data Downloads

Complete database dumps in MySQL format 5 . Suitable for large-scale analyses and custom database implementations.

KNIME Integration

Example workflows using REST nodes 5 . Excellent for visual workflow design and reproducible data analysis.

ChEMBL by the Numbers (2023 Release)

Beyond 2023: The Future of Digital Drug Discovery

As we look beyond the 2023 update, ChEMBL continues to evolve in response to the changing landscape of drug discovery. The database now stands as a FAIR (Findable, Accessible, Interoperable, Reusable) and Global Core Biodata Resource, reflecting its fundamental importance to the life sciences community 2 . With celebrations of its 15th anniversary in 2024, ChEMBL has firmly established itself as Europe's most impactful open-access drug discovery database 3 .

Direct Data Deposition

Greater emphasis on direct data deposition as pre-competitive collaboration increases across pharmaceutical and academic sectors.

Chemical Probes

Inclusion of new data types such as chemical probe information through collaborations with the EUbOPEN consortium 2 3 .

AI and ML Integration

Comprehensive, standardized datasets becoming vital for training and validating predictive models in drug discovery.

Perhaps most importantly, ChEMBL exemplifies how open science practices can accelerate therapeutic development. By making vast amounts of bioactivity data freely available, it helps reduce redundant research, enables the identification of promising compounds that might otherwise be overlooked, and provides critical information about potential safety concerns early in the drug development process.

As drug discovery grows increasingly data-driven, ChEMBL's role as a centralized, curated, and open knowledge resource becomes ever more essential—proving that in the complex journey from chemical compound to effective medicine, shared knowledge may be the most valuable catalyst of all.

References