The 2011 Molecular Data Gold Rush

How Biologists Learned to Organize Life's Code

The 2011 Nucleic Acids Research Database Issue marked a pivotal moment in science, documenting how researchers transformed raw genetic data into organized, searchable resources that could fuel discoveries about health, disease, and the very building blocks of life ¹ .

More Than Just 'A' for Apple

Imagine trying to understand every conversation in a bustling city where people speak hundreds of different languages simultaneously. This was the challenge facing biologists in the early 21st century, as DNA sequencing technologies began generating millions of genetic sequences daily.

The flood of data was so immense that simply storing it became a monumental task, let alone making sense of it all.

Enter the 2011 Nucleic Acids Research Database Issue—a specialized annual publication that served as a curated field guide to this explosion of biological information. This particular edition marked a significant moment in science, documenting how researchers transformed raw genetic data into organized, searchable resources that could fuel discoveries about health, disease, and the very building blocks of life ¹ .

The Database Revolution: From Data Hoarding to Biological Discovery

What Made the 2011 Edition Special?

The 2011 Database Issue wasn't just another academic publication—it represented a growing recognition that data curation required community standards and specialized resources. This edition featured descriptions of 96 new databases and updates on 83 previously established ones, bringing the total number of databases in the accompanying online Molecular Biology Database Collection to 1,330 carefully selected resources ¹ .

COMBREX

An ambitious project aimed at figuring out the functions of 'conserved hypothetical' proteins—genes that appeared across species but whose functions remained mysterious ¹ .

BioDBcore

A community effort to establish a 'minimal information about a biological database'—essentially a standard label for databases that would make them easier to find, use, and compare ¹ .

Dr. Daniel J. Rigden, one of the collection's curators, explained that emphasis was placed on including "databases where new value is added to the underlying data by virtue of curation, new data connections, or other innovative approaches" ³ . This philosophy transformed raw data into genuine biological insight.

The Expanding Universe of Biological Data

Growth of the Molecular Biology Database Collection

Year	Number of Databases	Notable Highlights
2000	Initial collection established	Focus on major sequence repositories and model organisms ²
2001	281 databases	55 new entries added; early emphasis on gene expression and genomics ⁸
2009	1,170 databases	95 new databases described in that year's issue
2011	1,330 databases	Introduction of COMBREX and BioDBcore initiatives ¹
2022	1,645 databases	Continued expansion with specialized resources for COVID-19, protein structures, and more ³

Database Growth Timeline

2000

Initial collection established

Focus on major sequence repositories and model organisms ²

2001

281 databases

55 new entries added ⁸

2009

1,170 databases

95 new databases described

2011

1,330 databases

Introduction of COMBREX and BioDBcore ¹

2022

1,645 databases

Continued expansion with specialized resources ³

A Closer Look: The International Nucleotide Sequence Database Collaboration

One of the most crucial resources highlighted in the 2011 issue was the International Nucleotide Sequence Database Collaboration (INSDC)—a perfect example of how scientific cooperation enabled biological discovery on a global scale ¹ .

GenBank

United States

EMBL Nucleotide Sequence Database

Europe

DNA Data Bank of Japan (DDBJ)

Japan

The INSDC comprised three major databases that worked in concert. These organizations established data exchange protocols that allowed researchers worldwide to submit DNA sequences to any one database while knowing the information would be shared across all three. This eliminated duplication of effort and created a comprehensive, unified resource that has become the foundation of modern biological research ¹ .

Sequence Read Archive

The 2011 issue also documented the establishment of the Sequence Read Archive, which addressed the challenge of storing the massive datasets generated by new sequencing technologies ¹ . This archive ensured that even the rawest genetic data would be preserved for future reanalysis as scientific understanding advanced.

The Scientist's Toolkit: Key Database Categories

Biological databases specialize in different types of information, much like libraries have sections for reference, periodicals, and special collections. The 2011 Database Issue highlighted several crucial categories:

Essential Database Categories from the 2011 Collection

Category	Purpose	Example Databases
Sequence Repositories	Store fundamental DNA and protein sequence data	GenBank, EMBL, DDBJ ¹
Protein Structure	Catalog 3D protein shapes determined experimentally	Protein Data Bank (PDB), CATH, SUPERFAMILY ¹
Gene Expression	Document when and where genes are active	GEO, ArrayExpress ¹
Specialized Genomics	Focus on specific organisms or biological systems	FlyBase (fruit flies), SGD (yeast), UK PubMed Central ¹

Research Reagent Solutions for Database Curation

Tool/Resource	Function in Database Curation
BioDBcore Standards	Provide consistent description framework for databases, making them more usable ¹
Validation Datasets	Standardized data used to test and confirm database search functions and accuracy ⁴
Curation Interfaces	Specialized software tools that help human curators extract and organize information from scientific literature ³
Automated Annotation Pipelines	Computational systems that add preliminary labels to new genetic sequences before expert review ¹

Database Curation Philosophy

Dr. Daniel J. Rigden emphasized that databases should include "databases where new value is added to the underlying data by virtue of curation, new data connections, or other innovative approaches" ³ . This philosophy transformed raw data into genuine biological insight.

The Human Side of Data: Beyond Ones and Zeroes

The 2011 issue also reflected the very human challenges facing the scientific community. The editors included a special note acknowledging the impact of the March 2011 tsunami in Japan, which devastated the northeast coast of the country and caused nuclear catastrophe at the Fukushima Dai-ichi power plant ⁴ .

Impact on Japanese Researchers

The disaster caused significant difficulties for Japanese researchers, including power blackouts and network disruptions that forced several authors to arrange alternative web locations for their databases.

The scientific community rallied, with the NAR editors expressing admiration for "their fortitude and resiliency in the face of this overwhelming tragedy" ⁴ .

This reminder that databases are built and maintained by people facing real-world challenges—from natural disasters to the daily grind of curation—highlighted the human infrastructure underlying our digital biological knowledge.

Conclusion: A Legacy of Organized Knowledge

The 2011 Nucleic Acids Research Database Issue captured biology at a crossroads—transitioning from a discipline limited by data scarcity to one challenged by data abundance.

The solutions pioneered in this era, from international collaborations like INSDC to standardization efforts like BioDBcore, created the foundation for today's biological research.

Continued Evolution

These resources continue to evolve, with the 2022 edition of the collection listing 1,645 databases ³ . What began as a response to a data crisis has become an enduring testament to science's ability to organize knowledge—proving that in biology, as in life, finding the right information is just as important as having the information itself.

As one of the early visionaries behind these efforts noted back in 2000, databases needed to be more than just "storehouses for thousands of bases or amino acids"—they needed to "make logical connections to other types of information that are available" to allow for true biological discovery ² . The 2011 Database Issue showed just how far the scientific community had come in achieving that vision.