Genomic Reference Libraries: Unlocking the Secrets of Deep-Sea Life
By Pedro Peres
The deep sea is Earth’s largest and least explored habitat, home to glowing fish, mysterious shrimps, strange cephalopods, and other amazing critters. As scientists work to understand this ecosystem, one thing becomes clear: to study the deep sea, we need to know who lives there.
That might sound simple, but it is not. Correctly identifying deep-sea species is one of the major challenges in marine science. Many species look similar, live thousands of meters below the surface, and are known from only a handful of specimens. Knowing which species live there is essential for everything from biodiversity monitoring and ecosystem conservation to understanding evolution in these environments.
In recent years, environmental DNA (eDNA) has emerged as an exciting method to detect deep-sea life. By collecting water samples and sequencing the DNA they contain, researchers can detect species without needing to see or catch them. But it is not as simple as it sounds like: if we do not have a good reference database to match the DNA to, we can’t tell what species it came from. It is like finding a fingerprint with no database to check it against. That’s where genomic reference libraries come in.
A genomic reference library is a collection of DNA sequences that are tied to correctly identified organisms. Ideally, each DNA record is linked to a physical specimen (voucher) that’s been preserved in a museum and identified by a taxonomic expert.

Figure 1. Preserving deep-pelagic samples collected during DSB2 for future DNA sequencing.
Without that level of care, small mistakes can quickly become big problems. Misidentifying a species once can lead to years of flawed data and wrong data interpretation.
To address this, DEEPEND|RESTORE and DSB are now building carefully curated and vouchered genomic libraries for deep-sea species, focusing on fish, crustaceans, and mollusks. From these specimens, we are sequencing the full mitochondrial genome of these species. But why the mitochondrial genome?
The mitochondrial DNA is found in many copies per cell, making it easier to recover from degraded or low-quality samples. A few regions of the mitochondrial DNA are usually used in eDNA studies or in direct sequencing for fast species identification (barcoding). For example, the 12S or 16S ribosomal RNA and the cytochrome c subunit I (COI) regions are common choices in many biodiversity studies. In these studies, usually one region is sequenced at a time, which means that we need references for each gene. By sequencing the full mitochondrial genome, we are getting multiple targeted regions from the same specimen and at the same time, improving the use and reliability of these reference libraries. These genomic libraries are laying the foundation for a future where biodiversity monitoring is faster, more accurate, and accessible, even in the most remote parts of the ocean.

Figure 2. Full circular mitochondrial genome of the lanternfish Diaphus dumerilii (voucher specimen: FIFC 115).
This is a work in progress, but we are already having results, and our goal is to sequence hundreds of species. Later, all this data will be deposited in public genetic databases (e.g., NCBI), and become available from anyone to use, or match their fingerprints to.
Each new genome added to the reference library brings us closer to unlocking the secrets of deep-sea life.

