MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences

Abstract

Analysis of environmental DNA is increasingly used to characterize ecological communities, but the effectiveness of this approach depends on the accuracy of taxonomic reference databases. The MIDORI databases, first released in 2017, were built to improve accuracy for mitochondrial metazoan (animal) sequences. MIDORI has now been significantly improved and renamed MIDORI2 (available at http://www.reference-midori.info). Like MIDORI, MIDORI2 is built from GenBank and contains curated sequences of thirteen protein-coding and two ribosomal RNA mitochondrial genes. Coverage has been substantially expanded to cover all eukaryotes, including fungi, green algae and land plants, other multicellular algal groups, and diverse protist lineages. MIDORI2 also now includes not only species with full binomials, but also taxa referred to by genus with species left unspecified (“sp.”). Another new feature is the updating of the databases approximately every two months with version numbers corresponding to each new GenBank release. Additional potentially erroneously annotated sequences have also been removed. Finally, the ability to export data files to BLAST+ has been added to the original ability to export preformatted data to five taxonomic assignment programs, and databases of amino acid sequences are also made available for protein-coding genes. As a technical validation, we conducted a preliminary comparison of the performance of MIDORI2 with five taxonomic assignment programs. Results suggest that BLAST+ top hits performed better for assigning CO1 sequences than alignment-free methods based on compositional features. Comparing MIDORI2 with two other commonly used curated databases of mitochondrial sequences, CO-ARBitrator and BOLD, we show that MIDORI2 includes sequences from a broader range of metazoan and non-metazoan taxa. Overall, in many contexts, MIDORI2 offers clear advantages–a higher diversity of taxa than other databases, a variety of user-friendly features, and regular updates. MIDORI2 is particularly well-suited for environmental DNA studies that target mitochondrial genes with broad primers.


Matthieu Leray
Matthieu Leray
Staff Scientist

Drivers, functions, and evolution of marine biodiversity.