The StrainSelect and Greengenes Databases

A variety of downloadable files are found at the links below. Since each have different terms of use please, read the documentation carefully before using them for your research.

  • The StrainSelect database organizes genome assemblies, contigs and 16S rRNA genes according to the strain from which each was derived. This is important because RefSeq contains many clandestine duplicates. StrainSelect allows lookup of strains available at various BRC (bioresource centers A.K.A culture collections). Also, all the strain identifiers are mapped to a unified a taxonomic reference useful for both 16S rRNA and shotgun metagenomics. The files can be helpful for academic non-commercial research as described in the terms of use.
  • The Greengenes database is comprised of 16S rRNA genes organized formats for use in pipelines and is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Beware that these publicly available versions of the Greengenes database utilize taxonomic terms proposed from phylogenetic methods applied many years ago between 2012 and 2013. Since then, a variety of novel phylogenetic methods have been proposed for Archaea and Bacteria. For recent examples of these methods see Yokono, et al. 2018 or Parks, et al. 2022.
  • Selected experimental datasets created with the PhyloChip 16S rRNA microarray are available in MIAME format.
  • Special collections of sequences posted by the NIH's Dr. Conlan are also available.