Discovery of novel prokaryotic taxa: A systematic approach

Exploring commercially viable novel prokaryotes has become more emphatic in recent years due to extinction and exploitation of biodiversity due to anthropogenic activities. I’m briefing here a systematic approach of identifying novel prokaryotes with special reference to bacteria. The estimated number of bacterial diversity is huge and is about 3 x 106 species, wherein approximately18k species were identified to date.
Polyphasic approach
Identifying novel prokaryotes for instance bacteria relies on polyphasic taxonomy which includes both phenotypic (morphological, biochemical, chemotaxonomic) and genotypic (16S rDNA sequence similarity and phylogeny, DNA-DNA hybridization, G+C content of DNA) characterisation. In order to overcome few limitations of polyphasic approach, taxono-genomic strategy was introduced that brought together genomic sequence similarity and characterisation along with the above said polyphasic methods. The genome sequence with whole genetic information of the bacterium under study will be used for taxonomy.
Studies have evidenced that strains from the same microbial species share >95 % Average Amino Acid Identity (AAI) and Average Nucleotide Identity (ANI), >95 % identity based on multiple alignment genes, <10 in Karlin genomic signature, and>70 % in silico Genome-to-Genome Hybridization similarity (GGDH) [1].
Where to begin with?
16S rRNA gene is more conserved and can be used comfortably as a taxonomic marker to start with novel bacterial identification. The entire 16S rRNA gene is approximately 1500 nucleotides long and even a partial sequence of this gene is sufficient to compare with reference libraries to know about the neighbour species with maximum similarity. The sequences are compared with already deposited sequences using the databases like GenBank, EzTaxon, BIBI etc.
I personally prefer ‘EzBioCloud 16S database’ as it compares sequences of type strains of prokaryotic species with validly published names. Phylogenetic neighbours can also be found out using Ez Taxon server. It is a web-based tool for the identification of prokaryotes based on 16S ribosomal RNA gene sequences and is freely accessible over the internet at
Based on the results obtained from these libraries, different taxonomic levels of the bacterium under study could be identified with the help of interpretive guidelines of CLSI [2].
Average Nucleotide Identity: The EzBioCloud data base also calculates the ANI cut off of the isolate and the closely related type strain. The ANI cut off for a new species lies between 95 % and 96 %.
Percentage of sequence identity: The sequence identity cut off for discrimination of different levels of taxonomy of species’ is been correlated with DDH results and the following percentages are recommended to proceed further for novel taxonomic characterisation.
Table. Sequence identity cut off percentage to determine a novel taxon
Percentage of 16S rRNA 
Novel taxon level
98.7 to 99.0
97.0 to 99.0
95 to 97

If the hit shows 98.7 % similarity with a good quality sequences, then it must probably be a new species. If 97 %, a new genus and so on. But if the results show around 99 % similarity, still there is good chance of it being a novel species. To confirm this, the genetic neighbour(s) of the bacterium under study may be purchased from culture collection centres and genetic characterisation shall be followed.
However, 16S rDNA analysis is just a preliminary experiment for discovering a novel bacterial strain, since many of the sequences in the library were testified as of poor quality with wrong information. Clayton in 1995 reported that 26 % of a set of 16S rRNA gene sequences of same species in GenBank had >1 % random sequencing errors.
Genetic Characterisation
DNA-DNA hybridization (DDH) is a golden standard technique used by taxonomists since the 1960s for the classification of prokaryotes. The recommended cutoff point for DDH similarity to define a new species is ≤70% [3]. This corresponds to the 94 % ANI cutoff value. The mol % G+C content of DNA shall be obtained for the strain under study and also the related reference strain(s). The DDH, and G+C content analysis results were compared and if the cut off lies below or equals 70 % in DDH, the strain must be a novel one.
Other data to be obtained to report a novel bacterium
General morphological characteristics like colony and cell morphology, gram staining, motility and spore morphology; biochemical characteristics; chemotaxonomic characteristics like cellular fatty acid analysis, polar lipids, quinones and cell wall analysis shall be obtained to report the identified novel strain as a type strain.
It is better to sequence the whole genome of the new bacterium, if not the gap must definitely be filled by some other researcher. However, IJSEM (International Journal of Systematic and Evolutionary Microbiology) now has made genome sequence information mandatory for any new classification of type strain.
[1] Thompson CC, Chimetto L, Edwards RA, Swings J, Stackebrandt E, Thompson FL. Microbial genomic taxonomy. BMC Genomics. 2013 Dec 23; 14: 913.
[2] Petti CA, Bosshard PP, Brandt ME, Clarridge JE, Feldblyum TV, Foxall P, Furtado MR, Pace N, Procop G. Interpretive criteria for identification of bacteria and fungi by DNA target sequencing; approved guidelines. Clinical and Laboratory Standards Institute. 2008 28(12).
[3] Clayton RA, Sutton G, Hinkle PS Jr, Bult C, Fields C. Intraspecific variation in small-subunit rRNA sequences in GenBank: why single sequences may not adequately represent prokaryotic taxa. Int J Syst Bacteriol. 1995 Jul; 45(3): 595-559.