Respiratory Microbial Gene Catalogue

Data

1. Sequence data for gene catalogs:

Non-redundant gene catalog (nucleotide sequences, fasta)
Non-redundant gene catalog (amino acid sequences, fasta)

2. Gene annotation table:

GeneAnnotationAndSummaryTable.xls.gz

3. Public data used including:

Genes of 1,384 genomes of 66 respiratory tract related bacteria in the Integrated Microbial Genomes(IMG)
Gene set of respiratory tracts from the Human Microbiome Project (HMP)
Genes of 73 respiratory tract related bacteria in the Pathosystems Resource Integration Center (PATRIC)

Table format
Gene ID	Unique ID
Gene Length	Length of nucleotide sequence
Taxonomic Annotation(Phylum Level)	Annotated phylum for a gene
Taxonomic Annotation(Genus Level)	Annotated genus for a gene
Taxonomic Annotation(Species Level)	Annotated species for a gene
eggNOG Annotation	Annotated eggNOG(s) for a gene
eggNOG Functional Categories	eggNOG functional category(ies) of the annotated eggNOG(s)
KEGG Annotation	Annotated KO(s) for a gene
KEGG Functional Categories	KEGG functional category(ies) of the annotated KO(s)

Tools

Gene catalog construction

SOAPdenovo(v2.04)

SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads.

Website: http://soap.genomics.org.cn/soapdenovo.html

MetaGeneMark(v 3.26)

MetaGeneMark is a program designed to predict genes in metagenomes.

Website: http://exon.gatech.edu/GeneMark/index.html

CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

Website: http://weizhong-lab.ucsd.edu/cd-hit/

SOAPaligner/soap2(v2.21)

SOAPaligner/soap2 is a program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology.

Website: http://soap.genomics.org.cn/soapaligner.html

Gene annotation

Blast

BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.

Website: https://blast.ncbi.nlm.nih.gov/Blast.cgi

KEGG(Kyoto Encyclopedia of Genes and Genomes)

KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.

Website: http://www.genome.jp/kegg/

eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups)

eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups) is a database of orthologous groups of genes. The orthologous groups are annotated with functional description lines (derived by identifying a common denominator for the genes based on their various annotations), with functional categories (i.e derived from the original COG/KOG categories).

Website: http://eggnogdb.embl.de/#/app/home

Optional sidebar menu

Social media

Contact information

Address

Email

Phone

Data & Tools