Badapanda C
High-throughput genome sequencing technologies are revolutionizing bacterial genomics, resulting in the accumulation of 'unknown' or 'hypothetical' or ‘conserved hypothetical’ genes. However, approximately 40-50% of genes within a genome are often labeled as ‘hypothetical’ or ‘conserved hypothetical’ or 'unknown' whose function has not yet been established, inviting the functional annotation of these 'unknown genes'. Mycobacterium tuberculosis H37Rv has 3,924 protein coding genes, of which 606 proteins are classified as ‘unknown proteins’. We here predict reliable functional annotation by integrating several bioinformatics annotation tools, sequential BLAST homology searches, InterProScan searches, Gene Ontology (GO) mapping, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and established the putative function of 522 proteins with at least some functional annotations. The identified pathways from ‘unknown proteins’ are mapped to well-known pathways, and would provide many putative targets for the rational design of more effective anti-mycobacterial agents. In this work, we have computationally defined T cell epitopes of proteins of M. tuberculosis H37Rv to help in the design of a vaccine with haplotype specificity for a target population. The peptides of M. tuberculosis H37Rv which are predicted to bind different HLAs class-I (Human Leukocyte Antigens), do not show similarity with peptides of human proteome. Some of the nonameric peptides are promiscuous in their association with multiple alleles, and are considered for vaccine design because of their relevance in the wider coverage of human population. Altogether, functional annotations performed by integrative bioinformatics approaches should considerably enhance the interpretation of the unknown proteins of this medically important organism.
Compartilhe este artigo