Rice Genomics and Bioinformatics
Worldwide, rice is one of the world’s most important crop plants with 50% of the population dependent on rice as a food source. Furthermore, rice is a member of the grass family (Poaceae) and is considered a model species for the cereals as it has a near-complete finished genome and a wealth of resources for functional genomics studies. One important component of a genome sequence is accurate, uniform annotation of genes, gene models, and gene function. Using a suite of automated, semi-automated, and manual computational methods, my group has annotated the rice genome rice genome. We have generated pseudomolecules to represent the 12 rice chromosomes, modeled genes, incorporated experimental evidence into gene models, generated deep, rich functional annotation of the genome, and identified related sequences in other plant species (Yuan et al. Plant Physiol. 2005 Ouyang et al. 2007). We have deployed a the genome browser for the rice genome in which >60 tracks of annotation are displayed. We have identified nearly 42,000 non-transposable element related genes in the rice genome and have initiated analyses of the rice genome and its predicted proteome to provide insights into the biology of this model species. The large number of genes in rice is attributable to the substantial segmental duplication that involved nearly half of the genome (Lin et al. 2006). One impact of this duplication is the generation of large gene families and in providing new genes for diversity. Indeed, nearly half of the predicted rice proteome can be found in paralogous families. We have analyzed alternative splicing in rice and observed that alternative splicing is not only widespread but also that a surprising number of alternative splice forms result in a significant change in coding sequence, suggesting a potential pathway for non-sense mediated decay of mRNAs in rice (Campbell et al. 2006). Rice has an immense level of diversity that provides a genetic resource for improving germplasm. Using the Perlegen hybridization-based re-sequencing technology to identify single nucleotide polymorphism data, we are collaborating with multiple scientists to generate a “hapmap” for rice (http://irfgc.irri.org/; McNalley et al. 2006). With access to sequence data from 184 plant species, we have been able to examine the conservation of predicted rice genes throughout the Plant Kingdom (Zhu and Buell 2007). Through these analyses, we have been able to identify core sets of conserved genes across not just the Poaceae but also angiosperms and the Plant .
