Plant genomes encode a complex and evolutionary diverse regulatory grammar that forms the basis for most life on earth. A wealth of regulome and epigenome data have been generated in various plant species, but no common, standardized resource is available so far for biologists. Here, we present ChIP-Hub, an integrative web-based platform in the ENCODE standards that bundles >10,000 publicly available datasets reanalyzed from >40 plant species, allowing visualization and meta-analysis. We manually curate the datasets through assessing ~540 original publications and comprehensively evaluate their data quality. As a proof of concept, we extensively survey the co-association of different regulators and construct a hierarchical regulatory network under a broad developmental context. Furthermore, we show how our annotation allows to investigate the dynamic activity of tissue-specific regulatory elements (promoters and enhancers) and their underlying sequence grammar. Finally, we analyze the function and conservation of tissue-specific promoters, enhancers and chromatin states using comparative genomics approaches. Taken together, the ChIP-Hub platform and the analysis results provide rich resources for deep exploration of plant ENCODE. ChIP-Hub is available at https://biobigdata.nju.edu.cn/ChIPHub/.
doi: 10.1038/s41467-022-30770-1 Google Scholar
Details Website Chinese Report
Enhancers are critical for developmental stage-specific gene expression, but their dynamic regulation in plants remains poorly understood. Here we compare genome-wide localization of H3K27ac, chromatin accessibility and transcriptomic changes during flower development in Arabidopsis. H3K27ac prevalently marks promoter-proximal regions, suggesting that H3K27ac is not a hallmark for enhancers in Arabidopsis. We provide computational and experimental evidence to confirm that distal DNase І hypersensitive sites are predictive of enhancers. The predicted enhancers are highly stage-specific across flower development, significantly associated with SNPs for flowering-related phenotypes, and conserved across crucifer species. Through the integration of genome-wide transcription factor (TF) binding datasets, we find that floral master regulators and stage-specific TFs are largely enriched at developmentally dynamic enhancers. Finally, we show that enhancer clusters and intronic enhancers significantly associate with stage-specific gene regulation by floral master TFs. Our study provides insights into the functional flexibility of enhancers during plant development, as well as hints to annotate plant enhancers.
doi: 10.1038/s41467-019-09513-2 PubMed: 30979870 Google Scholar
Floral homeotic transcription factors (TFs) act in a combinatorial manner to specify the organ identities in the flower. However, the architecture and the function of the gene regulatory network (GRN) controlling floral organ specification is still poorly understood. In particular, the interconnections of homeotic TFs, microRNAs (miRNAs) and other factors controlling organ initiation and growth have not been studied systematically so far. Here, using a combination of genome-wide TF binding, mRNA and miRNA expression data, we reconstruct the dynamic GRN controlling floral meristem development and organ differentiation. We identify prevalent feed-forward loops (FFLs) mediated by floral homeotic TFs and miRNAs that regulate common targets. Experimental validation of a coherent FFL shows that petal size is controlled by the SEPALLATA3-regulated miR319/TCP4 module. We further show that combinatorial DNA-binding of homeotic factors and selected other TFs is predictive of organ-specific patterns of gene expression. Our results provide a valuable resource for studying molecular regulatory processes underlying floral organ specification in plants.
doi: 10.1038/s41467-018-06772-3 PubMed: 30382087 Google Scholar
Significantly improved crop varieties are urgently needed to feed the rapidly growing human population under changing climates. While genome sequence information and excellent genomic tools are in place for major crop species, the systematic quantification of phenotypic traits or components thereof in a high-throughput fashion remains an enormous challenge. In order to help bridge the genotype to phenotype gap, we developed a comprehensive framework for high-throughput phenotype data analysis in plants, which enables the extraction of an extensive list of phenotypic traits from nondestructive plant imaging over time. As a proof of concept, we investigated the phenotypic components of the drought responses of 18 different barley (Hordeum vulgare) cultivars during vegetative growth. We analyzed dynamic properties of trait expression over growth time based on 54 representative phenotypic features. The data are highly valuable to understand plant development and to further quantify growth and crop performance features. We tested various growth models to predict plant biomass accumulation and identified several relevant parameters that support biological interpretation of plant growth and stress tolerance. These image-based traits and model-derived parameters are promising for subsequent genetic mapping to uncover the genetic basis of complex agronomic traits. Taken together, we anticipate that the analytical framework and analysis results presented here will be useful to advance our views of phenotypic trait components underlying plant development and their responses to environmental cues.
doi: 10.1105/tpc.114.129601 PubMed: 25501589 Google Scholar
Details Source Code
Our knowledge of the role of higher-order chromatin structures in transcription of microRNA genes (MIRs) is evolving rapidly. Here we investigate the effect of 3D architecture of chromatin on the transcriptional regulation of MIRs. We demonstrate that MIRs have transcriptional features that are similar to protein-coding genes. RNA polymerase II-associated ChIA-PET data reveal that many groups of MIRs and protein-coding genes are organized into functionally compartmentalized chromatin communities and undergo coordinated expression when their genomic loci are spatially colocated. We observe that MIRs display widespread communication in those transcriptionally active communities. Moreover, miRNA-target interactions are significantly enriched among communities with functional homogeneity while depleted from the same community from which they originated, suggesting MIRs coordinating function-related pathways at posttranscriptional level. Further investigation demonstrates the existence of spatial MIR-MIR chromatin interacting networks. We show that groups of spatially coordinated MIRs are frequently from the same family and involved in the same disease category. The spatial interaction network possesses both common and cell-specific subnetwork modules that result from the spatial organization of chromatin within different cell types. Together, our study unveils an entirely unexplored layer of MIR regulation throughout the human genome that links the spatial coordination of MIRs to their co-expression and function.
doi: 10.1093/nar/gkt1294 PubMed: 24357409 Google Scholar
See also: Google Scholar - Pubmed
Jump to year: 2009 / 2010 / 2011 / 2012 / 2013 / 2014 / 2015 / 2016 / 2017 / 2018 / 2019 / 2020 / 2021 / 2022
Non-human primates are attractive laboratory animal models that accurately reflect both developmental and pathological features of humans. Here we present a compendium of cell types across multiple organs in cynomolgus monkeys (Macaca fascicularis) using both single-cell chromatin accessibility and RNA sequencing data. The integrated cell map enables in-depth dissection and comparison of molecular dynamics, cell-type compositions and cellular heterogeneity across multiple tissues and organs. Using single-cell transcriptomic data, we infer pseudotime cell trajectories and cell-cell communications to uncover key molecular signatures underlying their cellular processes. Furthermore, we identify various cell-specific cis-regulatory elements and construct organ-specific gene regulatory networks at the single-cell level. Finally, we perform comparative analyses of single-cell landscapes among mouse, monkey and human. We show that cynomolgus monkey has strikingly higher degree of similarities in terms of immune-associated gene expression patterns and cellular communications to human than mouse. Taken together, our study provides a valuable resource for non-human primate cell biology.
Details Website Chinese Report
Plant genomes contain a large fraction of non-coding sequences. Discovery andannotation of conserved non-coding sequences (CNSs) in plants is an ongoingchallenge. Here report the we application of comparative genomics tosystematically identify CNSs in 50 well-annotated Gramineae genomes using rice(Oryza sativa) as the reference. We conduct multiple-way whole genome alignmentsto the rice genome. The rice genome is annotated as 20 conservation states (CSs)at single nucleotide resolution using a multivariate hidden Markov model(ConsHMM) based on the multiple-genome alignments. Different states showdistinct enrichments for various genomic features and the conservation scores ofCSs are highly correlated with the level of associated chromatin accessibility.We find that at least 33.5% of the rice genome is highly under selection withmore than 70% of the sequence lying outside of coding regions. A catalog of855,366 regulatory CNSs is generated and they significantly overlapped withputative active regulatory elements such as promoters, enhancers, andtranscription factor binding sites. Collectively, our study provides a resourcefor studying functional non-coding regions of the rice genome and anevolutionary aspect of regulatory sequences in higher plants.
doi: 10.1016/j.jgg.2022.04.003 PubMed: 35470092 Google Scholar
Hibiscus hamabo is a semi-mangrove species with strong tolerance to salt andwaterlogging stress. However, the molecular basis and mechanisms that underliethis strong adaptability to harsh environments remain poorly understood. Here,we assembled a high-quality, chromosome-level genome of this semi-mangrove plantand analyzed its transcriptome under different stress treatments to revealregulatory responses and mechanisms. Our analyses suggested that H. hamabo hasundergone two recent successive polyploidy events, a whole-genome duplicationfollowed by a whole-genome triplication, resulting in an unusually large genenumber (107 309 genes). Comparison of the H. hamabo genome with that of itsclose relative Hibiscus cannabinus, which has not experienced a recent WGT,indicated that genes associated with high stress resistance have beenpreferentially preserved in the H. hamabo genome, suggesting an underlyingassociation between polyploidy and stronger stress resistance. Transcriptomicdata indicated that genes in the roots and leaves responded differently tostress. In roots, genes that regulate ion channels involved in biosynthetic andmetabolic processes responded quickly to adjust the ion concentration andprovide metabolic products to protect root cells, whereas no such rapid responsewas observed from genes in leaves. Using co-expression networks, potentialstress resistance genes were identified for use in future functionalinvestigations. The genome sequence, along with several transcriptome datasets,provide insights into genome evolution and the mechanism of salt andwaterlogging tolerance in H. hamabo, suggesting the importance ofpolyploidization for environmental adaptation.
doi: 10.1093/hr/uhac067 PubMed: 35480957 Google Scholar
OBJECTIVE: Protein tyrosine kinases regulate osteoarthritis (OA) progression byactivating a series of signal transduction pathways. However, the roles ofprotein tyrosine phosphatases (PTPs) in OA remain obscure. This study wasundertaken to identify specific PTPs involved in OA and investigate theirunderlying mechanisms.METHODS: The expression of 107 PTP genes in human OA cartilage was analyzedbased on a single-cell sequencing data set. The enzyme activity of the PTP SH2domain-containing phosphatase 2 (SHP-2) was detected in primary chondrocytesafter interleukin-1β (IL-1β) treatment and in human OA cartilage. Mice subjectedto destabilization of the medial meniscus (DMM) and IL-1β-stimulated mouseprimary chondrocytes were treated with an SHP-2 inhibitor or celecoxib (a drugused for the clinical treatment of OA). The function of SHP-2 in OA pathogenesiswas further verified in Aggrecan-CreERT ;SHP2flox/flox mice. The downstreamprotein expression profile and dephosphorylated substrate of SHP-2 were examinedby tandem mass tag labeling-based global proteomic analysis and stable isotopelabeling with amino acids in cell culture-labeled tyrosine phosphoproteomicanalysis, respectively.RESULTS: SHP-2 enzyme activity significantly increased in human OA samples withserious articular cartilage injury and in IL-1β-stimulated mouse chondrocytes.Pharmacologic inhibition or genetic deletion of SHP-2 ameliorated OAprogression. SHP-2 inhibitors dramatically reduced the expression of cartilagedegradation-related genes and simultaneously promoted the expression ofcartilage synthesis-related genes. Mechanistically, SHP-2 inhibition suppressedthe dephosphorylation of docking protein 1 and subsequently reduced theexpression of uridine phosphorylase 1 and increased the uridine level, therebycontributing to the homeostasis of cartilage metabolism.CONCLUSION: SHP-2 is a novel accelerator of the imbalance in cartilagehomeostasis. Specific inhibition of SHP-2 may ameliorate OA by maintaining theanabolic-catabolic balance.
doi: 10.1002/art.41988 PubMed: 34569725 Google Scholar
Psoriasis is a complex chronic inflammatory skin disease with unclear molecular mechanisms. We found that the Src homology-2 domain-containing protein tyrosine phosphatase-2 (SHP2) was highly expressed in both psoriatic patients and imiquimod (IMQ)-induced psoriasis-like mice. Also, the SHP2 allosteric inhibitor SHP099 reduced pro-inflammatory cytokine expression in PBMCs taken from psoriatic patients. Consistently, SHP099 significantly ameliorated IMQ-triggered skin inflammation in mice. Single-cell RNA sequencing of murine skin demonstrated that SHP2 inhibition impaired skin inflammation in myeloid cells, especially macrophages. Furthermore, IMQ-induced psoriasis-like skin inflammation was significantly alleviated in myeloid cells (monocytes, mature macrophages, and granulocytes)-but not dendritic cells conditional SHP2 knockout mice. Mechanistically, SHP2 promoted the trafficking of toll-like receptor 7 (TLR7) from the Golgi to the endosome in macrophages by dephosphorylating TLR7 at Tyr1024, boosting the ubiquitination of TLR7 and NF-κB-mediated skin inflammation. Importantly, Tlr7 point-mutant knock-in mice showed an attenuated psoriasis-like phenotype compared to wild-type littermates following IMQ treatment. Collectively, our findings identify SHP2 as a novel regulator of psoriasis and suggest that SHP2 inhibition may be a promising therapeutic approach for psoriatic patients.
doi: 10.15252/emmm.202114455 PubMed: 34936223 Google Scholar
With advances in high-throughput sequencing technologies, quantitative geneticsapproac hes have provided insights into genetic basis of many complex diseases.Emerging in-depth multi-omics profiling technologies have created excitingopportunities for systematica lly investigating intricate interaction networkswith different layers of biological molecules underlying disease etiology.Herein, we summarized two main categories of biologi cal networks: evidence-basedand statistically inferred. These different types of molecular networkscomplement each other at both bulk and single-cell levels. We also review t hreemain strategies to incorporate quantitative genetics results with multi-omicsdata by network analysis: (a) network propagation, (b) functional module-basedmethods, (c) co mparative/dynamic networks. These strategies not only aid inelucidating molecular mechanisms of complex diseases but can guide the searchfor therapeutic targets.
doi: 10.1016/j.cbpa.2021.102101 PubMed: 34861483 Google Scholar
doi: 10.1111/pbi.13675 PubMed: 34310056 Google Scholar
Colorectal cancer (CRC), a malignant tumor worldwide consists of microsatelliteinstability (MSI) and stable (MSS) phenotypes. Although SHP2 is a hopeful targetfor cancer therapy, its relationship with innate immunosuppression remainselusive. To address that, single-cell RNA sequencing was performed to explorethe role of SHP2 in all cell types of tumor microenvironment (TME) from murineMC38 xenografts. Intratumoral cells were found to be functionally heterogeneousand responded significantly to SHP099, a SHP2 allosteric inhibitor. Themalignant evolution of tumor cells was remarkably arrested by SHP099.Mechanistically, STING-TBK1-IRF3-mediated type I interferon signaling was highlyactivated by SHP099 in infiltrated myeloid cells. Notably, CRC patients with MSSphenotype exhibited greater macrophage infiltration and more potent SHP2phosphorylation in CD68+ macrophages than MSI-high phenotypes, suggesting thepotential role of macrophagic SHP2 in TME. Collectively, our data reveals amechanism of innate immunosuppression mediated by SHP2, suggesting that SHP2 isa promising target for colon cancer immunotherapy.
doi: 10.1016/j.apsb.2021.08.006 PubMed: 35127377 Google Scholar
Nucleotide-binding leucine-rich-repeat (NLR) genes comprise the largest familyof plant disease-resistance genes. Angiosperm NLR genes are phylogeneticallydivided into the TNL, CNL, and RNL subclasses. NLR copy numbers and subclasscomposition vary tremendously across angiosperm genomes. However, theevolutionary associations between genomic NLR content and ecological adaptation,or between NLR content and signal transduction components, are poorlycharacterized because of limited genome availability. In this study, weestablished an angiosperm NLR atlas (ANNA, https://biobigdata.nju.edu.cn/ANNA/)that includes NLR genes from over 300 angiosperm genomes. Using ANNA, werevealed that NLR copy numbers differ up to 66-fold among closely relatedspecies owing to rapid gene loss and gain. Interestingly, NLR contraction wasassociated with adaptations to aquatic, parasitic, and carnivorous lifestyles.The convergent NLR reduction in aquatic plants resembles the lack of NLRexpansion during the long-term evolution of green algae before the colonizationof land. A co-evolutionary pattern between NLR subclasses and plant immunepathway components was also identified, suggesting that immune pathwaydeficiencies may drive TNL loss. Finally, we identified a conserved TNL lineagethat may function independently of the EDS1-SAG101-NRG1 module. Collectively,these findings provide new insights into the evolution of NLR genes in thecontext of ecological adaptation and genome content variation.
doi: 10.1016/j.molp.2021.08.001 PubMed: 34364002 Google Scholar
doi: 10.1007/s11427-020-1717-9 PubMed: 32394245 Google Scholar
Lamin proteins in animals are implicated in important nuclear functions,including chromatin organization, signalling transduction, gene regulation andcell differentiation. Nuclear Matrix Constituent Proteins (NMCPs) are laminanalogues in plants, but their regulatory functions remain largely unknown. Wereport that OsNMCP1 is localized at the nuclear periphery in rice (Oryza sativa)and induced by drought stress. OsNMCP1 overexpression resulted in a deeper andthicker root system, and enhanced drought resistance compared to the wild-typecontrol. An assay for transposase accessible chromatin with sequencing(ATAC-seq) analysis revealed that OsNMCP1-overexpression altered chromatinaccessibility in hundreds of genes related to drought resistance and rootgrowth, including OsNAC10, OsERF48, OsSGL, SNAC1 and OsbZIP23. OsNMCP1 caninteract with SWITCH/SUCROSE NONFERMENTING (SWI/SNF) chromatin remodellingcomplex subunit OsSWI3C. The reported drought resistance or root growth-relatedgenes that were positively regulated by OsNMCP1 were negatively regulated byOsSWI3C under drought stress conditions, and OsSWI3C overexpression led todecreased drought resistance. We propose that the interaction between OsNMCP1and OsSWI3C under drought stress conditions may lead to the release of OsSWI3Cfrom the SWI/SNF gene silencing complex, thus changing chromatin accessibilityin the genes related to root growth and drought resistance.
doi: 10.1111/nph.16518 PubMed: 32129897 Google Scholar
With ongoing climate change, drought events are becoming more frequent and willaffect biomass formation when occurring during pre-flowering stages. We exploredgrowth over time under such a drought scenario, via non-invasive imaging andrevealed the underlying key genetic factors in spring barley. By comparing withwell-watered conditions investigated in an earlier study and includinginformation on timing, QTL could be classified as constitutive, drought orrecovery-adaptive. Drought-adaptive QTL were found in the vicinity of genesinvolved in dehydration tolerance such as dehydrins (Dhn4, Dhn7, Dhn8, and Dhn9)and aquaporins (e.g. HvPIP1;5, HvPIP2;7, and HvTIP2;1). The influence ofphenology on biomass formation increased under drought. Accordingly, the mainQTL during recovery was the region of HvPPD-H1. The most important constitutiveQTL for late biomass was located in the vicinity of HvDIM, while the main locusfor seedling biomass was the HvWAXY region. The disappearance of QTL marked thegenetic architecture of tiller number. The most important constitutive QTL waslocated on 6HS in the region of 1-FEH. Stage and tolerance specific QTL mightprovide opportunities for genetic manipulation to stabilize biomass and tillernumber under drought conditions and thereby also grain yield.
doi: 10.3389/fpls.2019.01307 PubMed: 31708943 Google Scholar
BACKGROUND: Adaptation to drought-prone environments requires robust root architecture. Genotypes with a more vigorous root system have the potential to better adapt to soils with limited moisture content. However, root architecture is complex at both, phenotypic and genetic level. Customized mapping panels in combination with efficient screenings methods can resolve the underlying genetic factors of root traits. RESULTS: A mapping panel of 233 spring barley genotypes was evaluated for root and shoot architecture traits under non-stress and osmotic stress. A genome-wide association study elucidated 65 involved genomic regions. Among them were 34 root-specific loci, eleven hotspots with associations to up to eight traits and twelve stress-specific loci. A list of candidate genes was established based on educated guess. Selected genes were tested for associated polymorphisms. By this, 14 genes were identified as promising candidates, ten remained suggestive and 15 were rejected. The data support the important role of flowering time genes, including HvPpd-H1, HvCry2, HvCO4 and HvPRR73. Moreover, seven root-related genes, HERK2, HvARF04, HvEXPB1, PIN5, PIN7, PME5 and WOX5 are confirmed as promising candidates. For the QTL with the highest allelic effect for root thickness and plant biomass a homologue of the Arabidopsis Trx-m3 was revealed as the most promising candidate. CONCLUSIONS: This study provides a catalogue of hotspots for seedling growth, root and stress-specific genomic regions along with candidate genes for future potential incorporation in breeding attempts for enhanced yield potential, particularly in drought-prone environments. Root architecture is under polygenic control. The co-localization of well-known major genes for barley development and flowering time with QTL hotspots highlights their importance for seedling growth. Association analysis revealed the involvement of HvPpd-H1 in the development of the root system. The co-localization of root QTL with HERK2, HvARF04, HvEXPB1, PIN5, PIN7, PME5 and WOX5 represents a starting point to explore the roles of these genes in barley. Accordingly, the genes HvHOX2, HsfA2b, HvHAK2, and Dhn9, known to be involved in abiotic stress response, were located within stress-specific QTL regions and await future validation.
doi: 10.1186/s12870-019-1828-5 PubMed: 31122195 Google Scholar
The wave of high-throughput technologies in genomics and phenomics are enabling data to be generated on an unprecedented scale and at a reasonable cost. Exploring the large-scale data sets generated by these technologies to derive biological insights requires efficient bioinformatic tools. Here we introduce an interactive, open-source web application (HTPmod) for high-throughput biological data modeling and visualization. HTPmod is implemented with the Shiny framework by integrating the computational power and professional visualization of R and including various machine-learning approaches. We demonstrate that HTPmod can be used for modeling and visualizing large-scale, high-dimensional data sets (such as multiple omics data) under a broad context. By reinvestigating example data sets from recent studies, we find not only that HTPmod can reproduce results from the original studies in a straightforward fashion and within a reasonable time, but also that novel insights may be gained from fast reinvestigation of existing data by HTPmod.
doi: 10.1038/s42003-018-0091-x PubMed: 30271970 Google Scholar
Details Website Source Code
In wheat (Triticum spp.), modifying inflorescence (spike) morphology can increase grain number and size and thus improve yield. Here, we demonstrated the potential for manipulating and predicting spike morphology, based on 44 traits. In 12 wheat cultivars, we observed that detillering (removal of branches), which alters photosynthate distribution, changed spike morphology. Our genome-wide association study detected close associations between carbon partitioning (e.g. tiller number, main shoot dry weight) and spike morphology (e.g. spike length, spikelet density) traits in 210 cultivars. Most carbon-partitioning traits (e.g. tiller dry weight, harvest index) demonstrated high prediction abilities (>0.5). For spike morphology, some traits (e.g. total and fertile spikelet number, spike length) displayed high prediction abilities (0.3-0.5), but others (e.g. spikelet fertility, spikelet density) exhibited low prediction abilities (<0.2). Grain size traits were closely correlated in field and greenhouse experiments. Stepwise regression analysis suggests that significantly associated traits in the greenhouse explain 35.35% of the variation in grain yield and 67.63% of the variation in thousand-kernel weight in the field. Therefore, the traits identified in this study affect spike morphology; these traits can be used to predict and improve plant architecture and thus increase yield.
doi: 10.1038/s41598-018-31977-3 PubMed: 30258057 Google Scholar
Targeted changes in chromatin state at thousands of genes are central to eukaryotic development. RELATIVE OF EARLY FLOWERING 6 (REF6) is a Jumonji-type histone demethylase that counteracts Polycomb repressive complex 2 (PRC2)-mediated gene silencing in plants and was reported to select its binding sites in a direct, sequence-specific manner1-3. Here we show that REF6 and its two close paralogues determine spatial 'boundaries' of the repressive histone H3K27me3 mark in the genome and control the tissue-specific release from PRC2-mediated gene repression. Targeted mutagenesis revealed that these histone demethylases display pleiotropic, redundant functions in plant development, several of which depend on trans factor-mediated recruitment. Thus, Jumonji-type histone demethylases restrict repressive chromatin domains and contribute to tissue-specific gene activation via complementary targeting mechanisms.
doi: 10.1038/s41477-018-0219-5 PubMed: 30104650 Google Scholar
Deoxyribonuclease I (DNase I)-hypersensitive site sequencing (DNase-seq) has been widely used to determine chromatin accessibility and its underlying regulatory lexicon. However, exploring DNase-seq data requires sophisticated downstream bioinformatics analyses. In this study, we first review computational methods for all of the major steps in DNase-seq data analysis, including experimental design, quality control, read alignment, peak calling, annotation of cis-regulatory elements, genomic footprinting and visualization. The challenges associated with each step are highlighted. Next, we provide a practical guideline and a computational pipeline for DNase-seq data analysis by integrating some of these tools. We also discuss the competing techniques and the potential applications of this pipeline for the analysis of analogous experimental data. Finally, we discuss the integration of DNase-seq with other functional genomics techniques.
doi: 10.1093/bib/bby057 PubMed: 30010713 Google Scholar
Flowering time is an important factor affecting grain yield in wheat. In this study, we divided reproductive spike development into eight sub-phases. These sub-phases have the potential to be delicately manipulated to increase grain yield. We measured 36 traits with regard to sub-phase durations, determined three grain yield-related traits in eight field environments and mapped 15 696 single nucleotide polymorphism (SNP, based on 90k Infinium chip and 35k Affymetrix chip) markers in 210 wheat genotypes. Phenotypic and genetic associations between grain yield traits and sub-phase durations showed significant consistency (Mantel test; r = 0.5377, P < 0.001). The shared quantitative trait loci (QTLs) revealed by the genome-wide association study suggested a close association between grain yield and sub-phase duration, which may be attributed to effects on spikelet initiation/spikelet number (double ridge to terminal spikelet stage, DR-TS) and assimilate accumulation (green anther to anthesis stage, GA-AN). Moreover, we observed that the photoperiod-sensitivity allele at the Ppd-D1 locus on chromosome 2D markedly extended all sub-phase durations, which may contribute to its positive effects on grain yield traits. The dwarfing allele at the Rht-D1 (chromosome 4D) locus altered the sub-phase duration and displayed positive effects on grain yield traits. Data for 30 selected genotypes (from among the original 210 genotypes) in the field displayed a close association with that from the greenhouse. Most importantly, this study demonstrated specific connections to grain yield in narrower time windows (i.e. the eight sub-phases), rather than the entire stem elongation phase as a whole.
doi: 10.1111/tpj.13998 PubMed: 29906301 Google Scholar
Floret development is critical for grain setting in wheat (Triticum aestivum), but more than 50% of grain yield potential (based on the maximum number of floret primordia) is lost during the stem elongation phase (SEP, from the terminal spikelet stage to anthesis). Dynamic plant (e.g., leaf area, plant height) and floret (e.g., anther and ovary size) growth and its connection with grain yield traits (e.g., grain number and width) are not clearly understood. In this study, for the first time, we dissected the SEP into seven stages to investigate plant (first experiment) and floret (second experiment) growth in greenhouse- and field-grown wheat. In the first experiment, the values of various plant growth trait indices at different stages were generally consistent between field and greenhouse and were independent of the environment. However, at specific stages, some traits significantly differed between the two environments. In the second experiment, phenotypic and genotypic similarity analysis revealed that grain number and size corresponded closely to ovary size at anthesis, suggesting that ovary size is strongly associated with grain number and size. Moreover, principal component analysis (PCA) showed that the top six principal components PCs explained 99.13, 98.61, 98.41, 98.35, and 97.93% of the total phenotypic variation at the green anther, yellow anther, tipping, heading, and anthesis stages, respectively. The cumulative variance explained by the first PC decreased with floret growth, with the highest value detected at the green anther stage (88.8%) and the lowest at the anthesis (50.09%). Finally, ovary size at anthesis was greater in wheat accessions with early release years than in accessions with late release years, and anther/ovary size shared closer connections with grain number/size traits at the late vs. early stages of floral development. Our findings shed light on the dynamic changes in plant and floret growth-related traits in wheat and the effects of the environment on these traits.
doi: 10.3389/fpls.2018.00330 PubMed: 29599792 Google Scholar
Background: Image-based high-throughput phenotyping technologies have been rapidly developed in plant science recently, and they provide a great potential to gain more valuable information than traditionally destructive methods. Predicting plant biomass is regarded as a key purpose for plant breeders and ecologists. However, it is a great challenge to find a predictive biomass model across experiments. Results: In the present study, we constructed 4 predictive models to examine the quantitative relationship between image-based features and plant biomass accumulation. Our methodology has been applied to 3 consecutive barley (Hordeum vulgare) experiments with control and stress treatments. The results proved that plant biomass can be accurately predicted from image-based parameters using a random forest model. The high prediction accuracy based on this model will contribute to relieving the phenotyping bottleneck in biomass measurement in breeding applications. The prediction performance is still relatively high across experiments under similar conditions. The relative contribution of individual features for predicting biomass was further quantified, revealing new insights into the phenotypic determinants of the plant biomass outcome. Furthermore, methods could also be used to determine the most important image-based features related to plant biomass accumulation, which would be promising for subsequent genetic mapping to uncover the genetic basis of biomass. Conclusions: We have developed quantitative models to accurately predict plant biomass accumulation from image data. We anticipate that the analysis results will be useful to advance our views of the phenotypic determinants of plant biomass outcome, and the statistical methods can be broadly used for other plant species.
doi: 10.1093/gigascience/giy001 PubMed: 29346559 Google Scholar
Details Source Code
Floral organ identities in plants are specified by the combinatorial action of homeotic master regulatory transcription factors. However, how these factors achieve their regulatory specificities is still largely unclear. Genome-wide in vivo DNA binding data show that homeotic MADS domain proteins recognize partly distinct genomic regions, suggesting that DNA binding specificity contributes to functional differences of homeotic protein complexes. We used in vitro systematic evolution of ligands by exponential enrichment followed by high-throughput DNA sequencing (SELEX-seq) on several floral MADS domain protein homo- and heterodimers to measure their DNA binding specificities. We show that specification of reproductive organs is associated with distinct binding preferences of a complex formed by SEPALLATA3 and AGAMOUS. Binding specificity is further modulated by different binding site spacing preferences. Combination of SELEX-seq and genome-wide DNA binding data allows differentiation between targets in specification of reproductive versus perianth organs in the flower. We validate the importance of DNA binding specificity for organ-specific gene regulation by modulating promoter activity through targeted mutagenesis. Our study shows that intrafamily protein interactions affect DNA binding specificity of floral MADS domain proteins. Differential DNA binding of MADS domain protein complexes plays a role in the specificity of target gene regulation.
doi: 10.1105/tpc.17.00145 PubMed: 28733422 Google Scholar
Increasing grain yield is still the main target of wheat breeding; yet today's wheat plants utilize less than half of their yield potential. Owing to the difficulty of determining grain yield potential in a large population, few genetic factors regulating floret fertility (i.e. the difference between grain yield potential and grain number) have been reported to date. In this study, we conducted a genome-wide association study (GWAS) by quantifying 54 traits (16 floret fertility traits and 38 traits for assimilate partitioning and spike morphology) in 210 European winter wheat accessions. The results of this GWAS experiment suggested potential associations between floret fertility, assimilate partitioning and spike morphology revealed by shared quantitative trait loci (QTLs). Several candidate genes involved in carbohydrate metabolism, phytohormones or floral development colocalized with such QTLs, thereby providing potential targets for selection. Based on our GWAS results we propose a genetic network underlying floret fertility and related traits, nominating determinants for improved yield performance.
doi: 10.1111/nph.14342 PubMed: 27918076 Google Scholar
BACKGROUND: Plant phenotypic data shrouds a wealth of information which, when accurately analysed and linked to other data types, brings to light the knowledge about the mechanisms of life. As phenotyping is a field of research comprising manifold, diverse and time-consuming experiments, the findings can be fostered by reusing and combining existing datasets. Their correct interpretation, and thus replicability, comparability and interoperability, is possible provided that the collected observations are equipped with an adequate set of metadata. So far there have been no common standards governing phenotypic data description, which hampered data exchange and reuse. RESULTS: In this paper we propose the guidelines for proper handling of the information about plant phenotyping experiments, in terms of both the recommended content of the description and its formatting. We provide a document called "Minimum Information About a Plant Phenotyping Experiment", which specifies what information about each experiment should be given, and a Phenotyping Configuration for the ISA-Tab format, which allows to practically organise this information within a dataset. We provide examples of ISA-Tab-formatted phenotypic data, and a general description of a few systems where the recommendations have been implemented. CONCLUSIONS: Acceptance of the rules described in this paper by the plant phenotyping community will help to achieve findable, accessible, interoperable and reusable data.
doi: 10.1186/s13007-016-0144-4 PubMed: 27843484 Google Scholar
BACKGROUND: The efficiency of multiplex editing in plants by the RNA-guided Cas9 system is limited by efficient introduction of its components into the genome and by their activity. The possibility of introducing large fragment deletions by RNA-guided Cas9 tool provides the potential to study the function of any DNA region of interest in its 'endogenous' environment. RESULTS: Here, an RNA-guided Cas9 system was optimized to enable efficient multiplex editing in Arabidopsis thaliana. We demonstrate the flexibility of our system for knockout of multiple genes, and to generate heritable large-fragment deletions in the genome. As a proof of concept, the function of part of the second intron of the flower development gene AGAMOUS in Arabidopsis was studied by generating a Cas9-free mutant plant line in which part of this intron was removed from the genome. Further analysis revealed that deletion of this intron fragment results 40 % decrease of AGAMOUS gene expression without changing the splicing of the gene which indicates that this regulatory region functions as an activator of AGAMOUS gene expression. CONCLUSIONS: Our modified RNA-guided Cas9 system offers a versatile tool for the functional dissection of coding and non-coding DNA sequences in plants.
doi: 10.1186/s13007-016-0125-7 PubMed: 27118985 Google Scholar
Flower development is a model system to understand organ specification in plants. The identities of different types of floral organs are specified by homeotic MADS transcription factors that interact in a combinatorial fashion. Systematic identification of DNA-binding sites and target genes of these key regulators show that they have shared and unique sets of target genes. DNA binding by MADS proteins is not based on 'simple' recognition of a specific DNA sequence, but depends on DNA structure and combinatorial interactions. Homeotic MADS proteins regulate gene expression via alternative mechanisms, one of which may be to modulate chromatin structure and accessibility in their target gene promoters.
doi: 10.1016/j.pbi.2015.12.004 PubMed: 26802807 Google Scholar
Due to an increase in the consumption of food, feed, fuel and to meet global food security needs for the rapidly growing human population, there is a necessity to breed high yielding crops that can adapt to the future climate changes, particularly in developing countries. To solve these global challenges, novel approaches are required to identify quantitative phenotypes and to explain the genetic basis of agriculturally important traits. These advances will facilitate the screening of germplasm with high performance characteristics in resource-limited environments. Recently, plant phenomics has offered and integrated a suite of new technologies, and we are on a path to improve the description of complex plant phenotypes. High-throughput phenotyping platforms have also been developed that capture phenotype data from plants in a non-destructive manner. In this review, we discuss recent developments of high-throughput plant phenotyping infrastructure including imaging techniques and corresponding principles for phenotype data analysis.
doi: 10.3389/fpls.2015.00619 PubMed: 26322060 Google Scholar
Recent methodological developments in plant phenotyping, as well as the growing importance of its applications in plant science and breeding, are resulting in a fast accumulation of multidimensional data. There is great potential for expediting both discovery and application if these data are made publicly available for analysis. However, collection and storage of phenotypic observations is not yet sufficiently governed by standards that would ensure interoperability among data providers and precisely link specific phenotypes and associated genomic sequence information. This lack of standards is mainly a result of a large variability of phenotyping protocols, the multitude of phenotypic traits that are measured, and the dependence of these traits on the environment. This paper discusses the current situation of standardization in the area of phenomics, points out the problems and shortages, and presents the areas that would benefit from improvement in this field. In addition, the foundations of the work that could revise the situation are proposed, and practical solutions developed by the authors are introduced.
doi: 10.1093/jxb/erv271 PubMed: 26044092 Google Scholar
Natural antisense transcripts (NATs) are endogenous transcripts that can form double-stranded RNA structures. Many protein-coding genes (PCs) and non-protein-coding genes (NPCs) tend to form cis-NATs and trans-NATs, respectively. In this work, we identified 4,080 cis-NATs and 2,491 trans-NATs genome-widely in Arabidopsis. Of these, 5,385 NAT-siRNAs were detected from the small RNA sequencing data. NAT-siRNAs are typically 21nt, and are processed by Dicer-like 1 (DCL1)/DCL2 and RDR6 and function in epigenetically activated situations, or 24nt, suggesting these are processed by DCL3 and RDR2 and function in environment stress. NAT-siRNAs are significantly derived from PC/PC pairs of trans-NATs and NPC/NPC pairs of cis-NATs. Furthermore, NAT pair genes typically have similar pattern of epigenetic status. Cis-NATs tend to be marked by euchromatic modifications, whereas trans-NATs tend to be marked by heterochromatic modifications.
doi: 10.1093/dnares/dsv008 PubMed: 25922535 Google Scholar
Anther and ovary development play an important role in grain setting, a crucial factor determining wheat (Triticum aestivum L.) yield. One aim of this study was to determine the heritability of anther and ovary size at different positions within a spikelet at seven floral developmental stages and conduct a variance components analysis. Relationships between anther and ovary size and other traits were also assessed. The thirty central European winter wheat genotypes used in this study were based on reduced height (Rht) and photoperiod sensitivity (Ppd) genes with variable genetic backgrounds. Identical experimental designs were conducted in a greenhouse and field simultaneously. Heritability of anther and ovary size indicated strong genetic control. Variance components analysis revealed that anther and ovary sizes of floret 3 (i.e. F3, the third floret from the spikelet base) and floret 4 (F4) were more sensitive to the environment compared with those in floret 1 (F1). Good correlations were found between spike dry weight and anther and ovary size in both greenhouse and field, suggesting that anther and ovary size are good predictors of each other, as well as spike dry weight in both conditions. Relationships between spike dry weight and anther and ovary size at F3/4 positions were stronger than at F1, suggesting that F3/4 anther and ovary size are better predictors of spike dry weight. Generally, ovary size showed a closer relationship with spike dry weight than anther size, suggesting that ovary size is a more reliable predictor of spike dry weight.
doi: 10.1093/jxb/erv117 PubMed: 25821074 Google Scholar
Phenotyping large numbers of genotypes still represents the rate-limiting step in many plant genetic experiments and in breeding. To address this issue, novel automated phenotyping technologies have been developed. We investigated for a core set of barley cultivars if high-throughput image analysis can help to dissect vegetative biomass accumulation in response to two different watering regimes under semi-controlled greenhouse conditions. We found that experiments, treatments, genotypes and genotype by environment interaction (G × E) can be characterized at any time point by certain digital traits. Biomass accumulation under control and stress conditions was highly heritable. Growth model-derived maximum vegetative biomass (K max), inflection point (I) and regrowth rate (k) were identified as promising candidate traits for genome-wide association studies. Drought stress symptoms can be visualized, dissected and modelled. Especially the highly heritable regrowth rate, which had the biggest influence on biomass accumulation in stress treatment, seems promising for future studies to improve drought tolerance in different crop species. A proof of concept study revealed potential correlations between digital traits obtained from pot experiments under greenhouse conditions and agronomic traits from field experiments. Overall, non-invasive, imaging-based phenotyping platforms under greenhouse conditions offer excellent possibilities for trait discovery, trait development and industrial applications.
doi: 10.1111/pce.12516 PubMed: 25689277 Google Scholar
BACKGROUND: In plants, microRNAs (miRNAs) regulate gene expression mainly at the post-transcriptional level. Previous studies have demonstrated that miRNA-mediated gene silencing pathways play vital roles in plant development. Here, we used a high-throughput sequencing approach to characterize the miRNAs and their targeted transcripts in the leaf, flower and fruit of sweet orange. RESULTS: A total of 183 known miRNAs and 38 novel miRNAs were identified. An in-house script was used to identify all potential secondary siRNAs derived from miRNA-targeted transcripts using sRNA and degradome sequencing data. Genome mapping revealed that these miRNAs were evenly distributed across the genome with several small clusters, and 69 pre-miRNAs were co-localized with simple sequence repeats (SSRs). Noticeably, the loop size of pre-miR396c was influenced by the repeat number of CUU unit. The expression pattern of miRNAs among different tissues and developmental stages were further investigated by both qRT-PCR and RNA gel blotting. Interestingly, Csi-miR164 was highly expressed in fruit ripening stage, and was validated to target a NAC transcription factor. This study depicts a global picture of miRNAs and their target genes in the genome of sweet orange, and focused on the comparison among leaf, flower and fruit tissues. CONCLUSIONS: This study provides a global view of miRNAs and their target genes in different tissue of sweet orange, and focused on the identification of miRNA involved in the regulation of fruit ripening. The results of this study lay a foundation for unraveling key regulators of orange fruit development and ripening on post-transcriptional level.
doi: 10.1186/1471-2164-15-695 PubMed: 25142253 Google Scholar
BACKGROUND: Sweet orange (Citrus sinensis) is one of the most important fruits world-wide. Because it is a woody plant with a long growth cycle, genetic studies of sweet orange are lagging behind those of other species. RESULTS: In this analysis, we employed ortholog identification and domain combination methods to predict the protein-protein interaction (PPI) network for sweet orange. The K-nearest neighbors (KNN) classification method was used to verify and filter the network. The final predicted PPI network, CitrusNet, contained 8,195 proteins with 124,491 interactions. The quality of CitrusNet was evaluated using gene ontology (GO) and Mapman annotations, which confirmed the reliability of the network. In addition, we calculated the expression difference of interacting genes (EDI) in CitrusNet using RNA-seq data from four sweet orange tissues, and also analyzed the EDI distribution and variation in different sub-networks. CONCLUSIONS: Gene expression in CitrusNet has significant modular features. Target of rapamycin (TOR) protein served as the central node of the hormone-signaling sub-network. All evidence supported the idea that TOR can integrate various hormone signals and affect plant growth. CitrusNet provides valuable resources for the study of biological functions in sweet orange.
doi: 10.1186/s12870-014-0213-7 PubMed: 25091279 Google Scholar
High-throughput phenotyping is emerging as an important technology to dissect phenotypic components in plants. Efficient image processing and feature extraction are prerequisites to quantify plant growth and performance based on phenotypic traits. Issues include data management, image analysis, and result visualization of large-scale phenotypic data sets. Here, we present Integrated Analysis Platform (IAP), an open-source framework for high-throughput plant phenotyping. IAP provides user-friendly interfaces, and its core functions are highly adaptable. Our system supports image data transfer from different acquisition environments and large-scale image analysis for different plant species based on real-time imaging data obtained from different spectra. Due to the huge amount of data to manage, we utilized a common data structure for efficient storage and organization of data for both input data and result data. We implemented a block-based method for automated image processing to extract a representative list of plant phenotypic traits. We also provide tools for build-in data plotting and result export. For validation of IAP, we performed an example experiment that contains 33 maize (Zea mays 'Fernandez') plants, which were grown for 9 weeks in an automated greenhouse with nondestructive imaging. Subsequently, the image data were subjected to automated analysis with the maize pipeline implemented in our system. We found that the computed digital volume and number of leaves correlate with our manually measured data in high accuracy up to 0.98 and 0.95, respectively. In summary, IAP provides a multiple set of functionalities for import/export, management, and automated analysis of high-throughput plant phenotyping data, and its analysis results are highly reliable.
doi: 10.1104/pp.113.233932 PubMed: 24760818 Google Scholar
Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.
doi: 10.1371/journal.pone.0087723 PubMed: 24489955 Google Scholar
Pandemic H1N1/2009 viruses have been stabilized in swine herds, and some strains display higher pathogenicity than the human-origin isolates. In this study, high-throughput RNA sequencing (RNA-seq) is applied to explore the systemic transcriptome responses of the mouse lungs infected by swine (Jia6/10) and human (LN/09) H1N1/2009 viruses. The transcriptome data show that Jia6/10 activates stronger virus-sensing signals, such as the toll-like receptor, RIG-I like receptor and NOD-like receptor signalings, as well as a stronger NF-κB and JAK-STAT signals, which play significant roles in inducing innate immunity. Most cytokines and interferon-stimulated genes show higher expression lever in Jia/06 infected groups. Meanwhile, virus Jia6/10 activates stronger production of reactive oxygen species, which might further promote higher mutation rate of the virus genome. Collectively, our data reveal that the swine-origin pandemic H1N1/2009 virus elicits a stronger innate immune reaction and pro-oxidation stimulation, which might relate closely to the increasing pathogenicity.
doi: 10.1038/srep01601 PubMed: 23549303 Google Scholar
Transcriptome analysis of early-developing maize (Zea mays) seed was conducted using Illumina sequencing. We mapped 11,074,508 and 11,495,788 paired-end reads from endosperm and embryo, respectively, at 9 d after pollination to define gene structure and alternative splicing events as well as transcriptional regulators of gene expression to quantify transcript abundance in both embryo and endosperm. We identified a large number of novel transcribed regions that did not fall within maize annotated regions, and many of the novel transcribed regions were tissue-specifically expressed. We found that 50.7% (8,556 of 16,878) of multiexonic genes were alternatively spliced, and some transcript isoforms were specifically expressed either in endosperm or in embryo. In addition, a total of 46 trans-splicing events, with nine intrachromosomal events and 37 interchromosomal events, were found in our data set. Many metabolic activities were specifically assigned to endosperm and embryo, such as starch biosynthesis in endosperm and lipid biosynthesis in embryo. Finally, a number of transcription factors and imprinting genes were found to be specifically expressed in embryo or endosperm. This data set will aid in understanding how embryo/endosperm development in maize is differentially regulated.
doi: 10.1104/pp.113.214874 PubMed: 23478895 Google Scholar
Oranges are an important nutritional source for human health and have immense economic value. Here we present a comprehensive analysis of the draft genome of sweet orange (Citrus sinensis). The assembled sequence covers 87.3% of the estimated orange genome, which is relatively compact, as 20% is composed of repetitive elements. We predicted 29,445 protein-coding genes, half of which are in the heterozygous state. With additional sequencing of two more citrus species and comparative analyses of seven citrus genomes, we present evidence to suggest that sweet orange originated from a backcross hybrid between pummelo and mandarin. Focused analysis on genes involved in vitamin C metabolism showed that GalUR, encoding the rate-limiting enzyme of the galacturonate pathway, is significantly upregulated in orange fruit, and the recent expansion of this gene family may provide a genomic basis. This draft genome represents a valuable resource for understanding and improving many important citrus traits in the future.
doi: 10.1038/ng.2472 PubMed: 23179022 Google Scholar
Pseudorabies virus (PRV) belongs to Alphaherpesvirinae subfamily that causes huge economic loss in pig industry worldwide. It has been recently demonstrated that many herpesviruses encode microRNAs (miRNAs), which play crucial roles in viral life cycle. However, the knowledge about PRV-encoded miRNAs is still limited. Here, we report a comprehensive analysis of both viral and host miRNA expression profiles in PRV-infected porcine epithelial cell line (PK-15). Deep sequencing data showed that the ∼4.6 kb intron of the large latency transcript (LLT) functions as a primary microRNA precursor (pri-miRNA) that encodes a cluster of 11 distinct miRNAs in the PRV genome, and 209 known and 39 novel porcine miRNAs were detected. Viral miRNAs were further confirmed by stem-loop RT-PCR and northern blot analysis. Intriguingly, all of these viral miRNAs exhibited terminal heterogeneity both at the 5' and 3' ends. Seven miRNA genes produced mature miRNAs from both arms and two of the viral miRNA genes showed partially overlapped in their precursor regions. Unexpectedly, a terminal loop-derived small RNA with high abundance and one special miRNA offset RNA (moRNA) were processed from a same viral miRNA precursor. The polymorphisms of viral miRNAs shed light on the complexity of host miRNA-processing machinery and viral miRNA-regulatory mechanism. The swine genes and PRV genes were collected for target prediction of the viral miRNAs, revealing a complex network formed by both host and viral genes. GO enrichment analysis of host target genes suggests that PRV miRNAs are involved in complex cellular pathways including cell death, immune system process, metabolic pathway, indicating that these miRNAs play significant roles in virus-cells interaction of PRV and its hosts. Collectively, these data suggest that PRV infected epithelial cell line generates a diverse set of host miRNAs and a special cluster of viral miRNAs, which might facilitate PRV replication in cells.
doi: 10.1371/journal.pone.0030988 PubMed: 22292087 Google Scholar
Natural antisense transcripts (NATs), as one type of regulatory RNAs, occur prevalently in plant genomes and play significant roles in physiological and pathological processes. Although their important biological functions have been reported widely, a comprehensive database is lacking up to now. Consequently, we constructed a plant NAT database (PlantNATsDB) involving approximately 2 million NAT pairs in 69 plant species. GO annotation and high-throughput small RNA sequencing data currently available were integrated to investigate the biological function of NATs. PlantNATsDB provides various user-friendly web interfaces to facilitate the presentation of NATs and an integrated, graphical network browser to display the complex networks formed by different NATs. Moreover, a 'Gene Set Analysis' module based on GO annotation was designed to dig out the statistical significantly overrepresented GO categories from the specific NAT network. PlantNATsDB is currently the most comprehensive resource of NATs in the plant kingdom, which can serve as a reference database to investigate the regulatory function of NATs. The PlantNATsDB is freely available at http://bis.zju.edu.cn/pnatdb/.
doi: 10.1093/nar/gkr823 PubMed: 22058132 Google Scholar
SUMMARY: MyBioNet is a web-based application for biological network analysis, which provides user-friendly web interfaces to visualize, edit and merge biological networks. In addition, MyBioNet integrated KEGG metabolic network data from 1366 organisms and allows users to search and navigate interesting networks. AVAILABILITY AND IMPLEMENTATION: All KEGG metabolic network data are organized and stored in the MySQL database. MyBioNet is implemented in Flex/Actionscript and PHP languages and deployed on an Apache web server. MyBioNet is accessible through all the Flash-embedded browsers at http://bis.zju.edu.cn/mybionet/. CONTACT: firstname.lastname@example.org.
doi: 10.1093/bioinformatics/btr557 PubMed: 21984760 Google Scholar
Small RNAs (sRNAs), largely known as microRNAs (miRNAs) and short interfering RNAs (siRNAs), emerged as the critical components of genetic and epigenetic regulation in eukaryotic genomes. In animals, a sizable portion of miRNAs reside within the introns of protein-coding genes, designated as mirtron genes. Recently, high-throughput sequencing (HTS) revealed a huge amount of sRNAs that derived from introns in plants, such as the monocot rice (Oryza sativa). However, the biogenesis and the biological functions of this kind of sRNAs remain elusive. Here, we performed a genome-scale survey of intron-derived sRNAs in rice based on HTS data. Several introns were found to have great potential to form internal hairpin structures, and the short hairpins could generate miRNAs while the larger ones could produce siRNAs. Furthermore, 22 introns, termed "sirtrons," were identified from the rice protein-coding genes. The single-stranded sirtrons produced a diverse set of siRNAs from long hairpin structures. These sirtron-derived siRNAs are dominantly 21 nt, 22 nt, and 24 nt in length, whose production relied on DCL4, DCL2, and DCL3, respectively. We also observed a strong tendency for the sirtron-derived siRNAs to be coexpressed with their host genes. Finally, the 24-nt siRNAs incorporated with Argonaute 4 (AGO4) could direct DNA methylation on their host genes. In this regard, homeostatic self-regulation between intron-derived siRNAs and their host genes was proposed.
doi: 10.1261/rna.2589011 PubMed: 21518803 Google Scholar
Rice (Oryza sativa) feeds over half of the global population. A web-based integrated platform for rice microarray annotation and data analysis in various biological contexts is presented, which provides a convenient query for comprehensive annotation compared with similar databases. Coupled with existing rice microarray data, it provides online analysis methods from the perspective of bioinformatics. This comprehensive bioinformatics analysis platform is composed of five modules, including data retrieval, microarray annotation, sequence analysis, results visualization and data analysis. The BioChip module facilitates the retrieval of microarray data information via identifiers of "Probe Set ID", "Locus ID" and "Analysis Name". The BioAnno module is used to annotate the gene or probe set based on the gene function, the domain information, the KEGG biochemical and regulatory pathways and the potential microRNA which regulates the genes. The BioSeq module lists all of the related sequence information by a microarray probe set. The BioView module provides various visual results for the microarray data. The BioAnaly module is used to analyze the rice microarray's data set.
doi: 10.1007/s11427-010-4101-6 PubMed: 21181349 Google Scholar
BACKGROUND: RNA editing is a transcript-based layer of gene regulation. To date, no systemic study on RNA editing of plant nuclear genes has been reported. Here, a transcriptome-wide search for editing sites in nuclear transcripts of Arabidopsis (Arabidopsis thaliana) was performed. RESULTS: MPSS (massively parallel signature sequencing) and PARE (parallel analysis of RNA ends) data retrieved from public databases were utilized, focusing on one-base-conversion editing. Besides cytidine (C)-to-uridine (U) editing in mitochondrial transcripts, many nuclear transcripts were found to be diversely edited. Interestingly, a sizable portion of these nuclear genes are involved in chloroplast- or mitochondrion-related functions, and many editing events are tissue-specific. Some editing sites, such as adenosine (A)-to-U editing loci, were found to be surrounded by peculiar elements. The editing events of some nuclear transcripts are highly enriched surrounding the borders between coding sequences (CDSs) and 3' untranslated regions (UTRs), suggesting site-specific editing. Furthermore, RNA editing is potentially implicated in new start or stop codon generation, and may affect alternative splicing of certain protein-coding transcripts. RNA editing in the precursor microRNAs (pre-miRNAs) of ath-miR854 family, resulting in secondary structure transformation, implies its potential role in microRNA (miRNA) maturation. CONCLUSIONS: To our knowledge, the results provide the first global view of RNA editing in plant nuclear transcripts.
doi: 10.1186/1471-2164-11-S4-S12 PubMed: 21143795 Google Scholar
The currently developed next-generation sequencing (NGS) technology has significantly enhanced our capacity of small RNA (sRNA) exploration. Several ambitious sRNA projects have been established based on the technical support of NGS. Thanks to the high-throughput feature of NGS, huge amounts of sRNA sequencing data have been generated. However, much more research efforts are needed to further exploit these valuable data. In this study, we carried out functional analyses of sRNAs from 26 angiosperms by utilizing public sRNA NGS data. We proposed that the endogenous sRNAs largely represented by the 24-nt ones had a potential role in transposable element (TE) control in both Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa), based on the comparison of sRNA locus densities within the TEs to the non-TE genes. Functional analysis was performed for the predicted targets of the conserved sRNAs that were classified into eudicot-specific, monocot-specific, and angiosperm-conserved ones. Moreover, several miRNA families were found to be highly conserved, and miR396 was suggested to be the most conserved miRNA family between the eudicots and the monocots, indicating its essential role in angiosperm development. At last, we demonstrated that it was a great challenge for researchers to fully exploit these huge sRNA NGS datasets and numerous sRNA species remained to be uncovered and functionally characterized.
doi: 10.1016/j.compbiolchem.2010.10.001 PubMed: 21030312 Google Scholar
MicroRNAs (miRNAs), one type of small RNAs (sRNAs) in plants, play an essential role in gene regulation. Several miRNA databases were established; however, successively generated new datasets need to be collected, organized and analyzed. To this end, we have constructed a plant miRNA knowledge base (PmiRKB) that provides four major functional modules. In the 'SNP' module, single nucleotide polymorphism (SNP) data of seven Arabidopsis (Arabidopsis thaliana) accessions and 21 rice (Oryza sativa) subspecies were collected to inspect the SNPs within pre-miRNAs (precursor microRNAs) and miRNA-target RNA duplexes. Depending on their locations, SNPs can affect the secondary structures of pre-miRNAs, or interactions between miRNAs and their targets. A second module, 'Pri-miR', can be used to investigate the tissue-specific, transcriptional contexts of pre- and pri-miRNAs (primary microRNAs), based on massively parallel signature sequencing data. The third module, 'MiR-Tar', was designed to validate thousands of miRNA-target pairs by using parallel analysis of RNA end (PARE) data. Correspondingly, the fourth module, 'Self-reg', also used PARE data to investigate the metabolism of miRNA precursors, including precursor processing and miRNA- or miRNA*-mediated self-regulation effects on their host precursors. PmiRKB can be freely accessed at http://bis.zju.edu.cn/pmirkb/.
doi: 10.1093/nar/gkq721 PubMed: 20719744 Google Scholar
The approximately 21 nucleotide microRNAs (miRNAs) are one type of well-defined small RNA species, and they play critical roles in various biological processes in organisms. In plants, most miRNAs exert repressive regulation on their targets through cleavage, and a number of miRNA-target pairs have been validated either by modified 5' RACE (rapid amplification of cDNA ends), or by newly developed high-throughput strategies. All these data have greatly advanced our understanding of the regulatory roles of plant miRNAs. On the other hand, deep insights into miRNA precursor processing, and miRNA- or miRNA*-mediated self-regulation of their host precursors could be gained from high-throughput degradome sequencing data, based on the general framework of miRNA generation in plants. Here, the focus is on the recent research progress on this issue, and several interesting points were raised.
doi: 10.1093/jxb/erq209 PubMed: 20643809 Google Scholar
Since the beginning of this century, microRNAs (miRNAs), which are tiny RNA molecules, have become one of the major research topics on gene expression regulation in both animals and plants. The major task of miRNA study is to elucidate how the miRNAs are expressed in vivo, how they exert regulatory effects on their targets, and how they can be qualitatively or quantitatively cloned. For these purposes, the methodology of miRNA study has been developed and significantly improved in recent years. The focus here is on a number of powerful methods for plant miRNA research including bioinformatics tools and experimental approaches being used for upstream or downstream analysis of miRNAs or miRNA cloning. Some discrepancies exist in the miRNA research methodology between plants and animals, for example, 5' modified RACE (Rapid Amplification of cDNA Ends) can be used for cleavage target validation only in plants. However, numerous common methods are shared by these two miRNA research areas. Thus, this review will enhance our understanding of miRNA research methodology in organisms.
doi: 10.1093/jxb/erq087 PubMed: 20388745 Google Scholar
High-throughput sequencing (HTS) has opened up a new era for small RNA (sRNA) exploration. Using HTS data for a global survey of sRNAs in 26 angiosperms, elevated GC contents were detected in the monocots, whereas the 5(')-terminal compositions were quite uniform among the angiosperms. Chromosome-wide distribution patterns of sRNAs were investigated by using scrolling-window analysis. We performed de novo natural antisense transcript (NAT) prediction, and found that the overlapping regions of trans-NATs, but not cis-NATs, were hotspots for sRNA generation. One cis-NAT generates phased natural antisense short interfering RNAs (nat-siRNAs) specifically from flowers in Arabidopsis, while one in rice produces phased nat-siRNAs from grains, suggesting their organ-specific regulatory roles.
doi: 10.1093/bioinformatics/btq150 PubMed: 20378553 Google Scholar
MicroRNA (miRNA), recently recognized as a critical post-transcriptional modulator of gene expression, is involved in numerous biological processes in both animals and plants. Although eudicots and monocots, such as the model plants Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa), possess distinct root systems, several homologous miRNA families are reported to be involved in root growth control in both plants. Consistent with recent notion that numerous signaling pathways are implicated in root development, these miRNAs are implicated in auxin signaling, nutrition metabolism, or stress response and have potential role in mediating the signal interactions. However, a recapitulative representation of these results is especially desired. This review provides a global view of the involvement of miRNAs in root development focusing on the two plants, Arabidopsis and rice. Based on current research advances, several innovative mechanisms of miRNA transcription, feedback regulatory circuit between miRNAs and transcription factors (TFs), and miRNA-mediated signal interactions are also discussed.
doi: 10.1016/j.bbrc.2010.01.129 PubMed: 20138828 Google Scholar
Auxin, known as the central hormone, plays essential roles in plant growth and development. In auxin signaling pathways, the tiny RNA molecules, i.e., microRNAs (miRNAs), show their strong potential in modulating the auxin signal transduction. Recently, we isolated a novel auxin resistant rice mutant osaxr (Oryza sativa auxin resistant) that exhibited plethoric root defects. Microarray experiments were carried out to investigate the expression patterns of both the miRNAs and the protein-coding genes in osaxr. A number of miRNAs showed reduced auxin sensitivity in osaxr compared with the wild type (WT), which may contribute to the auxin-resistant phenotype of the mutant. Auxin response elements (AuxREs) were demonstrated to be more frequently present in the promoters of auxin-related miRNAs. In our previous report, a comparative analysis of miRNA and protein-coding gene expression datasets uncovered a number of reciprocally expressed miRNA-target pairs. A feedback circuit between miRNA and auxin response factor (ARF) was then proposed. Here, we will discuss in-depth some points raised in the previous report, in particular, the organ-specific expression patterns of miR164, the feedback regulatory model between miR167 and certain ARFs, and the potential signal interactions between auxin and nutrition or stress that are mediated by miRNAs in rice roots.
doi: 10.4161/psb.5.3.10549 PubMed: 20023405 Google Scholar
Auxin is one of the central hormones in plants, and auxin response factor (ARF) is a key regulator in the early auxin response. MicroRNAs (miRNAs) play an essential role in auxin signal transduction, but knowledge remains limited about the regulatory network between miRNAs and protein-coding genes (e.g. ARFs) involved in auxin signalling. In this study, we used a novel auxin-resistant rice mutant with plethoric root defects to investigate the miRNA expression patterns using microarray analysis. A number of miRNAs showed reduced auxin sensitivity in the mutant compared with the wild type, consistent with the auxin-resistant phenotype of the mutant. Four miRNAs with significantly altered expression patterns in the mutant were further confirmed by Northern blot, which supported our microarray data. Clustering analysis revealed some novel auxin-sensitive miRNAs in roots. Analysis of miRNA duplication and expression patterns suggested the evolutionary conservation between miRNAs and protein-coding genes. MiRNA promoter analysis suggested the possibility that most plant miRNAs might share the similar transcriptional mechanisms with other non-plant eukaryotic genes transcribed by RNA polymerase II. Auxin response elements were proved to be more frequently present in auxin-related miRNA promoters. Comparative analysis of miRNA and protein-coding gene expression datasets uncovered many reciprocally expressed miRNA-target pairs, which could provide some hints for miRNA downstream analysis. Based on these findings, we also proposed a feedback circuit between miRNA(s) and ARF(s). The results presented here could serve as the basis for further in-depth studies of plant miRNAs involved in auxin signalling.
doi: 10.1007/s00425-009-0994-3 PubMed: 19655164 Google Scholar
Transcription factors (TFs) are key nodes of gene regulatory networks
that specify plant morphogenesis and control specific pathways such as stress
responses. TFs directly interact the genome by recognizing specific DNA sequence,
in terms of a complex system to fine-tune spatiotemporal gene expression. The
combinatorial interaction among TFs determines regulatory specificity and defines
the set of target genes to orchestrate their expression during developmental switches.
In this chapter, we provide a catalog of plant-specific TFs and a comprehensive
assessment of whether genome-wide analyses have so far been used for identifying
potential direct target genes for each TFs. We further construct comprehensive
TF-associated regulatory networks in the model plant Arabidopsis thaliana using
genome-wide datasets from our ChIP-Hub database (http://www.chiphub.org/).We
discuss how to dissect the network structure to identify potentially important crossregulatory
loops in the control of developmental switches in plants.
doi: 10.1007/978-981-16-6795-4 PubMed: 31560650 Google Scholar
Key transcription factors (TFs) controlling the morphogenesis of flowers and leaves have been identified in the model plant Arabidopsis thaliana. Recent genome-wide approaches based on chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) enable systematic identification of genome-wide TF binding sites (TFBSs) of these regulators. Here, we describe a computational pipeline for analyzing ChIP-seq data to identify TFBSs and to characterize gene regulatory networks (GRNs) with applications to the regulatory studies of flower development. In particular, we provide step-by-step instructions on how to download, analyze, visualize, and integrate genome-wide data in order to construct GRNs for beginners of bioinformatics. The practical guide presented here is ready to apply to other similar ChIP-seq datasets to characterize GRNs of interest.
doi: 10.1007/978-1-4939-7125-1_16 PubMed: 28623590 Google Scholar
Plants, like other eukaryotes, have evolved complex mechanisms to coordinate gene expression during development, environmental response, and cellular homeostasis. Transcription factors (TFs), accompanied by basic cofactors and posttranscriptional regulators, are key players in gene-regulatory networks (GRNs). The coordinated control of gene activity is achieved by the interplay of these factors and by physical interactions between TFs and DNA. Here, we will briefly outline recent technological progress made to elucidate GRNs in plants. We will focus on techniques that allow us to characterize physical interactions in GRNs in plants and to analyze their regulatory consequences. Targeted manipulation allows us to test the relevance of specific gene-regulatory interactions. The combination of genome-wide experimental approaches with mathematical modeling allows us to get deeper insights into key-regulatory interactions and combinatorial control of important processes in plants.
doi: 10.1007/978-1-4939-7125-1_1 PubMed: 28623575 Google Scholar
Genomics and phenomics are two fundamentally important branches of biological sciences, and they stand at both ends of the multiple “omics” families. A central goal of current biology is to establish complete functional links between the genome and phenome, the so-called genotype–phenotype map. Recent advances in high-throughput and high-dimensional genotyping and phenotyping technologies enable us to uncover the casual networks inside the “black box” that lies between genotypes and phenotypes using the principles of genome-wide association studies (GWAS). Application of GWAS and analogous methodologies and incorporation of multiple omics data begin to unravel the contribution of genetic variation to phenotypic diversity. Integrating “omics” data at broad levels by using the systems-biology approach is paramount to further bridging the gaps between genomics and phenomics and eventually making accurate predictions of phenotypes based on genetic contribution.
doi: 10.1007/978-3-642-41281-3_11 Google Scholar
School of Life Sciences, Nanjing University
Nanjing 210023, China