Chromosome-level genome assembly of the diploid oat species Avena longiglumis – Scientific Data

  • Grundy, M. M. L., Fardet, A., Tosh, S. M., Rich, G. T. & Wilde, P. J. Processing of oat: the impact on oat’s cholesterol lowering effect. Food Funct. 9, 1328–1343 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liu, K. S. Comparison of lipid content and fatty acid composition and their distribution within seeds of 5 small grain species. J. Food Sci. 76, C334–C342 (2011).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • White, D. A., Fisk, I. D. & Gray, D. A. Characterisation of oat (Avena sativa L.) oil bodies and intrinsically associated E-vitamers. J. Cereal Sci. 43, 244–249 (2006).

    Article 
    CAS 

    Google Scholar
     

  • Yang, Z. et al. Oat: current state and challenges in plant-based food applications. Trends Food Sci. Technol. 134, 56–71 (2023).

    Article 
    CAS 

    Google Scholar
     

  • Kamal, N. et al. The mosaic oat genome gives insights into a uniquely healthy cereal crop. Nature 606, 113–119 (2022).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ouyang, S. et al. The TIGR rice genome annotation resource: improvements and new features. Nucleic Acids Res. 35, D883–D887 (2007).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • McCormick, R. F. et al. The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization. Plant J. 93, 338–354 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Yang, Z. R. et al. A mini foxtail millet with an Arabidopsis-like life cycle as a C4 model system. Nat. Plants 6, 1167–1178 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Peng, Y. Y. et al. Reference genome assemblies reveal the origin and evolution of allohexaploid oat. Nat. Genet. 54, 1248–1258 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liu, Q. et al. Genome-wide expansion and reorganization during grass evolution: from 30 Mb chromosomes in rice and Brachypodium to 550 Mb in Avena. BMC Plant Biol. 23, 627 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Saini, P. et al. Disease Resistance in Crop Plants: Molecular, Genetic and Genomic Perspectives (ed. Wani, S. H.) Ch. 9 (Springer Nature, 2019).

  • Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Article 
    PubMed 

    Google Scholar
     

  • Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts, and proteins. Nucleic Acids Res. 33, D501–D504 (2005).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 44, D309–D314 (2019).

    Article 

    Google Scholar
     

  • Finn, R. D. et al. The Pfam protein family’s database. Nucleic Acids Res. 36, D281–D288 (2014).

    Article 

    Google Scholar
     

  • Kristensen, D. M. et al. A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches. Bioinformatics 26, 1481–1487 (2010).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL. Nucleic Acids Res. 28, 45–48 (2000).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat Genet. 25, 25–29 (2001).

    Article 

    Google Scholar
     

  • Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kanehisa, M. et al. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 51, D587–D592 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Jin, J. et al. PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 45, D1040–D1045 (2016).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Levasseur, A., Drula, E., Lombard, V., Coutinho, P. M. & Henrissat, B. Expansion of the enzymatic repertoire of the CAZy database to integrate auxiliary redox enzymes. Biotechnol. Biofuels 6, 41 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chen, S., Zhou, Y., Chen, Y. & Gu, J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1–10 (2020).

    Article 

    Google Scholar
     

  • Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Liu, H., Wu, S., Li, A. & Ruan, J. SMARTdenovo: a de novo assembler using long noisy reads. GigaByte 15, 1–9 (2021).

    Article 
    CAS 

    Google Scholar
     

  • Hu, J., Fan, J. P., Sun, Z. Y. & Liu, S. L. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ou, S. J. & Jiang, N. LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mob. DNA 10, 48 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ou, S. J. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Shi, J. M. & Liang, C. Generic repeat finder: a high-sensitivity tool for genome-wide de novo repeat detection. Plant Physiol. 180, 1803–1815 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Su, W., Gu, X. & Peterson, T. TIR-Learner, a new ensemble method for TIR transposable element annotation, provides evidence for abundant new transposable elements in the maize genome. Mol. Plant 12, 447–460 (2016).

    Article 

    Google Scholar
     

  • Xiong, W., He, L. M., Lai, J. S., Dooner, H. K. & Du, C. G. HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. Proc. Natl. Acad. Sci. USA 111, 10263–10268 (2014).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457 (2020).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tarailo-Graovac, M. & Chen, N. S. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 4, 1–14 (2009).


    Google Scholar
     

  • Zhang, R. G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic. Res. 9, uhac017 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yandell, M. & Ence, D. A beginner’s guide to eukaryotic genome annotation. Nat. Rev. Genet. 13, 329–342 (2012).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Stanke, M. et al. Augustus: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: homology-based gene prediction utilizing intron position conservation and RNA-seq data. Methods Mol. Biol. 1962, 161–177 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000).

    Article 
    ADS 

    Google Scholar
     

  • International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763–768 (2010).

    Article 
    ADS 

    Google Scholar
     

  • Mascher, M. et al. A chromosome conformation capture ordered sequence of the barley genome. Nature 544, 427–433 (2017).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • International Wheat Genome Sequencing Consortium (IWGSC). Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361, eaar7191 (2018).

    Article 

    Google Scholar
     

  • Schnable, P. S. et al. The B73 maize genome: complexity, diversity, and dynamics. Science 326, 1112–1115 (2009).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, 1–22 (2008).

    Article 

    Google Scholar
     

  • Lavigne, R., Seto, D., Mahadevan, P., Ackermann, H. W. & Kropinski, A. M. Unifying classical and molecular taxonomic classification: analysis of the Podoviridae using BLASTP-based tools. Res. Microbiol. 159, 406–414 (2008).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–D124 (2005).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49, 9077–9096 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 1–19 (2020).

    Article 

    Google Scholar
     

  • Cabanettes, F. & Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. Peer J. 6, e4958 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP375311 (2022).

  • NCBI RNA Sequencing Data https://identifiers.org/ncbi/insdc.sra:SRP433645 (2023).

  • NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_030063025.1 (2023).