Detection of DNA methylation signatures through the lens of genomic imprinting – Scientific Reports

Animals and samples

The study included 10 pigs, 8 pigs were bred at the INRAE experimental farm (https://doi.org/https://doi.org/10.15454/1.5572415481185847E12) and 2 pigs come from breeding organizations in accordance with the French and European legislation on animal welfare. The animals belong to the same family, except for one LW animal. Animals were produced in a reciprocal cross design between Large White and Meishan pig breeds.

Ten biological samples were used in the experiment. Nine of them are blood samples collected on EDTA and were stored frozen nine months at − 20 °C. One biological sample is a sperm sample from dose for artificial insemination and was stored two years at − 20 °C. Biological samples were collected at adult developmental stage for all the parents (n = 5) of the reciprocal cross design while biological samples were collected at 1d after birth for all offspring (n = 5) of the reciprocal cross design.

Genomic DNA was extracted from blood using the Genomic-tip 100 DNA kit (Qiagen, 10,243) or using MagAttract HMW DNA kit (Qiagen, 67,563) following manufacturer’s instructions. Genomic DNA was extracted from sperm using standard phenol/chloroform method. DNA purity was determined using the Nanodrop 8000 spectrophotometer (Thermo Fisher Scientific). DNA concentration was determined using the DS DNA Broad Range Assay kit (Invitrogen, ThermoFisher Scientific, Q32850) and was measured with the Qubit3 fluorometer (Invitrogen, ThermoFisher Scientific).

All the procedures and guidelines for animal care were approved by the local ethical committee in animal experimentation (Poitou–Charentes) and the French Ministry of Higher Education and Scientific Research (authorizations n°2,018,021,912,005,794 and n°11,789–2,017,101,117,033,530). All animal and sample information is available at the European Nucleotide Archive (ENA) as accession number PRJEB58558.

Panel design

Candidate regions for GI in the pig (Sus scrofa) were selected based on various publications available in humans and mice1,13 and on two databases (https://www.geneimprint.com and https://corpapp.otago.ac.nz/gene-catalogue). Sequences not annotated in the pig genome were subjected to BLAST searches against the Sscrofa11.1 reference. A total of 165 regions ranging from 458 bp to 2.3 Mb, distributed across the 18 autosomes, the X chromosome and 4 scaffolds of the pig reference, were selected. These genomic regions, targeting a total of 23 Mb, were submitted to the two commercial platforms, TB and AG. Each platform used its own confidential algorithm for panel design. The sizes of custom panels from TB and AG were 20.5 Mb and 19.7 Mb, respectively, with all the 165 candidate regions for GI represented.

Library preparation

The final optimized protocol has been deposited to Protocol Exchange open repository (https://doi.org/https://doi.org/10.21203/rs.3.pex-2159/v1). Two types of libraries were generated using AG or TB technology, the latter involving two experiments (TB1 and TB2). The AG and the TB1 experiments were performed at the GeT-PlaGe core facility at INRAE Toulouse (https://doi.org/https://doi.org/10.15454/1.5572370921303193E12). The TB2 experiment was performed by Twist Bioscience company (Twist Bioscience, USA).

Library preparation and target enrichment with Agilent SureSelect Custom DNA Target Enrichment Probes

Eight library preparations were carried out using the SureSelect Methyl-Seq Target Enrichment kit (Agilent, G9651) following the manufacturer’s protocol (User guide: SureSelect, Agilent Technologies, version E0, April 2018). Genomic DNA (1 µg) was first fragmented using a Covaris M220 focused ultrasonicator in micro-TUBE 50 AFA Fiber screw cap (Covaris, 520,166) for a target insert size of 200 bp under the following conditions: peak power 75W, duty factor 10%, 200 cycles/bursts, 375 s, 8 °C. An additional 0.8X AMPure beads purification step was done to eliminate adaptor dimers.

Library preparation and target enrichment with Twist Bioscience NGS methylation detection system

Sixteen library preparations were carried out using an in-house combination of two protocols: NEB-Next Enzymatic Methyl-seq Library Preparation and Twist Bioscience Targeted Methylation Sequencing, using a methyl custom panel. The whole detailed and optimized protocol has been deposited to Protocol Exchange open repository (https://doi.org/https://doi.org/10.21203/rs.3.pex-2159/v1). Briefly, eight library preparations were carried out with a first similar development protocol (TB1) in which some adjustments have not yet been made. Differences between protocolTB1 and protocolTB2 are referenced in the procedure deposited in Protocol Exchange. All library quantifications were performed on a Qubit 3.0 fluorometer with High Sensitivity DNA Quantitation Assay kit according manufacturer’s recommendations (Agilent, ThermoFisher Scientific, Q32851). All library validations were performed on a 2100 Bioanalyzer with High Sensitivity DNA kit according to manufacturer’s recommendations (Agilent Technologies, 5067–4626).

Sequencing

All libraries were quantified by qPCR on QuantStudio 6 device (Applied Biosystems, ThermoFisher Scientific), using the Kapa Library Quantification Kit (Roche, KK4824). Agilent libraries and experiment TB1 libraries were each sequenced on one lane of an Illumina SP NovaSeq 6000 flow cell, using the SP Reagent kit v1.5 300 cycles (Illumina, 20,028,400), according to the manufacturer’s recommendations. The loading concentration was 2 nM 25% phiX. Experiment TB2 libraries were sequenced on Illumina P2 NextSeq 2000 flow cell, using the SP Reagent kit v3 300 cycles (Illumina, 20,046,813), according to the manufacturer’s recommendations. The loading concentration was 1000 pM 5% phiX. All sequences are available at ENA under study accession PRJEB58558.

Methyl-seq data analysis

Analyses were performed using the genotoul bioinformatics platform Toulouse Occitanie (Bioinfo Genotoul, https://doi.org/https://doi.org/10.15454/1.5572369328961167E12). Methyl-seq reads were processed with the nf-core/methylseq (v1.5) pipeline24,25 (https://nf-co.re/methylseq), using the Sscrofa11.1 pig reference and the Bismark26 workflow with standard parameters. Sequencing quality analysis was performed with custom Python scripts for comparing AG and TB experiments. CpG calls from TB2 experiment with depth ≥ 20X were further processed with CGmapTools27 and inbuilt Linux commands. Cytosines with methylation levels either < 0.3 or > 0.7 were classified as either hypo-methylated or hyper-methylated, respectively. Cytosines with methylation levels between 0.4 and 0.6, indicating potential PofO methylation, were classified as hemi-methylated. This subset of hemi-methylated CpGs was scanned using a sliding window approach with a custom R function to identify hemi-methylated regions potentially compatible with GI. The occurrence of ≥ 5 hemi-methylated CpGs within 100 bp was labelled as hemiR100. A subset of hemiR100, that is the occurrence of ≥ 5 consecutive hemi-methylated CpGs, was made distinct and labelled as hemiR5. Such cutoffs on CpGs-related parameters such as depth, methylation levels and density aim to define hemi-methylated regions incorporating some of the most stringent criteria for targeting epigenetic signatures of GI from reference imprintome studies7,28. Neighbouring hemi-methylated regions at a distance less than their initial definition criterion (i.e., 100 bp for hemiR100 and 5 bp for hemiR5) were merged in a single larger region. Top hemi-methylated regions were visually inspected using Integrative Genomics Viewer29, identifying when possible the parental origin of methylation in the progeny of the reciprocal cross. A complete list of software versions used in this study is provided in the next section.

Software used

BEDtools (v2.27.1) 30

Bismark (v0.22.3) 26

CGmapTools (v0.1.2) 27

Cutadapt (v2.9)31

nf-core/methylseq (v1.5)24,25

Nextflow (v20.01.0)32

FastQC (v0.11.9, https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).

Integrative Genome Viewer (v2.8.13)29

MultiQC (v1.8)33

Qualimap (v2.2.2-dev)34

Preseq (v2.0.3)35

R base (v4.1.1) with dplyr (v1.0.9), ggplot2 (v3.3.6), RIdeogram (v0.2.2), scales (v1.2.1) and tidyr (v1.2) packages (https://cran.r-project.org/).

Samtools (v1.9)36

Trim Galore! (v0.6.4_dev, https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/).

HISAT2 (v2.2.0)37