A single-cell transcriptomic dataset of pluripotent stem cell-derived astrocytes via NFIB/SOX9 overexpression – Scientific Data

Cell lines and culture

The DYR0100 cell line (“iPSC1” hereafter) was kindly provided by the Stem Cell Bank of the Chinese Academy of Sciences (cat. no. SCSP-1301, CSTR: 19375.09.3101HUMSCSP1301). The BIONi037-A line (“iPSC2” hereafter) was obtained from Sigma (ECACC cat. no. 66540580, RRID: CVCL_II80). To exclude the effects of the heterogeneity of the starting cell line on the resultant differentiated cells, the Monoclonal iPSC1 line was generated as previously described18. A single cell was obtained from limiting dilution of parental iPSC1 cells and expanded as the Monoclonal iPSC1 line. Briefly, iPSCs were cultured in Essential 8 medium (Thermo Fisher Scientific, A1517001) on 6-well plates coated with Matrigel (Corning, 354277); the medium was changed daily. When the culture reached ~80% confluence, the cells were dissociated using Accutase (Thermo Fisher Scientific, A1110501) or 0.5 mM EDTA and re-plated in Essential 8 medium supplemented with 10 µM ROCK inhibitor (Selleck, S1049) during the first day. All cells described in this study were incubated at 37 °C, 5% CO2, and 90% humidity.

Plasmids and lentivirus production

The full-length cDNA of the mouse Nfib gene was amplified from the tetO-Nfib-Hygro plasmid (Addgene #117271). To generate Nfib-GSG-P2A, a short DNA sequence of GSG-P2A was added to the 3′ end of Nfib by PCR amplification. Similarly, the full-length cDNA of the mouse Sox9 gene followed by the puromycin selection gene was amplified from the tetO-Sox9-Puro plasmid (Addgene #117269) to create Sox9-T2A-Puro. Specific restriction sites were incorporated to allow the cloning of Nfib-GSG-P2A and Sox9-T2A-Puro in tetO-Nfib-Hygro lentiviral vector (Addgene #117271) with EcoRI/PacI restriction enzyme sites, yielding the tetO-NfibSox9-Puro plasmid.

Lentiviruses were produced using a second-generation packaging system in HEK293T cells and titrated by Vigene Biosciences (Shandong, China). The FUdeltaGW-rtTA plasmid (Addgene #19780) and tetO-NfibSox9-Puro plasmid were used to produce lentivirus to overexpress rtTA (reverse tetracycline-controlled transactivator) and Nfib/Sox9, respectively.

Generation of astrocytes from iPSCs

We generated astrocytes from iPSCs as previously described13 with minor modifications (Fig. 1a). First, iPSCs were passaged with Accutase and replated in a Matrigel-coated 6-well plate with Essential 8 medium containing 10 μM ROCK inhibitor. Lentivirus overexpressing rtTA and Nfib/Sox9 was added to each well at a multiplicity of infection of 10 on the same day. The medium was replaced daily with Essential 8 medium. When the iPSCs reached ~80% confluence (i.e., Day 0), the medium was substituted with 2 mL fresh Essential 8 medium containing 1 μg/mL doxycycline (Sigma-Aldrich, D9891). On Day 1 and 2, the medium was exchanged with 2 mL expansion medium. On Day 3, the medium was replaced with 2 mL expansion medium and FGF medium (3:1 v/v). On Day 4, the medium was substituted with 2 mL expansion medium and FGF medium (1:1 v/v). On Day 5, the cells were washed with DPBS (without Ca2+/Mg2+) (Thermo Fisher Scientific, 14190250) and dissociated using 500 μL Accutase for 30 min at 37 °C. The progression of cell detachment was continuously monitored under a microscope. Cell suspensions were collected and centrifuged at 300 × g for 5 min to remove as much supernatant with Accutase as possible without disturbing the cell pellet. Then, cells were evenly replated at 3–4 × 105 cells per well in a 6-well plate pre-coated with Matrigel plus 2 mL expansion medium and FGF medium (1:3 v/v). On Day 6 and 7, the medium was exchanged daily with 2 mL FGF medium. On Day 8, the medium was substituted with 4 mL freshly prepared maturation medium. From Day 10 onward, half of the medium was replaced every other day with 2 mL maturation medium. Day 21 served as the endpoint for one round of astrocyte differentiation. Doxycycline (1 μg/mL) and puromycin at an optimized concentration depending on specific iPSC line (i.e., 2–5 μg/mL, Thermo Fisher Scientific, A1113803) were maintained in the medium throughout the experiments. Puromycin was used to select cells that were transduced with the constructs expressing rtTA and Nfib/Sox9.

Three media were used along astrocyte induction: expansion medium, FGF medium, and maturation medium. These media were prepared as follows: (1) expansion medium comprised DMEM/F12 (Thermo Fisher Scientific, 10565018), 10% FBS (Thermo Fisher Scientific, 10091148), 1% N2 (Thermo Fisher Scientific, 17502048), and 1% P/S (Thermo Fisher Scientific, 15140122); (2) FGF medium comprised Neurobasal (Thermo Fisher Scientific, 21103049), 1% FBS, 2% B27 (Thermo Fisher Scientific, 17504044), 1% NEAA (Thermo Fisher Scientific, 11140050), 1% Glutamax (Thermo Fisher Scientific, 35050061), 8 ng/mL bFGF (Peprotech, 100-18B), 5 ng/mL CNTF (Peprotech, 450-13), 10 ng/mL BMP4 (Peprotech, 120-05ET), and 1% P/S; (3) maturation medium comprised DMEM/F12 and Neurobasal (1:1 v/v), 1% N2, 1% Glutamax, 1% sodium pyruvate (Thermo Fisher Scientific, 11360070), 10 ng/mL CNTF, 10 ng/mL BMP4, 5 ng/mL heparin-binding EGF-like growth factor (hbEGF) (Peprotech, 100-47), 5 μg/mL N-acetyl-cysteine (Sigma-Aldrich, A8199), 500 μg/mL dbcAMP (Sigma-Aldrich, D0627), and 1% P/S.

Immunostaining of iPSC-derived astrocytes

On Day 21, the cells were dissociated with Accutase and re-plated on poly-d-lysine (Sigma-Aldrich, P0899) and Matrigel-coated 12-mm-diameter glass coverslips (SPL Life Sciences, 20012). The cells were washed once with DPBS and fixed in 4% paraformaldehyde (Sigma-Aldrich, 158127) for 15 min at room temperature. After washing in DPBS, the cells were permeabilized for 10 min with 0.1% Triton X-100 (Sigma-Aldrich, 93443) diluted in DPBS. The cells were blocked in 5% goat serum (Gibco, 16210064) diluted with 0.1% PBST for 30 min at room temperature. The cells were subsequently immunostained with primary rabbit anti-S100B antibody (Abcam, ab52642) overnight at 4 °C. The cells were then washed 3 times with 0.1% PBST for 20 min each time and treated with goat anti-rabbit AF568 secondary antibody (Life Technologies, A11011) for 1 h at room temperature. The nuclei were simultaneously counterstained with DAPI (Sigma-Aldrich, D9542). The cells were then washed 3 times with 0.1% PBST for 20 min each time. The coverslips were mounted on slides with ProLong Diamond Antifade Mountant (Thermo Fisher Scientific, P36961) and stored at 4 °C before imaging. Images were taken using an LSM900 confocal microscope (Zeiss) and processed with ZEN software (version 3.91.0).

Collection of cells for single-cell RNA sequencing

To establish the astrocyte differentiation path, Monoclonal iPSC1 cells were subjected to time-course profiling. Cells were collected on different differentiation days (i.e., Day 0, 1, 3, 8, 14, or 21) for scRNA-seq. The parental iPSC1 line and iPSC2 line were used to evaluate the consistency of astrocyte generation. Day-21 cells derived from these lines were collected for scRNA-seq. On the collection day, the cultured cells were prewashed with DPBS and dissociated with 200 µL Accutase in a 6-well plate for 10 min at 37 °C. The progression of cell detachment was constantly monitored under a microscope. Cell suspensions were collected and centrifuged for 5 min at 300 × g to remove as much Accutase as possible without disturbing the cell pellet. The cells were resuspended in DPBS with 0.04% filtered BSA and 60 U/mL RNasin Plus Ribonuclease Inhibitor (Promega, N2615). After quantification by a Countess Automated Cell Counter (Invitrogen, C10281), cell suspensions were diluted to 700–1,200 cells per microliter on ice for subsequent scRNA-seq library preparation.

Library preparation and single-cell RNA sequencing

The scRNA-seq workflow is summarized in Fig. 1b. A Chromium Next GEM Single Cell 3′ Reagent Kit v3.1 and Gel Beads Kit (10x Genomics) were used according to the manufacturer’s instructions. Briefly, single-cell suspensions, gel beads, and partitioning oil were added to the 10x Genomic Chromium Chip (Next GEM chip G) and subjected to the 10x Chromium Controller device, which encapsulated single cells within individual gel beads-in-emulsion (GEMs). The targeted number of cells in each sample was 10,000. Captured cells were lysed, and the transcripts inside the individual GEMs were barcoded through reverse transcription. Constructed 10x libraries were quantified by a Qubit 4 Fluorometer (Invitrogen) with a Qubit 1x dsDNA HS Assay Kit (Invitrogen). Quality control for the 10x libraries was performed using a Fragment Analyzer 5200 (Agilent) with a DNF-474 HS NGS Fragment Kit (Agilent). Library sequencing was performed on an Illumina NovaSeq. 6000 Sequencing platform (Novogene), with a paired-end read length of 150 bp and 100 GB raw data per sample.

Preprocessing of raw sequencing data

The workflow for bioinformatics analysis is summarized in Fig. 1c. In the 2 lentiviral constructs overexpressing rtTA and NfibSox9-Puro, an exogenous sequence of WPRE-LTR was included downstream of rtTA and Puro, respectively. First, a Homo sapiens transcriptome (GRch38) reference containing an exogenous WPRE-LTR sequence was constructed using the Cell Ranger mkref pipeline. The exogenous WPRE-LTR sequence in the reference genome was used to identify the lentiviral transduced cells. To obtain transcript count tables, the sequencing data were processed using Cell Ranger software (version 7.0.0, 10x Genomics). The library-specific FASTQ files were aligned to the genome reference by the Cell Ranger count pipeline with the default settings. Cell-free mRNA contamination was removed with SoupX (version 1.6.2) using the output files (i.e., “raw_feature_bc_matrix” and “filtered_feature_bc_matrix”) obtained from Cell Ranger. The contamination fraction (i.e., contFrac) was set to 0.2 according to the recommended guidelines. The SoupX-modified count matrix was used for all downstream analyses.

Bioinformatics analysis of scRNA-seq data

Seurat (version 5.0.3) was used for further quality control. The standards for cell exclusion were individually determined for each sample, guided by the Barcode Rank Plots generated by Cell Ranger. Cells were eliminated based on the following criteria: counts ≤ 2,000–5,500, features ≤ 1,000–2,500, and mitochondrial gene percentage ≥ 10%. These thresholds served as the lower and upper boundaries, respectively. In addition, cells with complexity (i.e., log10-transformed genes per count) less than 0.8–0.85 were filtered out. The lentiviral transduced cells were identified and isolated using the subset function (i.e., based on the expression of WPRE-LTR > 0) for further analysis. Genes were excluded if they were expressed in fewer than 10 cells. The unique molecular identifier (UMI) count matrices were log-normalized, and variable features for each sample were identified using the FindVariableFeatures function (variable.features.n = 3000). Principal component analysis (PCA) was performed using RunPCA with all genes present in the scaled data, excluding immediate early genes (IEGs)19. Clusters within individual samples were then identified using the FindNeighbors and FindClusters functions (resolution = 0.1–0.3). For clearer visualization, the individual samples were subjected to dimensionality reduction techniques, including uniform manifold approximation and projection (UMAP). Doublets were removed by DoubletFinder (version 2.0.3), and the doublet rates were set individually according to the recommendations of 10x Genomics.

To provide an overview of the timepoint samples, the scRNA-seq data of Monoclonal iPSC1 samples at Day 0, 1, 3, 8, 14, and 21 were merged using the Seurat merge function. To identify transcriptionally linked cell clusters, multiple timepoint samples were integrated according to cluster similarity spectrum (CSS). CSS was calculated using the cluster_sim_spectrum function in simspec (version 0.0.0.9000), and all dimensions in the raw cluster similarity spectrum were selected for graph-based clustering (resolution = 0.1). The CSS-integrated data were further visualized using the UMAP dimensional reduction technique. Transcriptionally linked cell clusters between timepoint samples were identified using the FindNeighbors and FindClusters functions (resolution = 0.1). Genes specific to these clusters were identified by the FindAllMarkers function (only.pos = T, min.pct = 0.1, logfc.threshold = 0.25). For pseudotime analysis, we followed the workflow described in Monocle 3 (version 1.3.4). A cell dataset (cds) object was generated using the CSS-integrated Seurat object. The cds object subsequently underwent normalization using the preprocess_cds function (num_dims = 100, norm_method = ‘none’). To visualize the cells, we reduced the dimensionality of the cds object with the reduce_dimension function and projected original Seurat cell embeddings onto the cds object. The cells were then clustered using the cluster_cells function (resolution = 1e−4), and a trajectory graph was generated using the learn_graph function (use_partition = F, close_loop = F, learn_graph_control = list [ncenter = 80]). To order the cells, we specified the root nodes of the trajectory graph of Day-0 cells using the order_cells function. To visualize enriched genes for each timepoint sample, the 10 genes showing the highest expression based on their avg_log2FC (with a cutoff of >1) values were selected, and a heatmap was generated using the pheatmap package (version 1.0.12).

To assess the consistency of astrocyte differentiation, the average expressions of genes among different cell lines were determined using the AggregateExpression function in Seurat (normalization.method = LogNormalize and return.seurat = T). The Pearson correlation test was performed by the cor function. The scRNA-seq data of Monoclonal iPSC1, iPSC1, and iPSC2 samples on Day 21 were integrated by the IntegrateLayers function (method = HarmonyIntegration). The Harmony-integrated data were further visualized using the UMAP dimensional reduction technique. The detailed analytical procedures used to generate all the figures in this study are available on our GitHub repository (https://github.com/ShuaiC-CYLab/iPSC-derived-astrocytes_scRNASeq).