
Ethical approvals
The animal study protocols were approved by and conducted in compliance with the Committee on the Use of Live Animals in Teaching and Research (CULTAR Ref. No.: 22-276) at the University of Hong Kong and the Animals (Control of Experiments) Ordinance of Hong Kong.
Generation of chimeric mouse from choanoflagellate Sox iPSCs
OG2MEF cells were derived from day 13.5 embryos of OG2 mice (Jackson Laboratory, no. 004654), following the preparation protocol described previously in ref. 53. Clonal iPSCs derived from OG2MEF (derived from C57BL/6 J strain with black coat color) reprogrammed by full-length Salhel-Sox-I were cultured in 2i/LIF medium on gelatin-coated plates for 7–10 days before harvesting for injection, with medium change every other day. CD-1 (ICR strain, white coat color) female mice were euthanized for isolating morula or blastocysts using a 27 G blunt end needle. These blastocysts were then collected and incubated in KSOM medium (Sigma-Aldrich, #MR-101-D) until the injection of iPSCs was performed. On the day of injection, 1 million cells were prepared to be selected for microinjection into blastocysts. The injection process took place on a 6 cm plate. Using a laser objective (XYRCOS, Hamilton Throne), a slit was created on the zona pellucida, following which 8 to 10 selected iPSCs were transferred into each blastocyst using a microneedle. Following the microinjection procedure, approximately 25 blastocysts containing the integrated clonal iPSCs were implanted into pseudopregnant CD-1 mice, where they would be nurtured and allowed to develop in a conducive environment. This fostering and breeding stage is critical for the establishment of chimeric mice derived from the clonal iPSCs, ensuring the generation of genetically homogenous and robust model organisms for further experimental investigations.
Sequence search, phylogenetics and ancestral sequence reconstruction
We used the human Sox2 protein as query for a BLASTP search against the predicted proteomes of unicellular holozoans, including 22 choanoflagellates (20 transcriptomes and 2 genomes), 4 filastereans, 7 ichthyosporeans and Corallochytrium limacisporum, as well as 130 non-holozoan eukaryotes. Top blast hits had e-values below 1e-19, and bit scores above 90, which were stronger hits than the best putative hits previously identified in Monosiga brevicollis (e-value 1e-11, bitscore 70.9)77. To establish the phylogenetic framework of HMG evolution and minimize distant outgroups (e.g. non-sequence specific HMG boxes), we performed a HMMER3 search with the HMG-box domain (PF00505, e-value < 0.0001) on a select group of species (Homo sapiens, Drosophila melanogaster, Nematostella vectensis, Amphimedon queenslandica, Mnemiopsis leydi, Trichoplax adhaerens, Salpingoeca rosetta, Capsaspora owczarzaki) using the –cut_ga threshold (HMMER3 and PFAM database). We used MAFFT LINS-I for multi-sequence alignment78, trimAl with -gappyout parameter for alignment trimming79, and then used IQTREE to build a maximum likelihood phylogeny allowing for model fitting80. We selected the TCF/LEF, Maelstrom, Capicua (CIC), BobbySox (BBX) and HBP1 clades as closer outgroups of Sox, discarding more distantly related HMG-box families.
Then, using BLASTP we searched human Sox2 against the proteomes of all the unicellular holozoans, and selected all the hits with an e-value below e-10. We also included Sox hits from additional sponge genomes (Oopsacas minuta, Tethya wilhelma, Oscarella carmela, Sycon ciliatum) and an extra placozoan (Hoilungia hongkongensis) to maximize the resolution of early metazoan branching events. The substitution model selected by ModelFinder81, implemented in IQTREE, was LG + G4, with amino acid frequencies taken from the model, according to the corrected Akaike information criterion. We constructed a phylogenetic tree as described above, performing 1000 replicates to obtain SH-like approximate likelihood ratio test (SH-aLRT)82 and ultrafast bootstrap83 nodal supports. The perturbation strength (-pers option) was set to 0.2 and the number of unsuccessful iterations to stop (-nstop option) was set to 500. The tree search was repeated 10 times with different seeds, and the tree with the highest likelihood is shown in Supplementary Fig. 1. Transfer bootstrap values were computed with the Booster84 software using 500 bootstrap trees computed with RAxML-NG85. For constrained tree search, we generated a guide tree with the Sox/Sox-like genes following the holozoan species topology: (Filasteria, (Choanoflagellata, Metazoa)). Polytomies were applied to all internal branches and to those of the outgroup sequences. All branch lengths were set to 1.0. The constrained tree search was repeated 10 times using same guide tree for all tree searches and the same parameters as for the ML tree search. The ML and the five constrained trees with the highest log likelihood values were subjected to an AU-test59 (implemented into IQ-TREE) using the original multiple-sequence alignment, the substitution model LG + G4 and 10,000 replicates. Trees with p-AU values > 0.05 were included in the confidence set of plausible trees that cannot be rejected by the data, which was the case for all five tested constrained trees.
POU was searched using a similar strategy, human POU5F1 sequence was searched against the BLASTP database including unicellular holozoans and other eukaryotes, only identifying a reliable hit in Mylnosiga fluctuans (e-value 1.14e-07). Subsequent reassembly of Salpingoeca helianthica transcriptome (see below) identified another POU5F1 hit in that species. Protein domain composition containing a Homeobox (PF00046) and POU (PF00157) domains was validated using Pfam database86. To place the choanoflagellate POU within the homeobox phylogeny, we used HMMER3 (e-value < 0.0001) search to extract all homeobox domain containing proteins present in Homo sapiens, Drosophila melanogaster, Nematostella vectensis, Amphimedon queenslandica, Trichoplax adhaerens and Mnemiopsis leidyi, as well as all holozoan sequences described above. The resulting 756 sequences were aligned using MAFFT, trimmed uing trimAl, and IQTREE to build the phylogeny. Another tree was built using the same procedure, but only focusing on POU members, and using the trimAl -automated1 parameter, including new sequences from sponges, placozoans and ctenophores. This tree spanned both the homeobox and POU specific domains. Additionally, the same alignment including homeobox outgroups was tested to evaluate the topology of the focused POU phylogeny, and using ONECUT, LHX and SIX homeodomains as outgroups, following previous reports suggesting their relative proximity to POU and the presence of a structurally analogous domain N-terminal of the homeodomain.
For ancestral sequence reconstruction of Sox HMG domains, ancestral gaps were assigned using PastML87 and ancestral sequences were inferred via IQTREE using the LG + G4 substitution model and amino acid frequencies taken from the model and contain the states with the highest posterior probabilities per site.
Sequence re-annotation
Before cloning the HMG-box and POU domains, we validated the annotations using two strategies. For Pigoraptor chileana sequence, we extracted the genomic Sox locus, and used Augustus with various species models to ab initio predict the gene to validate the published genome annotation88. For choanoflagellate sequences, we downloaded the raw reads in NCBI for Salpingoeca helianthica (SRR6344974) and Mylnosiga fluctuans (SRR6344975)44, did adapter trimming using fastp89, and assembled the transcriptome using Trinity v2.8.590. TBLASTN was used to search for the transcripts encoding the POU and Sox identified above, and Open Reading Frame curation was performed using ORFfinder in NCBI.
Sequence feature analysis and visualization
The fasta files of the DNA binding domains of each protein were obtained from https://www.uniprot.org/ and combined with sequences of the unicellular Sox or POU. The M-Coffee option of the T-Coffee Multiple Sequence Alignment server (https://tcoffee.crg.eu/)91 was used to align all sequences. Then, Jalview (https://www.jalview.org/)92 was used to color using clustal colors and annotate conservation, sequence logos. The structure of Sox17 (PDB:3F27) was used to annotate the Sox alignment while the POU was annotated manually following60.
Recombinant DNA
For protein purification, pET28a-mOct4 POU, pETG20a-Sox2 and pETG20a-Sox17 from93 were used. The Salhel Sox-I HMG, Pchi Sox HMG, Salro HMG, Monbr HMG and Cic HMG domains were cloned into the plasmid pETG20a with a N-terminal His6-thioredoxin tag and tobacco etch virus (TEV) cleavage site. The Salhel POU was cloned into pET28a vector with an N-terminal His6 tag and thrombin cleavage site. For reprogramming tests, pHAGE2-TetO-Oct4 (Addgene, #136611), pHAGE2-TetO-Sox2 (Addgene, #136612), pHAGE2-TetO-Klf4 (Addgene, #136613), pHAGE2-TetO-cMyc (Addgene, #136614), pHAGE2-TetO-mSox17 (Addgene, #206367) and pHAGE2-TetO-mCherry (Addgene, #136615) were used. All sequences of other factors were synthesized by Guangzhou IGE Biotechnology and cloned into the pHAGE2-TetO vector. For EMSAs with full-length proteins, the sequences were cloned into the pLVTHM-3xflag vector. The detailed sequences were all included in the Supplementary Data 2. All plasmid reagents will be made via Addgene or by request upon publication.
Protein expression
Proteins encoded on pETG20a-Salhel-HMG,-Pchi-HMG -Salro-HMG, -Monbr-HMG and -Cic-HMG and pET28a-Salhel POU were expressed and purified as previously described for Oct4, Sox2 and Sox1793,94. Constructs were transformed into Rosetta 2(DE3) chemically competent cells (Sigma, #71397) and grown overnight (O/N) at 37 °C in 20 mL Fisher BioReagents™ Miller’s LB Broth (Fisher Scientific, #BP97235), 30 μg/mL chloramphenicol and 100 μg/ml ampicillin (Amp – proteins in pETG20a) or kanamycin (Kan- proteins in pET28a). The next day, a 10 mL subculture was grown in 1 L of LB supplemented with 0.1% glucose with 100 μg/ml of Amp or Kan for pETG20a or pET28a proteins, respectively. When the OD600 reached 0.6–0.8 (2.5–4 h), then protein expression was induced with 0.25–0.5 mM isopropyl-b-thiogalactoside (IPTG) at 18 °C for 18–20 h.
Sox protein purification
pETG20a-Salhel-HMG, -Pchi-HMG -Salro-HMG, -Monbr-HMG and -Cic-HMG were purified in a similar manner. Cells were harvested, resuspended in cold His buffer A (20 mM Tris–HCl pH 8.0, 500 mM NaCl, 30 mM imidazole) and disrupted by ultrasonication on ice (4 s on/8 s off) for 5–8 min on. The lysate was cleared by centrifugation and passed through a 0.22 μm filter. The following steps were all done at 4 °C. The His6-Thx fusion proteins were captured from the supernatant using a HisTrap HP 5 mL (Cytiva, #17524801) pre-equilibrated with buffer A and eluted using His buffer B (20 mM Tris–HCl pH 8.0, 500 mM NaCl, 300 mM imidazole). The buffer was changed to SP buffer A (20 mM Tris–HCl pH 8.0, 100 mM NaCl) using HiPrep 26/10 Desalting column (Cytiva, #17508701). The fusion tag and Sox-HMG were separated by TEV digestion using a substrate: enzyme ratio of 15:1 (w:w) at 4 °C O/N. The Sox-HMG was purified by ion-exchange chromatography using a 1 mL HiTrap SP FF (Cytiva, #17505401) pre-equilibrated with SP buffer A and eluted with a salt gradient (up to 1 M NaCl). Finally, size-exclusion chromatography was performed using a HiLoad Superdex-75 16/600 column (Cytiva, #28989333) in storage buffer (20 mM Tris–HCl pH 8.0, 250 mM NaCl). Fractions with desired protein were pooled, aliquoted, flash frozen and stored at −80 °C.
Salhel POU protein purification
Cells were harvested, resuspended in lysis buffer (100 mM HEPES pH 7.0; 500 mM NaCl; 10 mM Imidazole, 10% Glycerol + [0.5 mM TCEP, 0.4 mM PMSF, 50 U/mL Benzonase® Nuclease added fresh from stock]) and incubated for 30 min on ice. The sample was disrupted by ultrasonication on ice (4 s on/8 s off) for 5–8 min on. The lysate was cleared by centrifugation and the supernatant was discarded. The cell pellet was resuspended in denaturing His Buffer A [20 mM HEPES; 500 mM NaCl; 10 mM Imidazole; 10% Glycerol; 6 M Urea; pH 7.0] and incubated at RT with spinning O/N. All steps with 6 M Urea were done at RT. The mixture was centrifuged, and the supernatant was collected. The His-tagged proteins in the supernatant were captured using HisTrap HP 5 mL (Cytiva, #17524801) pre-equilibrated with denaturing His Buffer A and eluted using denaturing His buffer B [20 mM HEPES; 500 mM NaCl; 300 mM Imidazole; 10% Glycerol; 6 M Urea; pH7.0]. The protein was then refolded via stepwise dialysis. The elute was concentrated to 10 mL and dialyzed with Slide-A-Lyzer Dialysis Cassette (7 K MWCO, Thermo Scientific, #66710) in 1 L of storage buffer [10 mM HEPES; 100 mM NaCl,10% glycerol, 0.5 mM TCEP; pH = 7.0] with 4 M Urea at RT for 2 h. The sample was then dialyzed at 4 °C to storage buffer with 2 M Urea and twice using storage buffer with without Urea (2 h first and then O/N). The final protein was concentrated using the centrifugal units, flash-frozen, and stored at −80 °C.
Electrophoretic mobility shift assay
For purified DNA binding domains
DNA probes (Supplementary Data 1) with 5′-Cy5 or 5’- FAM dyes at the forward strand and unlabeled reverse strand were mixed, heating to 95 °C followed by gradual cooling in a thermocycler in annealing buffer (20 mM Tris/HCl, 50 mM MgCl2, 50 mM KCl, pH 8.0) to make stocks of double-stranded DNA probes. Protein samples and fluorescently labeled DNA were mixed in EMSA buffer containing (10 mM Tris/HCl pH 8.0, 0.1 mg/mL BSA, 50 µM ZnCl2, 100 mM KCl, 10% glycerol, 0.10% Igepal CA630, 2 mM b-mercaptoethanol) and incubated for 1–2 h on ice in the dark. 10 µL of binding reactions were electrophoresed using the Mini-PROTEAN Tetra cell (BioRad) for 30–40 min at 200 V in the cold room (4 °C) using 12% native PAGE mini-gels pre-run at 200 V for 30 min and 1 x TG buffer (25 mM Tris, 192 mM glycine, pH 8.0). Images were captured using an Amersham Typhoon 5 Biomolecular Imager and quantified using ImageQuantTL 7.0. Apparent dissociation constants (Kd) were calculated as described in ref. 31. Cooperativity calculations were performed using established procedures49,50,95 (Supplementary Fig. 6a, b). Binding affinity was plotted as Gibbs Free Energy and calculated through (Delta G^circ={RT}{mathrm{ln}}left({apparent; Kd}right)) were R = 0.008314 kJ and T = 277.15 K. Statistics to compare the cooperativity of Sox proteins tested were calculated with R. First, Bartlett Test of Homogeneity of Variances to evaluate whether to use parametric or non-parametric test (bartlett.test). Since there were significant differences in variances found between samples., non-parametric tests were used. Kruskal-Wallis test (kruskal.test) and Games-Howell test (games_howell_test) to calculate adjusted p-values. The functions are from stats and rstatix R packages.
For full length proteins from mammalian cell extracts
Cell extract EMSAs were performed as described in refs. 31,53. In brief, HEK293T cells were used to overexpress the full length proteins of interest from a pLVTHM-3xflag plasmid. Cells were dissociated with 0.05% trypsin-EDTA and washed twice with DPBS after 72 h of overexpression. Cell pellets were lysed in lysis buffer (20 mM Hepes-KOH pH 7.8, 150 mM NaCl, 0.2 mM EDTA pH 8.0, 25% glycerol, freshly added 1 mM DTT, cOmplete™ protease inhibitor cocktail (Roche, #11836145001) using 4x freeze-thaw cycles with liquid nitrogen). The lysate was centrifuged at 14,000 x g at 4 °C for 10 min, and the supernatant which contains the protein of interest was kept. 10 μg of total protein was subjected to SDS-PAGE and Western blot analysis using anti-Flag antibody. Protein levels were adjusted based on the quantification of bands in the blot using cell lysates without exogenous protein. Reactions with the DNA probe and protein were incubated on ice in binding buffer (25 mM Hepes-KOH pH 8.0, 50 mM NaCl, 0.5 mM EDTA, 0.07% Triton X-100, 4 mg/ml BSA, 7 mM DTT,10% glycerol). 10 µL of binding reactions were electrophoresed using the PROTEAN® II xi cell (BioRad) for 2.5 h at 300 V in the cold room (4 °C) using 6% 18.5 × 20 cm native PAGE gel pre-run at 300 V for 1.5 h in 1 x TG buffer. Images are acquired with a GE Typhoon 5 Biomolecular Imager.
Specificity by sequencing (Spec-seq)
The experiment was performed essentially as described in refs. 45,93. DNA libraries (44 bp) were designed by flanking the degenerate sequences of the Octamer motif (ATGCNNNN, ATNNNNAT, NNNNTAAT) or the Sox motif (CATNNNN and NNNNGTT) with 5’ flanking sequence of GAGTCGTCTCGTCAGCAC and 3’ flanking sequence of CCGTAGAGCACTCAGGTC for downstream processing. The resulting libraries were then made into double-stranded DNA (dsDNA) using DreamTaq Green PCR Master Mix (Thermo Scientific: K1081) with the reverse complement primer (GACCTGAGTGCTCTACGG). To get rid of single-stranded DNA (ssDNA), 1 µL of Exonuclease I (New England Biolabs:M0293S) was added to the reaction mix for 30 min. The dsDNA products were then purified using homemade Qiagen PNI binding buffer, PCR purification columns (Tiangen), and eluted in ultrapure water. The respective three Octamer and the two Sox libraries were combined in equimolar amounts. Binding reactions were prepared using different concentrations of protein and 250 nM dsDNA library, in 1x NEB Cutsmart buffer supplemented with 10% glycerol. The reactions were incubated for 1 h at 4 °C, and then EMSA was performed. After the EMSA, the gels were stained with 3x GelRed® stain (Biotium: 41003) for 15 min and visualized. Each band was excised and the DNA in the gel was extracted in 150 uL PAGE diffusion buffer [500 mM Ammonium acetate; 10 mM magnesium acetate; 1 mM EDTA; 0.1 % sodium dodecyl sulfate (SDS), pH 8.0] and purified similarly to the dsDNA libraries. 6 cycles of PCR were performed using primers containing a unique molecular identifier (UMI) to account for PCR bias. A second round of PCR (28 cycles) was performed using primers compatible with Illumina adapter and containing different indexing barcodes (Primers are in Supplementary Data 1). The resulting PCR products were combined and gel purified twice and then sent to sequencing using a Illumina NovaSeq 6000 PE150 (Novogene).
Spec-seq data analysis
R packages (QuasR, Biostrings, tidyr, data.table, stringi) were used to process the paired-end sequencing data. The reads were trimmed for adapter sequences using the preprocessReads function. A library file (pseudogenome) with all the theoretically possible sequences in each experiment was created. This file was used as an alignment template for the sequences from the Spec-seq libraries with the qAlign function (Rhisat2 as aligner) resulting in a count matrix where rows are the sequence elements and columns are samples. The relative binding energy of each sequence was calculated as described in refs. 45,93 and plotted as a scatter plot using ggpubr: ggscatter in R, where ({#Si}) is the number of reads per sequence, using the formula:
$${ln}frac{{{rm{#}}}{Si}{_}{{rm{_}}}{bound}}{{{rm{#}}}{Si}{{rm{_}}}{_}{unbound}}-{ln};{ln}frac{{{rm{#}}}{Si}{_}{{rm{_}}}{bound}}{{{rm{#}}}{Si}{{rm{_}}}{_}{unbound}}{of; Concensus; motif}$$
To plot energy logos, a subset of the sequence space was used corresponding to the consensus sequences (CATTGTT or ATGCTAAT) with all possible single base mutations (N*3 + 1 where N is the length of the sequence and ‘3’ are the three possible mutations at each of the N positions). The binding energies (({mathrm{ln}}frac{{#Si_unbound}}{{#Si_bound}})) of this subset of sequences was used to generate an energy matrix using the motif_mlr.pl script available from the Gary Stormo Lab and the logo plotted using plotEnergyLogo from TFcookbook96 (https://github.com/zeropin/TFCookbook).
High throughput systematic evolution of ligands by exponential enrichment (HT-SELEX)
The HT-SELEX experiment followed the protocol outlined in ref. 61. Selection ligands were designed with an 8 bp barcode flanking a 40 bp randomized region. In summary, 50–100 ng of barcoded DNA fragments were introduced into wells containing the target proteins along with 25 μL of binding buffer supplemented with 5 μg/ml poly-dIdC-oligonucleotide (Sigma P4929-25UN) as a competitor. The plate was gently rotated at room temperature for 30 min. Following incubation, 150 μL of binding buffer, along with 10 μL of Ni Sepharose® beads previously equilibrated with poly-dIdC, were introduced into the reaction mixture. This mixture was then incubated on a rotor at room temperature for 60 min. Unbound oligomers were subsequently removed from the bound beads using a Biotek 405TS washer. Post washing, the bound DNA was eluted in 50 μL of ddH2O. For PCR amplification, 10 μL of bead suspension was utilized. The resulting PCR products served as input oligomers for the subsequent cycle or were purified for sequencing with the NovaSeq 6000 S4 PE150. This experiment involved a total of four cycles.
Data obtained were sorted based on barcodes for each sample. After discarding low-quality reads, the remaining sequences underwent trimming to eliminate adapter sequences. The resultant 40-nt region was subjected to further analysis. PWM models were generated using initial seeds identified through Autoseed, which were subsequently refined through expert analysis, in accordance with the approach outlined in ref. 61. All motif seqlogos were generated using the R package ggseqlogo97.
Modeling of Oct4-unicellular Sox complexes
Oct4 complexed with Salhel and Pchi HMG were homology-modeled bound to the canonical and compressed SoxOct DNA motifs using MODELLER 10.498,99. The structural template for the Canonical motif is a model of ternary complex Oct4 and Sox2 on the Nanog locus (5’CAGGGTCCACCATGGACATTGTAATGCAAAAGAAGCTGTAAGGTGACCC3’)24. For the compressed motif, a model of the Oct4-Sox17 complex on an idealized sequence (5’ CGGCATTGTATGCAAATCGGCGGC3’) was used24. Motifs are in bold, and the additional base of the canonical motif is in red.
We aligned individual sequences of Salhel HMG and Pchi HMG with Sox2 or Sox17, respectively. Similarly, Salhel POU was modeled using the Sox2-Oct4-Nanog locus ternary complex as a template. Salhel POU was aligned with Oct4. For all models, the automodel function was used to generate 500 models with the DNA set as a rigid body and the model refinement level set at fast. The model with the lowest discrete optimized protein energy (DOPE) score100 was selected. ChimeraX was used to analyze and visualize clashes/contacts of the complexes modeled101,102.
Cell culture
Mouse embryonic fibroblasts (MEFs) were obtained from E13.5 embryos of OG2 mice carrying transgenic Oct4-GFP (Jackson Laboratory, no. 004654) and Sox2-GFP mice (Mutant Mouse Resource & Research Centers, no. 037525-UNC) with a GFP reported driven by endogenous Sox2, maintained at the Centre for Comparative Medicine Research (CCMR) at The University of Hong Kong (CULATR no. 4855-18). MEFs were cultured in MEF medium [DMEM (Gibco, #12100046) supplemented with 10% fetal bovine serum (FBS, Gibco, #10270106), 1x Glutamax (Gibco, #35050061), 1x nonessential amino acids (NEAA, Gibco, #11140050) and 1x penicillin/streptomycin (Gibco, #10378016)]. HEK293T cells for lentivirus packaging were cultured in DMEM supplemented with 10% FBS. Mouse ESC medium (mES medium) is composed of: DMEM with 15% FBS, 1x GlutaMax, 1x NEAA, 1 mM sodium pyruvate (Gibco, #11360070), 0.005 mM β-mercaptoethanol (Gibco, #31350010), 50 μg/mL Vitamin C (Sigma-Aldrich, #49752), 0.5x penicillin/streptomycin and 10 ng/mL leukemia inhibitory factor (LIF, produced in house). 2i/LIF medium is mES medium supplemented with 3 μM CHIR99021 (Selleck, #S2924-25mg), 1 μM PD0325901 (Selleck, #S1036-25 mg)103. Pluripotent stem cells were maintained on either feeder layers (ICR MEFs mitotically inactivated with mitomycin-C) or on 0.2% gelatin-coated plates (Sigma-Aldrich, #G1393) and cultured in mES medium or 2i/LIF medium. Medium was replaced with fresh medium every other day, and passaged at 1:10 split ratio when reaching around 70% confluency. All cells were cultured in incubators at 37 °C with 0.5% CO2 and normoxic conditions.
Lentivirus production and reprogramming
HEK 293 T cells were seeded at 8 millions of cells per 10 cm plate. On the next day, 10 μg lentiviral vector and 40 μg linear polyethyleneimine (Polysciences, #23966) dissolved in 1 mL DMEM were added. The medium was replaced after 10 ~ 15 h and virus-containing supernatants were collected at 48 h, 72 h, and 96 h post-transfection and filtered through a 0.45 µm filter (Millipore). The virus medium was supplemented with 8 μg/mL polybrene (Sigma-Aldrich, #40804ES76) before transduction. MEFs were seeded at a density of 7 × 103 cells per well of a 24-well plate one-day before transduction. The virus-containing medium was replaced after 24 h with mES medium. This day was defined as reprogramming day 0 and the medium was replaced daily on subsequent days. Whole well scans were taken using the GE Amersham Typhoon™ 5 Biomolecular Imager. To establish clonal iPSC lines, iPSC colonies were picked at day 14 using a syringe, dissociated into single-cell suspension by pipetting up and down in 30 μL of 0.05% Trypsin-EDTA (Thermo Fisher Scientific, #25300062) and incubating at 37 °C for 5 min and seeded into 48-well plate pre-coated with ICR feeders. The cells were cultured for 5 ~ 7 days until sizeable iPSC colonies developed and two more rounds of picking were conducted to obtain pure clonal lines.
Genotyping of iPSC lines
To genotype the iPSCs reprogrammed with unicellular Sox, 500,000 cells of the clonal iPSC lines were harvested for genomic DNA isolation using Quick-DNA Microprep Kit (Zymo, #D3021). The isolated genomic DNA was used to examine the integrated transgene by PCR with different specific primers (Supplementary Data 1) and verified by Sanger sequencing.
Quantitative RT-PCR analysis
Total RNA was extracted using TRIzol (Thermo Fisher Scientific, #15596026) and 2 μg was used to synthesize cDNA with ReverTra Ace® qPCR RT Master Mix (Toyobo, FSQ-201S). Quantitative PCR was performed using iTaq universal SYBR Green Supermix (Bio-Rad, #1725124) with primers listed in Supplementary Data 1. β-actin was used for normalization to calculate the relative gene expression. The R package ggplot2 was used to plot the results (https://ggplot2.tidyverse.org).
Immunocytochemistry
Cells were washed three times with PBS and fixed in 4% paraformaldehyde in PBS at room temperature for 20 min, followed by permeabilization with 0.1% Triton X-100 and blocking with 5% BSA in PBS at room temperature for 1 h. Fixed cells were washed three times with PBS, and incubated with primary antibodies Nanog (Novus Biologicals, #NB100-58842, 1:500 dilution), and Sox2 (Santa Cruz Biotechnology, #sc-365823, 1:400 dilution) at 4 °C overnight. The cells were then washed three times for 5 min with PBST (PBS with 0.1% Tween-20) and incubated with fluorescent-dye conjugated secondary antibodies (Invitrogen, Alexa Fluor 488 dye: #A21203/A21207, 1:1000 dilution) at room temperature for 1 ~ 2 h. The cells were then washed with PBST three times for 5 min. For nuclei counterstaining, NucBlue™ Fixed Cell ReadyProbes™ Reagent (DAPI) (#R37606) was used, following the instruction from the kit. Images were captured with the inverted fluorescent microscope (Olympus CKX53).
Spontaneous differentiation of iPSCs into endoderm, mesoderm, and ectoderm
Clonal iPSCs were dissociated using 0.05% Trypsin-EDTA and seeded to 96-well low attachment plates (1·103 cells/well) in mES medium without LIF for 7 days to generate embryoid bodies (EBs). EBs were then seeded on gelatin-coated 12-well plates (20 EBs per well) and cultured in the differentiation medium (DMEM/F12 + 20%FBS + 1% Glutamax) for another 10 days. The differentiation medium was exchanged every other day. After 10 day differentiation in gelatin-coated plates, spontaneously differentiated EBs were further analysed by immunocytochemistry. To evaluate tri-lineage differentiation potential, cells were stained with primary antibody against three germ layer markers respectively (FoxA2, #8186 T Cell Signaling, 1:500 dilution; TUJ1, #PA5-85639 Invitrogen, 1:500 dilution; α-SMA, #A2547 Sigma-Aldrich, 1:500 dilution) at 4 °C overnight, followed by incubation with corresponding fluorescent-dye conjugated secondary antibodies (Invitrogen, Alexa Fluor 594 dye: #A11055/A1100/A21202, 1:1000 dilution) at room temperature for 1 ~ 2 h. The steps of fixation, permeabilization, blocking, washing, DAPI staining, and imaging are the same as described above for immunocytochemistry.
Statistics & reproducibility
No statistical method was used to predetermine sample size. To ensure accurate cooperative calculations from heterodimer EMSA, lanes with a band with a fractional contribution below 0.03 were excluded. This ensures that only lanes representing all four microstates at equilibrium are included. The experiments were not randomized. For the experimental setup of reprogramming tests, technical replicates involve using the same batch of reagents and cells within an experiment, while biological replicates involve using distinct batches of MEF cells sourced from different mouse embryos. No data was excluded for quantification. All the key reagents and equipment used in the study were listed in Supplementary Data 3.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
- SEO Powered Content & PR Distribution. Get Amplified Today.
- PlatoData.Network Vertical Generative Ai. Empower Yourself. Access Here.
- PlatoAiStream. Web3 Intelligence. Knowledge Amplified. Access Here.
- PlatoESG. Carbon, CleanTech, Energy, Environment, Solar, Waste Management. Access Here.
- PlatoHealth. Biotech and Clinical Trials Intelligence. Access Here.
- Source: https://www.nature.com/articles/s41467-024-54152-x