Chromatin landscape dynamics during reprogramming towards human naïve and primed pluripotency reveals the divergent function of PRDM1 isoforms

Chromatin dynamics during human iPSCs reprogramming

To investigate the chromatin dynamics during somatic cell reprogramming toward different states of pluripotency, we employed the 2° human induced pluripotent stem cell (hiPSC) reprogramming system that we previously developed [9]. This system involves the transfection of somatic cells with gene cassettes containing doxycycline (Dox)-inducible Yamanaka factors, facilitating the generation of clonal iPSCs. These iPSCs can subsequently be differentiated back into somatic cells, which possess the potential for a second reprogramming upon Dox reinduction. By introducing constitutive expression of human telomerase reverse transcriptase (TERT) into our 2°-inducible fibroblasts system (2° hiF-T), we successfully overcame the challenges of inefficiency and heterogeneity encountered in the primary reprogramming system. We directed these 2° hiF-T cells towards naïve and primed states of pluripotency (Fig. S1A). We harvested CD326+ cells, the putative pluripotent intermediates, at days 6, 8, 14, 20, and 24, alongside the initial fibroblasts (hiF-T) and the final iPSCs, which were subjected to ATAC-seq and RNA-seq analyses (Fig. 1A and Supplementary Table 1).

Fig. 1: ATAC-seq and mRNA-seq revealed highly dynamic chromatin changes during human naïve and primed reprogramming.
figure 1

A Schematic experimental design of human 2° hiF-T (immortalized fibroblasts with inducible OCT4, SOX2, NANOG, C-MYC cassette) reprogramming; Fan diagram within nuclei represents proportions of pluripotency states; MACS: magnetic-activated cell sorting. B Principal component analysis (PCA) on the ATAC-seq data collected from cells at different stages of naïve and primed reprogramming. C Open chromatin regions were categorized into three types: consistently accessible throughout reprogramming as permanently open (PO), regions transitioning from closed in fibroblasts to open as the reprogramming proceeded as closed to open (CO), and the converse as open to closed (OC). Log10-transformed FPKM values are used to represent the degree of chromatin accessibility at a given region. D Numbers of PO, CO, and OC regions at each stage of naïve and primed reprogramming. E Enrichment of naïve reprogramming ATAC-seq peaks at different genomic features. F Box plots depicting the normalized expression levels of genes within 10 kb of PO, CO, and OC regions in naïve reprogramming. The green dashed line indicates the median gene expression level corresponding to the hiF-T stage. Mann–Whitney U test versus the levels in hiF-T, *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. G Gene ontology (GO) analysis on all genes within 10 kb of the PO, CO, and OC regions in naïve reprogramming. H Representative gene expression and chromatin accessibility in naïve reprogramming; pluripotency genes, DPPA3, KLF4, POU5F1; somatic genes, AMOTL1, RUNX2.

Additionally, to interrogate the effects of sustained exogenous Yamanaka factor expressions post-day 20, we collected samples with continued Dox supplement until day 24 (Fig. 1A). ATAC-seq signals revealed a strong correlation among day 24 replicates (Fig. S1B). Moreover, samples with or without Dox from day 20 to day 24 demonstrated substantial correlation (Fig. S1B), indicating that by 20 days of reprogramming, the emerging iPSCs attained a relatively stable chromatin state, irrespective of continuous Yamanaka factor expression.

Chromatin accessibility dynamics appeared congruent in both naïve and primed reprogramming paths, with day 6 marking a significant juncture coinciding with medium changes (Fig. 1B). Despite this similarity, transcriptomic landscapes diverged significantly. A pronounced shift in transcriptome profiles was noted until day 14 for naïve reprogramming, whereas the corresponding change for primed reprogramming occurred around day 8 (Fig. S1C). By day 20 and day 24, the transcriptomic signatures of reprogrammed cells displayed minimal divergence from those of stable, cultured iPSCs (Fig. S1C). There are some differences between naïve 20/24 and naïve iPSCs (Fig. S1C), which we believe are primarily due to poor genomic stability during prolonged 5iLAF naïve medium culture [10]. Using ATAC-seq data, peaks in open chromatin regions were identified (Fig. S1D). The temporal tracking of the same genomic loci revealed highly dynamic chromatin accessibility throughout both naïve and primed reprogramming trajectories (Fig. S2A). We categorized regions consistently accessible throughout reprogramming as permanently open (PO), regions transitioning from closed in fibroblasts to open as the reprogramming proceeded as closed to open (CO), and the converse as open to closed (OC) (Fig. 1C). Regarding the abundance of ATAC-seq peaks, the OC regions consistently outnumbered the CO regions until day 20 during naïve reprogramming, a trend echoed in the primed reprogramming (Fig. 1D). Also, the number of CO regions progressively increased over time during both the naïve and primed stages, reaching its peak at the iPSC stage (Fig. 1D). Genomic feature annotations exhibited a comparable distribution of peaks in all clusters (Fig. 1E). In addition, we also observed a positive correlation between chromatin accessibility and gene expression levels (Fig. S1E) as previously reported [11], reinforcing the interplay between the epigenetic landscape and transcriptional activity during reprogramming.

We further annotated the defined PO, CO, and OC regions to the nearest genes within a 10-kilobase (kb) distance and quantified the associated gene expression levels (Fig. 1F and S2B). During naïve reprogramming, we observed a consistent and significant upregulation of gene expression in the CO regions at each examined time point compared to the initial hiF-T cells. In contrast, genes associated with the OC regions demonstrated a marked decrease in expression starting from day 8, while genes in the PO regions exhibited negligible changes in expression levels (Fig. 1F and S2C). During primed reprogramming, the gene expression patterns in the CO regions mirrored those observed during naïve reprogramming (Fig. S2B). However, gene expression within the OC regions was slightly up-regulated, while PO region-associated gene expression was significantly elevated (Fig. S2B).

Gene Ontology (GO) analysis for the genes in ATAC peaks revealed distinct enrichment (Fig. 1G and S2D): In the CO regions, the associated genes are primarily involved in pluripotency or early embryonic development processes. These genes drive successful reprogramming, as evidenced by GO terms such as “cell fate commitment”, “regulation of stem cell proliferation”, and “regulation of embryonic development”, which are well-aligned with established knowledge. Conversely, in the OC regions, the corresponding genes are associated with somatic cell lineages and biological activities that reflect a differentiated state. GO terms enriched in these regions, such as “positive regulation of neuron differentiation”, “T cell activation”, and “fibroblast migration” (Fig. 1G and S2D), further support this differentiated signature. For the PO regions, the enriched terms, including “response to transforming growth factor beta”, “extrinsic apoptotic signaling pathway”, and “cell growth”, align with the involvement of these genes in fundamental biological activities and key signaling pathways. Specifically, pluripotency-specific genes such as DPPA3, KLF4, and POU5F1, along with developmental patterning genes like LEFTY1 and FOXA2, gradually acquired an open chromatin state throughout reprogramming. Conversely, the chromatin surrounding those somatic genes, like AMOTL1 and RUNX2, progressively condensed, indicating a gradual shutdown of their transcriptional activity (Fig. 1H and S2E). We also validated these changes on the protein level (Fig. S2F, G). These results fully demonstrate the complexity and diversity of the human somatic reprogramming process.

Distinct trajectories between naïve and primed hiPSCs reprogramming

To concisely elucidate the divergent paths of naïve and primed reprogramming, RNA-seq and ATAC-seq data from both reprogramming strategies were integrated using a three-dimensional principal component analysis (PCA) (Fig. 2A). Analysis revealed distinct trajectories, underscoring substantial changes that affirm the disparate nature of these two reprogramming systems (Fig. 2A). Notably, chromatin accessibility began to differ on day 8, post-medium alteration, which preceded the dramatic transcriptome discrepancies that emerged around day 14 (Fig. 2A). This temporal shift indicates that alterations in chromatin accessibility are an early indicator of the ensuing transcriptional changes. The positive correlation between ATAC-seq signals and gene expression levels (Fig. S1E, S3A) further indicated the functional significance of the chromatin accessibility dynamics that we observed. Intriguingly, our findings suggested that a primed state is not a prerequisite for achieving naïve pluripotency, corresponding to an earlier developmental stage than the primed state.

Fig. 2: Chromatin accessibility, transcriptome, and TF motif analysis revealed distinct trajectories between naïve and primed hiPSCs reprogramming.
figure 2

A Three-dimensional PCA of the integrated ATAC-seq (left) and RNA-seq (right) datasets from cells at different stages of naïve and primed reprogramming. B Venn diagram showing the overlap between naïve and primed reprogramming in PO, OC, and CO regions from all time points. C GO analysis on all genes within the shared closed regions of naïve and primed reprogramming. D Heatmap illustrating the corresponding temporal relationship of chromatin changes in the shared OC regions (left) and CO regions (right) between naïve and primed reprogramming, derived from the overlapping peaks in (B). Color intensity indicates the number of peaks simultaneously open and closed at corresponding time point in naïve and primed conditions. E The bubble chart depicting the enrichment of transcription factor motifs at the CO regions in naïve and primed reprogramming. F Unsupervised clustering analysis of transiently opened or closed chromatin regions shown in Fig S2A. These clusters are divided into 6 categories: shared transient (C1–C4), shared loss (C5), shared up (C6), naïve transient (C7, C8), naïve up (C9), and primed up (C10). Solid lines and ribbons represent the mean of standardized ATAC-seq signals across clusters ± s.d. G The bar plot showing the count of each cluster given in Fig. 2F. H Heatmap displaying the gene expression levels corresponding to the most significantly enriched TF motifs in each cluster as depicted in (F).

We also compared the similarities and differences between the naïve and primed reprogramming processes in open chromatin regions from different perspectives. The PO regions, as well as OC regions in naïve reprogramming, overlapped with more than two-thirds of the primed PO regions, indicating a shared chromatin landscape (Fig. 2B). However, the CO regions in the naïve reprogramming system exhibited minimal overlap with the primed CO regions (~11%), mirroring the distinctive cellular identities achieved by each reprogramming approach (Fig. 2B). The number of genes within these regions also displayed a similar trend (Fig. S3B). Under the force of the OSKM reprogramming factors, chromatins associated with somatic identity were rapidly turned off in both naïve and primed reprogramming (Fig. 2D). In contrast, in the common CO regions, naïve and primed reprogramming did not reveal relatively more co-opened pluripotent regions until the iPSC stage (Fig. 2D), which is consistent with the number of open chromatin regions at different stages shown in Fig. 1D. This pattern indicated that gene interactions and regulatory mechanisms within CO regions were pivotal in dictating the reprogramming processes.

Next, we performed transcription factors (TFs) motif enrichment analysis within the identified chromatin dynamic regions. We observed several types of TFs enriched in the CO or OC regions at different time points during naïve and primed reprogramming, with apparent discrepant TF family preference between CO and OC regions (Fig. 2E and S3C). TFs like TWIST2, AP-1, FOSL2, JUNB, and RUNX were mainly enriched in the OC regions. Notably, NRF1 demonstrated a unique enrichment pattern, specifically abundant in the PO regions (Fig. S3C). Moreover, in the OC and PO regions, the enrichment scores of each TF showed no significant difference in naïve reprogramming or primed reprogramming (Fig. S3C). As for the CO regions, it is important to note that the degree of enrichment significance for some TF families, including AP-2, SOX2, POU5F1, GATA3, and KLF4, varies between naïve and primed reprogramming, suggesting their differential regulatory impacts on the reprogramming processes. (Fig. 2E).

In our analysis, we classified genomic regions into three categories (CO, OC, PO) based on their persistent accessibility status (Fig. 1C), indicating that once altered, these regions remained consistently open or closed until the end of reprogramming. This classification was referred to as the pattern style. Contrarily, we observed high chromatin accessibility dynamics, with numerous regions transiently altering status (Fig. S2A) (termed non-pattern style henceforth). Through unsupervised clustering of these non-pattern regions, we identified 10 clusters reflecting chromatin accessibility trends, which were further categorized into 6 groups: shared transient (C1-C4), shared loss (C5), shared up (C6), naïve transient (C7, C8), naïve up (C9), and primed up (C10) (Fig. 2F, G). Correspondingly, the trends in gene expression adjacent to these non-pattern regions followed a similar trend to the dynamics of chromatin accessibility (Fig. 2F and S3D).

Motif analysis was also performed across these non-pattern regions. Clusters C1–C4 (shared transient) were significantly enriched for both pluripotent TFs, such as POU5F1 and SOX2, and somatic TFs, including TCF21 and TWIST2 (Fig. S3E). This observation was consistent with a previously established model proposing that during the initial phase of reprogramming, somatic TFs were redistributed by core pluripotent TFs from their original somatic enhancer loci to transient opened loci to repress the somatic program [12]. The expression changes of the TFs exhibited a different pattern between naïve and primed reprogramming, suggesting the same TF may function as distinct roles during two reprogramming processes (Fig. 2H). Using somatic reprogramming as an example, these findings highlight the significant differences in developmental and differentiation trajectories between the naïve and primed states of pluripotency.

Identifying epigenetic factors responsible for human naïve reprogramming

Our analysis revealed significant distinctions in the CO regions between naïve and primed reprogramming (Fig. 2B, D). In human reprogramming studies, most of the work focuses on primed states, and less on naïve states, so we choose to focus on the naïve reprogramming process. As the complex network of gene interactions and regulatory mechanisms within these CO regions may play a pivotal role in dictating the success of reprogramming, we then sought to explore the involvement of epigenetic factors in modulating chromatin dynamics during naïve reprogramming. To this end, CO peaks (26246 regions, Supplementary Table 2) across various stages of reprogramming day 2, day 6, day 8, and day 24 from ATAC-seq data, including the transient open peaks identified in C8 (Fig. 2F and S3C), were collected. By focusing on genes with their transcription start site (TSS) located within 10 kb of these 26246 CO regions, we refined our search to those up-regulated genes during naïve reprogramming (Supplementary Table 3). Further refinement using an annotated library of epigenetic factors [13] led us to identify 41 candidate genes likely influencing chromatin accessibility during human naïve reprogramming (Fig. 3A).

Fig. 3: Two PRDM1 isoforms exhibit distinct effects during naïve reprogramming.
figure 3

A Schematic of the strategy used to identify PRDM1. The CO and C8 peaks were annotated to 5837 TSSs within 10 kb, of which 1127 genes were upregulated during naïve reprogramming. After the intersection with the epigenetic factors database, 41 candidate genes were selected as potential regulators. B Schematic experimental design of knocking down selected candidate genes with short hairpin RNA (shRNA) during reprogramming. C Bright field and fluorescent images of reprogramming intermediates in n14d and p14d upon shPRDM1, empty vector was set as Ctrl. 3-5 pictures were taken. The experiment was repeated three times. Scale bar, 100μm. D Relative expression of PRDM1, NANOG and STELLA upon shPRDM1 via qPCR in naïve reprogramming. n = 3; Two-way ANOVA, ****, adjusted p-value < 0.0001. E Gene expression of PRDM1α and PRDM1β (bar plot), and the fold difference (blue line) between them during reprogramming. F The snapshots of the browser view showing chromatin accessibility dynamics near PRDM1α and PRDM1β during reprogramming. G Clone numbers under shPRDM1α and shPRDM1β treatments in naïve reprogramming. The experiment was repeated three times; one-way ANOVA, ** adjusted p-value = 0.0013. H Relative expression of PRDM1α, PRDM1β, POUF51, NANOG, and STELLA under shPRDM1α and shPRDM1β treatments in naïve reprogramming. n = 3; Two-way ANOVA, *, adjusted p-value = 0.0478; ****, adjusted p-value < 0.0001.

These candidate genes were categorized into 4 groups according to their expression patterns during naïve reprogramming (Fig. S4A). Group 1 genes, including ARRB1, RNF2, PAXIP1, BRWD3, PCGF6, and PRDM1, exhibited a progressive up-regulation along with the reprogramming process. Conversely, group 2 genes remained consistently expressed, showing negligible expression changes. Group 3 genes were characterized by a sudden surge in expression at the later stages of reprogramming, with notable members such as DPPA3, DNMT3B, and TET1 reported for their critical roles in human naïve pluripotency. Group 4 genes displayed low and stable expression levels. Given these insights, we focused on group 1 genes for further investigation. Moreover, TFAP2A and TFAP2C were identified to be potential TFs that could regulate reprogramming chromatin remodeling (Fig. 2E), which were also included in our candidate regulators for further analyses. These identified candidate genes provide strong material for further investigating the critical role of key epigenetic factors in reprogramming.

TFAP2C was required for naïve reprogramming

For functional validations, we designed specific short hairpin RNAs (shRNAs) to knock down these candidate genes during naïve programming (Fig. 3B). Primed reprogramming was also performed as another control condition. Compared to the control, knocking down TFAP2A or TFAP2C affected naïve reprogramming and had no noticeable impact on primed reprogramming (Fig. S4B), which was consistent with previous studies on TFAP2A and TFAP2C that could regulate pluripotent programs [14,15,16]. TFAP2C has also been reported to be crucial in regulating human naïve pluripotency maintenance [17]. In our analysis, the promoter and enhancer regions of the TFAP2C locus gradually acquired chromatin accessibility (Fig. S4C), and the expression level of TFAP2C elevated along with the naïve reprogramming process (Fig. S4D). TFAP2C knockdown during naïve reprogramming led to fewer iPSC colonies (Fig. S4E). Detailed inspection showed that the knockdown of TFAP2C impaired naïve iPSCs but had a negligible impact on primed iPSCs cellular phenotypes (Fig. S4F, upper panel). In contrast, its silencing during naïve reprogramming led to decreased expression of naïve pluripotency-specific genes (Fig. S4G). These results confirmed the significance of TFAP2C in establishing naïve pluripotency and further substantiated the efficacy of our selection strategy.

PRDM1 exhibits dual characters during naïve reprogramming

Compared to the non-targeting control shRNAs, knocking down PRDM1 or PAXIP1 significantly affected naïve reprogramming, with the latter resulting in almost no colony (Fig. S4B). Similar phenotypes were observed in PAXIP1-knockdown during primed reprogramming, although they exhibited a modest effect (Fig. S4B). Unlike PAXIP1 knockdown, we could keep and harvest iPSC colonies upon PRDM1 silence for further characterization.

Beyond the colony formation capability, the morphology of these colonies provided additional insights. Notably, silencing PRDM1 in naïve reprogramming significantly reduced the number of iPSC colonies, albeit with improved morphological characteristics (Fig. 3C, D). In contrast, PRDM1 knockdown during primed reprogramming did not elicit a similar phenotype (Fig. 3C). Known as B lymphocyte-induced maturation protein-1 (BLIMP-1), PRDM1 plays a critical role in the differentiation and maturation of various immune and germline cells [18,19,20,21]. It modulates gene expression as a transcriptional regulator by engaging with histone-modifying enzymes, typically leading to transcriptional silencing [22,23,24,25,26]. Despite its well-established roles, the involvement of PRDM1 in both naïve and primed reprogramming processes remains unexplored.

Intriguingly, a detailed analysis of the PRDM1 gene locus revealed the presence of two splicing variants, PRDM1α and PRDM1β, with the latter being driven by an alternative promoter located in the third exon of PRDM1α (Fig. S4H). The expression of the two isoforms during somatic reprogramming was inversely correlated, with PRDM1α expression gradually decreasing along with primed reprogramming and PRDM1β expression being consistently low (Fig. 3E, F), chromatin accessibility within the gene loci displayed a similar trend (Fig. 3F). Conversely, during naïve reprogramming, a reciprocal expression pattern emerged: an increase in PRDM1α expression was accompanied by a decrement in PRDM1β expression, indicative of an exclusive expression relationship (Fig. 3E, F). As the shRNA sequences designed initially targeted the common exons of both PRDM1α and PRDM1β transcripts, leading to the simultaneous knockdown of both isoforms, the phenotypes observed in naïve reprogramming with fewer quantity and better quality of colonies could be attributed to the disparate roles of the PRDM1 isoforms.

To test this hypothesis, shRNA sequences specific to the unique regions of PRDM1α and PRDM1β were designed and utilized for isoform-specific knockdowns to dissect their contributions during naïve reprogramming. Suppression of PRDM1α did not substantially alter the quantity of obtained iPSC colonies relative to the control group (Fig. 3G and S4F). Yet, improvements in colony morphology were evident alongside a modest upregulation of the pluripotent genes POU5F1 and STELLA (Fig. 3H and S4F). Conversely, targeting PRDM1β alone resulted in a significant decrease in the quantity of iPSC colonies (Fig. 3G and S4F) and concomitantly reduced the expression of pluripotency markers NANOG and STELLA (Fig. 3H). These results suggest that a single PRDM1 gene can have multiple distinct roles in the reprogramming process, broadening the scope of gene function research.

PRDM1α nor PRDM1β was indispensable to maintaining naïve pluripotency

Next, we investigated whether the two isoforms of PRDM1 function in maintaining naïve pluripotency. Knockdown of PRDM1α and PRDM1β separately or collectively in established naïve human embryonic stem cell (hESC) lines did not result in significant morphological changes (Fig. 4A). RNA-seq analysis revealed a minimal impact on gene expression, with no substantial changes in the expression levels of critical naïve pluripotency-associated genes such as DPPA3, KLF4, and NANOG (Fig. 4B, C).

Fig. 4: Knockdown or overexpression of PRDM1α and PRDM1β showed negligible effects on naïve pluripotency maintaining.
figure 4

A Bright field and fluorescent images of naïve hESCs in Ctrl (empty vector), shPRDM1α, shPRDM1β, and shPRDM1(α + β). Scale bar, 200μm. B Expression levels of pluripotent genes upon PRDM1 knockdown. T-test versus the levels in shCtrl, *p < 0.05, and empty means non-significant. C Scatter plot of differential expression genes between shPRDM1α, shPRDM1β and shPRDM1(α + β) versus shCtrl. Differential expression genes were defined with fold-change > 2 and q value < 0.05 using the R package Ballgown. D Immunofluorescence images of Ctrl (empty vector), OE PRDM1α, OE PRDM1β in naïve hESCs. Scale bar, 50μm. E Expression levels of PRDM1α and PRDM1β upon their overexpression. F Expression levels of pluripotent genes upon the overexpression of PRDM1α or PRDM1β. All the genes showed no significant expression change versus the control group using the T test. G PRDM1α and PRDM1β specific proteome in naïve hESCs. H Protein structures of PRDM1α and PRDM1β. I PR domain structure prediction of PRDM1β and structure compared with PRDM1α.

Meanwhile, efforts to establish naïve hESC lines with inducible overexpression systems for PRDM1α and PRDM1β faced challenges. Despite utilizing standard single-cell cloning steps, the obtained cell lines showed substantial heterogeneity in protein overexpression levels, especially for PRDM1β (Fig. 4D). Achieving PRDM1β overexpression proved to be especially difficult, resulting in only a slight increase in protein expression (Fig. S5A, B, Supplementary Original Blots). This suggests the existence of a regulatory mechanism limiting PRDM1β protein overexpression. However, this restriction does not extend to PRDM1α and PRDM1β mRNAs (Fig. 4E). RNA-seq analysis for these inducible overexpression lines showed minimal differential gene expression (Fig. S5C, D), with no effect on the expression of naïve pluripotency-associated markers such as DPPA3, KLF4 and NANOG or shared pluripotency genes POU5F1 and SOX2 (Fig. 4F). Moreover, collective comparisons of transcriptomic profiles from knockdown and overexpression lines with published data on naïve hESCs revealed no significant discrepancies (Fig. S5E). Stemness analysis suggested that genes up-regulated following PRDM1α knockdown were more associated with stem cell signatures than other groups (Fig. S5F, G).

We also performed immunoprecipitation-mass spectrometry (IP-MS) on the PRDM1α and PRDM1β overexpression cell lines. After eliminating the IgG background data, we identified 355 and 323 interacting proteins for PRDM1α and PRDM1β, respectively, with 43 proteins shared between them (Fig. S5H and Supplementary Table 4). GO analysis of these interactions revealed that the proteins specifically interacting with PRDM1α and PRDM1β were involved in similar biological processes, notably in translation, gene expression, and ribonucleoprotein complex biogenesis (Fig. S5I, J), underlying their crucial roles in these processes. PRMT5, a known PRDM1 cofactor, featured prominently among the interacting proteins for both PRDM1 isoforms, suggesting that this interaction does not rely on the intact PR domain (Fig. 4G, H). Structural predictions for PRDM1β using AlphaFold2, compared to the PR domain structure of PRDM1α (PRDM1, Protein Data Bank (PDB): 3DAL), revealed the absence of 5 β-sheets in PRDM1β’s PR domain (Fig. 4I), highlighting structural variances that merit further investigation. In conclusion, our findings indicate that neither PRDM1α nor PRDM1β is essential for maintaining naïve pluripotency.

PRDM1α and PRDM1β targeted different genomic loci

To investigate the binding patterns of PRDM1α and PRDM1β during reprogramming, we employed the recently developed Cleavage Under Targets & Tagmentation (CUT&Tag) technique [27] to map the binding sites of PRDM1α and PRDM1β throughout reprogramming. Given the sequence similarity between PRDM1α and PRDM1β, commercially available antibodies cannot distinguish them well. We constructed HA-tagged overexpression vectors to overcome this limitation and introduced them into hiF-T cells for naïve reprogramming. However, we observed massive death of cells at the initial and intermediate stages, preventing the acquisition of sufficient iPSC colonies by the end of reprogramming.

We noted distinct chromatin accessibility profiles and significant transcriptomic changes at day 6 and day 14 of reprogramming, respectively (Fig. 1B and S1C). These observations led us to focus the CUT&Tag analysis between day 6 and day 14 to mitigate potential cellular toxicity while capturing differential binding behaviors. Lentiviral vectors carrying HA-PRDM1α/PRDM1β-EGFP were used to infect naïve reprogramming intermediates collected from day 8, with subsequent CUT&Tag analysis conducted after additional reprogramming until day 10 (Fig. 5A). This analysis identified 3,343 binding sites for PRDM1α and 471 for PRDM1β, with only 53 overlapping sites (Fig. S6A). Detailed examination showed that both PRDM1α and PRDM1β predominantly occupied promoter and 5’ UTR regions (Fig. 5B), reflecting classical TF binding patterns. PRDM1β exhibited more enrichment in promoter and 5’ UTR regions compared to PRDM1α (Fig. 5B). The binding signals displayed distinct binding patterns between PRDM1α and PRDM1β (Fig. 5C and S6B), indicating that PRDM1β may function as an auxiliary member to broaden the PRDM1 regulation network.

Fig. 5: CUT&Tag revealed distinct binding profiles of PRDM1α and PRDM1β.
figure 5

A Schematic experimental design of CUT&Tag analysis during naïve reprogramming. B Enrichment of PRDM1α and PRDM1β binding sites at different genomic features. C Heatmap showing the binding signals of PRDM1α, PRDM1β, and EGFP (empty vector) on PRDM1α and PRDM1β binding sites. D Venn diagram of annotated genes within PRDM1α and PRDM1β binding sites. E Percentage of negative or positive gene expression correlation between PRDM1α– and PRDM1β-target genes and themselves. F The snapshots of the browser view showing chromatin accessibility dynamics and PRDM1 binding signals near PRDM1α- and PRDM1β-target genes, SPRED2 and DDAH1 (upper panel). The expression levels of SPRED2 (bar plot) and PRDM1α (red line), DDAH1 (bar plot), and PRDM1β (blue line) (bottom panel). G Proposed auto-regulation model of the study.

We then focused on genes closely located to these binding sites, excluding those with low expression (FPKM < 1), and obtained 2108 genes targeted by PRDM1α and 382 by PRDM1β (Fig. 5D, Supplementary Table 5). Further analysis showed that 80% of PRDM1α-targeted genes positively correlated with PRDM1α expression, whereas PRDM1β-targeted genes mainly exhibited negative correlation with PRDM1β expression (Fig. 5E and S6C). This led us to hypothesize that the functional targets of PRDM1α and PRDM1β could be identified based on their expression correlation patterns. Based on the previously observed reprogramming phenotypes during the knockdown of PRDM1α and PRDM1β, we intersected the target genes of PRDM1α with the gene set “epithelial-mesenchymal transition” (GO:0001837) and the target genes of PRDM1β with the gene set “negative regulation of cell population proliferation” (GO:0008285). This resulted in only one gene, SPRED2, for the former, and 6 genes, ATOH8, CITED2, DDAH1, HGS, MAP2K1, and SHOC2, for the latter.

Then we highlighted SPRED2 and DDAH1 as downstream targets of PRDM1α and PRDM1β, respectively (Fig. 5F). SPRED2, a member of the SPRED protein family, is important in the epithelial-mesenchymal transition (EMT) process [28,29,30]. In mouse ESCs, Spred2 enhances self-renewal and proliferation [31]. DDAH1 functions as a cysteine hydrolase enzyme that metabolizes endogenous inhibitors of nitric oxide synthase (NOS) [32]. A deficiency in DDAH1 significantly impairs endothelial cell proliferation [33]. Additionally, DDAH1 or SPRED2 knockdown significantly impaired naïve reprogramming accessed by day 20 (Fig. S6D, E). These results suggest that different isoforms of the PRDM1 gene can target distinct genomic loci, exhibiting a complex and diverse regulatory mechanism.