Search
Close this search box.

Scientists leverage machine learning to decode gene regulation in the

SAN FRANCISCO—May 24, 2024—In a scientific feat that broadens our knowledge of genetic changes that shape brain development or lead to psychiatric disorders, a team of researchers combined high-throughput experiments and machine learning to analyze more than 100,000 sequences in human brain cells—and identify over 150 variants that likely cause disease.  

SAN FRANCISCO—May 24, 2024—In a scientific feat that broadens our knowledge of genetic changes that shape brain development or lead to psychiatric disorders, a team of researchers combined high-throughput experiments and machine learning to analyze more than 100,000 sequences in human brain cells—and identify over 150 variants that likely cause disease.  

The study, from scientists at Gladstone Institutes and University of California, San Francisco (UCSF), establishes a comprehensive catalog of genetic sequences involved in brain development and opens the door to new diagnostics or treatments for neurological conditions such as schizophrenia and autism spectrum disorder. Findings appear in the journal Science.

“We collected a massive amount of data from sequences in noncoding regions of DNA that were already suspected to play a big role in brain development or disease,” says Senior Investigator Katie Pollard, PhD, who also serves as director of the Gladstone Institute for Data Science and Biotechnology. “We were able to functionally test more than 100,000 of them to find out whether they affect gene activity, and then pinpoint sequence changes that could alter their activity in disease.”

Pollard co-led the sweeping study with Nadav Ahituv, PhD, professor in the Department of Bioengineering and Therapeutic Sciences at UCSF and director of the UCSF Institute for Human Genetics. Much of the experimental work on brain tissue was led by Tomasz Nowakowski, PhD, associate professor of neurological surgery in the UCSF Department of Medicine.

In all, the team found 164 variants associated with psychiatric disorders and 46,802 sequences with enhancer activity in developing neurons, meaning they control the function of a given gene.

These “enhancers” could be leveraged to treat psychiatric diseases in which one copy of a gene is not fully functional, Ahituv says: “Hundreds of diseases result from one gene not working properly, and it may be possible to take advantage of these enhancers to make them do more.”

Organoids and Machine Learning Take the Spotlight

Beyond identifying enhancers and disease-linked sequences, the study holds significance in two other key areas.

First, the scientists repeated parts of their experiment using a brain organoid developed from human stem cells and found that the organoid was an effective stand-in for the real thing. Notably, most of the genetic variants detected in the human brain tissue replicated in the cerebral organoid.

“Our organoid compared very well against the human brain,” Ahituv says. “As we expand our work to test more sequences for other neurodevelopmental diseases, we now know that the organoid is a good model for understanding gene regulatory activity.”

Second, by feeding massive amounts of DNA sequence data and gene regulatory activity to a machine learning model, the team was able to train the computer to successfully predict the activity of a given sequence. This type of program can enable “in-silico” experiments that allow researchers to predict the outcomes of experiments before doing them in the lab. This strategy enables scientists to make discoveries faster and using fewer resources, especially when large quantities of biological data are involved.

Sean Whalen, PhD, a senior research scientist in the Pollard Lab at Gladstone and a co-first author of the study, says the team tested the machine learning model using sequences held out from model training to see if it could predict the results already gathered on gene expression activity.

“The model had never seen this data before and was able to make predictions with great accuracy, showing it had learned the general principles for how genes are impacted by noncoding regions of DNA in developing brain cells,” Whalen says. “You can imagine how this could open up a lot of new possibilities in research, even predicting how combinations of variants might function together.”

A New Chapter for Brain Discoveries

The study was completed as part of the PsychENCODE Consortium, which brings together multidisciplinary teams to generate large-scale gene expression and regulatory data from human brains across several major psychiatric disorders and stages of brain development.

Through the consortium’s publication of multiple studies, it seeks to shed light on poorly understood psychiatric conditions, from autism to bipolar disorder, and ultimately jumpstart new treatment approaches.

“Our study contributes to this growing body of knowledge, showing the utility of using human cells, organoids, functional screening methods, and deep learning to investigate regulatory elements and variants involved in human brain development,” says Chengyu Deng, PhD, a postdoctoral researcher at UCSF and a co-first author of the study. 

About the Study

The study, “Massively Parallel Characterization of Regulatory Elements in the Developing Human Cortex,” appears in the May 24, 2024 issue of Science. Authors include: Chengyu Deng, Sean Whalen, Marilyn Steyert, Ryan Ziffra, Pawel Przytycki, Fumitaka Inoue, Daniela Pereira, Davide Capauto, Scott Norton, Flora Vaccarino, PsychENCODE Consortium, Alex Pollen, Tomasz Nowakowski, Nadav Ahituv, and Katherine Pollard.

The work was funded in part by the National Institute of Mental Health, the New York Stem Cell Foundation, the National Human Genome Research Institute, and Coordination for the Improvement of Higher Education Personnel. The data generated was part of the PsychENCODE Consortium.

About Gladstone Institutes

Gladstone Institutes is an independent, nonprofit life science research organization that uses visionary science and technology to overcome disease. Established in 1979, it is located in the epicenter of biomedical and technological innovation, in the Mission Bay neighborhood of San Francisco. Gladstone has created a research model that disrupts how science is done, funds big ideas, and attracts the brightest minds.