June 2023
Learn how faculty in UCLA Life Sciences are leading the big data revolution, in this latest UCLA College Magazine article “Faces of Big Data.”
The full article illustrates how big data are being used and studied across multiple disciplines within the UCLA College of Letters and Science.
Here, below, is an excerpt that highlights how UCLA Life Sciences is leading how we use big data to advance the fields of biology and medicine– and how we train the next generation of life scientists.
BENCH SCIENCE AND BEYOND
In the world of medical and biological research, likewise, there has been an overwhelming transformation of laboratory practices thanks to the advent of big data collection and sharing. Increasingly, biology research is done by analyzing publicly available data sets measured and deposited by any of the thousands of laboratories worldwide, says Alexander Hoffmann, professor of microbiology and immunology and founding director of the Institute for Quantitative and Computational Biosciences at UCLA.
“There is an increasing number of biologists who have never trained to hold a pipette, grow cells or stand at the lab bench,” he adds. “But they are trained in computational algorithms and workflows, and they have biological knowledge. That’s a huge shift in life sciences — we now have dry-lab scientists, in addition to the traditional wet-lab scientists.”
Dry-lab science can indeed have a huge impact in the real world. Jingyi Jessica Li, professor of statistics, biostatistics, human genetics and computational medicine at UCLA, does work illustrative of the complex path of such lifesaving medical research in the era of big data. Li’s expertise lies neither in the gathering of data from experiments nor the application of findings to medicines and therapies, but in the in-between step of deciding which algorithms are best for parsing which of the enormous data sets now available to researchers.
“My role in this whole long process is to ensure that the analysis is rigorous,” says Li, “so we can give a proper confidence level to the findings we observe so that we are not overly optimistic, or we don’t miss important findings.”
Recently, Li published a study that may revolutionize the way differential gene expression is examined. When scientists want to determine which genes are expressed differently by healthy and sick patients in the case of, for example, liver disease, they need a statistical algorithm to help them flag gene expressions worthy of further study.
Until recently, they’ve relied heavily on a statistical measure known as p-value, whose calculation, however, can be mysterious, dubious and error-prone for non-statisticians. Li’s research shows that methods relying on ill-posed p-values are often deeply flawed, turn up false discoveries or miss relevant genes. Li has designed a statistical framework, known as “Clipper,” that allows users to find differentially expressed genes using a new concept called contrast scores, which can be flexibly constructed using properly set up (experimental or in silico) negative control data, without relying on p-values.
Navigating these complexities in ways that are mutually intelligible to researchers working separately in dry labs around the globe is the path to achieving real medical breakthroughs in the big data era.
“How to distinguish signals from noise is the grand challenge in my field,” Li says. “Statistical modeling offers a way to make data analysis more transparent and interpretable.”
How to distinguish signals from noise is the grand challenge in my field.
—JINGYI JESSICA LI
TRAINING THE NEXT GENERATION
Li, with her research team, makes use of UCLA’s leadership in the field of next-generation sequencing, a big data method for determining the sequences of DNA and RNA, often for research into genetic conditions and diseases. According to Hoffmann, UCLA has become a beacon for NGS research in part because the university excels in training young scientists in the analysis method. This is thanks largely to two projects Hoffmann oversees: the Collaboratory and the Bruins-In-Genomics (B.I.G.) Summer Research Program.
Led by molecular, cell and developmental biology professor Matteo Pellegrini and housed in UCLA’s Institute for Quantitative and Computational Biosciences, the QCBio Collaboratory is a postdoctoral training program but also so much more.
“A broad UCLA community of scientists learn from the postdocs how to handle big data, how to analyze it, what the computational workflows are that are state of the art,” Hoffmann says. “And when they are done taking the workshops and they apply their newfound skills to their data, they can engage the postdocs in a collaborative way for expert consulting.”
This commitment to training has paid off in a big way for researchers at UCLA.
“The Collaboratory was initiated over 10 years ago, and it’s had a tremendous impact,” Hoffmann adds. “It’s a key reason why UCLA has adopted NGS and other big data measurement approaches very, very rapidly, whereas many researchers in the field at other institutions have these data sets lying around that nobody knows what to do with. The Collaboratory has really been phenomenal in removing the bottleneck for analysis.”
For UCLA, however, leadership in this field so crucial to future medical breakthroughs isn’t about leaving other institutions in the dust. It’s about sharing knowledge and skills to empower a diverse rising generation of scientists. That’s the idea behind B.I.G. Summer, which is an eight-week summer institute in quantitative and computational biosciences that is open to applicants from UCLA and other institutions, often from underrepresented backgrounds. Successful applicants get free tuition and a living stipend to spend their summer learning and working on bioscience datasets.
“While we are pretty advanced at UCLA,” Hoffmann says, “there are also lots of students, lots of talent, in other institutions that are still in the process of making that transformation.”
When the next great breakthrough in curing genetic disease occurs, it will surprise no one if it happens at UCLA. But it also may well happen thanks to non-UCLA scientists who trained here for a postdoctoral year, or for a summer after graduating from a college in their home city. It may happen thanks to an applied biologist who used Li’s algorithms — perhaps without Li or her colleagues even knowing it.
Such is the bold new universe of collaborative knowledge creation available thanks to big data, which is transforming so many aspects of scientific progress and of our lives. Will big data turn out to be too much for us to handle? Not at the UCLA College, where, when the data gets big, so do the solutions.
Read the full article here: UCLA College Magazine “Faces of Big Data”
The QCBio Collaboratory was initiated over 10 years ago, and it’s had a tremendous impact. —ALEXANDER HOFFMANN