Unraveling Genetic Mysteries

Unraveling the Mysteries of Genetics with Technology and Teamwork

Today, barely a decade after the first sequencing of the human genome, Einstein researchers are unlocking new knowledge about how genes affect human health. Much of that work relies on state-of-the-art scientific and technical resources, including the installation of the first segment of an easily expandable supercomputer that broadens the ability of researchers at the College of Medicine to organize and analyze complex genetic information.

Leo, the supercomputer
Leo, the supercomputer
The supercomputer – nicknamed "Leo" after Leo Szilard, a close friend and colleague of Albert Einstein – will provide the department of genetics with 4 terabytes (4 trillion bytes) of shareable random-access memory (RAM) for computation, as well as new capabilities for graphic processing of information. Because Leo is highly modular, fellow investigators can seamlessly add their own memory or CPU units. The result: faster analysis of research results and the development of software to explore scientific questions. But Leo is just one resource Einstein is using to drive genetic research and to harness the increasingly vast amounts of data it creates.

Since 2008, when it was reorganized under the leadership of chair Dr. Jan Vijg, Einstein's department of genetics has advanced knowledge through its divisions dedicated to molecular, translational, and computational genetics. The department also is affiliated with a center of excellence focused on epigenomics – the emerging science of how genomic changes other than those in the DNA sequence  – can affect gene expression and lead to disease. The Einstein Center for Epigenomics gives investigators an end-to-end solution for this research.

The center includes a shared facility for conducting molecular assays, and performs computation and analysis of the information from these experiments. In its first three years, the center has supported nearly $33 million in epigenomics-focused grant awards to Einstein investigators.

John Greally, M.D., Ph.D.
John Greally, M.D., Ph.D.
Dr. John Greally serves as director of the center, and also is chief of the department's division of computational genetics. He calls this dual role "synergistic," since it exemplifies the integration between computer science and genetic research.  "What we're doing with the machines is basically taking very disparate types of measurements – clinical, genetic, epigenetic – and trying to pull them in, synthesize them, in order to find patterns that link the data that's generated at the bench to the patient," said Dr. Greally, a physician practicing clinical genetics who also holds a Ph.D. in microbiology.

To accomplish this synthesis, Einstein also "synthesizes" diverse areas of expertise. With the assistance of new associate professor Aaron Golden, who earned his doctorate in astrophysics from the National University of Ireland, the center is able to rapidly process information. Dealing with the enormous quantities of high-throughput data in astrophysics helped prepare him for his current work in computational biology. He noted, "The challenges of representing and analyzing data about billions of stars and galaxies are similar to the challenges of representing and analyzing data about billions of epigenomic changes. These types of problems in astrophysics are really a proxy for similar problems currently facing genetics."

Another member of the team and recent addition to the Einstein community is Joseph Hargitai, who directs Einstein's high-performance computing (HPC) infrastructure. Mr. Hargitai came to Einstein from New York University, where he held a similar position.

Aaron Golden, Ph.D. and Joseph Hargitai prepare Leo, the supercomputer, to run some computations
Aaron Golden, Ph.D. and Joseph Hargitai prepare Leo, the supercomputer, to run some computations
"Einstein's computational genomics group was an early adopter of large-memory computation," he said, "and the acquisition [of Leo] takes us to this new plateau."

In addition to computation, the HPC group at Einstein also manages 800 terabytes of data storage – an amount that is expected to grow soon to 1,000 terabytes (a petabyte) of capacity.

"A particular component that may distinguish our HPC offering is the uniquely large set of internal and external applications that we have," added Mr. Hargitai. "I attribute this richness to the inquisitive and prolific nature of our investigators. Leo, like all computational instruments, is just a tool. It is up to the ingenuity and imagination of Einstein researchers to bring its capabilities alive."

Thanks in part to this strength in computation and data analysis, genetic research at Einstein encompasses a wide variety of disease categories. For example, the epigenomics center supports studies in areas such as cancer, diabetes, aging, neuroscience, and infectious disease.

In a multidisciplinary collaboration, Dr. Francine Einstein, an associate professor in obstetrics & gynecology and women's health, successfully competed for a National Institutes of Health Roadmap grant to investigate the relationship between abnormal fetal growth and epigenetic changes. The study, which involves infants born in the Bronx who have abnormally high and low birth weights, may someday identify those infants at increased risk for age-related diseases, such as type 2 diabetes, later in life.

"The computational component can be a real bottleneck for this type of research. We are very fortunate to have a user-friendly interface to a system that performs the primary analysis of raw data and provides the results in an interpretable form. This is a huge barrier that investigators at other institutions face. The Center for Epigenomics has a futuristic vision in striving to help all investigators use new technologies, even those without epigenetics expertise," she explained.

Francine Einstein, M.D.
Francine Einstein, M.D.
Collaborators from outside Einstein recognize the center's unique structure and formidable resources, as well. Dr. Bradley Bernstein, a senior associate member of the Broad Institute and associate professor at Harvard Medical School noted, "The computational infrastructure is an important asset of the center, which is augmented by innovative computational solutions for integrating genomic information.

He continued, "And the center bridges complementary areas of excellence at Einstein, in particular computation, epigenomics, and human genetics. Given the shift in biomedical science towards quantitative, multi-disciplinary approaches, this combination is a winning formula."

Dr. Eric Nestler, professor and chair of neuroscience at Mount Sinai School of Medicine, agreed, noting, "Einstein's Center for Epigenomics is well-recognized nationally, especially for the computational and analytic capabilities it's bringing to the field, and for employing those capabilities in innovative and collaborative ways to answer key scientific questions. As we learn more about the epigenomic roots of disease, Einstein's strengths are increasingly important to the next generation of medicine."

Einstein's computation abilities are further empowering genetic research through the center's Wiki-based Automated Sequence Processor (WASP), a web-based system that captures sequencing information and automatically allows analytical programs to be run, based on the sample submission, the type of assay, and other data, without the need for manual work by the end user. The results are then presented to the user in a visual, intuitive format. In the hope of promoting collaborative studies, Dr. Greally and his team plan to share the WASP software with other research institutions.

"We recognize that the bigger challenges in genetics have to be solved collectively," said Dr. Greally. "That emphasis on collaboration is one of the hallmarks of research at Einstein."

Eventually, this integrated, automated, easily understood approach to genetic analysis may extend to clinicians. Dr. Greally sees the possibility of someday having WASP as the foundation for a system that integrates molecular and clinical information and presents the results to the clinician in a visually-intuitive way, for which his group is taking inspiration from the Bloomberg terminal for financial data, and the Virtual Observatory for the astrophysics community. This would represent the ultimate form of "personalized medicine," a term popularized in the wake of the first genome sequencing.

"Our vision is that clinicians with stethoscopes around their necks will be able to analyze very complex, high-throughput data," he explained. "They can then explore what there is that makes their patient special or translate a gut feeling about a patient into the molecular data that will provide the answers they need for identifying better medical treatment."

Posted on: Wednesday, February 29, 2012