David Haussler

David Haussler’s research lies at the interface of mathematics, computer science, and molecular biology. He develops new statistical and algorithmic methods to explore the molecular function and evolution of the human genome, integrating cross-species comparative and high-throughput genomics data to study gene structure, function, and regulation. He is credited with pioneering the use of hidden Markov models (HMMs), stochastic context-free grammars, and the discriminative kernel method for analyzing DNA, RNA, and protein sequences. He was the first to apply the latter methods to the genome-wide search for gene expression biomarkers in cancer, now a major effort of his laboratory.

As a collaborator on the international Human Genome Project, his team posted the first publicly available computational assembly of the human genome sequence on the Internet on July 7, 2000. Following this, his team developed the UCSC Genome Browser, a web-based tool that is used extensively in biomedical research and serves as the platform for several large-scale genomics projects, including NHGRI’s ENCODE project to use omics methods to explore the function of every base in the human genome, NIH’s Mammalian Gene Collection, NHGRI’s 1000 genomes project to explore human genetic variation, and NCI’s Cancer Genome Atlas (TCGA) project to explore the genomic changes in cancer.

His group’s informatics work on cancer genomics, including the UCSC Cancer Genomics Browser, provides a complete analysis pipeline from raw DNA reads through the detection and interpretation of mutations and altered gene expression in tumor samples. His group collaborates with researchers at medical centers nationally, including members of the Stand Up To Cancer “Dream Teams” and the Cancer Genome Atlas, to discover molecular causes of cancer and pioneer a new personalized, genomics-based approach to cancer treatment.

The UCSC Cancer Genomics Hub (CGHub), a product of the Haussler lab, is a secure repository for storing, cataloging, and accessing cancer genome sequences, alignments, and mutation information for 25 cancer types from TCGA, the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project, and other related projects. The current planned capacity of this data center is five petabytes. The CGHub will serve as a platform to aggregate other large-scale cancer genomics information, growing to provide the statistical power to attack the complexity of cancer.

He co-founded the Genome 10K Project to assemble a genomic zoo—a collection of DNA sequences representing the genomes of 10,000 vertebrate species—to capture genetic diversity as a resource for the life sciences and for worldwide conservation efforts.

Haussler co-founded and co-chairs the Data Working Group of the Global Alliance for Genomics and Health, through which research, health care, and disease advocacy organizations that have taken the first steps to standardize and enable secure sharing of genomic and clinical data.

Haussler received his PhD in computer science from the University of Colorado at Boulder. He is a member of the National Academy of Sciences and the American Academy of Arts and Sciences and a fellow of AAAS and AAAI. He has won a number of awards, including the 2015 Dan David Prize, the 2011 Weldon Memorial Prize from University of Oxford, the 2009 ASHG Curt Stern Award in Human Genetics, the 2008 Senior Scientist Accomplishment Award from the International Society for Computational Biology, the 2005 Dickson Prize for Science from Carnegie Mellon University, and the 2003 ACM/AAAI Allen Newell Award in Artificial Intelligence.