Bansal, Vikas

Computational methods for analyzing human genetic variation

2008

Bansal, Vikas

Abstract

In the post-genomic era, several large-scale studies that set out to characterize genetic diversity in human populations have significantly changed our understanding of the nature and extent of human genetic variation. The International HapMap Project has genotyped over 3 million Single Nucleotide Polymorphisms (SNPs) in 270 humans from four populations. Several individual genomes have recently been sequenced and thousands of genomes will be available in the near future. In this dissertation, we describe computational methods that utilize these datasets to further enhance our knowledge of the fine-scale structure of human genetic variation. These methods employ a variety of computational techniques and are applicable to organisms other than human. Meiotic recombination represents a fundamental mechanism for generating genetic diversity by shuffling of chromosomes. There is great interest in understanding the non-random distribution of recombination events across the human genome. We describe combinatorial methods for counting historical recombination events using population data. We demonstrate that regions with increased density of recombination events correspond to regions identified as recombination hotspots using experimental techniques. In recent years, large scale structural variants such as deletions, insertions, duplications and inversions of DNA segments have been revealed to be much more frequent than previously thought. High-throughput genome-scanning techniques have enabled the discovery of hundreds of such variants but are unable to detect balanced structural changes such as inversions. We describe a statistical method to detect large inversions using whole genome SNP population data. Using the HapMap data, we identify several known and putative inversion polymorphisms. In the final part of this thesis, we tackle the haplotype assembly problem. High-throughput genotyping methods probe SNPs individually and are unable to provide information about haplotypes: the combination of alleles at SNPs on a single chromosome. We describe Markov chain Monte Carlo (MCMC) and combinatorial algorithms for reconstructing the two haplotypes for an individual using whole genome sequence data. These algorithms are based on computing cuts in graphs derived from the sequenced reads. We analyze the convergence properties of the Markov chain underlying our MCMC algorithm. We apply these methods to assemble highly accurate haplotypes for a recently sequenced human

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego

Computational methods for analyzing human genetic variation