A pathbreaking genomic project supports worldwide research and conservation efforts.
A visit to the Houston Zoo is never complete without seeing the Asian elephants. Methai, Tupelo, Shanti, Tess, Thai, Tucker and Duncan are all members of the zoo’s herd. Between the sunbathing, rolling around in water and mealtime antics, the elephants add a touch of whimsy to an afternoon at the zoo.
Of course, Asian elephants are also members of an endangered species. They used to be found in forested regions of India and Southeast Asia. Now, due to the loss of their native habitat, conflicts with humans and a devastating virus, their population is in steep decline. One-third of Asian elephants live in captivity, while the rest live in fragmented pockets of forest. Zoos play an essential role in conservation efforts, both by preserving members of threatened and endangered species as well as raising public awareness. In the coming years, as the world grapples with an extinction crisis on an unprecedented scale, zoos will play an even more critical role.
Just down the road from the Houston Zoo, a very different kind of “zoo” is rapidly growing. Housed in a laboratory high in the towers of the Texas Medical Center, the DNA Zoo is bringing a group of scientists together to offer genomic tools to aid in research and conservation efforts. Instead of a zoo where endangered species make their physical homes, this one is made up of test tubes, DNA samples, lab equipment and algorithms, housed in a laboratory with wall-to-wall whiteboards crammed full of diagrams, notes and equations, and staffed with scientists who are dedicated to sharing their research with anyone who needs it. Their goals are not modest.
“The DNA Zoo has a goal of putting together high-quality genomes for all existing critters,” said Olga Dudchenko, a researcher in Rice University and Baylor College of Medicine’s Center for Genome Architecture. Having a sequenced genome available for a species lays the foundation for researchers and conservationists to do their job. Examples include identifying possible genes involved in disease, monitoring for inbreeding within small populations and tracking individuals in the wild.
“Today, the barrier to entry for doing serious biological work on a species is the need to have a reference genome,” said Erez Lieberman Aiden, director of the DNA Zoo and the Center for Genome Architecture, and assistant professor of computer science and computational and applied mathematics at Rice. “What this [project] means is the barrier for entry just disappears.” After only one year, researchers have already put together the genomes for more than 100 species, with the results and techniques made freely available to the entire conservation and research communities.
According to a recent United Nations report by the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services, 1 million species are headed toward extinction, many within the next few decades. This extinction crisis is driven by several factors, such as the effects of climate change and sea level rise, the loss of habitat due to urbanization and agricultural development, and the increase in pollutants.
“If biodiversity fails, we are all going to fail,” said Parwinder Kaur, an associate professor of genomics and computational biology at the University of Western Australia, and the co-director of DNA Zoo Australia. In order to preserve the biodiversity we have, and to ensure the continued survival of different species, scientists and conservationists will need a variety of tools at their disposal, one of which is having a sequenced genome available for a particular species.
A genome is the sequence of base pairs that acts as an organism’s blueprint. The human genome has more than 3 billion base pairs that are found on 23 chromosomes. Bases come in two paired sets — adenine (A) pairs with thymine (T), and cytosine (C) pairs with guanine (G). The sequence of these bases — the C’s, G’s, A’s and T’s — forms the code for everything needed to make our bodies work, such as how a string of zeros and ones contains all of the information computers need to run and carry out complex tasks.
“Genome assembly is similar to a jigsaw puzzle,” said Dudchenko, “where you examine the individual bits and pieces and look for the overlap, the sequence similarity at the edges, that allow you to stitch them together.”
Putting together a genome works by chopping up DNA into small pieces called reads, figuring out the sequence of bases for each of these reads, then stitching all of the reads together by looking at how the pieces overlap. Dudchenko is adept at using metaphors to describe the challenges inherent in genome assembly. “Imagine you have a region that is all blue sky, where everything looks the same. This is the problem of ‘repeats.’”
Along with Aiden and other collaborators, she has come up with a solution to this puzzle based on Hi-C, a technology originally proposed to study how genomes fold inside the nucleus. Hi-C determines the spatial position of DNA sequences relative to one another in 3D and is usually used to analyze gene regulation. Dudchenko and her team developed algorithms that instead use Hi-C to trace the sequence of chromosomes. In a way, the spatial metadata from Hi-C works like identifying the corner piece of the blue sky section to help assemble the jigsaw puzzle.
An important part of the assembly infrastructure created at the Center for Genome Architecture is the tool called Juicebox Assembly Tools, or JBAT. JBAT works by creating a visualization map that gives information about points of contact between sections of DNA, through which an uninterrupted sequence can be assembled. Watching this in action is like watching a game of Tetris, where all of the different pieces get assembled into a coherent block of text. “I believe in searching for simple solutions,” Dudchenko said. “Humans are visual creatures. Our approach to solving this problem is extremely visual.”
The ability to use the Hi-C method to quickly assemble a genome was demonstrated, to remarkable effect, in 2017 with a Science paper published by the Aiden lab with Dudchenko as the principal author. At the time, the Zika epidemic was in full effect. Researchers were struggling with the lack of a good reference genome for the Aedes aegypti mosquito, which transmits the Zika virus. A draft of the genome had been published, but due to the large number of repeats, the pieces hadn’t been assembled, which was like having a manual in which all of the pages were unbound and unordered.
Using the JBAT algorithm, Dudchenko and collaborators were able to take these sequences and put them in order for the modest price of $10,000, a price point that, with additional refinements, has since dropped to $1,000.
“This is something that cost millions of dollars just a handful of years ago,” Aiden said.
Becoming DNA Zookeepers
As they were developing these tools, Aiden, Dudchenko and the rest of the research group were laying the groundwork for the DNA Zoo, including building up a repository of DNA samples for a wide range of species that could later be sequenced. One of the first organizations to step up was the Houston Zoo. Even before they had developed a low-cost technique, the Houston Zoo was there, willing to give whatever samples they had available to help out future conservation efforts.
“We try to hold the long view,” said Joe Flanagan, a senior veterinarian at the Houston Zoo and one of the early DNA Zoo collaborators. “We want to see wildlife thrive for the long run.”
For years, every time veterinarians at the Houston Zoo took blood samples, usually during the course of periodic checkups or diagnostic tests, the little bit that was left over went to the DNA Zoo, where it was stored, ready for the day when it will be sequenced.
“Our idea was to do this as unobtrusively and as opportunistically as possible,” Dudchenko said.
Since then, more than 50 institutions have joined forces with the DNA Zoo, offering DNA samples and expertise. Collaborators include the Texas State Aquarium, the Duke Lemur Center, the Smithsonian Conservation Biology Institute, the Wildlife Conservation Society and many others. Together, they have been able to curate a collection that contains the DNA of more than 1,000 species.
Given the need for genomic data, as well as the ability for rapidly responding to issues such as the outbreak of new viruses, DNA Zoo collaborators are “democratizing” the tools of genome assembly by making their data open source. “We share the data associated without any restrictions to help move the scientific and conservation communities forward as fast as possible,” Dudchenko said. “We wanted to launch an effort that is fully transparent.”
A Visit to the DNA Zoo
The ongoing work by the DNA Zoo team will aid in biomedical research and conservation efforts for a broad range of species. A few of the species whose genomes have been assembled by the DNA Zoo include Asian elephants, California condors and right whales, which are facing extinction, as well as Aedes aegypti mosquitoes and Madagascar rousettes, which act as reservoirs for new and emerging viruses, or Chinese hamsters, which play a large role in biomedical research.
One of the major threats to Asian elephants, after loss of habitat, is a virus. Elephant endotheliotropic herpesvirus (EEHV) is a highly fatal virus that is responsible for half of the deaths among young Asian elephants in zoos. Using samples collected from the elephants at the Houston Zoo, the DNA Zoo released an assembled genome in January 2020. This information will help researchers tease out the different factors that make elephants so susceptible to EEHV, as well as come up with possible cures or vaccines.
In 1987, the California condor, whose range once extended across much of North America, was declared extinct in the wild with only a few dozen surviving individuals found in captivity. These iconic birds, with wingspans of up to 10 feet and who soar at heights of up to 15,000 feet, are vulnerable to a number of threats, including power lines, environmental toxins and habitat destruction. In the years since, a concerted conservation effort has led to the reintroduction of the California condor back into the wild and an increase in population to a few hundred individuals, although they remain highly vulnerable to environmental factors.
For decades, cells derived from the ovaries of Chinese hamsters have played an important role in the pharmaceutical industry. These cells, which are easy to grow, can be manipulated to produce specific proteins. Such proteins range from anti-cancer drugs to hormone therapies, with Chinese hamster cells acting as the silent partner that makes this production possible. Until recently, researchers were using a draft genome for the Chinese hamster. Thanks to the DNA Zoo, a fully assembled genome is now available.
With only 400 individuals left, the North Atlantic right whale — Eubalaena glacialis — is one of the most endangered whale species. So named because they were the “right” whale for whalers to harvest, hundreds of years of whaling activity has left them teetering on the edge of extinction. Right whales are huge, reaching up to 52 feet in length and 70 tons in weight, but the combination of their ocean habitat and migratory habits makes them hard to study. Almost nothing is known of their social structure or mating habits, which makes coming up with effective conservation strategies especially tough. Previously, the strategy for identifying individuals was to look at the tail, which served as a sort of fingerprint. This approach is laborious and requires getting a good glimpse of the tail, which can be tricky. Now that the DNA Zoo has put together a high-quality genome for the right whale, this can make identifying individuals a lot simpler. All it would take is a small sample of DNA, which could be obtained from the water blown out of a whale’s blowhole, similar to the way humans might spit into a test tube for a genetic test. Having this as a tool will help researchers piece together more information about these creatures.
In the wake of the novel coronavirus (SARS-COv-2, which causes the disease COVID-19), the DNA Zoo was able to assemble and release a genome for the Madagascar rousette. The Madagascar rousette is a small fruit bat, part of the Pteropodidae family, which are natural hosts for many types of viruses. As DNA Zoo team members Aviva Presser Aiden, Cara Brook and Dudchenko wrote in their blog post announcing its release: “[W]e believe that the reference genome for a species in this genus — even one endemic to Madagascar rather than China — could be relevant to studies of the coronavirus and its reservoir.” Studying the original host species as well as related species can offer insight into disease mechanisms.