HomeArticlesGenome Sequencing for Pathogen Discovery – Joseph DeRisi (UCSF, HHMI)
Genome Sequencing for Pathogen Discovery – Joseph DeRisi (UCSF, HHMI)
September 9, 2019
Hi. My name is Joseph DeRisi. I’m a professor at UCSF in the Department of Biochemistry and Biophysics, and I’m a Howard Hughes Medical Investigator. The subject of today’s talk is Genome Sequencing for Pathogen Discovery. So, I’d like to start with a little mystery. This is obviously a snake, but it’s tied in a knot. You usually don’t see snakes tied in a knot. And when a snake is… appears tied in a knot like this, it means that there’s really something very wrong with this snake. Now, to add to the mystery, I’ll tell you one more piece of information. This phenomena, where this snake is doing inappropriate things, is actually transmissible. That is, one snake can infect another snake and it’ll exhibit the same kind of odd behaviors. And it’s more than just tying itself into knots. This snake is actually suffering from a wasting disease and it will ultimately be fatal. Now, the question is, why would you care about a snake disease? Who cares if snakes tie themselves in knots and die? Well, it turns out that all of disease and pathogens are intimately linked across species and one pathogen may be transmitted from species to species. Think about SARS or HIV or Ebola. They’re all so-called zoonotic infections originally. That is, they came from a nonhuman species originally, before they were in people. And so we ignore veterinary medicine at our peril. And actually, by studying strange diseases like this, we stand to learn about viruses or pathogens or other dangers that may impinge upon human health. Now, in this case, the disease this particular snake is suffering from is called inclusion body disease. For a while, it was a very mysterious disease. It was typified by these large intracytoplasmic inclusions. That’s these large pink dots inside the cells. Those shouldn’t be there. They’re basically masses of protein of unknown origin. And what happens to these snakes is they develop a lot of neurological abnormalities. They have… start behaving very strangely. They’ll stop eating. They waste away. It’s almost always fatal. And more importantly, it’s transmissible. So, here is a medical mystery in which it was a transmissible neurological disease of unknown cause causing strange behaviors. And the question is, what causes such a disease? And would it have any relevance to human health at all? So, let me just show you what a snake infected with inclusion body disease looks like. Here’s a video from a veterinarian that posted on YouTube with a snake with inclusion body disease. And what you’ll see here is the snake exhibiting signs of what’s called stargazing. That is, the snake is… is… is basically doing uncontrolled head movements, sometimes locking up in positions of rigidity or on their backs for long periods of time. Sometimes they’ll be… roll onto their backs and not be able to roll over. This is called “failure to recover from dorsal recumbency” in the business. And so, when… when I go about looking for new projects to research, I look for diseases like this — things that haven’t been solved before, that are potentially very interesting, that could reveal some new infectious agent. And so the question becomes, very quickly… if you want to find a new pathogen and you don’t know exactly what you’re looking for, all you have is maybe a snake that ties itself in knots, how exactly would you go about doing that? Well, traditionally we thought of species as different things. You know, there’s… there’s the human and it could be infected by a virus. There are bacteriophages that could be infected by… that will infect other bacteria. There’s bacteria. There’s fungus. All sorts of different things. And one way to go about finding a pathogen is to try to culture one of these things in a dish or grow it on cells, or use some sort of specialized reagent like an antibody. But all those techniques require bias. That is, you have some guess of what might be in there. Maybe you have an antibody to this and you’re gonna ask, is this thing here? Or maybe you have a reagent where you could grow this. But if you don’t know what’s there, you’re just guessing. And you might guess wrong. Well, recently, there’s been a lot of technological advances in the area of DNA sequencing. And these advances have actually resulted in our ability to change how we this picture. So, instead of viewing it as different species, where you look at each thing sort of uniquely, we actually look at everything kind of like the matrix, where everything is sequence. That is, don’t distinguish between what’s in a given sample or try to purify out one thing or another and try to grow it. Instead, just read the DNA or RNA sequence of everything in a particular sample. Because virtually every pathogen has RNA or DNA associated with it. It has its own genome. And so, with these recent advances technology that I’ll tell you about in a minute, you can literally take a sample and read everything that’s in there without preconceived biases or assumptions about what you might find. So, what is this new technology? Some people refer to it as next-generation sequencing or ultra-deep sequencing. I’m going to show you how one flavor of this technology works and how it’s changing how we do medical diagnostics and, in my case, how we look for pathogens that might cause disease. Alright. I’m gonna call it ultra-deep sequencing. I’m going to show you one variant of this technology that’s sold by Illumina. There are many different flavors of it. We’ll just go through one. So, the main technology is the ability to sequence millions of pieces of DNA sequence, rather than just one at a time. For a long time, we would sequence one at a time, maybe 8 at a time. If you had a very expensive machine, you could sequence 96 at a time. But these new machines are actually capable of sequencing literally millions of sequences all at once. How does that exactly work? Let me walk you through it. Okay. It would start off with the DNA or RNA in question. We don’t know what it is. We don’t know how big it is. We don’t know what’s in it. It doesn’t matter. The next step of the technology is just to shatter that DNA into millions of pieces, randomly. You can use a sonicator, one could use an enzyme, it doesn’t really matter. The way it works is you just shatter it into a million pieces. The next part of the technology is to just add little bits of DNA called adapters on the ends. That allows us to capture these puzzle pieces, these fragments of the DNA. The next step after that is to actually flow these little pieces of DNA randomly across… across a glass surface. So, they land virtually anywhere. And so, if we looked at it from the side, you’d see a piece of DNA stuck to a piece of glass by virtue of these little adapter sequences. Now, I’m glossing over some of the more nuanced details of this technology just to give you the big flavor of it. Now, once these fragments have bound to the flow so in the glass, the next step is to actually begin the sequencing. And the way that works is reversible dye terminators. That is, the nucleotides of DNA — A, C, G, and T — are put into the reaction, but these are special versions of the bases A, C, G, and T. Instead, these have fluorescent dyes, one for each base. The red might be an A, or the blue might be a C, and so on. Also a primer is put in. What’s a primer? The primer is the part that actually binds to the adapter and begins the sequencing reaction. It’s a synthetic piece of DNA. So, after the primer and enzyme is put in, the DNA polymerase, one of these nucleotides, one nucleotide at a time, is inserted on each strand, after the primer. And because they are so-called dye terminators they are not able to extend any more bases than once. Only one nucleotide is put on. And then what happens is the flow cell is imaged. So, you get a picture of where all these little fluorescent dyes are sitting on the glass surface. Now, this is important. After… after imaging that, we go for another cycle and put in another base, because the dye terminators are reversible, allowing us to put in additional dye terminators at every cycle. So, now we have cycle 2, showing the image of where those dyes were put in on the glass surface Reverse the dye terminators, put in cycle 3, and now we have another image. So, what happens is we have a series of images representing one base of extension on a new… on a DNA template at a time. And if you look at the colored dots, you can actually read the sequence. It goes blue, orange, red, blue. Since each color corresponds to one base of the DNA, we can read the sequence, CGACTA, and so on. And that happens for as long as the enzymes and nucleotides will work in that sample. Usually these sequences are anywhere from 60 bases to 250 bases long at a time, during sequencing. Now, the way I showed it here was just a sparse… several dots on a glass surface. That’s the cartoon version. What does the real data look like? How many dots are being sequenced at a time, each dot representing a unique DNA template? Well, it’s millions. This is just one fragment of the total data set. And, as you can see here, the dots are really crammed in there. And by virtue of really advanced computer imaging technology and image recognition, one can actually follow the sequence of the colors of each of these dots and recognize them as different templates. So, this is really a quantum leap in how we sequence. Instead of 1 or 96 at a time, you can sequence a hundred million at a time, a billion at a time. The sky’s the limit. And actually, the technology keeps getting better every single year. Alright. So, this major quantum shift in the way that we sequence DNA allows us to get back to our problem of what tied that snake into a knot. So, let’s go back to that. How would we go about this? I’m not gonna have any preconceptions or biases as to what is making this snake ill. So, the way it would work is we would take tissues from snakes that were sick. We’d also take tissues from snakes that were not sick — that’s an important control. We would isolate the RNA from these. You could isolate DNA too, but I’d… I’d note that there are many pathogens that only have RNA. And any DNA pathogen makes RNA. So, it’s actually more fruitful for us to isolate RNA. We can convert that into a library, put it on one of these fancy new next-generation sequences, and then read out the result. Now, for this particular experiment I’m gonna tell you about, we did about 200 million reads of 100 nucleotides each. That equates to about 6 million sequences for each of the tissues of the snake. Okay. Now that we’ve done that, there’s a serious problem here. And the problem is, how do you know what’s a pathogen and how do you know what’s the snake? Well, the way that we do this normally, if we did it in the human genome, since we know what the human genome is, is that we would compare all the sequences to the human genome and remove those that are human. And then, what we’re left with is the stuff that might be pathogen, because it’s not human. And so, depicted in graphical form, we might have 100% of our sequences to start off with and then, as we begin removing human sequence and things that match different human RNA transcripts, low-complexity sequence, quality control, and so on, we end up with a very small amount of sequence that’s not human at the very end. And that’s the stuff we can look at closer. Well, when we started this project, we actually didn’t have the boa constrictor genome from which to compare. Luckily, our friends at the California Academy of Sciences helped out. We were able to take a blood sample from this very nice snake, here, named Balthazar. And, via a collaboration with Illumina and the Assemblathon 2 Consortium, were able to sequence the… the genome of this particular snake. And then we used that as a reference to then remove all the snake sequence, leaving us with a pool of sequences that are potentially from the pathogen. So, that’s not the end of the story because there’s still a pretty major problem. And the major problem is, you’ve got a lot of all these little sequence fragments, maybe a hundred million of these or… or less, and you don’t know how they go together. How do you know what’s a pathogen what’s not, even if you know it’s not snake, or host genome sequence? And the goal, here, is that you’ve got to put them together. So, like a jigsaw puzzle, one can compare the DNA sequences and actually assemble them. And so, if you had a sequence of DNA like the one I’m showing you here, and then you had another DNA sequence like shown on these lower rows down here, you could overlap them to see where the sequences match. And in… by doing so, you can tile across, thus forming a larger piece of sequence. In the business, we call these contigs. And so this is called sequence assembly. Now, if you did this by hand, you’d go crazy, and it would take you a hundred years to do it or longer, because there are millions of these sequences. Luckily, there’s a lot of computer algorithms for doing this. And, in fact, one of the software packages for doing this, an algorithm called PRICE, was written in my lab by a talented postdoc, Graham Ruby. And what he did is actually just wrote a computer program that took all the pieces… takes one of the pieces of DNA and starts comparing the ends. And then starts building out from one sequence, outwards, by comparing the ends and assembling on little pieces. And then, finally, because the sequences actually have two end sequences — every template gets both ends sequenced — you can use the other end of the sequence to figure out what other pieces in the data it matches. And by doing this, iteratively join sequences together to make longer pieces. It’s sort of a clever “divide-and-conquer” way of doing assembly. And there are many other algorithms for doing this too. This one’s particularly well-suited for sequencing viruses, in this particular case where there’s a lot of sequence we don’t care about. Okay, so we’ve got the software tools, we’ve got the technology, we’ve got the snake tissues of diseased snakes, we’ve added them to the sequencer, and so after we remove the host genome, we assemble what’s remaining left in the sequence. The question is, what did we find? Is there anything there? And, in the case of these snakes that tie themselves in knots and wave their heads in the sky and ultimately die of this terrible disease, we did find something. And what we found was two RNA fragments that form just four genes that are totally diagnostic for arenavirus. That is, we found an arenavirus genome, which consists of two parts, in these snakes. Now, I’ll tell you more about the arenavirus genome in a minute. This is a diagram of it. It’s just four genes. What are arenaviruses? Here’s a… a cartoon diagram of what one of those viruses might look like. They’re negative strand RNA viruses. They have a rodent reservoir. So, all known arenaviruses, prior to this discovery, were in rodents. And they can cause severe disease by transmitting it to humans. They cause cytoplasmic inclusions — those are the big pink dots we saw inside the cells. And they include the largest family of human hemorrhagic fever viruses that we know of. Lassa fever, for example, kills about 5,000 people a year in West Africa, and is spread by rodent feces. So, this is interesting. And I’ll tell you more about arenaviruses in a second. Just a little detail on the genome, first. Like I said, it’s four genes. What are those four genes? The L gene, the large gene, is the polymerase. That’s the red dot inside the capsid, there. That’s the… that’s the… the protein that makes copies of the genome and replicates it for the next cycle of infection. The Z gene, or matrix. The forms sort of the inside shell structure of the virus, here, those sort of orangish-brown particles. The N gene, or nucleocapsid, that’s thought to protect the RNA from degradation or attack from the host. That’s shown here as the green bits coating the genome. And then, glycoprotein, which actually is cleaved into two parts. That’s the outside, those little spikes on the outside of the virus, embedded in this lipid membrane that the virus steals from the host. Alright. So, this is a very simple virus with only four genes. And the question is, how… what’s going on with this particular virus with respect to snakes? Now, the known family of arenaviruses is really divided into two: Old World arenaviruses, like those in Africa; and New World arenaviruses, like those in North America and South America. And, like I said, all of these are rodent viruses. So, what’s the deal here? Is this just a rodent virus that a snake ate? Because, clearly, snakes eat rodents. That’s part of their diet. And one can easily imagine that a snake just consumed the virus from a rodent. And this would… this thing that we found might just be an Old World or a New World arenaviruses. One can look at that by comparing the sequence to all known Old and New World arenaviruses by doing a phylogenetic comparison. That is, how different is this genome versus the other genomes? That’s shown here for the two proteins L and NP, the nucleoprotein and the polymerase. The two red ones are our snake viruses, Golden Gate virus and CASV virus. And here are the Old World and the New World. These groups of lines tell you how related they are. The smaller the bars on this dendrogram, the more related two sequences are. The longer the bars, like down here, show that they’re more distant from each other. And what this diagram is showing, in graphical terms, is that the snake viruses are extremely different than the Old World or the New World arenaviruses. They’re their own separate family. And so, immediately, we can make the hypothesis that these are indeed viruses endog… that are belonging to snakes and have nothing to do with the known rodent virus. It wasn’t the simple case that the snake ate a rodent that happened to have one of the viruses we already know about. So, that’s interesting, because this would mean that arenaviruses have a whole ‘nother host range than the rodents which we previously appreciated. Okay. Now, there’s one more little bit of information. What about the glycoprotein? Now, that’s those outside spikes on the outside of the virus. Those help determine which cells the virus may infect. And so we can compare it to the Old or New World, but it doesn’t even compare at all. There’s no comparison. It’s not even related at all. And, in fact, what we found out is that our snake viruses, shown in red in here, are actually more closely related to the filoviruses. Now, filoviruses are famous because Ebola is a member of the filoviruses. This is particularly interesting because it was thought that, maybe, Ebola and arenavirus had a common ancestor in ancient history that they evolved out of. And here, in reptiles, an ancient… an ancient, you know, set of animals, we find evidence for arenavirus and Ebola-like sequences in a single virus. And so this speaks to, perhaps, a more interesting evolutionary history of these viruses than anyone had rec… had… had recently suspected. Now, I also mentioned that my colleague, Jonathan Lai, and his colleagues, actually crystallized the glycoprotein from these snake viruses, and was able to show that, indeed, they are structurally more similar to the filoviruses, including Ebola, than any of the arenaviruses. So, this is truly something different than what’s been seen before. Alright. So, one of the aspects that I’ve left out, though, is, how do you know that the thing you found actually caused the disease you said it did? Now, that is a tricky issue. And there’s these so-called Koch’s postulates, which says that, basically, if you want to prove that something caused a disease, you actually have to isolate it in pure culture and then infect a healthy animal with that virus. Now, that’s a challenge. Because the first thing is, how do you grow a virus that’s new you don’t know what it is? That… that could actually be a project in and of itself. And, of course, what we’d like to do is show that this virus is making those inclusions, the hallmark of the disease that’s diagnostic for it. And then, finally, if we’re able to do all those things, we’d like to investigate the possibility of an experimental challenge: we actually infect a healthy snake with the virus to see if it recapitulates the disease. That’s the kind of proof that’s required in this business. Alright. So, if we want to grow the virus, we’re gonna need snake cells. Where do you get snake cells? Luckily, we have collaborations with veterinarians, and here’s Dr. Chris Sanders about to do surgery on a snake named Juliet. Now, we were able to take cells out of Juliet and put them in culture. And, much to our surprise, these cells from her kidney were able to grow in a dish with media like… like it was no problem. And this is great, because that means we can make lots of cells and passage them over and over again, and create a large supply of cells that we can use for infection experiments. Now, do these cells support replication of the virus? In fact, they do. So, we were able to put the virus on these cells and show that the virus is able to rapidly replicate in these boa constrictor cell… cell line. And this is the work of Mark Stenglein, a postdoc in my lab. And what he showed, very clearly, is not only does the virus replicate in the cells, but if we then make an antibody to the virus — that’s a protein reagent that allows us to, you know, see the virus in microscopy — we were able, then, to localize the virus to the cells of infected tissues from these snakes with the disease. And it localizes to the inclusion bodies — those big orange globules that were inside the cell. So, if we’ve established that the virus is in the snakes that were sick, we’ve never found the virus in snakes that were healthy, that lacked the inclusions, the virus localizes to the inclusions, and then, finally, what we’d like to know is, would it create disease in a healthy animal? And so that’s an experimental challenge. And to do that what, basically, we need to do is set up an isolation chamber, so that the animal and question would have no contact with any other biologic, and then we’d take pure virus from isolated culture and inject it into a snake. And so, in collaborations with the UC Davis veterinary school, we are embarking on such experiments. And they’re necessary to be able to prove whether the virus does indeed cause the disease we say it did. All the evidence points in that direction, but this would be the ultimate proof of that. And would allow us to basically erase any doubt of whether the virus is just a passerby or an innocuous finding in these snakes. Alright. So, what does this mean? What can we learn from the work we’ve done so far? So, first of all, I mentioned the evolutionary history. That is, these snakes have protein in them that is more similar to a different family, filoviruses, the Ebola viruses. And the question is, in evolution, how did those two viruses evolve away from each other? Ebola virus is in the news right now, there’s an outbreak in Africa. And the question is, are there reservoirs of these viruses, or the genetic material that comprises these viruses, that was hidden? And I would say that, in the case of arenavirus, that is certainly the case. We thought all arenavirus are in rodents, but when, in fact, there’s a large genetic diversity of viruses in the reptile population belonging to this important family that causes human hemorrhagic disease in humans. We can also learn about the disease pathology. Since we can grow the virus and we can put it into animals, one can learn about how the virus is acting on the cells. How does it subvert the host defenses? How does it create disease in these animals? Why do they exhibit neurological symptoms? What is happening to break down the central nervous system in these infected animals? And what is the course of the disease? How is the disease spread from one snake to another? And what can that tell us about the arenaviruses that are in rodents, that ultimately can be transmitted to humans? The other aspect of this that could be unique is that this virus may have interesting features not seen in other viruses. For example, these viruses appear to have the ability to reassort and recombine at high frequency. That’s work that we’re currently doing and we’d like to understand how these viruses are generating such diversity. The other aspect of this is it could be a model for arenaviruses. Since the snake virus does not infect humans, we can easily handle it safely in the lab. And, by doing so, we can use that as a model to screen for therapeutic drugs, compounds, or understand how the arenaviruses are subverting the immune system to escape detection. All of these are new avenues of research that wouldn’t have been available if we didn’t know what was causing this disease in the first place. Alright. So, I want to go back to my original premise, where, in the past, we viewed organisms as discrete entities. And, in the case of disease-causing organisms, what I’m… what I’m… would like to convince you of is that, by viewing everything as a continuum of sequence, and having an unbiased approach without assumptions, one can actually dig into diseases that were previously mysterious and understand them at a level that is literally unprecedented even ten or twenty years ago. And by using technology, we should be able to make diagnostics faster/cheaper/better, and to erase the… the scourge of unknown infectious diseases. Every disease that is transmissible should have a cause and should be knowable by us using these kinds of technologies. And with that, I’d like to thank you.