Brian Naughton, Expert: 23andMe Sample Processing – Tales from the Genome


Hi, I’m Brian Naughton. I’m, I’m a scientist at 23andMe and I’m from Dublin, Ireland.>>So after dropping the DNA sample in the mail, we don’t see anything until we get to our results. what happens to that sample once we drop it in the mail? Where does it go? Where does it end up? So your sample actually gets shipped to our lab in Los Angeles where it gets processed. There are four major components to that. It starts with DNA extraction, then amplicfication of the DNA, then hybridization, and finally analysis of the data to make sure it’s good.>>Okay, great about how long does that process take to do all those steps?>>So if you take all of the steps, and you did them end to end, it would probably take three days or so.>>Mm-hm.>>A lot of the process is just waiting for the equipment to become available because of volume.>>I see. What, of all those sort of four processes, which of those is the most time intensive of all of them?>>The most time-intensive is the hybridization stuff, which happens overnight. Amplification also happens overnight, though. So there are two major parts where we have to leave the DNA for a while, and come back to it, so it has time to>>about what what, I mean I’m curious. What’s the most expensive part of the Of the process.>>So that the chip and the reagents are going to be expensive part of the process. I mean, by the chip, you also get the reagents that sort of like the chemicals that you need to run the chip. the other expensive components are probably labor, obviously, because we actually need to hire what’s called high-complexity staff to run our process. It’s a CLS staff. and then the, the rest is actually fairly inexpensive.>>What are the are there any international restrictions, like obviously I’m, I’m guessing if the lab time only takes, you know, like you said, three days, but, you know, longer if you’re processing multiple samples. You know, it should take a lot longer for any international people, who want to have this service done?>>So the lab process doesn’t take longer, obviously the shipping is an issue and shipping can be expensive if you’re coming from Asia to California that’s part of the reason why we want to add another lab actually just to help with that.>>Uh-huh.>>There’s also issues with exporting DNA from certain countries, so China and India specifically have laws around what DNA you are allowed to export out of that country. So that, that plus other regulations regarding biological specimens mean that it’s actually quite difficult to add another country to our list of countries.>>Tell us a bit more aboutthe DNA extraction step. What actually happens, you get the, the DNA or the saliva sample in the lab, what do you do to get the DNA out? DNA extraction is a very, you know normal thing that happens in labs all over the world. There’s nothing special about who, how we do our DNA extraction. But the process is really trying to separate the DNA in the nucleus of the cell from the proteins and other components of the cell that other wise be in the way.>>I see, and how much DNA do you get out of the saliva sample that someone sends in?>>It’s actually very variable. So depending on the person we can get anywhere from ten micrograms to over a hundred micrograms of DNA.>>And does most of that variation happen just based on the number of cells that were in that spit sample or saliva sample? We think so, but we actually don’t have very good information on that. We know that some people’s saliva is more watery than other people, and we know that, for example, infants do not often give us good data, and we think that’s partially because of the wateriness of the, of that saliva.>>Well, for those people who might be wondering, you know, what kinds of cells do you actually getting in, in my spit? What, what, what, what are the actual cells that And the and white blood cells are the other major component.>>Now, I, I do floss and brush my teeth, but I know that I’m sure I have some kind of bacteria or germs in there. Even still is that a concern at all, worrying about sort of having other non-me DNA in my spit sample? Again, it’s not something we worry about too much. We know that people have a lot of bacteria in their mouths, and we know that from person to person there’s a huge variability in the composition of those bacteria. But for amplification we are happy to amplify up all of your human and bacterial DNA. And then let the sequencing or genotyping process sort out the human from the bacterium.>>I see. So, as long as it’s human it’s good. But does that mean, like, I, you know I shouldn’t go kiss anybody right before I, I, I give my DNA sample?>>For the 23andMe test, we actually collect two milliliters of saliva from you. Which doesn’t sound like a lot, but it can take up to five to 10 minutes to fill that. That’s a lot of saliva. You have to be kissing a lot to make a difference.>>[LAUGH] So when you finish the first step, the DNA extraction, you essentially have taken our sample and like you said, sort of purified the DNA from [INAUDIBLE] gotten rid of all the other stuff, the proteins, the other molecules that might be part of cells and other kinds of contaminants. And you pretty much have just a purified DNA product.>>Exactly.>>Now is, is that enough already to do what you want to do with it or what do you have to do next?>>Not really. It’s common in many situations like this to want to amplify the DNA. So you always want to give yourself the best opportunity to sequence what’s in there. And the easiest way to do that is to take the DNA you got and amplify it or copy it as many times as you can so that you just have enough signal that you make it easy on yourself.>>I see. So when you say you’re, you’re amplifying or you’re, you’re making copies I mean, are you controlling kind of how many copies you make or you just, you know, you’re making like ten more, 20 more, you know, 100 more? How does that work? It’s not tightly controlled, but, in essence, at the end of the process, you end up with a thousandfold enrichment of the DNA. So you really end up with a lot more DNA.>>So, do you end up with DNA leftover? You know, after you’re done with the procedure, you actually have leftover DNA for, for people?>>Sometimes, we do. but usually, we store your DNA in its In the saliva form, so that we’re going back to the original copy. One issue with amplification is that, every time you copy just like in DNA replication in humans, there is a possibility for error. So those errors could compound if you keep copying the copy. Just like Xeroxing [INAUDIBLE]. [INAUDIBLE].>>Now, we have our extracted DNA sample that’s been amplified, so we have enough of our sample to do something with. are we ready to go sequence it now? Is that what we’re going to do?>>So, the process that your DNA undergoes at 23andME is a little different to sequencing. It’s called chip hybridization or genotyping.>>And so, how does this differ from sequencing? What’s the difference between sequencing, and sort of, what you mentioned as sort of chip hybridization for, for genotyping? So, if you can think of sequencing as just reading the genome from beginning to end almost, you want to know every single letter and what it is. With genotyping we actually decide beforehand, which parts of the genome, which letters are interesting to us, and then we specifically look at those letters with the knowledge that, these are the letters that maybe cause diseases or are associated with Some trait that we have.>>I see. So you have to actually put a lot of thought into it before hand as to sort of know exactly which positions, which bases, which letters your interested in. How many letters, do you actually or how many bases do you look at in an individual sample?>>So we look at about a million on the current [UNKNOWN] chip. that means that, when you get your results back from us, you can download a million different calls, or base pairs from us.>>Sequencing sounds like it gives you a lot more information. what are the prohibitive aspects of sequencing? Why doesn’t 23andMe do sequencing? Yeah, sequencing is definitely the future of this field and we, we’re definitely interested in sequencing when it gets cheap enough. But the main advantage of genotyping is cost. so we know that to get the same results that we get with the 23andMe chip we would have to charge at least ten times more with the sequencing product.>>Wow.>>Probably a lot more.>>Oh, I see.>>Genotyping has other advantages as well. We, because you genotype using the process of hybridization, you actually get a lot of different shots on go, if you like. You get to hybridize different molecules many times>>Mm-hm. And you get the, the sum of all of those, or the, the average of all of those weeds. You get a lot of information compiled into one data point. With sequencing, you might actually miss the part of the genome you were interested in, and that could be a problem.>>I see. So, in a way, you kind of have higher confidence, even though you have maybe more limited information in the end. You, you have a higher confidence about the actual, you know, letter or base result for that position.>>Exactly. And some people describe genotyping as almost like single based sequencing, where we’re sequence, we’re sequencing hundreds of thousands of molecules on each part of the chip. So there’s an advantage there that’s hard to replicate in sequencing.>>Tell us a bit more about how this sort of chip piratization works for people who are unfamiliar. I hear chip, and I. Well, sometimes I think potato chip. But I know in this case, we’re probably talking about something more like maybe a computer chip. How does this work exactly? Yeah, so the chip is really a glass slide that contains an array of DNA. And those, and the DNA is bound to glass beads, that are, scattered, on a, on a regular pattern on the, on the chip. So then that DNA is designed by us. To bind to the parts of the genome we’re interested in, it’s the process I talked about a minute ago, and it comes together through the process of base pairing, just like double helix base pairing. And then either you have, let’s say an a at postion The position we’re interested in, or a g. And depending on which of those two you have, will either light up green or light up red on the chip, at a specific position on the chip.>>I see, so you essentially have sort of a discreet, or quantum unit, for every little DNA molecule that you’re testing, you get a little signal that says, oh you have an a or you have a g here.>>Exactly. And because humans are deployed, you’ll also get situations where you have A and G, one from mother, one from father. So then, what we see at that position on the chip is red and green or kind of a yellow [INAUDIBLE].>>Oh, that’s really clever. So that, and it allows you to easily distinguish any given position that you’re looking at. Yeah, so when we get the data at 23andMe we actually can plot it on a simple xy plot with red on the y axis and green on the sorry, on the x axis. And then we’ll see three clusters of people who are AA, AG and GG because those are the three possible situations. for disposition in your genome. Then we can say, well if you’re in this cluster, its because of the process called clustering, and you’re AG, and if your in this cluster, then you’re AA. So there’s a kind of a simple analysis problem at the end of the, at the end of the output there.>>I see it sounds like you’re actually using sort of, the information from pooled set of individuals, that you know, you’re sort of looking, and So even though an individual call maybe difficult to know where on that graph you’re, you’re placed. If you use sequestering like you said, you can more readily identify. Oh, you’re part of this group. So you’re AA, not AG kind of thing.>>That’s exactly right. And there are also situations you can imagine where we get even more information, so, you could imagine a situation where you have a deletion of the part of the geneome we’re interested in. Then what you will see is a new cluster another cluster of [UNKNOWN] three with lower intensity because there is less signal there because there are fewer DNA molecules. So then a new cluster could emerge and it could be as infrequent as one in thousand people or one in hundred thousand people. But we will see a nucleus of samples there.>>Now, in, in lesson two of, of our course, we actually talked about the first time there about the idea of base pairing. And we learned about the base pairing rules and, and how things fit together. but what I find really interesting is the idea that in, in a double stranded DNA, you know, you have all these letters matching with each other. Is, is what you’re telling is the idea that a single nucleotide variant is just one letter change here? That if one letter doesn’t match, that essentially what’s happening is that, the, the strand can’t actually hybridize completely. It can’t base pair completely, and that’s why you, you don’t see the signal for one, but you do for another? Is that, essentially what”s going on?>>It’s something very close to that. So what we expect is that all of the letters leading up to the letter we’re interested in will bind. And like you say, if there is a, an error there, one difference, then it won’t bind, or it will bind weakly. So the signal will be weak. But then, the specific letter we’re interested in, which will kind of come out one more base, has, there’s a pool of a’s, g’s, c’s and t’s floating around that can bind there.>>Mhm.>>Or not. They’ll only bind. If it’s the correct base pair>>And it’s unoccupied.>>And it’s unoccupied. Then, we’ll see the extension happen. The reaction that leads to the color, you know, the color that we can see in our imaging system.>>That’s great. And so essentially you have this, you said the chip is a slide, and you’re looking at a million bases in this one little chip, this one little space?>>Yeah. So there, even more than that. There’s more than a million beads per chip because it’s better to read the same variant many, many times.>>Mm-hm.>>Just to get a good average signal.>>Mm Hm. the reason for that, is that if your in the corner, or maybe you’ll get slightly different optics, cause the cameras looking at the center of the chip, or so, or something like that. So you just want some redundancy.>>I see.>>So it’s a more then ten million beads per chip. And that process has been improving over the years the density of the, of the chips has been increasing, a lot.>>How many chips, so, I mean, do you, have you always use the same chip, or How do you make changes, you know, if there’s a new, I guess, I don’t know, a new trait, or a new a new SNV that you’ve become interested in. Are you able to change your chips, so you can look at that? Yes. We update the chip approximately every 18 months. we’re on the third iteration right now, and we’re constantly looking for a new variance to add to the chip to improve the performance of it. So you can imagine then, if there’s a paper published that says, this mutation in this gene causes a disease. And it, you know, it’s very interesting to go and find that out. We can design what we call a, a probe for that. Just a base, the sequence of DNA is attached to the glass bead, and we make sure it’s on the next iteration of the chip.>>You mentioned that the hybridization process is sort of the, the longest part. you said it happens over, overnight?>>Yeah.>>what, I mean for those of us, I mean we’ve learned conceptually about base pairing, but any technical insight you might have. I mean what’s, why is this an overnight process?>>Essentially it just takes time for the DNA in your saliva, in the extracted saliva sample to find the right bead. So you want to give it time to just search around in the solution, find the right bead and bind there. And that, that takes time.>>I see.>>It takes time just based on quickly the DNA moves around and how tightly it binds.>>and what do you use to measure the results? Are you, I mean are you just like snappin a picture of the chip? what’s actually happening?>>Yes. It’s very close to that. in essence, you’d get what’s called a fluorophore attached to the, the bead. Because of the presence of the A or the G at that position. that’s the red or green signal. You shoot a laser at it to excite the therafore, and then you take a photograph of it with a digital camera. So a lot of it is very much, you know, digital camera based. Just interesting that that technology allowed our technology.>>Yeah. And it also sounds like, I know how big my digital camera picture files are, that sounds like it’s a lot of data, a lot of information that, that you end up with.>>Yeah, so all of that data from this giant image file gets processed by software that’s actually done by our labs in Los Angeles, and translated into the xy plot I mentioned before. Because for each bead on the chip really all we’re interested in is how much red is there in there, and how much green. The photograph itself is not terribly useful after you extracted that data.>>Well it sounds still like there is a lot of stuff that goes on time from the time that you guys get the wrong data and you sort of do your quality control, qc check before you know you know i see it on my twenty three [UNKNOWN] profile page I mean is, is all of that just sort of computational biology at that point. who, who’s looking at that data? Who’s organizing it? What’s sort of going on there?>>Yeah, it really becomes a computational biology problem once we get the data from the lab, and that’s really where 23andMe concentrates most of our Staff here are computational biologists or software engineers, so that’s really where we’re best. so, the next major component there is called calling, that’s turning the data that we get from the lab into the AAs, AGs or GGs that you can download in your raw data, or produce the reports you see on our website. and that’s kind of an involved process but it, in essence, relies on looking at these xy plots and making sure that everything is where it, we expect it to be.>>and so the final results that, that we get back seem to look so, so curated and so, so official w-, I mean how much of the computational sort of side of this is sort of deriving you know. Oh what does this letter mean. I mean how do you, I mean are you guys viewing all the literature and, and selecting things and,>>How, how are those decisions made in general?>>So we actually have a pretty big content team at 23andMe, they’re the ones who scour the literature, look at what’s going on in the current science of genetics, and find out, oh this mutation or this varient causes this disease, and it’s a new discovery. Then they would write up what they think about that, and try to put it in launguage that makes sense to our users. And if it meets our scientific criteria, it goes out to the website. It is a little automated, and it can take time to just process the data through to the website. But the actual process of taking your data, calling all the variance, and all that is quite quick. You can think of the website as being in two parts. There’s the part where we just look at the calls that you have for disease. And the report just links that call to the results. So for example we could say, you are a carrier of cystic fibrosis because you have this variant. So that’s a, that’s a scientific challenge, to figure out what’s going on in the literature. But it’s not an algorithmic challenge. The other part of the website includes things like our ancestry composition feature and our relative finder feature. Those are difficult algorithmic problems. You want to take the whole, you know genotype sample all million variants and try and figure out based on that, what is this person’s ancestral composition.>>Mm-hm.>>So you can imagine that that’s actually a more complex task and requires a lot of code taking it from, you know, ABC through to the end

Add a Comment

Your email address will not be published. Required fields are marked *