MyHeritage DNA 101

– So this session is the MyHeritage DNA 101. I’ll go over the basics, I’ll explain probably, who has taken a DNA test? Alright, so we’ll go over the basics. Some of you have already taken a DNA test some of you have already viewed the results. We’ll explain a little bit about what is going on behind the scenes, how it works, how we’re doing that matching magic that we do. How the ethnicity estimate is calculated. And it will be fun, join me. Alright, the basics even before we are able to regenerate original DNA, the 22 autosomal chromosomes, pairs of chromosomes and additional sex chromosome which determines our sex, an x and y for males and double xs for females. Basically all the information is hidden right in these in each one of us. This is the kind of thing when you take the DNA test, you send it in the lab and I’ll show you what it eventually looks like. This is what we’re looking into, this is where all that secrets, all those matches and ethnicities hidden inside. So that’s over 3 billion base pairs in those 22 or 23 chromosomes but we don’t need to look at all of them, that would take us a lifetime to calculate matches for everyone of those snakes those genetic markers that exist in our chromosomes. I like this example I use it quite often so we’ll do it again. In this picture you see two genomes on the top and on the bottom, they look quite the same but when we look deep inside the one at the top is of a human while the one at the bottom is of a chimpanzee. So you can see that there’s a lot of resemblance a lot of things look quite alike. And actually we’re 98% similar, our DNA looks exactly 98% exactly the same as a chimpanzee. When it comes to human beings the number goes up. We’re 99.9% similar, those 3 billion pairs. Almost identical, what we’re left with is that 0.1% that distinguishes us, one from the other and makes us those unique us that at least I like to think that I am. So the ethnicity estimate, the definition that we go with is the architecture of genome variation between populations basically our attempt to distinguish and say you’re from this ethnicity and you’re from that or from a combination of a couple of these. Let’s see how it works. We won’t use specific ethnicities for this example we’ll go with colors. So this is Mike and mike has a maternal side that looks something like this. Let’s say that his great grandparents from the maternal side are purple, they’re coming from the purple population, my favorite one. So, if all of these great grandparents are purple so will be his grandparent and his grandmother, his mom and therefore Mike will be 50% purple. So this is easy side of the family because on the paternal side things are a bit different. So here we have two great parents, great grandparents who are orange and one who is red and one who is blue right. So as we go down in the generation we inherit about 50% of the ethnicity for each one, it depends on which kind of segments we’ve inherited. So his great grandparents, his maternal grandparents, his grandma is orange because both parents are orange, but on the other side you can see that his grandfather is already half blue and half red because he inherited approximately half from each side of his parents. When it comes to his father it’s already 50% orange and 25% of each. Eventually when we get to our dear Mike he ends up with 50% purple, 25% of orange and then 12.5% of the blue and the red. This is in a nutshell how we do the ethnicity estimate is we’re trying to determine those different ethnicities, different populations. Not giving them colors but giving them names. And then, looking at your DNA results and trying to say what do you have in your inheritance. We’ve conducted a project at MyHeritage called MyHeritage Founder Populations Project. We started by looking a trees. We have millions and millions of trees at MyHeritage. We were looking for those people who have long chains of ancestors coming from the same geographical location for example. Or bearing the same surname for a long chain of ancestors. We spotted them, we marked them and we said, these are good potential to be a good representation of an ethnicity, let’s call it for now Scandinavian. I think you’re familiar with that. We started by that and we started sending samples, more than 5,000 of them to different participants of the Founder Populations Project. We started sampling them and trying to come up. When we were looking at their DNA we were trying to look for signals, something that repeats itself. If we have ten people tested we’re looking at those ten people and we’re trying to see whether we have something that stands out. Something that is repeated for every one of them. So if we can say, this looks like Scandinavian or this looks like an Iberian person. Those pictures were taken by us at MyHeritage we have a wonderful project I recommend you to look at. It’s called Tribal Quest. We’re going all around the world, we’re helping tribes from deserted places document their family history. We’re building family trees for them, we’re sitting with them, we’re meeting with them. Kind of a side note I recommend you all to look at it, but we’re also taking DNA tests and trying to preserve that unique heritage. These are not people who travel too much around the world, they usually stay in the same place for a very long time. So we can take that DNA test and we can tell people from various places around the world, you have got some right there on the right this is a Nenets tribe in Siberia for example. So you’ve got a little bit of Nenets in you or Namibia or the other places we went to or continued to go to. Alright, so we’ve collected all that information the Founder Populations Project came to an end. There’s no real end here. Keeps on growing, we keep on learning we are able to improve our algorithms, collect more data and build more focused groups. So, we’re done and now we are taking a DNA test. This is actually what’s being done. After we collected the founder population data we build those models those signals we say okay in chromosome seven we see that pattern that usually resembles a person from Scandinavia so when you take your DNA test we compare your results against those models and we’re trying to look and trying to break down your own ethnicity to give you an ethnicity estimate result. We aggregate it and compare it and eventually we’re giving you your ethnicity estimate report which consists of those ethnicities that we have. At the moment today we have 42 regions at MyHeritage. They’re listed right here. And, some surprises are expected. Alright, so now let’s talk about DNA matching. And my example of Frank and Molly. So we have 22 autosomal chromosomes, getting away from the mic. We have 22 autosomal chromosomes and in this example I’m kind of showing you what happens as generations that go by and what happens to the DNA and in this case I’m taking for reference one think of those two lines that you see at the top for Molly’s chromosome number five. This is how Frank’s chromosome number five looks like and this is Molly’s. His is purple, hers is orange, again we’re using colors, easier to understand. Remember they are pairs. Each one of us for each chromosome we get one of the pairs from our father and one of the pairs from our mother and that’s exactly what we see in Ben and Lisa’s case. You can see that Frank has dark purple and the light purple and in the middle Ben’s fifth number 5 chromosome you can see that the color changes, these are the recombination points. When the two pairs are recombining together and then switching between one and the other that can happen more than once. When Ben was conceived, same thing, Lisa had two recombination points so her one pair of chromosome five is a mixture of both her father Frank, his father Frank and the same thing goes to the mother. This is eventually how Ben and Lisa’s chromosome five looks like. When we’re trying to compare we’re looking at those places on the chromosome which are identical. We look at them, we look for long segments that repeat themselves. So we can see on the right that Ben and Lisa share this amount of DNA on chromosome five. This is the amount of shared DNA that they have. Then later on Ben and Lisa are getting married and having their own kids and new colors come into play. So for Erik we have the new blue color which came from Ben’s spouses. Same thing for Lisa with her daughter Julia. We try to compare Erik and Julia together now we see that things are a little bit different and we see that there’s even less shared DNA between them. That’s why, Elab spoke about it in his presentation. As generations go by we lose the amount of shared DNA between people with common ancestors. Erik and Julia’s common ancestors are Frank and Molly so they will be sharing roughly about 25%. But Ben and Lisa had approximately 50% and that’s why it’s so important to test the more elderly relatives and we’ll get to that also in a second. So let’s look at the reports because eventually when you’re going online and you’re looking at your DNA matches you don’t see Ben and Lisa. We haven’t added them to the product yet. You see your DNA matches, the people with whom you share some amount of DNA and we’re telling how many segments and what is the length how many segments the length of shared segments you have with them. And a percentage, that’s what most users look at. How much shared DNA you have with one person or the other. So when it comes to 50% that probably means a parent or a child or a sibling. 25% is taking you one hop away, when we’re talking about grandparents or grandchildren. At MyHeritage we use different algorithms to rule out some options. For example if you get a 25% match with someone it could mean that he is a male, also something that we’re taking into consideration. Let’s assume we have a 25% match with a male he can either be your grandparent or your grandson. But we’re also looking at the age so if someone signed up for MyHeritage and logged his year of birth and we can see that he’s older than you by approximately 50% 50 years I’m sorry, it probably means that he’s your grandparent, there’s no point in even matching the grandchild. So 25, 12.5 and so on and so forth so as the amount of shared DNA, the percentage of shared DNA decreases, the problem is we’re getting a little bit farther away. Our common ancestors, our most recent common ancestors are a little bit farther away from us. That’s exactly what happens in our case. This is a tiny bit of my family tree on MyHeritage. Right next to me is my beautiful wife Yulia and the youngest genealogist to live on earth to date, my daughter Mai she’s three years old. I tested her for DNA at the age of 2 approximately it cost me a box of candies but it went fine. It comes back to a question that was asked before but why are you testing your daughter because you have your DNA, I’ve tested my DNA way before I tested her but her mother did not want to test hers at some point. So we’ve been testing my daughter I can get my wife’s DNA matches. Don’t worry she has tested since then she’s very happy, she’s excited with the results. But when Mai actually gets a DNA match with a Jane Doe and there is a segment, an IBD segment it’s an inherited by descent, there’s a common ancestor. It sits right there, right there at the top. We need to work hard to find that common ancestor in many of the cases and I’ll talk about it in this presentation but mostly in the next one. The problem is that up there, there’s a common ancestor and right here it’s over there the blue one and the purple. They are both her ancestors. The common ancestors. The DNA diluted, with the generations it kept on passing by for the grandfathers to the father and eventually my daughter. And so the same thing happened with Jane. Alright, so now let’s take it a step further and talk a little bit about, you’ve taken the DNA test. It goes to the lab and it takes time, you wait I know for two, three, four weeks sometimes. And finally the results are ready. It’s great, they’re done, the lab is done and we’re ready to present you with your results. It looks like that. That’s what comes out when you take a DNA test. No more nice pictures of the people from the Founder Population Project right. So this is your DNA, this is basically what comes back, 700 thousands slips or so that we’re testing and then we’re looking into those. So what you can see there are the chromosome. It’s a very long list, a thousand lines or so. For each genetic marker we look at what the result, what did we find in your DNA test. There are four letters, ABCD, the I represents an oak hole, we weren’t able to find it. This is what we have to work with and from that we need to find you some DNA matches and give you an ethnicity estimate. That’s not easy. So we’re doing a couple of things. I’ll focus on the first two, phasing and imputation. I hope this might help to clarify some of the points I spoke about previously. So what is phasing? Phasing is our attempt as you saw in the file there are two letters. The reason there are two letters is one you got from your mom and one you got from your dad. To do DNA matching, we need to separate. We need to try and understand which one came from the mother and which one came from the father. That will help us find DNA matches for you because we’re looking at the single line. Phasing is our attempt to break that when we see that you have, if I’m the child in this case you have BB and AA and AA, trying to figure out and split those and we’re doing those by smart algorithms that can help us break those down and understand better which letter came from the mother and which came from the father. There are some cases which we can’t tell that. And we use some statistical algorithms to figure that out. Alright, so we’ve narrowed it down, we’re done with the phasing. We have the first pair and the second pair. At MyHeritage we allow uploads from other vendors as well so if you’ve taken a test with someone else we encourage you to upload your results. This is free up til December 1 when things are gonna change. We welcome you all to come and upload your results if you’ve already taken a DNA test. The thing is that different companies look into different slips, different genetic markers. That makes our life hard because it’s not apples to apples, we cannot compare one DNA result to the other and find good DNA matches because there are some differences. We’re doing something called imputation. Here’s an example where you can see that the first snip exists in both samples right. But the second one is missing an vice versa. We need to fill those lines, we need to understand what was the result there so we can give you good matches. The best way to think about how we’re doing that is by doing fill in the blanks. How many of you can read the sentence that appears on the top of the screen? Good but some letters are missing, you don’t see them. That’s exactly how we’re doing statistical imputation. Your brain can perform amazing statistical calculations and fill in the blanks even though some is missing. That’s what we’re doing for DNA. We look at what’s surrounding, the results, those snips that we were able to see results for. We look and we use statistical algorithms to fill in the blanks. If there was a sequence of AGCTCTC then usually the next one will come a T. That’s how we’re filling in the blanks and we can do more matching between different vendors and that’s something that’s unique to us. We’re looking for those sequences that repeat. Alright, starting to run out of time. These are the IBD segments. Once we’re done phasing and once we’re done with imputation we’re looking again at these results. We’re taking the database, we have nearly 9 million people tested on MyHeritage and we match every time a new kid comes in we’re matching it against this database to see if we can find those segments long enough segments that we can say that there’s a potential good match here. There’s a potential common ancestor for these two individuals and we’re looking for these segments. Once we can find them this is what we add to the application. This is what we show as your DNA matches. I think that’s it for now.

One Comment

Add a Comment

Your email address will not be published. Required fields are marked *