What is DNA Overlap on GEDmatch? – A Segment of DNA

if you haven’t logged on to Gedmatch
recently then one thing you might see when you’re looking at your match list
is overlap and what is overlap why is it important and why is there so much
confusion we’re gonna try to tackle that today howdy I’m Andy Lee with Family
History Fanatics and this is a segment of DNA be sure to subscribe to our
Channel and click on the bell if you’d like to be notified about upcoming
episodes now overlap has caused a lot of
confusion among people and it was just started to be included in the match list
when the Gedmatch Genesis program came out now since that time Genesis has been
merged over to the regular Gedmatch website and so overlap is here to stay
and understanding it might help you understand why you’re seeing some of
your matches and why maybe you’re not seeing some of your matches so let’s get
into it to begin DNA companies test your DNA and
they look at somewhere between 500,000 and 900,000 individual locations out of
3 billion now between different companies they each select which ones
they want to choose and because of this some of those sites are these same sites
and this is what we call overlap so for instance Family Tree DNA might overlap
with 23andme in about two hundred thousand of those locations so it might
be that 23andme has 300,000 locations that are different from anything that
family tree DNA has and Family Tree DNA might have 300,000 locations that are
different from anything that 23andme has now because there are thresholds in
matching you have to match a certain number of centimorgans but you also have
to match a certain number of SNP’s that’s those these individual locations
that are being tested the amount of overlap is important because while both
23andme and Family Tree DNA may each test about
500,000 locations it’s only the same locations that they can look at it’s
only that overlap that can be looked at so for instance instead of 500,000
locations that are spread along those there’s really only 200,000
where they both test the exact same SNP’s, in other words, the overlap that
they test that you’re comparing what this means for segments is as you get
into smaller and smaller segments there’s less and less SNP’s and at some
point you may not have enough SNP’s to meet that threshold so you may lose a
segment that really is a match but just because of where the different people
tested you’re not seeing that let me put up a little chart here to
help explain it more and this is a list of the different testing companies and
their different chip version these time I change the chip version they change
which of those SNP’s they’re including on each one of those tests and now I’m
just using ten because the screen is not big enough for 700,000 so we’re just
gonna be looking at ten and you can see hey ancestry with their version one they
tested these ones of the SNP’s and then they’re version two tested a little bit
different same with 23andme same with my heritage Family Tree DNA and living DNA
so next let me highlight a couple of tests let’s look at the ancestry test
and the 23andme test their most recent versions you can see that ancestry they
test on two three five six nine and ten 23andme they test on two four five seven
nine and that’s it and so the overlap between these two is
really only positions two positions five and positions nine now if we look at a
different test if we look at Family Tree DNA in comparison to that ancestry test
we see that Family Tree DNA they test on two three four seven eight and ten
but the overlap with ancestry is only two three and ten so because of that
there’s a lot of these gaps in the information and if those gaps get big
enough then you might start to get or you might start to lose matches that are
really should be there but they’re not being there because of the thresholds
what does the overlap table them look like if we are comparing each one of the
different tests now the I saw guedel that shows you how much overlap there is
between the tests and you can see on this table and you can see in the link
below how the different tests overlap and in most cases they overlap by at
least a couple hundred thousand and sometimes as much as you know six
hundred thousand SNP’s but the Gedmatch overlap is a little bit different so
based on the test that I’ve taken with all the different companies at different
times I created my own table of overlap and you can quickly see that the overlap
when the Gedmatch database is not the same as the overlap that’s listed on a
sog now why is this well Gedmatch doesn’t look at every single snip that
is tested Gedbatch started back in 2010 and at that time Family Tree DNA
and 23andme were both offering tests it was the
23andme version 3 chip now the 23andme version 3 chip had um just over 900,000
SNP’s on it and you can see based on the overlap that Gedmatch is actually
looking at the vast majority of those almost 850,000 of those with the Family
Tree DNA chip they had about 700 thousand SNP’s on it and Gedmatch
decided to look at the majority of those also about six hundred and twenty
thousand and most of those overlap with ones from 23andme so there’s really only
a few thousand of the Family Tree DNA SNP’s extra that they added in now why
they didn’t go with the entire group of 23andme and Family Tree DNA SNP’s I
don’t know but they still got the majority of them now as time went on
23andme changed their chips ancestry added their chip MyHeritage added a DNA
test but derres was using the Family Tree DNA chip so there really wasn’t
much change there and then we also have living DNA that added a chip as well as
some other companies now because of the way that Gedmatch is storing their data
my guess is that they didn’t want to necessarily expand the overall snip that
they were looking at because that might then invalidate some
of the past results and so they kept with this same base of about eight
hundred and seventy thousand SNP’s I know that that’s really what the limit
is is because I’ve created a super kit a jet matched super kit using their tool
and using their tool it comes up with eight hundred and seventy thousand one
hundred and ninety-two SNP’s and that is using all of the tests that I’ve taken
from all the different companies now I’ve created my own super kit I showed
you in a video before and that kit has over 1.5 million SNP’s in it and those
are all individual SNP’s so there is you know six hundred thousand those SNP’s
that jet matches just ignoring so they’ve got a specific selection of
those SNP’s and it’s really based on that version 3 of the 23andme chip and
the original Family Tree DNA chip now what this means when it comes to
comparisons is that if you have a 23mm version 3 chip or a Family Tree DNA chip
or a MyHeritage chip then you are going to get the best matching results when
you’re matching with other 23andme Family Tree DNA and MyHeritage chips now
the results with the ancestry chip are still pretty good because it still
covers about 400,000 of the common SNP’s between them but once you get into the
23andme version five chip the latest version they have and if you’re using
the living DNA chip then what you’re gonna see is we’re starting to get into
much less DNA that’s being compared and that’s where this overlap might be a
problem now Gedmatch points us out when you’re looking at the columns on your
match list if the overlap column is highlighted in various shades of red or
pink that just means that the overlap is really small and so the results may not
be very reliable in that case the major downside is is that you’re going to miss
out on some matches that you wouldn’t have otherwise so one reason that I
encourage everybody to test everywhere is because the information you get is
different now when you’re uploading that information to Gedmatch then there
really is a priority as far as what the best information up
so for instance the 23 me version three chip that’s the best information that
you can upload the my heritage or Family Tree DNA chip that is the second best
information that you can upload then there is the ancestry DNA chips
that’s the third best information to upload and finally you have the other
23andme chips and the living DNA chip so if you haven’t tested at ancestry Family
Tree DNA or my heritage I encourage you to test at one of those places and make
sure you upload that data to Jen and use that to help in your matching results so
next I wanted to show you this table and this is a table that basically combines
that information from the other table I know that there is a maximum threshold
of around eight hundred seventy thousand which is the maximum number of SNP’s
based on my super kit that Jade match is looking at and this table is showing the
percentage of what each one of those chip combinations has and how much
you’re actually going to be matching with so for instance if you are matching
a 23andme version for kit to a 23mm version five kit you’re only using
eleven percent of overall SNP’s that Ged match looks at and so there’s gonna be
a lot of information that you missed on the other hand if you have a 23 MeV
version for kit and you’re comparing it to an ancestry version two kit you get
about 33 percent of the overall SNP’s Lee it’s three times the amount of
information that you’re looking at between those two kits now like I said
the best kit is going to be the 23andme version 3 kit which if you’re comparing
it to another 23andme version 3 kit 97% of all the SNP’s that they’re looking at
are included in there and so there’s very few matches that you’re going to be
missing out on because you’re not missing out on there very much
information now if there was a way for Gedmatch to retrofit their website and
be able to bring in all this other snip data from these new chips then comparing
a living DNA chip to a 23andme chip would be as accurate as comparing an
ancestry chip to a ancestry chip you’d still have some issues when you’re
comparing you know a three a mean ship two an ancestry chip
but those issues would be minimized because we’d be looking at a whole lot
more SNP’s overall there is nothing that you can do to overcome the problem of lo
overlap the only thing that can be done is for people to test with other
companies and upload the results to Ged match so when you’re looking at your
matchless if you’re comparing it to one of the match lists on one of the other
websites and you’re not seeing some of the same people who you know of uploaded
to Gedmatch it might just be that there’s not enough overlap in the SNP’s
to be able to actually identify that match of course it’s always important to
remember that each website has their own algorithms that they use in order to
determine what a match is and so there might be some cases where there’s plenty
of overlap but one site just their algorithm doesn’t call that a match but
another site does ideally you’re gonna find multiple matches that go back to a
common ancestor and in this case overlaps not such a big deal because you
have multiple matches that are all leading back in the same line so that’s
what you should focus on if you’re dealing with just one match make sure
you’re paying attention to whether or not you have good overlap whether that
box is highlighted in red or not and if you have any questions about overlap
that you would like me to answer put it in the comments below if you like this
video give it a thumbs up and make sure you share it with all your friends


Add a Comment

Your email address will not be published. Required fields are marked *