The Intersection of Genomic Ancestry & Health: Cutting Edge Tools for Conquering Health Disparities

EMI CASAS: Hi, everyone. So we’ll go ahead and get started. Welcome to the ENRICH Fo-rum of the National
Cancer Institute. My name is Emi CASAS and I’m from NCI’s
Bi-orepositories and Bio-specimen Branch. We started the ENRICH Forum in 2013 to promote
dialogue and education on ethical and regulatory issues affecting cancer research. On behalf of our ENRICH Forum team I would
like to thank you for joining today’s presentation by Dr. Tim O’Connor. We have attendees joining us both in person
and online, so for those of you present in the conference room, we would appreci-ate
if you would please silence your cell phones. And if we can proceed to the lo-gistics slide. For our online participants your lines will
be muted upon entry and will remain muted for the duration of the webinar. If you experience technical difficulties,
please use the chat feature on the right side of your screen and contact the host of the
webinar to assist you. We encourage you to submit questions by using
the question and answer feature. Please type in your question and select all
panelists before hitting submit. Feel free to submit your question at any time
and we’ll try to ask them on during the question and answer portion of the presentation. And during the presentation if you want to
write down questions we can also go over those at the end. Unfortunately we will not be able to get to
all of the questions sent in due to time constraints, but we’ll do our best. If you require closed captioning, please refer
to the media viewer panel. You’ll be asked to enter your name. With that I’d like to please welcome Dr.
O’Connor. Dr. O’Connor joins us from the University
of Mary-land, School of Medicine. He’s an expert in population genetics of
large scale sequence data. He received his PhD from the University of
Cambridge as a Gates Cambridge Scholar in 2011 and subsequently works with Dr. Joshua
Akey at the University of Washington, Depart-ment of Genomic Sciences. While there he joins the notational team of
scientists working the NHLVI Grant Op-portunities Exome Sequencing Project. He joined the faculty of the Institute of
Ge-nomic Sciences and Department of Medicine at the University of Maryland, School of Medicine
in 2013 as Assistant Professor. There his team works on rare variation population
genetics, data analysis of nex-gen sequencing and new world populations, such as Latino
Americans, African Americans and older Amish communities. Dr. O’Connor has graciously agreed to participate
in a round table discussion on this topic after this formal talk and we invite those
pre-sent here to join us. That part will unfortunately not be broadcast. With that we’ll go ahead and welcome Dr.
O’Connor and move onto his presenta-tion. DR. O’CONNOR: So I’m very grateful for this
opportunity to speak with you all. And this is a topic that I get very, very
excited about. And so I’m going to tell you about some of
the work that we’ve been doing on this intersection between ancestry and genomic health. So one of the things I want to kind of really
emphasize in today’s talk is that we’re going to be talking a lot about tools and
we’re going to be talking about … so, for in-stance, I have no cancer data in my
talk. Sorry. There’s no cancer data. But there is a lot of other fun data that
we can start to look at. And kind of the key motivation behind this
talk I hope is that you’ll think about some of these tools in your own re-search programs
and how these might be appropriate for dealing with things like health disparities. So one of the issues that we have in the genomics
community is that we, in re-search in general, is that a good portion of our research is
in European Americans. And while that’s important and we’ve been
able to do a lot of really cool, answer a lot of really cool questions, what it means
is that some of these tools and some of these resources that we’ve developed around European
Americans are not actually very applicable to non-European individuals. And there’s a lot more non-European individuals
in the world then there are European individuals. And so from a person-alized medicine standpoint
it’s important that we kind of extend some of these re-sources to other populations. This was really well put in an editorial from
Bustamante, Carlos Bustamante, and Esteban Burchard and Francisco De La Vega. And these guys are really at the fore-front
of bringing kind of genomic medicine to non-European populations. And so some of the stuff I want to talk about
today was created by them and others in the population genetics community, but also I’m
hoping to focus in on some of the ex-amples from my own research in terms of how we might
think about and extract additional information from ancestry in terms of genomic health. So to give kind of a background and I know
that this, as a diverse community, that hopefully that I’m speaking to, and so I want to give
a couple of slides of back-ground. If this bores you, you can just go check your
email for a few minutes and then come back in, but I want to give some background to
make sure everybody’s on the same page. And I’m going to stand up just because I get
all antsy. So one of the things that we know from population
genetic theory and application of genetic data is that we’ve been able to illustrate
something called the Out of Af-rica Model. And what this is, this is about 200,000 years
ago, actually there’s a new fossil up here that just kind of came out recently, that’s
about 300,000 years ago, but basically modern humans evolved in Africa and then about 50
to 100 thousand years ago there was a large bottleneck. And what by bottleneck I mean is a small group
or a small subset left. And that changed the generic variation of
those indi-viduals. And as they left and went into different parts
of the globe, so about 40 to 50 thou-sand years ago they went into Europe and then they
went into Asia, and most re-cently they’ve gone into the New World, so about 15 to 16
thousand years ago they entered the New World. And they also really recently started entering
into differ-ent parts of Oceania, so like the Samoan Islands. The oldest fossils they have in Sa-moa are
about three thousand years ago. And so over the course of time we’ve spread
out, we’ve been exposed to all sorts of different environments than what we originally
evolved in Africa. And as we’re ex-posed to those things, those
things start to change our genome. As we start to go through these dynamics of
sub setting our population multiple times, the genetic variation again is impacted. And so then most recently we’ve had kind
of this new pattern of migration. So what we’ve had in the last 500 years
is something called the Columbia Exchange. And as a result of that there was a lot of
admixture of genetic backgrounds that just hadn’t been in contact before. I’m going to show this slide a couple of times,
one in this context and one in the context of the project from which it originates. But basically what I’m wanting you to pull
from this slide is that we’re all kind of in the New World, we tend to be a mix of different
ancestral backgrounds. And what I mean by an admixture is basically
it’s this migration, combination of Eu-ropean haplotypes or variation and African genetic
variation and Native American genetic variation and others that come in varying quantities
in different populations of the New World. So as a basic understanding of kind of what
the underlying mechanisms of these tools, evolution in a genetic standpoint is a change
in allele frequency over time. That’s kind of our simplest dogmatic statement
of what evolution is. And so what that means is if we take this
nice little population of gingerbread men, I’m sure you guys are all familiar with evolution,
but just to cover our basics, if we take our nice little population of gingerbread men,
this is our population, then they’re going to consist of some genetic variation. In this case at one site in the genome they
have either an A or a G nucleotide. So then what is evolution for this gingerbread
population? Well, in the first genera-tion we might see
that the A is more prevalent than the G. But then over time through a number of different
mechanisms the chief of which would be something called genetic drift, which is basically that
since we don’t have an infinite number of courthouse, we’re flipping coins, but we’re
only flipping it once. So if you have a heads and a tails and you
flip the coin once, you’re only going to see a head or a tail. And so the next generation there’s going
to be some movement. The other kinds of mechanisms are things like
migration and selection and muta-tion; new variants arising in the population. And so over time these generic vari-ants are
going to change. So in the end this final population of gingerbread
men now have a higher frequency of this G alleles than the A allele. Okay? But if this were to happen again in a different
gingerbread population, maybe it would be the reverse. And then we start to see that, you know, these
two popula-tions of gingerbread men might be different than one another and we can start
to identify where they’re the same and where they’re different. So one of the key tools that we have in our
toolbox is the ability to estimate varia-tion or to estimate ancestry from a genome-wide. And so we use genome-wide var-iants. We use, you know, from all across the genome,
and we can use some differ-ent algorithms to start estimating what the ancestry is of
every individual. So in this case each line represents an individual. It’s a lot of lines. The top box represents about 950 individuals. So each line is one person. And if you go down here, this red color means
that this individual is 100 percent European. You move over here, this individual is 100
percent African. But if you move into some of these kind of
intermediate individuals, in this African American cohort they can range all the way
from 20 percent African to 100 percent African. And so when we’re thinking about how we’re
going to stratify a research study or we’re going to think about disease risk, we need
to realize that it’s not just Black or White. That our African American communities and
individuals, each individual has a varying amount of ancestry coming from these different
components. And this could be that there was a very recent
move, say, you know, President Obama is always a good example of this, you know, his father
came in and his moth-er was mother was from East Africa and his mother from America, was
European, so he’s probably got about 50/50. His ancestry would come out about 50/50. But he’s still an African American and we
have to think about that. But if someone had been an African American,
say his wife, she may be coming from a different part of part of the distribution because her
ancestors, the European con-tribution may be much older and may be much smaller. And so we have to really think about the individual
when we’re starting to think about ancestry proportions. Now why does this matter? Well, another reason this matters is we talked
a little bit about this bottleneck. One of the results of a bottleneck is that
you actually lose variation. So Europeans on average have less variation
in their genome than an African individual would. And so if you have an add-mixed individual. What you see here on the x axis is that same,
it’s basically that blue value. So it’s the percentage of African ancestry
that they have. And on the y axis it’s the num-ber of variants
that you find that are different from the reference in their genome. Okay? And so we’ll return to this concept a little
bit, but basically if you’re thinking of this as kind of the baseline cool where there
be a disease variant in here, it’s go-ing to depend on how European versus African ancestry
that individual has; how many variants you have to look through. Okay? Yes? MS: How do you define the (unint.)? DR. O’CONNOR: The reference? MS: Calculate the (unint.)? DR. O’CONNOR: This is the human reference. We’re just talking about the difference
between the … MS: Is that for European? DR. O’CONNOR: It is. But it actually wouldn’t matter. There’s another way of calcu-lating this,
what’s called heterozygosity, which is just the number of places where that individual
has a difference. So they have one version of either. And that’s al-most perfectly correlated
with your distance from Eastern African. It’s really pretty cool. But, yeah. And it can be done a number of different ways. But in general if you run a targeted sequencing
or do any kind of genetics you’re going to get more variation in an Afri-can ancestry
individual than a European ancestry individual. And in African Ameri-cans you’re going to
see kind of the spread or the gradation of that. The other thing we can do though is we can
start to kind of take it one step further. Okay? On the right here is what we were just talking
about. This is this genome-wide ancestry. But what we can do is if we take one African
American individual we can actually look within their genome and find the mosaic pattern of
inheritance that they’ve had. So, for instance, in this case this individual
on chromosome one, the first part of it they only have African representation. But if you move a little bit further in they
have one haplotype coming from Africa and one haplotype coming from Europe. So in this case it’s actually quite interesting
because two individuals that have the same amount of African ancestry, the distribution
in their genome might be differ-ent. And that might confer differences in risk. So an example I like to use would be something
like cystic fibrosis. So this is a monogenic disease that comes,
it comes from CFTR. Most of the mutations that cause, there’s
high frequency mutations that cause cystic fibrosis come from a Western European ancestor. So if this African American walks in the door
and we knew his genetic profile or their genetic profile and we found out that actually they
have two copies of CFTRs that are coming from Europe the risk for having cystic fibrosis
might be different than if they have two copies coming from Africa. Okay? Thinking about it another way is if there’s
some sort of risk associated, ancestry-based risk associated with cancer you could start
to again profile them in this much more specific way. So they may be homozygous or heterozygous
for African ances-try, a P53, or they may look like they’re European at P53. And in that case we might think about their
risk in a subtly different way. So today I’m going to outline three projects
that I want to go over. The first is look-ing at a large cohort of
individuals of predominantly African and European ancestry. Then we’re going to move into something
is a very fun project that we were able to do in our lab, which is a Peruvian genome
project. And then finally I want to look a little bit,
kind of return back to this idea of variant number and look at clinical vari-ation in
the context of African ancestry. Now again the goal of today is not necessarily
to give you some new insight to can-cer biology, but hopefully to give you some insights into
some of the tools that might be available and applicable for things like health disparities. So this first project comes from NHLBI’s
Trans-Omics for Precision Medicine initia-tive or TOPMed. And as part of this project, this particular
project is based on a freeze of 18,000 whole genomes. The data set currently lives at about 65,000
whole genomes. And at the end of the project the goal is
to be at about 120,000. And with that a couple of things like RANAC
(ph.) and other Omics kinds of technol-ogies. All of these individuals are kind of either
cases in controls for heart, lung and blood disorders or they’re from family studies
like the Amish that was referenced before that comes out of the University of Maryland
study. But one of the things that we can do with
this is we can start to look at genetic var-iation across a wide group of individuals. Now what you’re seeing here is we’ve broken
up the genome into kind of one mega based chunks and we’ve looked at the amount of
variation that lands in those different chunks. Above this line here, this is the amount of
common variation or things that are seen in at least five percent of the individuals and
below here is the amount of rare vari-ants that are seen, and rare I mean less than half
of a percent. So these are things that are only seen just
a few times in the whole population. And what you’ll notice is that there’s
just this vast amount of rare genetic variation comparted to the common variations. And what we’ve seen in other studies is
that most of this variation is also very re-cent; it arose in the last five to ten thousand
years. And so that can start to get into things like
you have one European individual walk in your door, they may look sub-tly different than
another European individual walking in your door. Okay? MS: What does functionality mean there? Low functionality? High functionality? DR. O’CONNOR: So we used what’s called an
in-silico (ph.) predictor of function. And so this uses things like evolutionary
conservation over the course of all of mammals or all of vertebrates and says how often has
this particular site changed? That’s one of the metrics. In this case it was slightly different. And so what I didn’t break down into, which
is a good point, is this is coding regions here in the black and the gray. There’s obvi-ously a lot more variation
in the non-coding regions. And then this pink region here are those things
that are basically functional elements predicted out of the non-coding that seem to have even
less variation than the coding regions. MS: Enhancers and stuff like that. DR. O’CONNOR: Potentially, yeah. I mean, we don’t know exactly. We didn’t break it down into that type of
a bin, but, yes, it could be some sort of function or it could be a microbe or an A
that’s just bee functionally conserved over millions of years. Alright, so in this initial freeze we have
about 15 cohorts that come from across the country, each with hundreds of samples in
them. We’ve got some interesting dif-ferent cohorts,
mostly African, Latino and European. But we also have Samoan and Amish cohorts
in there. And like I said they’re well characterized
for heart, lung and blood. And one of the things that we want to look
at is the relationship between these different cohorts. You can ignore most of this what’s called
a circos plot and I want you to look at this inner track here and the lines in between
them. And what this is is this is the amount of
times that in 10,000 individuals or 20,000 chromosomes this variant is seen either two
to 100 times. So these are very rare variants. And this is the amount of variation that is
shared amongst the different cohorts. Over here we’ve got our African cohorts,
that kind of form a strong cluster and over here we have some different European cohorts. Now looked at it a slightly differ-ent. So this is that same data. What we can start to see is that there’s
basically three main clusters that come out. This first cluster up here is our European
cohorts. This cluster down here is our Af-rican cohorts. And then the one in the middle is our Latino
cohorts. Now this is all kind of expected. These are pretty diverged populations from
one another. We kind of expect this. But what we didn’t expect, at least in our
lab, maybe others would have predicted it better than we did, but we didn’t expect
to have this much higher intensity of Af-rican sharing than between different European cohorts. And I should say that this has all been corrected
for things like heterozygosity and sample size. So the under-lying genetic variation is not
what’s explaining this. And so it kind of raised the question of well,
is there some level of diversity that’s going on within our European cohorts that’s
not really seen in our African cohorts? And so we used a tool called fineStructure. This is similar to the admixture that we kind
of talked about with the genome-wide estimates of admixture. But here we take kind of a much more focused
analysis; we know kind of what our source popu-lations are, we’re learning from them
what their characteristics are and then we apply it to a population as a whole. And so what you can see… So this right here I’ve just pulled out
four different co-horts, two European cohorts, an African American cohort from San Francisco
and an Afro Barbadian cohort from Barbados. And what we can see is that in this case,
this African American cohort from San Francisco has a little bit of this Han Chinese an-cestry,
which is different than all of the other African American cohorts, at least in this study. What turns out is it’s actually consistent
with this forced migration or this use of kind of Chinese immigrants to help build the
West, right, to build the railroads and all of these things. And as a result of that there’s been a subtle
mixture between this African American community and this Han Chinese community. Very small por-tion. But very interesting. All of the other African American cohorts
don’t have that, they don’t have this piece but they’d otherwise look very similar
to this population. Now our Afro Barbadians they’re also quite
distinct. They have a little bit in two ways, one is
they have a lit-tle bit more of this African component from this group called the Aruba
that live in Nigeria and surrounding areas. And so they have a little bit more there. And then they also have a little bit of this
population here, which is a Spanish popu-lation, okay, so they have a little bit of a component
of Spanish ancestry that’s also not seen in other African American populations. In general African Americans come from France
and Great Britain and kind of Western Europe. So there’s some subtle difference here. But then when we start looking at our Eu-ropeans
we also start to see some of the same things. So a lot of them are the same, but in this
particular case this cohort had a lot more Eastern European than this other cohort. So they had, even though it was coming from
Europe there’s this sub-tle diversity difference that we might see amongst these different
individuals. And you can kind of, we can kind of imagine
that, right, you know, you might expect a little bit more Irish if you’re coming from
Boston, if you’re coming from Pennsyl-vania you might expect a little bit more Dutch. As different parts of the country were populated
by different European groups those ancestral components still re-main today. But they’re very subtle and we have to look
at them in these kind of large data sets. So what we did to kind of pull this together
is we actually looked at the sharing, so on the x axis is basically how similar they are
in that FineStructure and on the y axis is how many rare variants they’re sharing. And you can see that there’s kind of this
negative correlation between the amount of rare variants they’re sharing and how different
they were in that FineStructure. This again is that Barbadian cohort. And so here it’s kind of an interesting
outlier in that they share a lot of the rare variants, but again they have kind of a unique
Fine-Structure, and so we’re able to pull that out. The other piece to this that starts to get
really interesting is when we start to think about functional variation. So in this case what I mean by functional
is things like stalk (ph.) codons and missed sense (ph.) variants, okay, things that are
going to functionally change the protein or potentially change the protein. And so that’s, when we start looking at
that and how those variants are distributed, compared to something that’s found in the
random part of the genome or intergen-ically we want to know, well, is one going to be
more likely to be a cohort specific variant than another? And what we find is, that’s what we’ve
measured by this delta, is the amount of co-hort specificity of these different classes of
variants. And so what we find is that these things that
are far more likely to be functional, something that you would want to follow up in the lab,
create a mouse model behind, they’re also more likely to be cohort specific. And so if you’re designing an epidemiological
study and you have a limited amount of money, you say, well, look, I’ve got all of my
hundreds of cases of prostate cancer and they’re all coming from Boston, I’ll use my controls
from Chicago, because Eu-ropeans are all the same, but this is saying is that those variants
that you’re most likely to want to follow up on are also the ones that are most likely
to be just due to ancestry. And so we need to think carefully about how
we design with these kinds of studies, because if that’s the case there’s going to be
a lot of things that we’re going to fol-low up on that are just, they’re not going to
be biologically, at least in that way, functional towards whatever disease we’re looking at,
they’re just not going to be that interesting. Alright, so in summary the different European
and African cohorts have distinct fine-scale ancestry contributions. This is really apparent when we start looking
at rare variations. And finally these functionally important variants
that we’re most likely to follow up on they seem opportunity be cohort specific. They seem to be kind of really ancestry specific,
even correcting for things like allele frequency. Alright, the second project that also is,
this one just gets me really excited, because it’s so much fun, is looking at the Peruvian
Genome Project. And this was done in collaboration with Dr.
Henrik Gulo, from the Peruvian INS or Instituto Nacional de Salud or National Institutes of
Health. So this is their version of National Institutes
of Health. They do a lot of really cool studies. And one of which of we were able to connect
with them on is they’ve actually started sampling a lot of their native communities
and trying to create a bio bank and to ethically sample these individuals in a way that we
can do things and preserve the genetic variation that’s going to be im-portant for understanding
their genomic health in the future. And so we’ve sequenced 150 individuals from
these populations and they represent different parts of the country. In general we’ve broken them down into kind
of an Amazonian region, the High Andes and the Coast. And then we genotyped some additional ones
as well. So in essence we have about 215 samples that
we’re going to look at today. So one of the things that’s really cool
about Peru is that it’s diverse. So this is kind of a cross sectional look
at Peru. And so we have communities that live down
on the coast. These are some of my collaborators that work
in Lima. These are some individuals that live in the
High Andes. And when I speak about high altitude, if you’ve
ever been to Denver it’s twice what it is in Denver. It’s very hard to breathe up there. And this is actually a community that we went
to and visited where one of my friends who’s a neuro geneticist offered to take some consults
and immedi-ately you had individuals that wanted to partake in that. So there’s a lot of isolated communities
that don’t have equal access to health care. And so it’s important for that reason as
well. And then even more remotely we have some populations
from the Amazon. In this case Dr. Gulo actually had to travel
for multiple days by plane and boat and walking to get to these populations that are just
incredibly remote and are as a result of that isolation they have a high propen-sity for
kind of genetic diseases. And so there actually is, it’s important
that we start to sample these populations as well. Can’t talk about Peru without putting a
picture of Machu Picchu up there. But I ac-tually put this up for a couple of
different reasons. One, so Machu Picchu is part of this kind
of Native American history that they have. And what we’re going to find throughout
this study is that policies from things like the Incan Empire have actually affected the
genetics of modern day Peruvians, and in different ways. And so we’ve been able to start teasing
those things apart. But it’s interesting that our history including
very recent history, can really dramatically impact what kind of genetic variation we’re
going to find. So you guys are all now familiar with this
kind of a plot. This is another admixture plot. Colors are different a little bit. Here our African cohort is represented by
this kind of bright green and our darker green is our European cohort, our European representatives. This top line is coming from a project called
the 1,000 Genomes Project that man of you I’m sure are aware of. And they did a really good job of sampling
globally. But where they weren’t able to do as well
or where we were hoping to be able to improve a little bit is within their American cohorts. So these are their four Ameri-can cohorts. And you can see that most of their cohorts
are predominately Europe-an. So this is a Mexican cohort, a Puerto Rican
cohort, a Columbian cohort. These purplish colors, purple and pink colors,
this is amounts of different areas of Native American ancestry. So as we start to see there’s, this Columbian
cohort has more than this Puerto Rican cohort in terms of Native American ancestry. Now this final one here, this is a group that
they sampled from Lima. And what you can see is that they have, out
of all of the cohorts in 1,000 Genomes, they have the most Native American ancestry. This is sequence data, I should say. This is based on a genome-wide array and these
are a bunch of Native American populations. And so you can see there’s a lot more Native
American ancestry amongst these different populations. But mostly it’s this kind of lighter pink
and this purplish color. So this is basically Central America. These are some isolated Amazonian resilient
populations. But what we really didn’t have is kind of
a combination of the two, right. So we have array data with a lot of Native
Americans. We have sequenced data with very few Native
American haplotypes. And what we were able to do as part of the
Peru-vian Genome Project is combine the two. So we have, most of our individuals have over
90 percent of their genome is of Native American ancestry. And it’s high quali-ty, whole genome sequenced
data. And so our hope is that this will be a good
reference panel for those that are work-ing in both Latino and Native American communities
in terms of identifying the var-iation that’s found in those healthy populations. I should also specify that this pa-per, it’s
a preprint that we have on Bio Archive and this link here will take you to that preprint. So going back to this idea of the amount of
variation that we can find in any given individual’s genome. Now we’re looking at heterozygosity and
so as we talked about before it’s kind of a surrogate for the amount of variation you’re
going to find in their genome. And up here we’ve got our Africans that
have the most, our Euro-peans have less, Asians have even less and our Mestizos or Latino
communities, which are here in this kind of bluish colors and then this red one from 1,000
Ge-nomes, these ones have about the same as kind of our Asian cohort. But when we start to look at our Native American
cohorts down here in these pur-ples and pinks they actually have a lot less variation. Now partly this is due to things like inbreeding,
but it’s also due to just the fact that these guys are so far removed and so isolated
from the original kind of source of population in Africa that they’ve just lost a lot of
genetic variation. One of the things from kind of a more anthropology
type of question that we want-ed to ask was when did these different groups kind of form
and how did they split off from one another? And so we do what’s called a phylogeny or
a tree based ap-proach. And what we can find is that we have a cluster
up here of Amazonian popu-lations, we have a cluster down here, just a couple of coastal
populations, and then in the middle here we have some Andean populations. And using some different types of approaches
we actually were able to estimate the divergence of these three groups at about 11 to 12,000
years ago. So if we return back to that Out of Africa
model, I told you that people came into the New World about 16,000 years ago. And what’s so interesting about Native American
communities is that essentially between 16,000 and 14,000 years ago, there’s an archeological
site in Chile from 14,000 years, they populated both North and South America. It was incredibly rapid. And as a result of that rapidness each of
these populations are quite diverged from one another, right? So a Native American coming from the Navajo
is going to look 12,000 years diverged from an individual coming from Mexico or South
America. So the Native American components are going
to be these kind of very strict kind of discreet, it’s not really discreet, but it’s going
to be much more discreet than say a British individual compared to an German individual
or even a Finnish individual where they’ve had a severe bottleneck. So there’s these quite old and deep divergences
amongst the different Native communities. And so again from a personalized genomic standpoint
it’s important that we start to access and evaluate these kinds of genetic variations
in a lot of di-verse Native American populations so that we have access and know what is healthy
variation versus what might be disease causing. One of the other things we can look at is
migration patterns. Now in this case we’re looking at subtlety
old migration patterns. And what we can see here on the right is in
the bluish colors are regions where there was high migration. In the browning regions are regions of low
migration. And what you can see is that the Andes Moun-tains
have essentially created this block to gene flow, which is pretty interesting as it’s
kind of confirmatory to what we were expecting. What we’ve done with some other studies
is we actually show that there seems to be more migration down the mountains than there
is up the mountains. And that can be both for culture and biological
reasons. But those are still things that we’re trying
to soss out. On the left here, what you can see is the
diversity. So regions with kind of these purplish colors,
so like along the coast are more diverse, whereas some of these iso-lated populations
out here in the Amazon are actually, they’re just not very diverse. And that goes along with the amount of heterogeneity
that we were looking at be-fore. And so we need to think about basically this
impacts the distribution of varia-tion across Peru. So now to use a separate tool, so this is
something called principal component anal-ysis. We used this in a wide arrange of different
types of studies. I have colleagues that work on proteomics
and they use principal component analysis. It’s a great tool that allows you to take
millions of variables or genetic variance and summarize it into kind of key axis of
variation. In population genetics or with genetic data
this tends to get dominated by kind of biogeography or where these individuals came from in the
globe. And so we can see here our African cluster
comes out here, so this creates kind of an axis of African variation. Our European cohort is down here, so that’s
kind of, not the opposite of, that’s not the right word, but it’s another extreme
of variation. And then up here we have our Peruvians with
a different kind of sampling and ancestry component. And then what we find is that Latinos and
Mestizos are some combination of these three. So if African Americans came out here, these
are what’s going on here with our Latino or Mestizos populations. So what we can do is we can take this idea
of local ancestry or breaking down each individual’s genomes into its constituent parts of ancestry
and we can combine it with a principal component analysis and what we can do is kind of a targeted
look at their ancestry. So in the case here if we extract just the
European segments of their genome, only looking at the European component we can see how they
clus-ter amongst other European populations. And the same with African and Native Americans. And what this does is this allows us to kind
of get an admixture free view of these individuals. So they’re part European, part African,
part Native American, but where is their Native American coming from? Where is their African coming from? Where is their European coming from? We can start to get into a little bit more
detail. And so what we start off here is that amongst
our samples in Europe. So we’ve ex-tracted now those European components
of their genome and look amongst other European populations. This is the same PCA on both sides. Here we colored those individuals that come
from Europe. And you can kind of see a little bit of a
map. So this is northern Europe, this is the Iberian
Peninsula, this is into southern Europe and so forth. These are the Bosque’s (ph.), which are
kind of an isolated population. And then over here in the same analysis that
were grey here, we’ve now colored our samples. And you can see that for the most part they
come out in the Iberian Peninsula. And that’s kind of what we expect from history. The similar down in Af-rica, most of our samples
are coming out from a West African ancestry. And what we know from the slave trade is that’s
where most of our African ancestry, individuals in the New World came from. They came from Nigeria and kind of Sierra
Leone, and all those area on the west coast of Africa. When we do this in our Native American context
what we actually discover is we’re able to recapitulate essential a map of Peru. And so what we see here on the top is kind
of northern Peru, we see the Amazonian populations out here, we see some of our Coast and Andean
populations here, and the further south you go the more south you’re go to be with this
PCA. And so these work as a very good understanding
of kind of biogeography. Now what’s interesting about these tools
is there’s a few things that will break them from this assumption of biogeography. One of which is migration, right? So we’ve tried to eliminate that a little
bit. But if you were to put me on a PCA of Europe
I would not come out, you’d expect by my last name I’d come out by Ireland that I
might come more towards that, but I have ancestry from other parts of Europe as well. And so that mixture in me makes me kind of
come out in a different place. And so it’s cool that we can start to think
about the individual’s ancestry but we can also think about it in terms of where they
might not fit in terms of these kind of bi-ogeographic signals. But what this did do for us is confirm that
a lot of our samples are kind of geo referenced correctly that they’re coming from the places
where we sample them from. Another important methodology that we’ve
kind of developed over the course of, and by developed I mean others have developed
and I’m glad I get to use, is some-thing called identity by descent. And so what this does is it looks genome wide
at regions of the genome that two individuals might share in common. So in this case we have a pedigree, right,
we have a grandparent up here that has this red chromosome and they passed it on in part
to both of their kids. And then those cousins now share this small
little rectangle of ancestry from that one grand-parent. Okay? And so what we can do is, what we know is
that recombination will slowly break these segments down. But what’s nice about that is that it means
that if we can ba-sically, the length of these segments is going to be inversely correlated
with how recent their common ancestor was. And so we can actually start to tease apart
how these individuals related to each other over time, which is really pretty cool. Sorry, this gets me all giddy because it’s
super fun. But anyway, so what we can see here is we’ve
done that, we’ve broken it down into varying segment lengths and we can see that the relationship
amongst our different Peruvian individuals suddenly changes over time. In this group we see that there’s kind of,
you know, it’s just a general cluster. Here we start to see that the Andean region
is towards the center. Here the center of influence tends to move
away from that. And then here in the most recent timeframe
we actually see that our Native American communities become more and more isolated. Now what’s even cooler is we can use some
of these approximations and based on the length of these segments to actually put dates on
things. So this is before the Incan Empire, this is
during the Incan Empire, this is during Spanish Colonial rule and this is after Peruvian independence. And so there are a few different things and
some alternative methodologies that we’ve used to kind of confirm this, but what we
find here is that this kind of struc-ture is consistent with a process called MITMAS
(ph.). So when the Incan Empire ruled and they had
these groups of individuals that were causing problems, they would move them into an area
that as not causing problems. And if they had a group of individuals that
was not causing problems then they would move them into an area that was causing problems. And in that way they would be able to kind
of suddenly tamp down any kind of struggles that they might have from the people that
they were conquering. Later, in the Spanish rule, they did something
similar, but different. And in this case it’s called Re-doxi-sonayas
(ph.). And what they did is they said, well, everybody
needs to live like we do in Europe, they need to have a town, they need to have a post office,
they need to have kind of a central square and so we’re going to take these kind of
isolated villages and shove them all and make a big town. And in addition to that the center of rule
moved from Kusco (ph.) or the high Andes into Lima. And so we start to see that subtle pattern
as well. And then here, after independence these communities
started to separate off and they started to basi-cally become isolated. And what you’ll notice, the blue, which
are our Latinos, they tend to become less connected to these other populations. And so our Latino communities were essential
formed early on with these different policies of MIT-MAS and Re-doxi-sonayas that eventually
led into modern day Latino populations. So as kind of a summary of the Peruvian Genome
Project we talked about how the three areas were kind of colonized in about 12,000 years
ago. In the course of even before the Inca and
during the Inca there seemed to be this Mestizos, I’m going to call them Mestizos at this point,
because Latino doesn’t make sense, but there was this mixing amongst Native American components
that they had a mix of different parts, Native Americans from different parts of the country. What’s really cool was a subsequent analysis
we show that the European compo-nents that they all inherited most recently didn’t
actually happen until after Peru-vian independence or at least predominately the bulk of it didn’t
happen until after Peruvian independence. And it happened with these groups that were
already cosmopolitan. So these groups that basically had been pulled
from their native communities and kind of meant to assimilate under the Incan, under
the Spanish, later also assimilated and received kind of gene flow from the Spanish. So now we’re returning back to this picture
that we showed earlier on. So this comes from a consortium, looking at
asthma. The goal of this project was a couple, there
were a couple of goals. One of which was to develop a genome array
that was specific to non-Europeans, since this is the African ancestry individuals. And, of course, to look for asthma. But as the population genetics I said I can
do something with this data, and so I did something slightly different. And so what we did is, well, first off we
looked at things like fineStructure. So this is again another IBD network. And I brought this up because I wanted to
kind of reiterate this idea that just because we have a label doesn’t mean it’s homogeneous. And what you can see here is actually two
different cohorts. One from San Francisco and one from Baltimore. And if they’re linked together it means
that they share a certain amount of IBD together. And what you can see is that there’s actually
these clusters that even in the course of the last 300 or 400 years African Americans
have started to diverge amongst the different populations across the U.S. And so we see this cluster here from individuals
of San Francisco. We see this clus-ter here of individuals from
Baltimore. And again another cluster here in Baltimore
and D.C. And so your grandparents matter, where your
grandparents lived matters and will affect kind of the distribution of genetic variation
that you have. So the other thing that we wanted to look
is what’s called a variant prioritization problem. So if you’re given kind of this big pool
of genetic variation and you’re try-ing to figure out what’s causing the disease,
right. You want to know in the case of this, this
was inspired by a project where we were looking at a NICU. So we were getting sick babies that were coming
in and we wanted to sort through all of the variation that they had to see if we could
find the variant that might explain what’s causing their problems. Okay? And so everybody, this was kind of a trope,
but I like it, so it works, but it’s this idea of looking for this really small needle
in this really large haystack. Now if we layer on top of it the complexities
that we’ve been talking about today in terms of ances-try it starts to suddenly change
how we might approach this problem. This is just kind of a figure to just kind
of look at, if we have this European cohort here, they might have this many variants that
we have to sort through that are kind of clinically of interest. We may have done some filtering in other things. And that gives us, you know, this amount of
stuff that we have to look through. But as we’ve been talking about African
ancestry individuals have more variation than European individuals. So they’re starting off with this larger
haystack. But how we define, but the second thing that’s
going on is how do we define clinically im-portant variation. But we do it based on our prior knowledge. We do it based on things that we’ve found
previously. And it’s something like 80 or 90 percent
of all GWAS (ph.) have been done in Euro-peans. Almost all clinical studies have been done
in Europeans. And so all of these kind of known clinical
variants that we have in things like ClinVar’s, which is a great database. They’re going to be biased against variation
we might find in an African individual. There are things that are going to be explaining
the disease in a European family, but they’re going to be rare, and so they’re probably
going to be likely to be unique to that population, whereas the Africans, they might have just
as much disease or less disease or more disease or whatever, but the specific causal variance
is also likely to be rare and population specific, so we probably don’t know about it yet. And so what we found is we wanted to look
at how these kinds of biases exist over time or might play a part in how we might sort
through clinical variation in non-European populations. So we took this cohort of about 950 African
ancestry indi-viduals and we compared it against Clin-Var’s what they’re calling pathogenic. This study ended about 2015, but what you
can see here on the x axis is the month-ly freezes that Clin-Var has made. And so they’ve made a lot of progress over
the years and are increasing dramatically in the amount of variation and I’m sure the
number has continued to rise. Okay? When we start to look at, let’s take it
a half a step back, and we do something like this, where we correlate the amount of African
ancestry and the amount of Clin-Var variance in this case that these individuals have,
we can get a correlation. So in this case the correlation’s really
tight, it’s .99. What we’ve done here is we’ve mapped that
correlation over time. And what you can see is that they’re kind
of correlated up here, then there’s this dramatic statistically significant in both
directions drop in the course of a month and then there’s been a subtle climb kind of
back. But this seemed a little bit off to us. And so what we did is we started to filter
things. We started to use these kind of insilico (ph.)
predictors of function. We started to approach this more like kind
of the Baylor when they have applied these metrics to kind of a clinical genome. And from that we started to basically remove
a lot of this rare variation that’s out there and we can see there’s this strong
negative correlation with African ancestry over time. Now there seems to be this kind of uptick
here in this same month. Now this is kind of rumored, so it’s word
of mouth. My understanding is that in the course of
this month’s change they dumped a whole bunch of African ancestry brca mutations into
Clin-Var. And that has subsequently been kind of curated
and now it’s made a much smaller contribution to modern Clin-Var. And so what the composition of our genetic
databases look like is greatly going to affect the outcome of our ability to look at variation
in non-European individuals, right? So the poll ways that we kind of get from
this is that one, that Africans are going to have a larger haystack to begin with, but
they’re going to have a biased sampling if we start to restrict it to what we’ve
learned before, which means we’re going to cost more and our ability to diagnose is
going to be less efficient, which is just not okay. Now that said I think as a scientific community
we are starting to address these is-sues. For example, TOPMed (ph.) where more than
50 percent of the samples are non-European origin. I know in clinics and in other sequencing
studies in genome-wide association studies, people are starting to include these non-European
co-horts. And what I’m arguing and I think others in
the population genetics field are arguing for is a broader inclusion of multi-ethnic
and multi-ancestry individuals. MS: So is it at all possible to design a clinical
trial using a cohort of say Europeans and make predictions from the ancestral genes
in that cohort as to the outcomes in a non-European cohort population. I mean that would be what you really want
to do, because these are really hard. Those goals are almost unachievable. DR. O’CONNOR: I disagree that they’re unachievable. I do think they’re lofty, but I disagree
that they’re unachievable. Now one of the things that we’re kind of
gloss-ing over a little bit is if you find a gene that’s causing a disease, right,
and it got knocked out for whatever reason in a European cohort, it’s like to have
the same function if it’s knocked out in an African cohort. But the variation that’s knocking it out
will probably be different. And so we need to think about basically the
translatability of some of these things over to different populations. But you’re absolutely right. MS: Then you have to start thinking about
selection pressure and then you have to start thinking about selection pressure and then
it’s like a whole different variable you have to layer on top of that. DR. O’CONNOR: That’s true. Their environmental history or whatever and
that gets really complicated. MS: The other assumptions that can be missed. If you’re talking about drug metabolism
or something like this, it’s not a single mutation or a single area. DR. O’CONNOR: Actually in drug metabolism it
tends to be more single variants, sin-gle gene. MS: But that’s proven not necessarily (unint.). DR. O’CONNOR: Let me give you an example of
I guess what I’m arguing for here. So one example of a drug is Cal-klita-gral
(ph.), which is Plavic’s (ph.) I believe it is. MS: Yeah. Good example. DR. O’CONNOR: And there the variant is very
high impact, because it’s a pro drug. It has to be metabolized within the body to
be able to turn into something that can be helpful to the individual to stop … it’s
an anti-coagulate. In Europeans that var-ies, in Africans it
varies. In Oceanic populations they are almost near
fixed for not being able to metabolize this drug. And so when the company that was behind this
required it to be the Medicaid drug of choice Hawaii had some problems, because now they
had people being treated with the drug that was essentially being washed out of their
system. And so we do need to, I guess that’s what
I’m arguing for. Now if they had a different variant that caused
the same variation in an oceanic population absolutely. I don’t think the bi-ology is fundamentally
different. One of the things, I’m highlighting differences
in this, but it is a little bit of a mis-nomer in that we’re 99.99 percent the same. We’re three to five million sites in a three
billion positioned genome. It’s just going to be … it’s very subtle,
but that subtlety arises from biology. And that biology can be different. And so what I guess I’m arguing for is the
broader inclusion of these other popula-tions to see if, hey, maybe it is the same. But it’s also very possible that it’s
differ-ent. And that kind of brings me back to kind of
what I’m saying is we’ve got, we can’t use these homogenous labels. We can’t say Black or White or European
or Af-rican. There are subtle variation with them, but
on the European and the African Ameri-can and especially on the Native American side. There are these subtle variants that are going
to be different and population specific and we need to think about those when we’re
trying to find the biology behind these connections with disease. These population genetic tools help us to
start delineating some of that and help us start to zoom in on that. And then finally we need to think about how
we might expand our current clinical evaluations and cohorts like we’ve been talking about
so that they’re less Eurocentric. Just some quick acknowledgements. Most of this work has been done by my stu-dents
who are phenomenal. But it’s also been done in the context of
large collabo-rations. And these are some of the kind of key names. And this is from this Capa (ph.) cohort, this
African ancestry cohort. And again these are a broad group of indi-viduals
that have been, it’s been really fun to work with. And then finally this is just a list of the
PIs for TOPMed. This cohort represents something like 100
or 200 investigators. And it’s been a phenomenal group to work
with them as well, to learn from and to interact with them. So with that I’m happy to take any questions
you guys might have. FS: So right now we’ll open it up to questions
from the audience here in person. For those of you in person if you could please
state your name before asking a question. And for our online audience, you can submit
your questions using the question and answer feature on the right side of the screen. Just type in your question, select the option,
to send it to all panelists and hit submit. FS: Great presentation. Thank you very much. Very clear. You helped understand… DR. O’CONNOR: Good. FS: There have been a couple of hints throughout
the presentation about the challeng-es of enrolling diverse participants in studies. And when you were talking about the Peruvian
Project you eluded to this being an approach that would enroll those par-ticipants ethically. And I think it’s a unique case study. I’d be interested if you’d be willing
to share a little more about the ethical review process, what you had to do to get approval
from not only the Peruvians, NIHRB and you should probably know that there’s a representative
in the room from that group as well, but also engage-ment with the indigenous communities
as well. DR. O’CONNOR: So I am not the force behind that. That was very much a lot of work by Hiner
(ph.), who’s the contact at NIH. But in essence my understanding was that he
went not only to the individuals for consent, he also went to the tribes and kind of the
varying levels of structure above that to make sure that it was okay to work with those
communities. And he did so through a process of including
linguists and other individuals to try and include, to make sure that the … there’s
been a lot of problems with this and especially in the United States in terms of inclusion
of Native American populations. And so one of my hopes in working with them
is that we’ve done our due diligence to try and include individuals at multiple levels
to be part of the process. FS: Do you know if that has been described
anywhere? DR. O’CONNOR: He is working on describing it,
yeah. I agree. I think it would be fan-tastic. So I know he’s planning on writing it up
and trying to publish is. FS: Thank you. DR. O’CONNOR: Yeah? FS: Question, what is in terms of the array
used? So you showed by, some kind of arrays and
sequencing. And I got the impression you’ve got some
more detailed infor-mation from this sequencing. But if you can elaborate? We are the investigators getting what advice
to use. Of course the genome-wide array gives the
regulator some information, but … DR. O’CONNOR: You can do local ancestry calls
without genome-wide or without se-quence data. You can do it with array data. What ends up happening though is … FS: (unint.) DR. O’CONNOR: For local ancestry you’re usually
okay. Now if you wanted to do a GWAS (ph.) I would say that you have to use an ethnically
(unint.). So as one of the results or one of the things
that came out of this Capa consortium is they were showing if they run the GWAS with kind
of traditional GWAS panels of variation, they miss a lot in African ancestry individuals. It really is pretty dominate in African ancestry
individuals. The way that these are rasor (ph.) designed
are trying to summarize genome-wide genetic variation. But they don’t do a good job of summarizing
it in African ancestry individuals. That said, I mean, it’s a first pass, it’s
okay. But one of the cool things that came out of,
so Kathleen Barnes and Russica Mathias (ph.) were the PIs of this Capa consortium. And one of the cool things that they did in
conjunction with alumina, what’s the other … the other consortium is the Page (ph.)
consortium and they worked with Amir Kenney (ph.) there to develop a new array that works
in non-European popula-tions and captures a lot more variation in non-European populations. So they worked with alumina. It’s now called the MEGA Chip, which I’m
going to get wrong, but I believe it sands for Multi Ethnic Genomic Array. And that I believe alumina has also taken
it a step further and kind of reduced the variation even fur-ther into something they’re
calling like an international array or a world array or something like that. And so I would recommend if you’re doing
a GWAS in non-European populations I would recommend using one of those platforms. In terms of local ancestry pretty much any
genome-wide estimate is going to be able to give you some pretty good accurate. Because you’re trying to capture something
that’s much larger. So you don’t need as fine a scale. FS: (unint.) DR. O’CONNOR: So there’s a few different things. We wouldn’t have been able to evaluate clinical
variation in these individuals if we had looked on non-sequenced data. Certain kinds of the demographic estimates
require sequenced data. Some of the tools work with both, some of
the tools only work with sequenced data. The sequenced data can be used in both. But sequenced data can only be used for some
of them, and so from that perspec-tive, and I think the third is just independent of what
we actually accomplished is now we have access to 150 predominately Native American individuals
with their whole genome sequence that we can compare to any cohort now. This being and other researchers and should
provide as a good resource to them for understanding natural variation. FS: Will you include that (unint.)? DR. O’CONNOR: I could talk to Gonzalo. I like Gonzalo. We get along well. If you no-tice he’s one of the people in
TOPMed. We’re still kind of early stages. This is small beans for Gonzalo. This is a small …
FS: Except that if it’s specific to that population as a resource it wouldn’t be
small beans. DR. O’CONNOR: Yeah, that’s a good point. So, yeah, that is something that we should
consider doing. I agree. FS: I mean, we thought about trying to follow
up with him and figure out sources of ethnically diverse sequence data that could be included
in his panel. DR. O’CONNOR: Right. I guess the reason why I’m saying it’s small
beans is because an amputation (ph.) server is now encompassing TOPMed. It’s starting to encompass the HRC? FS: Yeah. (unint.) DR. O’CONNOR: Which is also a very large consortium. And so it’s starting to include huge, you
know, 100,000 individuals, including 100 Native Americans and they might get washed out in
that large of a thing. But it might be really, you’re right, it
might be really critical for some individuals looking at Latino communities. FS: Yeah. DR. O’CONNOR: Yeah? FS: (unint.) DR. O’CONNOR: Do I know where they are? FS: Yes. Where they are. And
the second question is (unint.)? DR. O’CONNOR: So all of the samples are at the
Peruvian National Institute of Health, other than they sent me one (unint.) of DNA that
I got sequenced and now no long-er exists. All of the other samples, the blood, everything
is at the Peruvian National Institute of Health. As far as I know they know it’s a resource
that’s available. It’s something that the Peruvian National
Institute has been made very public and they’re very excited about having this research and
they are trying to make it avail-able to a wide range of investigators to have other
different questions, including this sequenced data. We’re in the process of transferring the
sequence data back down to the Peruvian NIH as well. So it is going to be a broad resource and
as far as I know that’s part of the consenting process. FS: So we have a couple of questions from
the remote participants. First, can you sug-gest any public databases
where we can find Exome data on Latino populations? DR. O’CONNOR: Where you can just download the
raw data it becomes a lot more difficult. But in general if you want to go look at genetic
variation in the exact data-base or its new version in which I believe it’s called NOME? These are large aggre-gation consortiums that
are run by Daniel McArthur out of the BRODE (ph.). And they have as a subset of that individuals
they call, I think they call them admixed Americans, which … they’re Latinos. And so in terms of allele frequencies and
what genetic variation is found in the Exome that is the source to go to. In terms of having something you can combine
with other data and into one, where you have individual level data that I know less about. FS: And one more question. Are commercially available ancestry prediction
kits accu-rate, like 23 and Me? DR. O’CONNOR: I was wondering if I was going
to get this question. If you had asked me six months ago or a year
ago I would have said no. I think they’re greatly im-proved. FS: Is that a result of the FDA? DR. O’CONNOR: So the FDA … well, okay, that’s
a lot more complicated piece. Thanks for making me talk about that. So 23 and Me was basically told to cease and
desist in terms of giving risk predictions, because they were not working in a CLEA (ph.)
facility. FS: Right. DR. O’CONNOR: Their ancestry stuff never stopped,
because the ancestry stuff doesn’t tell you about … FS: Health issues. DR. O’CONNOR: Yeah. FS: I just wanted to know if the increased
oversight made them more … DR. O’CONNOR: I don’t know. The other one that’s the big one in the
market is An-cestry DNA. And they’ve never had any sign, as far as
I know they’ve not had the same level of interaction with the FDA. I don’t work for them. But in both cases, at least I can speak mostly
from just because I’ve had the most experi-ence with their platform,
they’re starting to incorporate things like error (ph.) bars, which I think to me that
starts to alleviate some of my concerns, because some of the big things that they were doing,
I felt like they were overselling. They were saying oh, you’re this percent
from France, you’re this percent from Great Britain. And the models that they were using did not
allow you to do that. They were overfitting their data. And their accuracy, according to, at their
own presentations was about 50 percent. So they were getting it quite wrong. But now what they’ve done is they’ve started
to incorporate some error models and I think if you’re a scientist, if you’re smart
about it, you can start to look at it. I still have a little bit of concern towards
like my mom, who is not a scientist, how she might interpret, she might get the summary
figure and walk away and say oh, this is the end. And so, you know, you have to kind of be a
little bit cautious in terms of how you do it. If you start to see things like oh, look,
this says I’m two per-cent Ashkenazi Jewish or this says I’m two percent African. The error bar is going to overlap with zero. So it’s more likely that you actually have
zero if you didn’t … you know what I mean? You have to be very cautious about it. And so I would just say that as scien-tists
and as people are starting to use these kinds of things and as the general public starts
to become more informed, we just need to be cautiously optimistic about how we use their
results. MS: As far as what percentage and so on, how
certain are you? DR. O’CONNOR: So the difference is that we’ve
… the model that we use … okay, this is hard to explain without going into the details
of it, the model that we’re using is a supervised, that particular example is something called
supervise. The model that they’ve generally been using
in Ancestry DNA is an unsupervised, which means that they run it and fix the amount
of clusters or ancestries that they’re going to find. Now they’ve transitioned over to a more
supervised one where they say here’s our cluster, we know these individuals are from
Spain, we know these individuals are from Great Britain, and so on, then you’re going
to start to see some more accuracy. But I think the other thing you’ll have
to take and go back is if you notice how I tried to discuss it, tried to lump people,
some of those clusters together, right? So like the fact that we’re separating Great
Britain and France, I would call that Western Europe. That’s how I would interpret it in the paper. I would not say that they’re this much more
European, British and this much more French. But when it’s your own genome you kind of
care if it’s French or German. You want to know which genealogical records
you should be going through. So I think it’s all about how the data is
presented. But it’s a good point. Maybe I should change my slides. FS: So we’re going to go ahead and close
this portion of the event and move onto the discussion afterwards. So we’ll go ahead and turn off the WebEx. And for those of you who want to stay here
you’re more than welcome to stay. We’ll be here until three. DR. O’CONNOR: But before you turn off the WebEx,
my email is on there or if you look for me I’m happy to take questions by email. FS: Thank you very much. That was a great talk.

Add a Comment

Your email address will not be published. Required fields are marked *