You might have noticed that MyVariant.info has not had a data update for quite for a while, until recently. No, we did not go out on vacation :-). In the past a few months, we have been busy refactoring our backend data aggregation and updating our code base. Check out the over a hundred commits that we’ve made in past months aimed at improving the efficiency and automation in our code if you’re interested.
The first new data release with this refactored data backend was just rolled out. Specifically, it went live last Thursday (03/23/2017). Many data sources have been updated to their latest versions for both GRCh37/hg19 and GRCh38/hg38 variants. No new data source has been added in this data release. Of course, all changes in this data release are backwards-compatible.
Data Sources Updated
Three popular data sources, ClinVar, dbSNP and dbNSFP data were updated to their latest (same version for both hg19 and hg38 assembly):
Some numbers for GRCh37/hg19 variants:
last release | new release | # of variants in last release |
# of variants in new release |
|
ClinVar | 2016-11 | 2017-03 | 166,681 | 262,061 |
dbSNP | 147 | 149 | 153,037,251 | 153,968,878 |
dbNSFP | 3.2a | 3.3a | 82,366,649 | 82,366,649 |
Similarly, some numbers for GRCh38/hg38 variants:
last release | new release | # of variants in last release |
# of variants in new release |
|
ClinVar | 2016-11 | 2017-03 | 166,889 | 262,254 |
dbSNP | 147 | 149 | 152,728,552 | 153,745,925 |
dbNSFP | 3.2a | 3.3a | 82,443,934 | 82,443,934 |
ClinVar, dbSNP and dbNSFP annotations are available under “clinvar” and “dbsnp“, and “dbnsfp” subfields, respectively, for each annotated variant. MyVariant.info aggregates annotations from ClinVar, dbSNP, dbNSFP and other 11 sources for each variant, so you can access them all in one request.
The total number of unique variants is now over 341M (341,289,677), compared to 340M (340,102,225) previously. More details about the variant data we provide from MyVariant.info are always available from our documentation. The programmatic access of this information is available from our metadata endpoint.
New field added for flagging observed variants:
-
observed
To provide the maximum coverage of variants, MyVariant.info includes annotations for both observed and theoretical variants. Theoretical variants can come from data sources like dbNSFP and CADD, which calculate the possible impacts of all theoretical variants based on the human genome. While most of other resources like dbSNP, ClinVar and ExAC provide annotations for only “real” (observed) variants.
We now added a new observed field to all “observed” variants, with a value of boolean “true” ({“observed”: true}). No observed field is added for a theoretical variant. Thus, you can easily filter for only “observed” variants:
http://myvariant.info/v1/query?q=observed:true
http://myvariant.info/v1/query?q=_exists_:observed (equivelent to the first query)
or combine with other query terms:
http://myvariant.info/v1/query?q=cadd.polyphen.cat:possibly_damaging AND _exists_:observed
http://myvariant.info/v1/query?q=dbnsfp.sift.pred:d AND observed:true
That’s all! And as always, feel free to reach us at helpmyvariantinfo (helpmyvariantinfo) or @myvariantinfo if you have any questions or feedback.