Another fresh data release for MyVariant.info is out! As we mentioned in the last data release, a refactored backend for data aggregation and updating is now in-place, which streamlines our process of keeping variant annotations up-to-date. In this data release, we have updated the data from ClinVar, dbSNP and dbNSFP to their latest versions, and also added the variant annotations from UniProtKB. Here are more details.
Data Sources Updated
Three popular data sources, ClinVar, dbSNP and dbNSFP data were updated to their latest (same version for both hg19 and hg38 assembly):
Some numbers for GRCh37/hg19 variants:
last release | new release | # of variants in last release |
# of variants in new release |
|
ClinVar | 2017-03 | 2017-04 | 262,061 | 282,772 |
dbSNP | 149 | 150 | 153,968,878 | 238,894,687 |
dbNSFP | 3.3a | 3.4a | 82,366,649 | 82,366,524 |
Similarly, some numbers for GRCh38/hg38 variants:
last release | new release | # of variants in last release |
# of variants in new release |
|
ClinVar | 2016-11 | 2017-04 | 262,254 | 282,956 |
dbSNP | 149 | 150 | 153,745,925 | 335,499,682 |
dbNSFP | 3.3a | 3.4a | 82,443,934 | 82,443,748 |
ClinVar, dbSNP and dbNSFP annotations are available under “clinvar” and “dbsnp“, and “dbnsfp” subfields, respectively, for each annotated variant. MyVariant.info aggregates annotations from ClinVar, dbSNP, dbNSFP and other 12 sources for each variant, so you can access them all in one request.
A notable big change in this data release is the number of variants from dbSNP has increased significantly, from 154K in v149 to 335K in v150, almost doubled. The increased variants are mostly coming from the TopMed and HLI projects. You can see the full announcement here.
The total number of unique variants is now over 425M (424,515,266), compared to 341M (341,289,677) previously. More details about the variant data we provide from MyVariant.info are always available from our documentation. The programmatic access of this information is available from our metadata endpoint (and hg38 metadata).
New Data Sources Added
In this data release, we added variant annotations from UniProtKB, including the “humsavar.txt“: an index of manually curated human polymorphisms and disease mutations from UniProtKB/Swiss-Prot. You can access the data under “uniprot” field. And note that “uniprot” field is only available for hg38 variants. Here are a few query examples:
curl 'http://myvariant.info/v1/variant/chr5:g.97171338A%3EG?fields=uniprot&assembly=hg38'
curl 'http://myvariant.info/v1/variant/chr4:g.88121708T%3EC?fields=uniprot&assembly=hg38'
curl 'http://myvariant.info/v1/variant/chr15:g.51464769A%3EC?fields=uniprot&assembly=hg38'
You can also reference a variant using UniProt’s VAR id (also called ftid):
curl 'http://myvariant.info/v1/variant/VAR_042351?fields=uniprot&assembly=hg38'
That’s all! And as always, feel free to reach us at helpmyvariantinfo (helpmyvariantinfo) or @myvariantinfo if you have any questions or feedback.