By now, you’ve probably seen the announcement that our renewal grant application has been approved. In addition to funding the improvement of MyGene.info, and the extension of lessons learned from MyVariant.info, the renewal grant will fund the development of a BioThings Software Development Kit (SDK). To our knowledge this BioThings SDK will be the first open-source bioinformatics software development kit (SDK) for building high performance web services. We WANT people to be able to build high performance web services for accessing important biological data so that everyone can get the most out of existing data.
Building the BioThings SDK
The development of the BioThings SDK will have three phases:
Phase 1- Abstraction–in this phase, we will extract the common codebase from MyGene and MyVariant to form the three core components (“databuild”, “web-API” and “cloud-deployment”) of the BioThings SDK.
Phase 2- Customization Tool Creation– in this phase, we will build customization mechanisms into the SDK. These mechanisms will include a project-specific configuration system, a scheduler (for harvesting data from data source specific parsers), and a some generic interfaces for adding customization into each of the three core components (“databuild”, “web-API” and “cloud-deployment”).
Phase 3- Test and Improve– in this phase, we test to ensure that the BioThings SDK works by converting the existing MyGene and MyVariant project code base to use the BioThings SDK. In converting the code base, we will identify issues with the SDK and areas of improvement. Once the SDK is in place, we may iteratively improve it either by converting codebase for other BioThings APIs (if they’re created prior to the completion of the SDK) or by creating new BioThings APIs for chemicals or diseases.
Improving Data Provenance of BioThings APIs
One of the advantages of using BioThings APIs like MyGene or MyVariant is that it’s always up-to-date. The BioThings team has taken care of the issues with data parsing and munging that can arise when individual resources are updated, so smoothly that users can sometimes be surprised/caught off guard when a data update changes their results.
To ensure that BioThings APIs handle data provenance well, we will build methods for data discrepancy and quality control; as well as, data update log recording and reporting directly into the BioThings SDK. Data provenance is important for reproducibility especially when working with continuously updated data resources, and the BioThings SDK will include methods for this important aspect of data management.
Improving Utility of BioThings APIs
By building the BioThings SDK, we invite others to create APIs like MyGene and MyVariant. APIs like MyGene and MyVariant make it easier for developers to create tools for utilizing the annotation data available via these APIs. This could potentially mean the growth of APIs like MyGene and MyVariant which creates a new and interesting quandary–how to know when data resource updates affect something of interest to you? To address this quandary, the BioThings team proposes developing a tool which is tentatively called BioReel which will be discussed in our next post.
Sneak Peek at Changes Coming to BioThings.io
In addition to overhauling the MyGene and MyVariant websites, the BioThings website will be getting a cosmetic upgrade as well. Here’s a little taste of what’s to come!