Prompted by the recent publication of the BioGPS paper in Genome Biology, the folks at BioTechniques recently wrote a profile of BioGPS. It’s a very nice overview of BioGPS and its goals.
For those who are interested in more of the gory details, I’m pasting the entire email “interview” below. Enjoy…
Why did the research team decide to create BioGPS? Was it in support of any research or specific needs of Novartis?
BioGPS grew out of a need facing many organizations, both academic and commercial, who are doing genome-scale science. Say you do some sort of profiling experiment (microarray, next gen sequencing, etc.) and find that there are 10 candidate genes highlighted in your study. How do you learn what’s known about these genes? There are hundreds of public, gene-centric resources available, and many organizations also have internal databases with proprietary gene annotation information. The problem that BioGPS addresses is how to aggregate data from all these separate online databases into one application and interface.
Could you briefly describe what makes this technology different from other genomic organization platforms out there?
Development of BioGPS has focused on two relatively unique features. First, we emphasize the idea of “community extensibility”. BioGPS utilizes a very simple plugin interface that allows most gene-centric external databases to be easily included in our plugin library. Currently we have over 250 plugins registered by over 40 unique users, and we allow any registered user to add new plugins. Second, we emphasize the idea of “user customizability”. Most gene-centric databases tell the user what they think the user should know about their gene of interest. In contrast, BioGPS allows users to individually combine and arrange plugins into “layouts”, enabling each user to define for themselves what content they find most useful.
Why should researchers switch to your technology?
We won’t go so far as to say BioGPS is a replacement for any other tool that’s out there. Rather, we view BioGPS has a new tool that is complementary to existing resources. But we definitely feel that the design principles behind BioGPS and the emphasis on “community intelligence” are unique among gene annotation databases.
Do you think in the future such databases will follow a model similar to yours? Why? What is inadequate about the other technology currently in use?
The real difference between BioGPS and other mechanisms of aggregating data is in the degree of data structure. For example, approaches like Semantic Web and Distributed Annotation System (DAS) are based on a very structured data exchange format — every piece of data that’s transmitted is “typed”. The advantage is that consumers of those data can do very powerful analyses based on combining data from multiple sources. However, relatively speaking, it requires quite a bit of programming sophistication to become a DAS or Semantic Web data provider.
In contrast, BioGPS utilizes a completely unstructured data format based on HTML, the language of the web. The advantage is that it’s very easy to become a data provider because it’s very easy to create web pages. Again, as evidence, we have over 250 plugins registered by over 40 unique users and spanning over 100 unique domain names, and the developers behind those plugins range from the big annotation authorities (NCBI, Ensembl, UniProt, etc.) all the way down to the single postdoc or grad student developing a simple web site using perl or python. Although we’re somewhat more limited in how we can integrate data from multiple sources, the BioGPS model focuses on attracting as large a developer community as possible to contribute to a single gene annotation platform.
Does Novartis have any other genetic-based technology in development related to organization or communication capabilities?
My group has also been involved in developing the Gene Wiki, the goal of which is to collaboratively annotate the function of all human genes. In short, BioGPS applies the principle of community intelligence to the development of a gene annotation database, while the Gene Wiki applies community intelligence to direct annotation of gene function.
How does BioGPS address community intelligence?
The defining characteristic of community intelligence is that each user benefits from the aggregated usage patterns of the entire community of users. Let’s take a concrete example. Suppose I want to find all resources that have information on SNPs for my favorite gene. In the pre-BioGPS days, you would probably start with a web search and spend quite a bit of time clicking on links and browsing candidate web sites to see which had that information. Undoubtedly, that process had been repeated before and will be repeated again by other scientists elsewhere asking that same question.
Using BioGPS, you would instead simply search the plugin library for the keyword “SNP” to find all the relevant gene-centric resources that other BioGPS users have registered. Searching on a gene in BioGPS will take you directly to the relevant page for each plugin. Moreover, the list of SNP plugins is ranked by popularity, so you can see which specific SNP plugin is most used by other users in the BioGPS community.
What inspired you to name the technology BioGPS?
The meaning of the name has changed several times. Currently, we’re telling people that BioGPS is a tool to “navigate the landscape of gene annotation resources”. Even though it wasn’t the original meaning, I think it’s a pretty good analogy for why people use BioGPS.