The Gene Wiki team has been hard at work filling wikidata with useful content about genes, diseases, and drugs using the new and improved ProteinBoxBot. Now, we are starting to see the fruits of this labor in the context of Wikipedia.
Infobox for ARF6, rendered entirely from content Wikidata |
The Gene Wiki project has programmatically created and maintained the infoboxes to the right of all the articles in Wikipedia about human genes since about 2008 [Huss 2008]. This process has entailed the construction of a unique template containing all of the relevant data for each gene. For example, here is the code for the template for the ARF6 gene. As Wikipedia previously had no database, that is where the data was stored. Altering that content programmatically involves parsing that template as a string. Its ugly (sorry Jon) and there are more than 11,000 of these templates to maintain (one per gene in Wikipedia).
Now, the same data can be represented in Wikidata, a queriable, open graph of claims about the world backed by references and specified by qualifiers [Vrandečić 2014]. Now that the content needed to render the infobox is all there, we can convert 11,000+ complex templates that require string parsing to maintain to a single, re-usable template for all of them.
The first cut at the new template is {{infobox gene}}. If you put that on any article about a human gene, you ought to get the complete infobox for the article without any further ado. Poof! You can view it in action on this revision for ARF6. We haven’t rolled out the new template across all the articles yet, but hope to see that happen in the coming months. Remaining issues include: better error-handling in the template code, better ways to give users the ability to edit the associated data in wikidata, and updates to all of the code that produces gene wiki articles. If you want to help, chime in on the module:wikidata thread.
Trackbacks/Pingbacks