As we hinted in our earlier post, we have now officially moved our gene query engine from Oracle to CouchDB. We quietly completed the migration last Friday, and so far, everything has run pretty smoothly. There were no significant visual changes in the user interface, so the only change that you may have noticed is significantly improved performance.
For those interested in some technical details, CouchDB is a document-oriented database. In contrast to relational databases like Oracle and mysql, CouchDB stores data in a key-to-document fashion. In our case, the “key” is the gene ID and the “document” is the entire gene annotation object. There are plenty of discussions online about document-oriented and relational databases, so I won’t repeat them here. We think CouchDB’s document model is better suited to BioGPS because gene annotation data are very heterogeneous in structure. Moreover, CouchDB allows us to optimize run-time performance by running a one-time expensive indexing step at load-time. For a read-only database like BioGPS, this tradeoff is one we will gladly make.
Although we specifically made this migration without concurrently adding new features, this new architecture definitely opens up many possibilities for new BioGPS functionality. So, stay tuned…
Ed note: Kudos to Chris Petersen of Assay Depot for first turning us on to CouchDB probably two years ago. We didn’t quite get it at the time, but the seed was planted…