NoSQL in the Wild

Mozilla Test Pilot Programme

With Firefox 4 entering Release Candidate Mozilla Labs embarked on a huge user testing programme for the new browser. Data is constantly collected from participants and consists of surveys, direct feedback as well as automatic monitoring data. The site gives daily sentiment reports and also trends by factors such as operation systems

Mozilla Labs wrote up the details of their evaluation process and performance testing. Their final choice was Riak and this probably represents the highest profile deployment of Riak to date.

Afghan War Logs

The leaking of the Afghan combat reports represents a significant milestone in crowdsourced data analysis (although the professional media services managed to provide better analysis and background).

The original data is basically like a paper form with consistent sections but variable length content. The match to document databases is immediately apparent and it is likely that the US government originally stored them in some kind of document store. It was interesting to note that no-one seemed to use a relation store to publish or analyse the reports.

The Guardian used Google App Engine which sit on BigTable, if you click through to the event log you can see the blocks of the underlying sections.

An independent CouchDb version has also been created.

One interesting point with this is that it was written with CouchApp, a framework that allows you to write, store and serve the app in CouchDb itself which makes the store an application server as well. Oracle eat your heart out! This unusual arrangement is possible because CouchDb is based on HTTP and therefore can treat HTML and Javascript files as it would any other document and serve them to a client. If that client is a web browser it can display the page and then make subsequent calls to retrieve other JSON documents.