Getting started with DataCube on HBase
This tutorial blog provides a quick introduction to using DataCube , a Java based OLAP cube library with a pluggable storage engine open sourced by Urban Airship . In this tutorial, we make use of the inbuilt HBase storage engine. In a small database much of this would be trivial using aggregating functions (SUM(), COUNT() etc). As the volume grows, one often precalculates these metrics which brings it's own set of consistency challenges. As one outgrows a database, as GBIF are, we need to look for new mechanisms to manage these metrics. The features of DataCube that make this attractive to us are: A managable process to modify the cube structure A higher level API to develop against Ability to rebuild the cube with a single pass over the source data For this tutorial we will consider the source data as classical DarwinCore occurrence records, where each record represents the metadata associated with a species observation event, e.g.: ID, Kingdom, ScientificName, Country, IsoC...