Posts

Showing posts from February, 2012

Performance Evaluation of HBase

Image
Update: See also followup posts: part 2 and part 3 . In the last post Lars talked about  setting up Ganglia  for monitoring our Hadoop and HBase installations.  That was in preparation for giving HBase a solid testing run to assess its suitability for hosting our index of occurrence records.  One of the important features in our new Data Portal will be the "Download" function that lets people download occurrences matching some search criteria and currently that process is a very manual and labour intensive one, so automating it will be a big help to us.  Using HBase it would be implemented as a full table scan, and that's why I've spent some time testing our scan performance. Anyone who has been down this road will probably have encountered the myriad opinions on what will improve performance (some of them conflicting) along with the seemingly endless parameters that can be tuned in a given cluster.  The overall result of that kind of research is: "Yo...

Monitoring Hadoop and HBase

Image
We're getting serious in our Hadoop adoption. The first process (our so called "rollover") is now in production and it uses Hadoop, Hive, Oozie and various other parts of the Hadoop ecosystem. Our next step is evaluating HBase and its performance on our (small and aging) cluster. To do that properly and to fix a rather embarrassing situation we first had to get proper monitoring up and running for our cluster. So far we've only had Cacti stats for OS level things (CPU, I/O, etc.) but we were missing actual Hadoop statistics. So we've now set up Ganglia at GBIF and the best news is it's public  and using the very latest Ganglia 3.3 which was released only a few days ago in February 2012. The setup was relatively painless. Ganglia was just nice to work with. To get monitoring of HBase working we had to apply HBASE-4854  because it's not included in our Hadoop distribution (CDH3u2). Thanks to Lars George for the hint. So we can happily report that Ganglia 3...