Performance Evaluation of HBase
Update: See also followup posts: part 2 and part 3 . In the last post Lars talked about setting up Ganglia for monitoring our Hadoop and HBase installations. That was in preparation for giving HBase a solid testing run to assess its suitability for hosting our index of occurrence records. One of the important features in our new Data Portal will be the "Download" function that lets people download occurrences matching some search criteria and currently that process is a very manual and labour intensive one, so automating it will be a big help to us. Using HBase it would be implemented as a full table scan, and that's why I've spent some time testing our scan performance. Anyone who has been down this road will probably have encountered the myriad opinions on what will improve performance (some of them conflicting) along with the seemingly endless parameters that can be tuned in a given cluster. The overall result of that kind of research is: "Yo...