Posts

Profiling memory usage of various String collections

Wanting to know the memory footprint and performance of different java options to keep simple string lookups in memory, I profiled different java.util collection classes that are filled with the same list of strings to see how much the memory usage differs. I then loaded the same data into 2 lucene in memory indices using lucenes RAMDirectory. I finally also evaluated an embedded file based & in memory H2 database. The data that has been loaded are 1.573.345 scientific name strings, the longest being about 150 characters. The original uncompressed text file is 31.6MB (zipped 8.1MB). To also test ID lookup in case of java.util.map or the KVP lucene index, the row number of each name has been used. The machine I used for testing was a MacPro 8-core 3GHz, 5GB RAM using Java6 with 2GB of memory (-Xmx2g) on Mac OSX 64bit. Here are the shortened results using System.currentTimeMillis() and JProfiler inspecting deep object copies in heap dumps (a seriously memory intensive thing too in s...

Deploying the portal web application

For building the web application The steps for building and deploying the portal web application are as follows: 1) Download the source code at: http://code.google.com/p/gbif-dataportal/source/checkout The modules needed are: portal-core portal-index portal-service portal-web For instructions on how to checkout this modules from the SVN to your machine, please see http://code.google.com/p/gbif-dataportal/source/checkout . 2) Once that the modules have been saved to your machine, you need to build them. There is a script on the portal-web module for automatically building all the project and downloading all the dependencies (libraries) from the repositories Script location: portal-web/first-build-all.sh For building the database 1) On the portal-core project, there is a file at db/portal.ddl that builds the initial structure for the index DB for the portal. mysql> create database portal; mysql -u [username] -p [database] 2) For populating the database with the minimum data required,...

GBIF Maven Repository

GBIF uses Maven to build projects, manage the dependencies and also to generate online java docs as part of a maven site. I would like to take the chance and introduce some basic maven features that we use at GBIF. Repository & Sites We host a maven repository that we use for keeping external not yet mavenized libraries and to deploy our own developments. All projects can deploy a maven site with java docs, test coverage, dependencies and the regular maven things. The subfolder will be named after the artifactId of the project, so make sure its unique within GBIF! Apache has more information on customizing a maven site per project. GBIF Parent POM All GBIF maven projects should make use of our shared parent POM that defines the repository and site URLs, the apache 2 licensing, other popular maven repositories and basic build rules. One of the most important settings in this mother pom is the groupId=org.gbif. We would like all GBIF projects to share the same groupId, which mea...