Posts

Showing posts from October, 2011

Integration tests with DBUnit

Database driven JUnit tests As part of our migration to a solid, general testing framework we are now using DbUnit for database integration tests of our database service layer with JUnit (on top of liquibase for the DDL). Creating a DbUnit test file As it can be painful to maintain a relational test dataset with many tables, I've decided to dump a small, existing Postgres database into the DbUnit XML structure, namely FlatXML . It turned out to be less simple as I had hoped for. First I've created a simple exporter script in Java that dumps the entire DB into XML. Simple. The first problem I've stumbled across was a column named "order" which caused a SQL exception. It turns out DbUnit needs to be configured for specific databases, so I've ended up using three configurations to both dump and read the files. Use Postgres specific types Double quote column and table names Enable case sensitive table & column names (now that we use quoted names, Postgres be...

GBIF Portal: Geographic interpretations

The new portal processing is about to go into production, and during testing I was drawing some metrics on the revised geographic interpretation.  It is a simple issue, but many records have coordinates that contradict the country that the record claims to be in.  Some illustrations of this were previously shared by Oliver . The challenge of this is two fold.  Firstly we see many variations in the country name  which needs to be interpreted.  Some examples for Argentina are given (there are 100s of variations per country): Argent. Argentina Argentiana N Argentina N. Argentina ARGENTINA ARGENTINIA ARGENTINNIA "ARGENTINIA" ""ARGENTINIA"" etc etc We have abstracted the parsing code into a separate Java library which makes use of basic algorithms and dictionary files to help interpret the results.  This library might be useful for other tools requiring similar interpretation, or data cleaning efforts, and will be maintained over time as it will be in use in ...

Group synergy

Image
During the last few weeks we have been intensively designing and implementing what would come to be the new data portal. Oliver described nicely the new stage our team has entered in his last blog post Portal v2 - There will be cake . As my personal opinion, I think this has been truly a group experience as we have decided to change our paradigm of working. Normally we would have worked on different components each one of us and later try to integrate everything, but now we took the approach of just focusing on one subcomponent, all of us, and driving our efforts into it. From my point of view, the main advantage of this is that we avoid the  Bus Factor  element, that we as a small group of developers, are quite exposed to. Communication has increased among our team as we are all on the same page now. As a general overview, the portal v2 will consist of different subcomponents (or sub-projects) that would need to interact between them to come up with a consolidated "view" for...