Posts

Showing posts from July, 2011

Customizing the IPT

Image
One of my responsibilities as the Biodiversity Informatics Manager for Canadensys is to develop a data portal giving access to all the biodiversity information published by the participants of our network. A huge portion of this task can now be done with the GBIF Integrated Publishing Toolkit version 2 or IPT. The IPT allows to host biodiversity resources, manage their data and metadata, and register them with GBIF so they can appear on the GBIF data portal , which are all targets we want to achieve. Best of all, most management can be done by the collection managers themselves. I have tested the IPT thoroughly and I am convinced the GBIF development team has done an excellent job creating a stable tool I can trust. This post explains how I have customized our IPT installation to integrate it with our other Canadensys websites. Background Our Canadensys community portal is powered by WordPress (MySQL, PHP), while our data portal - which before the IPT installation only consisted of...

Working with Scientific Names

Dealing with scientific names is an important regular part of our work at GBIF. Scientific names are highly structured strings with a syntax governed by a nomenclatural code. Unfortunately there are different ones for botany , zoology , bacteria , virus and even cultivar names. When dealing with names we often do not know to which code or classification it belongs to, so we need to have a code agnostic representation as much as possible. GBIF came up with a structured representation which is a compromise focusing on the most common names, primarily the botanical and zoological names which are quite similar in its basic form. The ParsedName class Our ParsedName class provides us with the following core properties: genusOrAbove infraGeneric specificEpithet rankMarker infraSpecificEpithet authorship year bracketAuthorship bracketYear These allow us to represent regular names properly. For example Agalinis purpurea var. borealis (Berg.) Peterson 1987 is represented as genusOrAbove=Aga...

Are you the keymaster?

As I mentioned previously I'm starting work on evaluating HBase for our occurrence record needs.  In the last little while that has meant coming up with a key structure and/or schema that optimizes reads for one major use case of the GBIF data portal - a user request to download an entire record set, including raw records as well as interpreted.  The most common form of this request looks like "Give me all records for ", eg "Give me all records for Family Felidae". So far I'm concentrating more on the lookup and retrieval rather than writing or data storage optimization, so the schema I'm using is two column families, one for verbatim columns, one for interpreted (for a total of about 70 columns).  The question of which key to use for HTable's single indexed column is what we need to figure out.  For all these examples we assume we know the backbone taxonomy id of the taxon concept in question (ie Family Felidae is id 123456). Option 1 Key: nativ...