Posts

Portal v2 - There will be cake

The current GBIF data portal was started in 2007 to provide access to the network's biodiversity data - at the time that meant a federated search across 220 providers and 76 million occurrence records. While that approach has served us well over the years, there are many features that have been requested for the portal that weren't addressable in the current architecture. Combined with the fact that we're now well over 300 million occurrence records, with millions of new taxonomic records to boot, it becomes clear that a new portal is needed. After a long consultation process with the wider community the initial requirements of a new portal have been determined, and I'm pleased to report that work has officially started on its design and development. For the last 6 months or so the development team has been working on improving our rollover process, registry improvements, IPT development, and disparate other tasks. The new portal marks an important milestone in ou...

VertNet and the GBIF Integrated Publishing Toolkit

Image
(A guest post from our friends at VertNet, cross-posted from the VertNet blog ) This week we’d like to discuss the current and future roles of the GBIF Integrated Publishing Toolkit (IPT) in VertNet. IPT is a Java-based web application that allows a user to publish and share biodiversity data sets from a server. Here are some of the things IPT can do: Create Darwin Core Archives. In our post about data publishing last week, we wrote about Darwin Core being the “language of choice” for VertNet. IPT allows publishers to create Darwin Core data records from either files or databases and to export them in zipped archive files that contain exactly what is needed by VertNet for uploading. Make data available for efficient indexing by GBIF. VertNet has an agreement with its data publishers that, by participating, they will also publish data through GBIF. GBIF keeps our registry of data providers and uses this registry to find and update data periodically from the original so...

Darwin Core Archives for Species Checklists

Image
GBIF has long had an ambition for supporting the sharing of annotated species checklists through the network. Realising this ambition has been frustrated by the lack of a data exchange standard of sufficient scope and simplicity as to promote publication of this type of resource. In 2009, the Darwin Core standard data set was formerly ratified by the TDWG, Biodiversity Information Standards. The addition of new terms, and a means of expressing these terms in a simplified and extensible text-based format, paved the way for the development of a data exchange profile for exchanging species checklists known as the Global Names Architecture (GNA) Profile. Species checklists, published in this format, can be zipped into single, portable, 'archive' files. Here I introduce two example archives that illustrate the flexible scope of the format. The first represents a very simple species checklist while the second is a more richly documented taxonomic catalogue. The contents ...

Configuring Drupal and some modules for ticketing emails

Image
We at the Secretariat receive enquiries via helpdesk[at]gbif[dot]org, portal[at]gbif[dot]org and info[at]gbif[dot]org, everyday, or I would say, almost every hour. Some of them are provider-specific questions that need special attention from staff, while some others are FAQs. We have been thinking about better managing questions/issues, so by adding a little bit structure in the collaborative workflow, we can: 1. Make sure questions are answered with satisfaction; 2. Estimate how much man hours have been spent, or evaluate performance; 3. Improve efficiency on helpdesk activities. To achieve these, we need softwares that meet these requirements: 1. Case management for incoming emails; 2. A Q&A cycle should be completed by solely using email. Web forms are good but not necessary in the beginning; 3. Easy configured knowledge base essays; 4. Graphical reports shows the helpdesk performance; 5. Automatic escalation of case status. We looked for options from Open Source Help Desk List....

Using C3P0 with MyBatis

The problem In our rollover  process, which turns our raw harvested data into the interpreted occurrences you can see on our portal , we now have a step that calls a Web Service to turn geographical coordinates into country names. We use this to enrich and validate the incoming data. This step in our process usually took about three to four hours but last week it stopped working all together without any changes to the Web Service or the input data. We've spent a lot of time trying to find the problem and while we still can't say for sure what the exact problem is or was we've found a fix that works for us which also allows us to make some assumptions about the cause of the failure. It is a project called  geocode-ws  and it is a very simple project that uses MyBatis to call a PostgreSQL  (8.4.2) &  PostGIS  (1.4.0) database which does the GISy work of finding matches. Our process started out fine. The first few million calls to the Web Service wer...

Indexing occurrences data - using Lucene technology

The GBIF Occurrence Index collects, stores and parses data gathered from different sources to provide a fast and accurate access to biodiversity occurrence data. The purpose of having a GBIF Index is optimize speed, relevance and performance of search functionalities that will be implemented by the new GBIF portal architecture. Currently, GBIF has been providing search functionalities in its Data Portal supported in a semi-denormalized index relational database design, which allows find occurrence information by specifying filters to refine the expected results. That design was envisioned to support use cases of the actual GBIF Data Portal (a Web application); for the next generation of the GBIF platform, a new set of requirements must be achieved and is possible that the current index will not be able to support them, the most relevant of those requirements are: scheduling of batch exports, full text search, realtime faceted search and probably new schemas of data sharing with other ...

Customizing the IPT

Image
One of my responsibilities as the Biodiversity Informatics Manager for Canadensys is to develop a data portal giving access to all the biodiversity information published by the participants of our network. A huge portion of this task can now be done with the GBIF Integrated Publishing Toolkit version 2 or IPT. The IPT allows to host biodiversity resources, manage their data and metadata, and register them with GBIF so they can appear on the GBIF data portal , which are all targets we want to achieve. Best of all, most management can be done by the collection managers themselves. I have tested the IPT thoroughly and I am convinced the GBIF development team has done an excellent job creating a stable tool I can trust. This post explains how I have customized our IPT installation to integrate it with our other Canadensys websites. Background Our Canadensys community portal is powered by WordPress (MySQL, PHP), while our data portal - which before the IPT installation only consisted of...