Posts

Showing posts from August, 2011

Darwin Core Archives for Species Checklists

Image
GBIF has long had an ambition for supporting the sharing of annotated species checklists through the network. Realising this ambition has been frustrated by the lack of a data exchange standard of sufficient scope and simplicity as to promote publication of this type of resource. In 2009, the Darwin Core standard data set was formerly ratified by the TDWG, Biodiversity Information Standards. The addition of new terms, and a means of expressing these terms in a simplified and extensible text-based format, paved the way for the development of a data exchange profile for exchanging species checklists known as the Global Names Architecture (GNA) Profile. Species checklists, published in this format, can be zipped into single, portable, 'archive' files. Here I introduce two example archives that illustrate the flexible scope of the format. The first represents a very simple species checklist while the second is a more richly documented taxonomic catalogue. The contents ...

Configuring Drupal and some modules for ticketing emails

Image
We at the Secretariat receive enquiries via helpdesk[at]gbif[dot]org, portal[at]gbif[dot]org and info[at]gbif[dot]org, everyday, or I would say, almost every hour. Some of them are provider-specific questions that need special attention from staff, while some others are FAQs. We have been thinking about better managing questions/issues, so by adding a little bit structure in the collaborative workflow, we can: 1. Make sure questions are answered with satisfaction; 2. Estimate how much man hours have been spent, or evaluate performance; 3. Improve efficiency on helpdesk activities. To achieve these, we need softwares that meet these requirements: 1. Case management for incoming emails; 2. A Q&A cycle should be completed by solely using email. Web forms are good but not necessary in the beginning; 3. Easy configured knowledge base essays; 4. Graphical reports shows the helpdesk performance; 5. Automatic escalation of case status. We looked for options from Open Source Help Desk List....

Using C3P0 with MyBatis

The problem In our rollover  process, which turns our raw harvested data into the interpreted occurrences you can see on our portal , we now have a step that calls a Web Service to turn geographical coordinates into country names. We use this to enrich and validate the incoming data. This step in our process usually took about three to four hours but last week it stopped working all together without any changes to the Web Service or the input data. We've spent a lot of time trying to find the problem and while we still can't say for sure what the exact problem is or was we've found a fix that works for us which also allows us to make some assumptions about the cause of the failure. It is a project called  geocode-ws  and it is a very simple project that uses MyBatis to call a PostgreSQL  (8.4.2) &  PostGIS  (1.4.0) database which does the GISy work of finding matches. Our process started out fine. The first few million calls to the Web Service wer...

Indexing occurrences data - using Lucene technology

The GBIF Occurrence Index collects, stores and parses data gathered from different sources to provide a fast and accurate access to biodiversity occurrence data. The purpose of having a GBIF Index is optimize speed, relevance and performance of search functionalities that will be implemented by the new GBIF portal architecture. Currently, GBIF has been providing search functionalities in its Data Portal supported in a semi-denormalized index relational database design, which allows find occurrence information by specifying filters to refine the expected results. That design was envisioned to support use cases of the actual GBIF Data Portal (a Web application); for the next generation of the GBIF platform, a new set of requirements must be achieved and is possible that the current index will not be able to support them, the most relevant of those requirements are: scheduling of batch exports, full text search, realtime faceted search and probably new schemas of data sharing with other ...