Posts

Occurrence Downloads

Occurrences at GBIF are often downloaded through the web interface , or through the api (via rgbif ect.). Users can place various filters on the data in order to limit the number of records returned. As the occurrence index is currently a 447 GB csv, most users want to use a filter. Total monthly downloads Here I plot the total monthly downloads for various popular filters. For the past few years, GBIF has be averaging around 10k downloads per month. Two peaks in total downloads stand out: Mar 2014 Sep 2016 The Sep 2016 peak seems to be explained by high DATASET_KEY downloads. Both the Mar 2014 and Sep 2016 peaks are well explained by the top users . Top users in this graph are all the downloads generated by the top 3 most active users on GBIF. These users generate downloads in the 1000s and are most likely to be automated downloads generated internally. One interesting detail is that while No Filter Used is not used very often it accounts for more than 500 billion occurrence r...

GBIF Name Parser

The GBIF name parser has been a fundamental library for GBIF to parse a scientific name string into a structured representation of a name. It has been refined over many years based on actual name strings encountered in the GBIF occurrence and checklist indices. Over the years the major design goals have not changed much and can be summarised as follows: extract canonical, code relevant name parts populate only the ParsedName class of the GBIF API ignore any superflous name parts irrelevant to the code, e.g. species authorships in infraspecific names, infrageneric placements of species or superflous infraspecific parts in quadrinomials deal with a wide variety of names that the ParsedName class can represent cultivar names bacterial strains & candidate names virus names named hybrids taxon concept references, sensu latu/strictu or aggregates legacy ranks extract notes often found in names: nomenclatural remarks determination notes like aff.  partially determined species, e.g. ...

GBIF Backbone - February 2017 Update

Image
We are happy to annouce that a new GBIF Backbone just went live, available also as an improved Darwin Core Archive for download . Here are some facts highlighting the important changes. New source datasets Apart from continuously updated source like the Catalog of Life or WoRMS here are the new datasets we used as a source to build the backbone. New Type specimen checklist listing all distinct names of type specimens found in GBIF occurrences contributing 252,410 new species and 57,410 infra specific names. ZooBank joined GBIF and was added as a nomenclator with 175,775 names, contributing 3460 new generic and 39,695 new species names. Added phylum Myzozoa with 136 families under kingdom Chromista to GBIF Algae Classification to fill the classification gap for Dinoflagellates Tiny new dataset listing species named after famous people and which are often found in news The 43 sources used in this backbone build Code changes Merging of duplicate taxa across kingdoms, especially ...

Sampling-event standard takes flight on the wings of butterflies

Data collected from systematic monitoring schemes is highly valuable. That's because harvesting species data from a given set of sites repeatedly over time using a well-defined sampling effort opens the door to key ecological analyses including phenology, population trends, changes in community structure and other metrics related to a range of Essential Biodiversity Variables ( EBVs ). A couple of years ago there was no faithful way to universally standardize data from systematic monitoring schemes. This meant that researchers using this kind of data would need to spend a lot of time deciphering it first. Their job would get even more complicated when trying to integrate data from various heterogeneous sources, each storing their data in different formats, units, etc. Today, the situation looks much better thanks to a massive collaboration between GBIF , EU BON partners and the wider biodiversity community whose aim was to enable sharing of "sampling-event datasets".   I...

IPT v2.3.3 - Your repository for standardized biodiversity data

Image
GBIF is pleased to announce the release of IPT v2.3.3, now available for download from the IPT website . This version looks and feels the same as 2.3.2 but is much more robust and secure. I'd like to recommend that all existing IPT installations be upgraded as soon as possible following the instructions listed in the release notes . Additionally, a couple new strategic features have been added to the tool to enhance its potential. A description of these new features follows below. Improved dataset homepage Compared with general-purpose repositories such as Dryad or Figshare , the IPT ensures that uploaded biodiversity data gets disseminated in a standardized format (Darwin Core Archive - DwC-A), facilitating wider reuse and enabling the data to be indexed by aggregators such as GBIF.org. Interoperability comes at a small cost though, as depositors choosing to use the IPT must overcome a learning curve in understanding how to map their data to the Darwin Core standard. To make this...

GBIF Backbone - August 2016 Update

Image
GBIF has just put a new backbone taxonomy into production! Since our last update of the GBIF Backbone we have received various feedback and gained insight into potential code improvements. Here is a quick summary of what has changed in this August 2016 version. Important code changes: much less eager basionym detection resulting in fewer algorithmically assigned synonyms and removing many false synonyms especially in plants detect and merge orthographic variants of species doing gender stemming, allowing double consonant characters, deal with author transliterations and merging hybrid names All fixed issues in the source code that generates a new backbone can be found there, each of them often leads to actual reported user feedback: http://dev.gbif.org/issues/browse/POR-3029 New sources The following new sources have been incorporated into the august backbone: major new version of The Paleobiology Database contributing 2,315 new families, 11,390 genera and 131,958 species names to the...

Probably Turboveg's best-kept secret

Image
Turboveg is one of the most widely used software programs used to manage vegetation data. Probably its best-kept secret is that it can export vegetation data in Darwin Core Archive (DwC-A) format, which is a standard format that enables its quick and easy integration with other resources on GBIF.org . Turboveg v2 converts vegetation data into species occurrence data packaged as a DwC-A. Now thanks to an 8 month long collaboration between GBIF and Stephan Hennekens (Turboveg's developer), v3 will convert vegetation data into sampling event data packaged as a DwC-A - a much more faithful and useful representation of the data. Turboveg Screenshot of Turboveg v3 prototype Turboveg is an easy to install and easy to use Windows program for storing, managing, visualizing and exporting vegetation data (relevés). A relevé is a list of the plants in a delimited plot of vegetation, with information on species cover and on substrate and other abiotic features in order to make as complete as...