Decoupling components
Recent blog posts have introduced some of the registry and portal processing work under development at GBIF. Here I'd like to introduce some of the research underway to improve the overall processing workflows by identifying well defined components and decoupling unnecessary dependencies. The target being to improve the robustness, reliability and throughput of the data indexing performed for the portal. Key to the GBIF portal is the crawling, processing and indexing of the content shared through the GBIF network, which is currently performed by the Harvesting and Indexing Toolkit (HIT) . Today the HIT operates largely as follows: Synchronise with the registry to discover the technical endpoints Allow the administrator to schedule the harvest and process of an endpoint, as follows: Initiate a metadata request to discover the datasets at the endpoint For each resource initiate a request for the inventory of distinct scientific names Process ...