About

The Biodiversity Literature Repository (BLR) has been growing from a community on Zenodo to be a service dedicated to liberate and make open access, FAIR (findable, accessible, interoperable and reusable) data hidden in the hundreds of millions of pages of scholarly publications.

It is built on top of Zenodo, a digital repository hosted at CERN, which provides a sustainable and robust infrastructure for long tail research data, which can consist of small datasets that otherwise would be lost.

Originally a collaboration between Zenodo, Plazi and Pensoft, BLR began as a repository for taxonomic publications which lacked Digital Object Identifiers (DOI) and thus were effectively orphaned from the network of online citations. As it grew its scope expanded to morphed into a highly interlinked repository that focuses on include illustrations and taxonomic treatments contained in publications with all these content types interlinked among themselves and enhanced with and rich metadata.

The source data for BLR are scholarly publications that are most often in PDF or html format but sometimes in XML formats whose structured data facilitates the automated data extraction.

The largest data users are the Global Biodiversity Information Facility (GBIF) and the United States’ National Center for Biotechnology Information (NCBI).

Support of BLR comes from the Arcadia Fund and the three partner institutions Zenodo, Plazi and Pensoft.

Liberating Data

Content is prepared for deposit in BLR via two workflows. Most often data mining processes extract content from PDF or HTML, identifying and labelling relevant data elements, either named entities such as DNA accession codes or geographic localities, or larger textual segments such as material citations and entire treatments. This can be automated by developing templates for each journal. During the upload of the data to BLR, each deposited object (article, treatment, or image) is assigned a DOI, which is cited by each related object. In the best case scenario, this process is completely automated. An advanced workflow is based on publications that have already structured data based on standard vocabularies that machines can understand.

After each article deposit in BLR, GBIF is notified that a new data set derived from the content of the publication is available subsequently GBIF downloads and integrates the data in its service.

The entire process from PDF via BLR to GBIF can take just a couple of minutes.

Liberation also means dissemination. Collaboration with global research infrastructures such as GBIF promotes usage and also promotes the improvement of data structure and quality.

APIs

How BLR Works

BLR works by providing access to parts of publications that are cited within the corpus of biodiversity literature. A taxonomic name usage implicitly cites a clearly delimited section of a scientific publication, called a taxonomic treatment. A taxonomic treatment contains further citations, to other treatments and thus, to publications, to figures, specimens in a collection, or even DNA sequences.

The corpus of biodiversity literature includes tens of millions of figures and taxonomic treatments, which are, therefore, the fundamental building blocks on which knowledge of the world’s biological diversity is based. To make them more open accessible, the BLR team designed the upload types for the deposit of figures and articles with enhanced metadata, For taxonomic treatments, Zenodo created a new resource subtype of the resource type “publication” which maps to DataCite’s general resource type “text”. This allows to mint for all the figures and taxonomic treatments Digital Object Identifiers (DOI).

All publications after 1999 are accessible according to the licenses assigned by their publishers. Publications prior to 2000 are open access. Taxonomic treatments and figures being scientific data, are thus not copyrightable, and since extracted from legally accessible sources, are thus made openly accessible.

The development and maintenance of BLR is organized during biannual sprints at CERN where all partners participate.

The data use is open and depositing of new research related data is free.

Contribute

You can contribute by making your publications open access, by learning how to arrange your scientific data and text so automated processing can apply, by publishing in semantically enhanced journals, by learning how to convert articles, by including BLR services in research grants and by supporting the activity financially.

Blog

Articles about BLR are available in blogs of its three partners: