07 Feb 2024

Landmark Report Shows Scholarly Content at Risk

Martin Eve of CrossRef has some sobering news for researchers. A recent analysis suggests that around a quarter of academic publications are not being preserved for the future. 

Eve’s sample is based on the assessment of around 7.5 million of the e-books and articles for which CrossRef provides a fixed identifier, or Digital Object Identifier. These identifiers prevent ‘link rot’ so that, even if a URL changes, an article or book can still be cited and retrieved.  But they’re not enough on their own because they only preserve the link not the destination. So, unless there’s a preservation service guaranteeing the content, the DOI will stop working if a publisher goes out of business or simply removes a title.  Scholarly content needs to be preserved before it is deleted, not after. 

The bad news is that for c. 2 million articles in his study, he could find no evidence that the articles were being preserved. So, based on this sample more than a quarter of the electronic scholarly record is immediately at risk.

The good news is 4.3 million of the works he studied were preserved in at least in one place. “That's not utterly terrible, and this under-counts preservation, because we haven’t got data from every archive everywhere. I’m also not looking at green archives, although there’s still debate about whether such platforms can constitute adequate preservation. It is true that simple hosting in an institutional repository is not the same as triplicate redundancy preservation in dark archives.”

Librarians have an important role to play in encouraging, indeed requiring, proper preservation. A toolkit to support you, including model license language, is available here: https://liblicense.crl.edu/resources/digital-preservation/

Alicia Wise of CLOCKKS observed: ‘This is a wake-up call.  Agencies like CLOCKSS and libraries like the British Library have very advanced understanding of how to preserve content and have made amazing progress with one of the major challenges of our generation.  But we urgently need to accelerate the preservation of our intellectual heritage content if we want to secure the huge percentage of scholarship which remains unprotected.’

William Kilbride of the Digital Preservation Coalition welcomed the report: ‘Martin’s finding are incredibly important. Publishers and library have been at the leading edge of digital preservation. We’ve been arguing for years for urgent investment to ensure research remains viable against the fluctuating fortunes of the publishing industry.  It’s pleasing to see progress, but telling how much more there is to do.’

In November 2023 the DPC’s ‘Bit List’ classified Research papers among ‘vulnerable’ content types, meaning that the application of ‘proven tools and techniques’ is required to improve the likelihood of preservation.  ‘What happens in sectors which haven’t invested in preservation the way libraries and publishers have?  Martin’s report is significant for publishers who deal with well-known data types in a well-developed sector. It hints to the crisis in other sectors which have not been so pro-active in the preservation of digital content.’

You can read more about Martin Eve’s sobering discoveries here: https://clockss.org/martin-eve-crossref-the-digital-preservation-of-7-5-million-items/