Guest Blog: Digital Lifecycles and the Costs of Curation by Paul Wheatley - 4C Project

In This Section

4C Partners

dpc
Jisc
Inesc
SBA
Dans
DCC
Deutsche National Bibliothek
Keep Solutions
National Library Estonia
The Royal Library
Statens Arkiver
UK Data Archive
University of Glasgow

Guest Blog: Digital Lifecycles and the Costs of Curation by Paul Wheatley

Paul WheatleyAlthough digital preservation costing isn’t what I’d exactly call a passion of mine, I’ve been involved in either working on projects to develop costing tools or commenting from around the periphery of this niche field for quite some time. I may not be too thrilled by economic matters, but I realise it’s an important topic. Having a good understanding of the costs of digital preservation has to be crucial for anyone working in this field. Digital preservation is a solved problem. No really, it is. IF you’ve got the cash. It only starts getting difficult when you have to do it on a budget. Of course when the target of a particular activity is the long term, and you’re working in a world that has difficulty looking beyond next year’s financial results and/or budget, it’s always going to be a financial stretch. There are some critical considerations in light of this observation. The efficiency of your activities and the targetting of resources have to be near the top of that list. Ensuring your digital work is efficient and focused in the right areas requires however, a detailed understanding of the digital lifecycle, of your data and of your costs. That’s where digital preservation cost modelling comes in.

So if the arguement is there, what’s the problem? Let’s get on and solve it, shall we? Well as I’ve observed before in one my more outspoken blog-rants (See “Digital Preservation Cost Modelling: Where did it all go wrong?“), we (as a community) have spent a lot of effort on modelling digital preservation costs but haven’t got very far with it. It’s just really difficult! I’ve already talked at length in that blog post about what I think we’ve got wrong in the past. For this post, having attempted to learn the lessons, I’m going to make a couple of suggestions on how I think we (and in particular the 4C Project) could make a really useful contribution. And I should emphasise, that these are my views only, and not those of the project team.

The strong message that came out of my aforementioned costing blog-rant, was hopefully one of a need for more collaboration. We need to be working more effectively as a community in order to maximize the value of our work. 4C, as an EU-funded project with lots of partners from across Europe, is in an ideal position to bring some more collaboration to this area and tie together existing work into a more coherent whole. A suggestion I made at the Nordbib/Knowledge Exchange workshop on costing last year (that was discussed there and picked up on in the event report) was to attempt to develop a common lifecycle model. Standardisation is never easy, but the benefits of getting at least close are pretty clear: a foundation would be established on which much more compatible costing tools and costing data could be built. Published data on actual costs, such as “We did x,y,z and it cost a,b and c” is pretty rare, but is in high demand from those working in this field.

Unfortunately whenever some costing data is actually published, it’s hard to work with as each bit of capture work tends to categorize or structure the costs in a different way. A standardized lifecycle model would eliminate this problem, and provide a basis for others working with costing data. Develop a new costing model based on the standard lifecycle model, and you can compare it’s results with other work far more easily. How do you quickly build a standard and make sure it will be used? Get buy in from the key people working with costing models. Get them in a room for a 2 day workshop and encourage them to compromise until agreement can be reached. Reduce detail in the standard model or make some elements optional until those key people are happy. I was hoping Knowledge Exchange might take this forward, but their last workshop took a very different direction. 4C is in a great position to fill this void however. Many of those who have led costing initiatives in the past are part of 4C, and others I’m sure will be interested. It felt like there was a lot of good will at the Knowledge Exchange workshop to work more closely together, and this would be a great way to build those links and kick off the kind of collaboration which I belive 4C wants to achieve.

A standard costing model on it’s own achieves nothing, and 4C will I’m sure want to go further. But where? Over the last couple of years I’ve had the pleasure of working with digital preservation practitioners from many different organizations which has given me a reasonable insight into the priorities many have for digital preservation. I’ve also been looking at justifying digital preservation and making the case to fund it effectively, as part of the Jisc funded SPRUCE Project. With thoughts from these activities regularly swirling around in my head, I keep finding myself coming back to the idea of demonstrating the efficiency you can achieve by doing just the right amount of digital preservation early in the digital lifecycle. I’ve come across plenty of anecdotal evidence where that little intervention up front (eg. choosing the right storage approach, checksumming you data, or properly documenting it) can save loads of money a few years downstream (eg. recovering data from broken media, patching up metadata, picking through data to find the stuff you want to preserve). If 4C could collect convincing evidence in a number of different organizational contexts, it would be of huge benefit and significant impact to the digital preservation field as a whole. So it’s not really data to justify full on digital preservation in a comprehensive manner. It’s just to show that with some real thought and a little bit of effort across the lifecycle, there are real financial wins for organisations in the medium (not even long) term.

I was unable to make it to the recent Screening The Future event in London but a recurring theme in the tweets I was following was what they called COI or Cost Of Inaction. In their case, the focus seemed to be primarily on the costs resulting from not performing media migration early on. But I like this terminology for making a case to preserve properly up front. We’re not going as far as the scare tactics we’ve sometimes heard along the lines of: “preserve or there will be a digital dark age”, we’re just pointing out that if you don’t do it now, it’s ultimately going to come back and bite you in the wallet.

So as 4C gets up and running, and launches with the excellent idea of a consultation, I’d encourage the project to focus on some very realistic and practical activities as well as some of the more ambitious targets that the project has I’m sure committed to the EU to deliver (as all projects of this kind must). Well chosen practical activities will build the foundation for those more complex tasks that are harder to achieve. Nailing a shared lifecycle model is a great kick off activity and gathering evidence that a small investment in DP now means big savings later, would deliver a fantastic outcome for 4C.

I’ll be following the project closely and hope to be able to contribute when the opportunity arises.

Paul Wheatley,University of Leeds

Paul manages the Jisc-Funded SPRUCE Project at the University of Leeds and has worked extensively on cost-models for digital preservation.