Author: Government recordkeeping

Written by Robin Friend, Digital Records Technical Specialist, and David Fowler, Manager Digital Transfer for World Digital Preservation Day 2022 on 3 November.
 

This year Public Record Office Victoria (PROV) took the plunge into website preservation after being approached by a Victorian Government agency for advice regarding the preservation of two public websites that were due to be de-commissioned. After an unsuccessful attempt by a third-party vendor to capture the sites in our preferred preservation format (.warc) we decided to have a go ourselves, as a learning experience if nothing else. The results were surprising and successful.

The sites were dynamic and interactive, relying heavily on JavaScript and user input to display content. Additionally, much of the content was in the form of embedded YouTube videos. This led to difficulties when trying to capture the sites using traditional web crawlers, which failed to preserve the bulk of the content. Initial investigation by the vendor suggested it would not be feasible to capture an interactive facsimile of the website. The best we could hope for was a ‘static’ capture, and capture of the embedded media separately. 
 

Server-side web archiving

A static capture is obviously not the most desirable outcome. Having little background in web archiving, we decided to try to find a better solution that more faithfully represented the website as it appeared to the public. This research led to an article by Eoin O’Donohoe from the Netherlands Institute for Sound and Vision, titled Server-Side Web Archiving. The article detailed their attempts to archive two dynamic websites using existing tools one of which was Conifer (previously known as WebRecorder). Conifer is an open-source web archiving tool that allows for the capture of dynamic websites, including content relying on JavaScript and embedded media. The process is very straightforward, you create a free account, select the URL to begin preserving, and then in a virtual browser navigate the chosen website, just as a user does, ensuring that all pages are visited. Conifer captures the pages that are visited, and these can then be exported as a .warc file. 

 

Conifer capture

As the websites were relatively small (in terms of number of pages) we decided to try to capture them using Conifer. This was a relatively slow and involved process, however it was very worthwhile, as the captures proved to be faithful to the original user experience. 

We now want to gain additional expertise in this space. The capture required the archivist to navigate through each page to be captured and in the absence of an accurate site map, content could be accidentally overlooked, so exploring more advanced tools such Browsertrix could be worthwhile. Ultimately too, we’d like to provide access via our online catalogue, perhaps by embedding a tool such as replayweb.page.

Overall, we discovered that while digital preservation can seem daunting, particularly for those without technical expertise, it is worthwhile investigating the tools that may be available. Conifer is both free and easy to use, and the relatively small-time investment has led to a better outcome for the government agency, PROV and future researchers. 
 

Material in the Public Record Office Victoria archival collection contains words and descriptions that reflect attitudes and government policies at different times which may be insensitive and upsetting

Aboriginal and Torres Strait Islander Peoples should be aware the collection and website may contain images, voices and names of deceased persons.

PROV provides advice to researchers wishing to access, publish or re-use records about Aboriginal Peoples