Home > About PANDORA > Table of Contents

Webrecorder

Webrecorder is a tool for capturing individual pages with high fidelity that was originally developed for preserving digital art but has been adopted by many web archiving programmes around the world. Unlike web crawlers such as HTTrack and Heritrix, Webrecorder needs someone to manually click through pages on a website and it archives everything they view. Since Webrecorder uses a real web browser though it can capture some kinds of dynamic content which web crawlers are not able to capture.

The NLA runs a copy of Webrecorder connected to the Australian Web Archive for use by PANDORA partners.

Getting Started

  1. Open Webrecorder from the PANDAS sidebar menu
  2. Click 'New collection' to create collection for the site or set of documents you're trying to archive (ignore the "Make public" option)
  3. Click the 'New session' button
  4. Enter a URL to capture in the box
  5. After the page loads interact with to ensure all the content is loaded. For example to collect audio files make sure to click 'Play' so the browser loads the file.
  6. Follow links to other pages you wish to capture.
  7. When you are finished hover the red 'Capturing' in the top left and click 'Stop'.
  8. The files recorded in your session will be automatically archived. You can view the results of your session within Webrecorder itself. After about an hour they will be made available in the Australian Web Archive.
  9. Once the collection has been archived (you can check in Bamboo) you can delete the collection from Webrecorder to free up space.

See Webrecorder's documentation for more advanced options.

Known issues

The integration between Webrecorder and the Australian Web Archive is currently fairly primitive. There is not yet a way to add a link to Webrecorder content from PANDAS collections. This is something we're working on.

Webrecorder's public/private, lists and notes features have no effect on Trove.

The hourly sync process is far from ideal we're looking into how to implement a different archiving mechanism.

Integration diagram