Skip to content

Skip to menu

Pandora : Australia's Web Archive National Library of Australia and Partners
Home > About PANDORA > Frequently asked questions about PANDORA

Frequently asked questions about PANDORA

Questions relating to organisation

Questions relating to collection

Questions relating to description and resource discovery

Questions relating to access

Technical questions

Questions about preservation

What is PANDORA?

PANDORA, Australia's Web Archive, is the name the National Library gave to it selective web archiving program. The PANDORA web archiving program is a collaborative activity with nine other partner agencies .

The initialism 'PANDORA' stood for 'Preserving and Accessing Networked DOcumentary Resources of Australia'. The National Library no longer promotes this initialism but retains 'PANDORA' as the branding for the selective web archiving component of its web archiving program which also includes some bulk harvesting of websites (in particular government websites) and annual harvests of the entire .au Australian web domain.

Where is PANDORA based?

PANDORA is managed and maintained at the National Library of Australia, Canberra, Australia.

Both the infrastructure for the web archive and the archived content are located and stored at the Library.

When did work on PANDORA begin?

Work began on scoping and defining selection guidelines in late 1995 and PANDORA was formally established on 1 April 1996. The first titles were archived in October 1996.

Who builds the PANDORA Archive?

Content for the Archive is selected and curated by the National Library and the nine other partner agencies .

Within the National Library, the Web Archiving Section is responsible for administering the Archive, supporting partner agencies, providing input to the ongoing infrastructure development as well as the operational tasks of selecting, scoping collecting, quality assurance and cataloguing content.

How is the Archive funded?

All of the participants fund their contributions to the PANDORA Archive from their ongoing operational budgets. This includes the National Library which bears the expense of administering the Archive, storing and preserving the archived content as well as developing and maintaining the technical infrastructure.

No additional funding has been received from government to develop, build or support the activity.

How do participants contribute to the Archive when they are remote from the National Library?

All partner agencies undertake the curatorial tasks necessary to archive web content in their jurisdication or area of responsibility. This is achieved using the web-based workflow management system (PANDAS) developed by and centrally maintained by the National Library.

What is the scope of the Archive? Do you have selection guidelines?

Each of the PANDORA partner agencies select content according to their collection development policy or selection guidelines, which provide direction in respect to the jursidiction or subject areas for which they take responsibility.

The National Library is responsible for archiving websites and documents of national relevance, while the state and territory library partners are resposible for websites that are relevant to their respective jurisdictions. The Australian War Memorial is responsible for selecting websites relating to Australian military history; the National Gallery of Australia for websites relating to Australian fine arts; and the Australian Institute of Aboriginal and Torres Strait Islander Studies is responsible for websites and documents relating to Australian Indigenous peoples.

The PANDORA Archive represents the highly selective component of the National Library's web archiving program which also includes bulk seed-list and domain harvesting. PANDORA content makes up around 7 or 8 percent of the entire web archive mainatined by the Library.

Content collected for PANDORA is not restricted to websites on the .au top level country domain. Content may be located on either an Australian or an overseas server and is selected on the basis of relevance to Australia and Australians in accordance with the statutory functions of the National Library (as set out in the National Library Act, 1960) and legal deposit coverage as set out in the Copyright Act, 1968 or the legal warrants of participating agencies including state legal deposit legislation and government directives.

How frequently do you capture web sites?

Curators determine an appropriate schedule for reharvesting of content which can range from daily to annual or beyond. It is also possible to scheduled harvesting on specific days although not at specific times of the day.

In setting harvesting schedules consideration is given to the stability of the site (is content frequently removed), how critical is timely harvesting, technical necessity and pragmatic concerns to make best use of resources.

The objective in setting harvesting schedules is long-term preservation and to ensure content is comprehensively collected. The Archive does not operate as a mirror site and there is no attempt to archive and document every change to a website.

Do you attempt to collect all levels of each web site?

Content selected for PANDORA ranges from entire websites to single documents. Because PANDORA is a selective archive it is more common to target specific content rather than entire websites. For example, for the daily harvesting of news sites only the main page and one link will be collected because the harvesting schedule is frequent. Since the PANDORA content becomes part of the much larger Australian Web Archive with content collected through domain and bulk harvesting methods content that is not collected through the targeted selective harvesting may still link to its broader context in the larger archive.

Do you archive external links?

Targeted selective harvesting means that most of the linked content is not collected. However when added to the much larger amount of content in the Australian Web Archive external links to content that has been harvested through other bulk and domain harvesting methods will work. Linked content that is generally out of scope such as non-Australian content will not be collected.

Do you receive publications directly from publishers? If so, how?

As a general rule deposited content cannot be received from publishers and creators and added to the web archive.

The most efficient and practical means of copying websites for the web archive involve using crawl harvesters. PANDORA uses the crawler robots HTTrack and Heritrix and the headless browser harvester Webrecorder for patching. This system is not designed to accept content management systems, databases or other content unless it is in the form of WARC packages.

Publishers wishing to deposit publications are able to do so in PDF, epub or mobi format through the National eDeposit portal.

How large is the Archive?

Check the Statistics page for up to date figures on the size of the PANDORA component of the Australia Web Archive.

Are publications in the Archive subject to legal deposit?

Yes. In February 2016 the legal deposit provisions of the Copyright Act, 1968 were extended to include all online publications including websites, web pages and documents. For more information see the Library's legal deposit information.

What kind of resources does the Archive contain?

PANDORA contains a wide range of online publications and websites. Priority is given to collections of material that reflects the range of Australian society and culture especially in respect to events of national concern such as elections and national disasters (e.g. bushfires, COVID-19) and commemorations. Thes collections can be browsed through the Trove Australian Web Archive portal. In addition there are many other types that are individually collected to provide a representation of Australian online publishing. The following are just a few examples in a few categories:

Cultural activity

Community concerns

Scientific standards and research

Politics and government

Indigenous peoples

Sport

People

There are sites of well known as well as 'ordinary' Australians:

How are resources in PANDORA described (resource discovery metadata)?

Many of the titles selected for inclusion in the Archive receive MARC cataloguing records which can be searched through the National Library's online catalogue, Libraries Australia and Trove. Adding the phrase "pandora electronic collection" to a search will help to find catalogued PANDORA content.

The archived titles that are catalogued only represents a very small part of the content in the Archive. All content is full text and URL indexed and can be searched through the Trove Australian Web Archive discovery service.

How do I find and access the PANDORA Archive?

Access to the contents of the PANDORA Archive is freely available by searching the Trove discovery service. This provides access to the entire Australian Web Archive of which PANDORA is only a small part. PANDORA curated collections may be browsed from the same Trove discovery service.

What is the technical infrastructure that supports PANDORA?

This section to be updated.

Is there a long-term preservation policy for the contents of the PANDORA Archive?

Yes. The National Library of Australia is strongly committed to ensuring long-term access to all its digital collections, including the PANDORA Archive. To this end it: