Frequently Asked Questions about WARP

General Information about the Web Archiving Project (WARP)

What kinds of websites are archived by WARP?

We archive the websites of the national, prefectural, and municipal governments, including those of prefectures, designated cities, cities, and towns as well as committees for municipal mergers, independent administrative corporations, special public corporations or agencies, universities, events, online periodicals, and similar sites.

How often do you archive these websites?

National institution websites are archived once per month, and the websites of other public institutions are archived four times per year. Websites of private institutions are generally archived once per year. Websites of events are archived based on the event frequency. Online periodicals are archived as they are published to avoid omissions and according to whether or not past issues are available.

When are archived websites made available?

We need to verify whether there were any issues with the archiving process and set the access scope (public internet access or restricted to library use only) after archiving websites. Therefore, we generally make collections available for viewing by the end of the month following the collection. For example, websites collected in April will be made available in late May.

For how long do you archive data?

Our goal is to preserve and to make available the data we archive for as long as technically possible.

Our institution would like to link our website to WARP. Are there any procedures we should follow?

We welcome and encourage other websites to link to pages of WARP. No permission is necessary to do so, but please be sure that links are labelled clearly to indicate that WARP is the destination site.
When linking to individual content archived in WARP, please use the URL obtained by clicking the "Copy URL" button on the banner at the top of the archived web page as the link destination address.

Links | Site Policy

What applications do you use for crawler and content replay?

We use the following open-source software:

Crawler: Heritrix https://github.com/internetarchive/heritrix3
Content replay: pywb https://github.com/webrecorder/pywb
Search: Elasticsearch https://www.elastic.co/jp/elasticsearch

We also use the WARC format to store collected websites. For an overview of the WARC format, please see the following page:

Archival File Format WARC (in Japanese)

How WARP Archives Websites

By what means does WARP archive websites?

We use an automated program, known as a web crawler, to archive websites.

3. How to Harvest Websites | Mechanism of Web Archiving (in Japanese)

Do web crawlers cause congestion at the servers of the websites to be archived?

No, they do not. Web crawlers are designed to ensure that they do not cause congestion. For example, the interval between downloads is always at least one second.

Do you also archive data intended only for internal use?

No, we do not. We only archive data that has been made available to the public via the internet.

Can you archive all kinds of files?

No, we cannot. There are technical limitations on the types of files that we can archive. We do not archive some files, including the types listed below.

Files that are stored in a database
Files that can be streamed or played back as they are downloaded
Files that are set to exclude bots
Files for which links are generated dynamically with JavaScript
Style sheets and JavaScript files
Files with character encoding recognition issues
Large files (approximately 500MB or more)

If materials are preserved through WARP, is it unnecessary to send paper copies?

The purpose of WARP is to preserve information on the internet for future generations, which differs from the purpose of preserving paper materials. While WARP may archive materials posted on websites that are identical in content to paper materials, this does not exempt them from legal deposit. We kindly ask for your continued cooperation in fulfilling legal deposit obligations.

Legal Deposit System | National Diet Library, Japan

About Searching Archived Content

Keywords within the content are not being found.

Due to system specifications, only the first 5000 characters of the content are searchable. Therefore, text beyond the 5000th character will not be found even if searched.

There are no search results for the URL I am looking for.

Try searching without index.html or index.htm at the end.

What does the "Relevance" order mean in search results?

Search results are sorted by relevance according to the frequency with which keywords appear in the content. Webpages with the highest frequency appear at the top of the search results. In addition to frequency, document size and other factors affect the order of relevance.

What's the difference between "Grouping search results" and not grouping them?

When search results are grouped, they are displayed grouped by either URL, website, or publisher. When grouping by URL, only one result per unique URL is shown, so all displayed content URLs will be different. When not grouping by URL, content with the same URL but different archived dates tends to appear in the search results. Therefore, grouping by URL is the default setting. Additionally, when grouping by website or publisher, only one instance of the same content from a website or publisher is displayed. This makes it easier to understand which website or publisher the matched content belongs to.

The displayed content type may differ from the actual content type.

Since we register the MIME type specified by the source server at the time of crawl, the content type displayed in search results may differ from the actual content type.

Browsing Archived Websites

What is the purpose of the banner displayed at the top of archived webpages?

Websites archived by WARP are displayed with a banner at the top of the webpage that provides information about when the page was archived. The WARP banner is displayed to help ensure that users are aware the content they are browsing is an archive and the information is quite possibly out of date.。
*These notes do not always display properly for certain files, such as images and Office files.

5. Content Browsing Screen | Help

What does it mean when an item is labelled as available only at the NDL?

All websites archived by WARP are available for browsing on the premises at the NDL. In cases where the copyright holder has granted permission to do so, the NDL also makes archived content available via the internet. There are some websites, however, for which the copyright holder has not granted such permission, and these are available for browsing only on the premises at the NDL.

I can't get some areas to display.

There are technical limitations on the types of files that we can archive. We do not archive some files including the types listed below, or even if collected, their layout may become distorted.

Files that are stored in a database
Files that can be streamed or played back as they are downloaded
Files that are set to exclude bots
Some files for which links are generated dynamically with JavaScript
Style sheet files and JavaScript files
Files with character encoding recognition issues
Large files (approximately 500MB or more)

Please note that some links will direct you to the live website rather than an archived page. You can verify whether the webpage you are viewing has been archived by WARP or is part of a live website by checking the URL in the address bar of your browser.

The PDF file is not displayed.

When viewing PDF files contained within archived websites from smartphones or tablets, you may be unable to view the entire file or only view part of it. Please download the PDF file using the “Download PDF” button on the banner at the top of the screen.
Additionally, depending on your OS or browser, a content blocker screen may appear, preventing the “Download PDF” button from showing on the banner. In such cases, return to the source page and right-click the PDF link on a computer or long-press it on a smartphone. Then, open the PDF file in a new tab or download it to view.

The text is garbled.

When displaying pages with unrecognized character encodings, text may be garbled. This happens because pages archived with unrecognized character encodings are uniformly processed as UTF-8. Please try changing the character encoding using an extension that allows you to modify the encoding in your browser.

Only part of the page can be printed.

Due to the technology used, when viewing online, you must select the desired print area before printing.
To print the entire page, select "Select All" before printing. On Windows, use the shortcut keys Select All (Ctrl+A) and Print (Ctrl+P). On macOS, use Select All (Command+A) and Print (Command+P).
*If selecting all also selects the banner at the top of the screen, click once on a blank area within the content display area before executing "Select All".
Alternatively, if using a browser that supports printing frames (like Firefox), use the right-click menu: "This Frame" > "Print Frame".

The banner at the top of the screen is not displayed.

If the URL starts with "https://warp.ndl.go.jp/yyyymmdd/yyyymmddhhmmss/~", the banner will not appear. To display the banner, access the page by changing the "/yyyymmdd/" portion of the URL to "/web/".
Additionally, the banner will not appear if you perform the following actions:

Right-click a URL within the page to copy it, then access the copied URL
Open a URL within the page in a new tab

When linking to WARP content from external sources, please use the URL copied via the "Copy URL" button on the top banner.

5-3. URL | 5. Content Browsing Screen | Help

The page does not refresh even after clicking the link.

WARP uses JavaScript to display archived content. Please enable JavaScript in your browser settings. Issues with your network may also cause a delay when following links. If you have trouble loading a particular webpage, wait a while and then try to load it again.

The time and date displayed on the archived webpage doesn't match the date shown in the WARP banner.

WARP archives information from the internet by copying it to the NDL server and then providing access to users. Thus, webpages that display the time and date dynamically might not always display this information correctly.

Are archived websites still protected by copyright?

The copyright to archived websites is retained by the original copyright owner. Care is needed when using archived materials to respect the copyright and limit your reuse of such material to the extent permitted by copyright law. You are responsible for obtaining permission from the original copyright holder when you intend to reprint images, documents, articles, data, or other content from archived websites.

Can users browse archived data at the NDL?

In general, users can browse all archived data at all three premises of the NDL: the Tokyo Main Library, the Kansai-kan of the NDL, and the International Library of Children's Literature.

Is archived data available to the general public via the internet?

The NDL makes available to the general public via the internet any archived webpages for which it has obtained permission from the copyright holder. Pages for which permission has not been obtained are available for browsing by the general public only at the Tokyo Main Library, the Kansai-kan of the NDL, and the International Library of Children's Literature.

Can I request photoduplication services at the NDL for archived webpages?

The NDL will accept requests for photoduplication of archived webpages from websites for which it has already obtained permission from the copyright holder. But we cannot make copies of webpages from websites for which the NDL has not already obtained permission from the copyright holder.
The NDL does not provide remote photoduplication services for archived webpages.

Can I download archived data at the NDL?

Data cannot be downloaded and taken out of the NDL.

There are some archived PDF files for which printing has been disabled. Can I still request photoduplication services for these files?

The NDL is unable to provide photoduplication services for PDF files that are print disabled.

Is the NDL willing to modify or remove archived material for which a claim of copyright violation is made after the material is archived?

The NDL will consider usage restrictions and other measures based on its rules and regulations. Please contact the NDL directly via email to initiate discussion of this kind of issue.

I would like to download a large amount of data in bulk. Is there a function for this?

As we primarily anticipate browser-based viewing as the main usage, we do not provide bulk download functions or APIs for data retrieval. Please refrain from large-scale automated access, as it places a load on the system. Access deemed detrimental to the system by the NDL may be blocked.

Details Pages of Archived Websites

What types of pages are included in "Online Periodicals, etc. Included Within This Archived Website"?

We register pages from continuously published online periodicals and similar content within "Online Periodicals, etc. Included Within This Archived Website". While we are examining archived websites and registering them sequentially, we have not yet covered all online periodical titles contained within each website.

What types of pages are included in the "Pages on Specific Topics Included Within This Archived Website"?

We register pages on socially prominent themes and events, as well as pages for facilities owned by the archived institutions.

What is the details page without thumbnails or sections like "Changes in the Number of New Archived URLs"?

This is the details page for pages registered as "Online Periodicals, etc. Included Within This Archived Website" or "Pages on Specific Topics Included Within This Archived Website". It displays metadata assigned to specific URLs from the archived website and does not show thumbnails or sections like "Changes in the Number of New Archived URLs".

Websites of Official Institutions

What is meant by the phrase "collection under the provisions of the National Diet Library Law"?

The National Diet Library Law allows the NDL to archive the websites of official institutions that are available to the general public via the internet.

The Collection of Internet Materials based on the National Diet Library Law (PDF: 425KB) (in Japanese)

Does this include the websites of all official institutions?

Yes. Every website that is published by an official institution and is available to the general public via the internet is subject to collection under the provisions of the National Diet Library Law.

Is any action required during archiving our websites?

If you have enabled the "Robot Exclusion" setting to block crawlers, please disable the robot exclusion setting.

The Collection of Internet Materials based on the National Diet Library Law (PDF: 425KB) (in Japanese)

If we operate multiple domains, do we need to notify the NDL?

While notification is not mandatory, as the NDL will investigate the relevant domains, we would appreciate it if you could inform us to prevent any omissions in crawling.

Do we need to be notified if there are website updates or URL changes?

No notification is required.

If these websites contain links to non-government agencies, are the targets of those links archived as well?

No, they are not.

How is content from institutional repositories archived?

When content from institutional repositories is available to the general public via the internet, it is considered to be continually available to the public for an extended period of time. Since this availability is not subject to change except under special circumstances, there is no need for the NDL to archive such content. Therefore, content on the National Institute of Informatics Current Institutional Repositories (IR) List is not archived by the NDL.

National Institute of Informatics (NII) Institutional Repositories Program Current IRs

Why is our website restricted to be browsed only on the premises at the NDL?

When we receive a response indicating that internet publication is not permitted, or when we receive no response, we restrict access to only on the premises at the NDL. Additionally, if permission is granted only for a very limited portion of the content, we also restrict access only to on the premises at the NDL.

Can I change my organization's website from internal access only to public internet access?

Public internet access is provided based on permission from each organization. Please contact us via email.

I would like to change certain pages from public internet access to internal access only.

Please review the page "To Website Administrators of Public Institutions" (in Japanese) and submit your request. Requests from third parties (such as the page creator, individuals depicted in photos or mentioned in information on the page, etc.) cannot be accepted. Please consult the website administrator of the institution that published those pages.

To Website Administrators of Public Institutions (in Japanese)

Privately Operated Websites

What is the process for archiving content from privately operated websites?

We archive the content of privately operated websites from which we have obtained permission to do so. Our main focus is on archiving content from the websites of non-profit incorporated associations and foundations, private universities, political parties, and international or cultural events as well as those pertaining to the East Japan Great Earthquake, online periodicals, and others. The process is as follows.

The NDL selects candidates.
The NDL sends the website operator a request for permission to archive.
Interested website operators respond to the request./li>
The NDL configures a web crawler based on information contained in the response and begins to archive the website's content. In the event of technical issues, archiving might be cancelled.
Webpages become available to the general public via WARP once we have confirmed that there are no technical issues related to archiving.

Online Periodicals

What is an online periodical?

At the National Diet Library (NDL), the term "online periodicals" is used to refer to any digital information that is published periodically on a network under a single title and on an ongoing basis with successive volume numbers and dates. Of these, the NDL preserves those that are available free of charge on the internet. We are currently not collecting new online periodicals. Online periodicals contained within archived websites are registered as "Online Periodicals, etc. Included Within This Archived Website" in Details Pages.

What types of pages are included in "Online Periodicals, etc. Included Within This Archived Website"? | FAQ

The journal title has been changed.

If you change the journal title, we would appreciate it if you could notify us. We will create a new metadata under the new title separately from the old title and harvest, preserve, and provide access under the same conditions as the old title.

Is it acceptable for an online periodical to use the same ISSN as its print counterpart?

ISSNs are assigned separately for different media formats (print, microform, electronic, etc.), even if the content is identical. While not mandatory, we kindly request your cooperation in registering the ISSN.

Japanese National Centre for ISSN | National Diet Library, Japan

Could you archive the online periodical we published?

Online periodicals from public institutions are archived by including them within the public institution's website. For online periodicals from private institutions, details such as submission methods are provided on the following page.

E-Legal Deposit System of Online Publications｜National Diet Library , Japan

Our use of Cookies on NDL Web Archiving Project (WARP)

Strictly Necessary Cookies : Always Enabled

Accept all Cookies : Usage Patterns Analytics Cookies