Skip to content

Commit ae872cd

Browse files
committed
docs and README: add wacz format doc, tweak links, tweak README
1 parent 53bb291 commit ae872cd

File tree

6 files changed

+42
-12
lines changed

6 files changed

+42
-12
lines changed

Gemfile.lock

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ GEM
88
eventmachine (>= 0.12.9)
99
http_parser.rb (~> 0.6.0)
1010
eventmachine (1.2.7)
11-
ffi (1.13.0)
11+
ffi (1.13.1)
1212
forwardable-extended (2.6.0)
1313
http_parser.rb (0.6.0)
1414
i18n (0.9.5)

README.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,18 @@
55
## Serverless Web Archive Replay
66

77
ReplayWeb.page provides a full web archive replay system running directly in the browser,
8-
available at: https://replayweb.page/
8+
available at: [https://replayweb.page/](https://replayweb.page)
99

10-
For full user docs, see: https://replayweb.page/docs
10+
For full user docs, see: [https://replayweb.page/docs](https://replayweb.page/docs)
1111

12-
The ReplayWeb.page App can be downloaded from: https://replayweb.page/releases
12+
The ReplayWeb.page App can be downloaded from the [Releases](https://replayweb.page/releases) page.
1313

14+
### Embedding Guide
1415

15-
## Architecture / What's in this repo
16+
See the [Embedding Guide](https://replayweb.page/docs/embedding) for more info on embedding web archives in other sites.
17+
18+
19+
## What's in this repo
1620

1721
ReplayWeb.page is a static web site / offline web app + Electron app.
1822

@@ -59,12 +63,12 @@ For service workers to work, they must be served from either localhost or an HTT
5963

6064
See the [user docs](https://replayweb.page/docs/) for additional info about using ReplayWeb.page
6165

66+
67+
6268
## LICENSE
6369

6470
ReplayWeb.page is made available under the AGPLv3 License.
6571

66-
[Embedding ReplayWeb.page](https://replayweb.page/docs/embedding) from published releases is encouraged.
67-
6872
If you would like to use it under a different license or have a question, please reach out as that may be a possibility.
6973

7074

docs/exploring.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,12 @@ The archive view presents several tabs:
2424
- **Story** - This Story view presents lists of curated pages, as developed by the creator of the web archive.
2525
This option is only shown if there is a curated story. As curated lists are not a standard part of WARC, only WARCs exported from Webrecorder.io/Conifer can have this option.
2626

27-
The new [Web Archive Collection (WACZ)](web-archive-collection-format) can also include curated lists.
27+
The new [Web Archive Collection (WACZ)](wacz-format) can also include curated lists.
2828

2929
- **Pages** - The Pages view presents all pages in the web archive. As pages are not a standard part of WARC format,
3030
generally only WARCs from Webrecorder.io/Conifer will have pages.
3131

32-
The new [Web Archive Collection (WACZ)](web-archive-collection-format) can also store pages.
32+
The new [Web Archive Collection (WACZ)](wacz-format) can also store pages.
3333

3434

3535
- **Page Resources** - This view allows searching the archive by URLs, as well as by common MIME type.

docs/formats.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ parent: Reference
1212
ReplayWeb.Page supports the archive formats listed below.
1313
Format is currently determined based on the file extension.
1414

15-
The `.wacz` refers to the newly proposed [Web Archive Collection Zip Format](web-archive-collection).
15+
The `.wacz` refers to the newly proposed [Web Archive Collection Zip Format](wacz-format).
1616

1717

1818
| Format | Extensions | Status |

docs/loading.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ To load a remote archive, simply enter the URL of the archive and click `Load`.
4444
{: .fs-3 .pad .bg-grey-lt-100}
4545
See [Supported Locations](locations) for details on where archives can be loaded from.
4646

47-
The archive will be downloaded, either fully or [only as needed (if possible)](streaming-archives.md) and presented on the archive page.
47+
The archive will be downloaded, either fully or on-demand (if possible) and presented on the archive page.
4848

4949
The system supports WARC files, as well as several other formats
5050

@@ -74,7 +74,7 @@ Due to the nature of the WARC format, the entire file must be read on first use
7474
For WARC files **>25MB**, only the index is initially stored in the browser, and the actual content is loaded 'on-demand',
7575
when the content is first accessed. This leads to faster loading and saves memory when dealing with large archives.
7676

77-
[Web Archive Collection (WACZ)](web-archive-collection-format) are always loaded on-demand, as no indexing is required.
77+
[Web Archive Collection (WACZ)](wacz-format) are always loaded on-demand, as no indexing is required.
7878
The initial archive view should load almost instantly as a result.
7979

8080
If an archive could not be loaded, an error will be displayed instead of the progress.

docs/wacz-format.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
layout: default
3+
title: 'Web Archive Collection Zipped (WACZ) Format'
4+
nav_order: 1
5+
permalink: /docs/wacz-format
6+
parent: Reference
7+
---
8+
9+
## Web Archive Collection Format Specification
10+
11+
ReplayWeb.page supports a new format for bundling raw web archive data (usually WARC files), indices,
12+
page lists and other metadata into a single ZIP file.
13+
14+
The full spec for this format is available at: [https://github.com/webrecorder/web-archive-collection-format/blob/master/README.md](https://github.com/webrecorder/web-archive-collection-format/blob/master/README.md)
15+
16+
Files bundled into this format can use the .wacz (web archive collection zipped) file extension.
17+
18+
ReplayWeb.page will recognize this extension (as well as regular .zip) and will also load it from Google Drive when the
19+
[Google Drive Integration](https://gsuite.google.com/u/2/marketplace/app/replaywebpage/160798412227) is installed.
20+
21+
The key benefit of this format is that large web archive collections can be loaded very quickly, to show the page list
22+
and other key metadata, by downloading only parts of the WACZ file, as [outlined here](https://github.com/webrecorder/web-archive-collection-format/blob/master/README.md#appendix-a-use-case-random-access-to-web-archives-in-zip)
23+
24+
The actual raw content is loaded on-demand when the user requests each page.
25+
26+
With a WARC file, the entire contents must be loaded or indexed to determine the contents of the web archive collection.

0 commit comments

Comments
 (0)