Skip to content

Commit b9a4089

Browse files
committed
license / docs pass:
- add new docs for loading, exploring, sharing, embedding, and offline usage - add license file / license banner electron app: - for now, don't bundle docs, open in native browser embeds: - support 'embed' option, 'replayonly' or 'full' (default if no url) ui: - display package.json version in info menu package.json: - update scripts to match README - move electron entrypoint into extraMetadata - add scripts for packaging as module README: minor edits
1 parent 948b232 commit b9a4089

24 files changed

+556
-126
lines changed

Gemfile.lock

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,12 @@ GEM
88
eventmachine (>= 0.12.9)
99
http_parser.rb (~> 0.6.0)
1010
eventmachine (1.2.7)
11-
ffi (1.12.2)
11+
ffi (1.13.0)
1212
forwardable-extended (2.6.0)
1313
http_parser.rb (0.6.0)
1414
i18n (0.9.5)
1515
concurrent-ruby (~> 1.0)
16-
jekyll (3.8.6)
16+
jekyll (3.8.7)
1717
addressable (~> 2.4)
1818
colorator (~> 1.0)
1919
em-websocket (~> 0.5)
@@ -53,7 +53,7 @@ GEM
5353
rb-fsevent (0.10.4)
5454
rb-inotify (0.10.1)
5555
ffi (~> 1.0)
56-
rouge (3.17.0)
56+
rouge (3.19.0)
5757
rubyzip (2.3.0)
5858
safe_yaml (1.0.5)
5959
sass (3.7.4)

LICENSE

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
ReplayWeb.page
2+
Copyright (C) 2020 Webrecorder Software
3+
4+
This program is free software: you can redistribute it and/or modify
5+
it under the terms of the GNU Affero General Public License as published by
6+
the Free Software Foundation, either version 3 of the License, or
7+
(at your option) any later version.
8+
9+
This program is distributed in the hope that it will be useful,
10+
but WITHOUT ANY WARRANTY; without even the implied warranty of
11+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12+
GNU Affero General Public License for more details.
13+
14+
You should have received a copy of the GNU Affero General Public License
15+
along with this program. If not, see <http://www.gnu.org/licenses/>.

README.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,12 @@ The frontend is loaded from `ui.js`, while the backend service/web worker is loa
2323

2424
This repository contains:
2525
- The built assets for the site hosted at https://replayweb.page/ via GitHub Pages
26+
- The package for npm module: https://www.npmjs.com/package/replaywebpage
2627
- A build system for https://replayweb.page and ReplayWeb.page App.
2728
- Docs hosted at: https://replayweb.page/docs
2829
- App releases at: https://github.com/webrecorder/replayweb.page/releases
2930

30-
## How to Use
31+
## How to Use This Repo
3132

3233
ReplayWeb.page is built as a Node package can be installed using yarn:
3334

@@ -56,11 +57,15 @@ The static assets are placed in the root `index.html`, `sw.js` and `ui.js`, and
5657

5758
For service workers to work, they must be served from either localhost or an HTTPS endpoint.
5859

60+
See the [user docs](https://replayweb.page/docs/) for additional info about using ReplayWeb.page
61+
5962
## LICENSE
6063

6164
ReplayWeb.page is made available under the AGPLv3 License.
6265

63-
If you would like to use it under a different license, please reach out as that may be a possibility.
66+
[Embedding ReplayWeb.page](https://replayweb.page/docs/embedding) from published releases is encouraged.
67+
68+
If you would like to use it under a different license or have a question, please reach out as that may be a possibility.
6469

6570

6671
## Contributing and Bug Reports

_sass/custom/custom.scss

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,13 @@ $nav-width-md: $nav-width;
3535
.rwp-blue {
3636
color: $rwp-blue;
3737
}
38+
39+
.pad {
40+
padding: 0.5rem;
41+
}
42+
43+
.cap-header {
44+
padding: 0 0 0 10px;
45+
margin: 0;
46+
}
47+

docs/browsing.md

Lines changed: 0 additions & 79 deletions
This file was deleted.

docs/embedding.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
layout: default
3+
title: Embedding ReplayWeb.page
4+
nav_order: 2
5+
permalink: /docs/embedding
6+
---
7+
8+
## Embedding Web Archives with ReplayWeb.page
9+
10+
A key goal of ReplayWeb.page is to make embedding web archives into other sites as easily as possible,
11+
as easy as it is to embed PDFs, for example.
12+
13+
ReplayWeb.page provides the `<replay-web-page>` HTML tag to support embedding.
14+
15+
Embedding requires just two parts, loading the frontend ui and the backend service worker.
16+
17+
For example, to embed a web archive stored on s3 at `s3://webrecorder-builds/warcs/netpreserve-twitter.warc`
18+
(shorthand for: `https://webrecorder-builds.s3.amazonaws.com/warcs/netpreserve-twitter.warc`), you can first add
19+
the following snippet to your HTML page:
20+
21+
22+
{: .bg-blue-000 .text-grey-lt-000 .cap-header}
23+
my-web-archive-embed.html
24+
25+
```html
26+
<script src="https://unpkg.com/[email protected]/ui.js"></script>
27+
<replay-web-page source="s3://webrecorder-builds/warcs/netpreserve-twitter.warc"
28+
url="https://twitter.com/netpreserve"></replay-web-page>
29+
```
30+
31+
This will load the frontend UI.
32+
33+
Since ReplayWeb.page requires a service worker, it is necessary to add a service worker path
34+
from where the web archive will be served. Create a subdirectory (eg. `replay/`) and place the following
35+
one-line script
36+
37+
38+
{: .bg-blue-000 .text-grey-lt-000 .cap-header}
39+
./replay/sw.js
40+
41+
```javascript
42+
importScripts("https://unpkg.com/[email protected]/sw.js");
43+
```
44+
45+
Thus, if the HTML snippet was added to `https://my-site.example.com/path/my-web-archive-embed.html`
46+
then the sw.js should be added such that it is at: `https://my-site.example.com/path/replay/sw.js`.
47+
48+
That's it! Loading `https://my-site.example.com/path/my-web-archive-embed.html` should now load the web arhive.
49+
50+
(Be sure to add sizes to the `<replay-web-page>` tag as needed to size the embed).
51+
52+
You can replace `s3://webrecorder-builds/warcs/netpreserve-twitter.warc` with any web archive hosted on your site,
53+
eg. `https://my-site.example.com/warcs/my-warc-file.warc`.
54+
55+
{: .fs-3 .pad .bg-grey-lt-100}
56+
If the file is loaded from a different origin, your site must have CORS access to download the web archive.
57+
58+
59+
### Versioning
60+
61+
Note that the above example uses the paths as:
62+
63+
- `https://unpkg.com/[email protected]/ui.js`
64+
- `https://unpkg.com/[email protected]/sw.js`
65+
66+
Another alternative would be:
67+
68+
- `https://cdn.jsdelivr.net/npm/[email protected]/ui.js`
69+
- `https://cdn.jsdelivr.net/npm/[email protected]/sw.js`
70+
71+
These URLs point to a specific version of ReplayWeb.page software released on NPM, eg. `1.0.0`, meaning that your replay should stay stable, even if ReplayWeb.page is updated.
72+
73+
You can choose another of ReplayWeb.page (or even try different versions) to ensure that you have tbest available replay.
74+
75+
This addresses the potential issue of older sites breaking when web archive replay software is updated.
76+
77+
For production use, it is advised against linking to the latest version, eg. `https://replayweb.page/ui.js`
78+
and `https://replayweb.page/sw.js` as these will be updated frequently and make break your embed.
79+
80+
81+
82+

docs/exploring.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
layout: default
3+
title: Exploring Web Archives
4+
nav_order: 2
5+
parent: Usage
6+
permalink: /docs/exploring
7+
---
8+
9+
## Exploring Archives (Browse, Search and Replay)
10+
11+
12+
The ReplayWeb.page homepage lists an index of all loaded archives. You can also search by archive title or source
13+
and filter by date loaded or title.
14+
15+
Once loaded, the archive remains cached in the browser for quick access. This view will be unique to your browser.
16+
To remove the archive from your browser, click on the 'X'. (Of course, this does not delete the archive from its original location, only your local copy of it)
17+
18+
### Archive Views
19+
20+
Although primarily designed for replay, ReplayWeb.page also offers several ways to interact with web archives.
21+
22+
The archive view presents several tabs:
23+
24+
- **Story** - This Story view presents lists of curated pages, as developed by the creator of the web archive.
25+
This option is only shown if there is a curated story. As curated lists are not a standard part of WARC, only WARCs exported from Webrecorder.io/Conifer can have this option.
26+
27+
The new [Web Archive Collection (WACZ)](web-archive-collection-format) can also include curated lists.
28+
29+
- **Pages** - The Pages view presents all pages in the web archive. As pages are not a standard part of WARC format,
30+
generally only WARCs from Webrecorder.io/Conifer will have pages.
31+
32+
The new [Web Archive Collection (WACZ)](web-archive-collection-format) can also store pages.
33+
34+
35+
- **Page Resources** - This view allows searching the archive by URLs, as well as by common MIME type.
36+
37+
For many archives with no page or curatorial metadata, this is a way to explore the archive data in more detail.
38+
39+
This view is available for all archives that only store raw data.
40+
41+
- **Replay** - The view presents a replay of the archived web content in a mini-browser directly your browser. The view allows entering a URL directly. Clicking on links on any of the other views will switch to the **Replay** view.
42+
43+
### Search
44+
45+
Both the Page Resources view provide ways to search the archive directly.
46+
47+
48+
#### Searching Page Resources
49+
Page Resources allows searching by URL only, with additional sorting options.
50+
Searches can be done by exact url, by url prefix, or by any string contained in the URL.
51+
52+
The URL Prefix option is best for searching large archives that require on-demand loading.
53+
The contains option will not find any URLs that have not yet been loaded.
54+
55+
56+
#### Searching Pages
57+
58+
The Page view search includes page titles, urls and page full text search, if available.
59+
60+
ReplayWeb.page will currently generate full ext search data from WARC pages automatically.
61+
62+
ReplayWeb.page will soon load existing extracted full-text data as well.
63+
64+
65+
<hr>
66+
Next: [Sharing Links to Archived Pages](sharing.md)
Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,28 @@
11
---
22
layout: default
33
title: 'Supported Formats'
4-
nav_order: 2
4+
nav_order: 3
55
description: 'Supported Formats'
66
permalink: /docs/formats
77
parent: Reference
88
---
99

1010
## Supported Formats
1111

12-
ReplayWeb.Page supports the following archive formats.
13-
Format is currently determined based on the extension, though more extensive detection may be added later.
12+
ReplayWeb.Page supports the archive formats listed below.
13+
Format is currently determined based on the file extension.
14+
15+
The `.wacz` refers to the newly proposed [Web Archive Collection Zip Format](web-archive-collection).
1416

1517

1618
| Format | Extensions | Status |
1719
|:--------|:--------------------|:--------------|
1820
| WARC | `.warc`, `.warc.gz` | <span class="d-inline-block p-2 mr-1 v-align-middle bg-green-000"> Supported |
19-
| HAR | `.har` | <span class="d-inline-block p-2 mr-1 v-align-middle bg-green-000"> Supported |
21+
| HAR | `.har` | <span class="d-inline-block p-2 mr-1 v-align-middle bg-yellow-000"> In Progress |
2022
| WBN | `.wbn` | <span class="d-inline-block p-2 mr-1 v-align-middle bg-yellow-000"> Experimental |
2123
| ARC | `.arc` | <span class="d-inline-block p-2 mr-1 v-align-middle bg-red-000"> Not Supported |
2224
| CDX | `.cdx`, `.cdxj` | <span class="d-inline-block p-2 mr-1 v-align-middle bg-green-000"> Supported |
23-
| ZIP | `.zip`, `.waz` | <span class="d-inline-block p-2 mr-1 v-align-middle bg-yellow-000"> In Progress |
25+
| WACZ | `.wacz` | <span class="d-inline-block p-2 mr-1 v-align-middle bg-yellow-000"> In Progress |
2426

2527

2628

docs/linking.md

Lines changed: 0 additions & 8 deletions
This file was deleted.

0 commit comments

Comments
 (0)