JustAnotherArchivist
|
cbebafe588
|
Colourise sha1sum output
|
3 years ago |
JustAnotherArchivist
|
18a3305e79
|
Fix handling of filenames with spaces and ampersands
|
3 years ago |
JustAnotherArchivist
|
788b25707d
|
Handle more domains and case variations
|
3 years ago |
JustAnotherArchivist
|
36aa2e8259
|
Add archivebot-log-extract-ignores
|
3 years ago |
JustAnotherArchivist
|
5b731fbde1
|
Fix compatibility with wpull 2.x
|
3 years ago |
JustAnotherArchivist
|
743e0582ba
|
Fix confusing error message when lxml is not installed
|
3 years ago |
JustAnotherArchivist
|
491a80a04b
|
Add warc-tiny scrape command for parsing HTTP responses using wpull and extracting links
|
3 years ago |
JustAnotherArchivist
|
fd2728f1b8
|
Add archivebot-irccloud-paste
|
3 years ago |
JustAnotherArchivist
|
4eff3c3eb3
|
Refactor, strip query/fragment
|
3 years ago |
JustAnotherArchivist
|
821cacf626
|
Add --help
|
3 years ago |
JustAnotherArchivist
|
caffebab2e
|
Add parent-urls
|
3 years ago |
JustAnotherArchivist
|
77ec76bc04
|
Add --urls and --nodl options
|
3 years ago |
JustAnotherArchivist
|
06cf71f73d
|
Fix gofile.io download: getServer is not used by the website anymore, and getUpload no longer returns the MD5
|
3 years ago |
JustAnotherArchivist
|
bff1490871
|
Add github-list-repos
|
3 years ago |
JustAnotherArchivist
|
bf695d63a3
|
Fix channel URLs
|
3 years ago |
JustAnotherArchivist
|
dde4464555
|
Cover two more rare URLs
|
3 years ago |
JustAnotherArchivist
|
bbf2d2c315
|
Be more lenient regarding slashes to catch things with collapsed URLs in paths etc.
|
3 years ago |
JustAnotherArchivist
|
362f66eb26
|
Handle youtube-nocookie.com and fix removenonyt mode not recognising CC domains
|
3 years ago |
JustAnotherArchivist
|
81e2b4b999
|
Refine patterns
|
3 years ago |
JustAnotherArchivist
|
9974d4613c
|
Stop trying to rewrite patterns for percent encoding
|
3 years ago |
JustAnotherArchivist
|
0ee83bc0f2
|
Refactor
|
3 years ago |
JustAnotherArchivist
|
b66260ca94
|
Add youtube-extract
|
3 years ago |
JustAnotherArchivist
|
d82dff8b71
|
Add ETA column
|
3 years ago |
JustAnotherArchivist
|
01274e461a
|
Prevent constantly moving bytes around for better performance on large chunked records
|
3 years ago |
JustAnotherArchivist
|
77d9f61de0
|
Colourise output
|
3 years ago |
JustAnotherArchivist
|
6512669cfd
|
Refactor and compare file list as well
|
3 years ago |
JustAnotherArchivist
|
8e0cb30d0a
|
Add atdash mode
|
3 years ago |
JustAnotherArchivist
|
5fe595d71c
|
Record wrapper script in meta WARC as well
|
3 years ago |
JustAnotherArchivist
|
c1def0e7a8
|
Fix S3_WITH_LIST_URLS being defined (but empty) when --with-list-urls is not used
|
3 years ago |
JustAnotherArchivist
|
398cbfdcda
|
Add s3-bucket-list-qwarc, rewritten s3-bucket-list on top of qwarc
|
3 years ago |
JustAnotherArchivist
|
80084e0d35
|
Another alternative and performance/memory comparison
|
3 years ago |
JustAnotherArchivist
|
6a288a6338
|
Use grep instead, which is faster but uses more memory
|
3 years ago |
JustAnotherArchivist
|
4d274e64e0
|
Add dedupe
|
3 years ago |
JustAnotherArchivist
|
a4af8e6ca6
|
Add IE6 UA
|
3 years ago |
JustAnotherArchivist
|
ac277437a3
|
Add Googlebot UA
|
3 years ago |
JustAnotherArchivist
|
0181e53f01
|
Treat NXDOMAIN and no A/AAAA record errors as ok
|
3 years ago |
JustAnotherArchivist
|
41c2a9d2d4
|
Add support for alternative xmlns
Used on Google's storage under https://storage.googleapis.com/bucket/
|
3 years ago |
JustAnotherArchivist
|
830e9dbc43
|
Treat redirects as successful retrievals
|
3 years ago |
JustAnotherArchivist
|
7a999c9b0a
|
Ignore redirects
|
3 years ago |
JustAnotherArchivist
|
579d589853
|
Add a script to extract errors from wpull 2.x logs
|
3 years ago |
JustAnotherArchivist
|
d60948e90f
|
Verbosity
|
3 years ago |
JustAnotherArchivist
|
a9a4792854
|
Fix server validation
|
3 years ago |
JustAnotherArchivist
|
57e2e26d80
|
Support multi-file uploads
|
3 years ago |
JustAnotherArchivist
|
02c967f608
|
Add gofile.io download script
|
3 years ago |
JustAnotherArchivist
|
a83d28d08e
|
Add WARC/1.1 support
|
3 years ago |
JustAnotherArchivist
|
ba2f7db380
|
Merge warc-peek repository into little-things
|
3 years ago |
JustAnotherArchivist
|
79fc113467
|
Merge kill-wpull-connections repository into little-things
|
3 years ago |
JustAnotherArchivist
|
b4bb9babac
|
Switch to HTTPS
|
3 years ago |
JustAnotherArchivist
|
9f3c7b3ca8
|
Support negative filter values for date columns as relative to the current datetime
|
3 years ago |
JustAnotherArchivist
|
c7151efc3e
|
Add script for checking whether a file on transfer.notkiska.pw was archived correctly with AB
|
4 years ago |