JustAnotherArchivist
4798154e98
Fix URLs without a path
пре 1 година
JustAnotherArchivist
1830d67283
Add ia-cdx-search-subdomains
пре 1 година
JustAnotherArchivist
565be7bf1b
Fix
пре 2 година
JustAnotherArchivist
e2085e6c81
Add cloudflare-email-decode
пре 2 година
JustAnotherArchivist
73f35f5591
Fix infinite loop when file ends with something that is not a WARC record
пре 2 година
JustAnotherArchivist
06d60a798c
Bump read size
пре 2 година
JustAnotherArchivist
3e0b70be6b
Handle processes with too many open connections
пре 2 година
JustAnotherArchivist
df7b25c2db
Error on unknown options
пре 2 година
JustAnotherArchivist
4bd4f5a30c
Fix 'Argument list too long' error when using --urls-from-stdin with many URLs
пре 2 година
JustAnotherArchivist
e20d35a553
Fix crash on 429
пре 2 година
JustAnotherArchivist
cef61434a0
Add --urls-from-stdin
пре 2 година
JustAnotherArchivist
b5cf04947b
Add Wasabi
пре 2 година
JustAnotherArchivist
d2afd1309d
Add s3-bucket-find-direct-url
пре 2 година
JustAnotherArchivist
95988466ec
Make S3 response pattern matching more flexible (so it also works on Scaleway)
пре 2 година
JustAnotherArchivist
a9a03d3a00
Add urlsort
пре 2 година
JustAnotherArchivist
9798cc1188
Typo
пре 2 година
JustAnotherArchivist
d193637e5e
Add kill-connections
пре 2 година
JustAnotherArchivist
6cfe8e51ba
Make job a global variable in --pyfilter expressions so it can be used in genexps
пре 2 година
JustAnotherArchivist
a4627fa1c6
Queue derives with `ia tasks` instead of this manual curl rubbish
пре 2 година
JustAnotherArchivist
c68b310afc
Always print the parts value if there is an upload ID
Previously, parts wouldn't be printed if it was an empty list. This made resuming uploads that crashed in the first part harder than necessary.
пре 2 година
JustAnotherArchivist
fdc3c3d69e
Support float values for --partsize with M or G suffix
пре 2 година
JustAnotherArchivist
002c1eb7ae
Wait until item exists
IA doesn't immediately create the item on CreateMultipartUpload, so if it didn't already exist, UploadPart would fail for a while and we'd waste bandwidth.
пре 2 година
JustAnotherArchivist
142a5a9c49
Get rid of asyncio
No point in using it when it only delegates to a ThreadPoolExecutor anyway.
пре 2 година
JustAnotherArchivist
b6663ae731
Add concurrency
пре 2 година
JustAnotherArchivist
22f2e68356
Add JSONL output option for S3 listing
пре 2 година
JustAnotherArchivist
bfebe9a2a5
Fix only sending partial file contents on retries
пре 2 година
JustAnotherArchivist
39b3b7793a
Add support for IA_CONFIG_FILE environment variable
пре 2 година
JustAnotherArchivist
7ed2906dd2
Add progress bar
пре 2 година
JustAnotherArchivist
58f0f0f8d0
Fix being unable to resume an upload that crashed in the first part
пре 2 година
JustAnotherArchivist
74485c399b
Require decompressed WARCs with warc-tiny
пре 2 година
JustAnotherArchivist
e24790132e
Add at-tracker-sample-user-item-size
пре 2 година
JustAnotherArchivist
a14939b069
Add base64url
пре 2 година
JustAnotherArchivist
5c2ce7ec10
Add cdx-chunk
пре 2 година
JustAnotherArchivist
fe0b020352
Add support for reading from stdin
пре 2 година
JustAnotherArchivist
1010769c3c
Handle connection errors
пре 2 година
JustAnotherArchivist
1acdc88c81
Add ia-upload-stream
пре 2 година
JustAnotherArchivist
360c4d9371
Add youtube-extract-rapid
пре 2 година
JustAnotherArchivist
d07b5a7d09
Remove debugging prints
пре 2 година
JustAnotherArchivist
bf5e065a0f
Add URL/percent decoding tool
urldecode.c is entirely written by OrIdow6 except for one bug fix (char → uint8_t in the mallocs) and whitespace changes. The test suite is by JAA.
Co-authored-by: OrIdow6 <68304414+OrIdow6@users.noreply.github.com>
пре 2 година
JustAnotherArchivist
11485d9404
Add infrastructure for simple C-based tools
пре 2 година
JustAnotherArchivist
c50a8fd796
Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed
пре 2 година
JustAnotherArchivist
5bc3d4b020
Fix crash on an empty response
This check was a leftover from the resumeKey pagination, where empty responses are supposed to be impossible. With the page pagination, they are possible.
пре 2 година
JustAnotherArchivist
7f25c092d1
Catch other connection errors
пре 2 година
JustAnotherArchivist
f8352809f3
Handle ConnectionResetError
пре 2 година
JustAnotherArchivist
0b34268210
Catch socket.timeout, which is a separate exception class from TimeoutError before Python 3.10
пре 2 година
JustAnotherArchivist
0f7a2b32a3
Log number of results on a page
пре 2 година
JustAnotherArchivist
628aeb052f
Handle rate limiting
пре 2 година
JustAnotherArchivist
d3ea3ce8a0
Switch from urllib to http.client to reuse connections
пре 2 година
JustAnotherArchivist
8f7619ff3a
Add retries
пре 2 година
JustAnotherArchivist
f98fdd5f01
Fix printing HTTP response line to stdout instead of stderr
пре 2 година