JustAnotherArchivist
cef61434a0
Add --urls-from-stdin
2 years ago
JustAnotherArchivist
b5cf04947b
Add Wasabi
2 years ago
JustAnotherArchivist
d2afd1309d
Add s3-bucket-find-direct-url
2 years ago
JustAnotherArchivist
95988466ec
Make S3 response pattern matching more flexible (so it also works on Scaleway)
2 years ago
JustAnotherArchivist
a9a03d3a00
Add urlsort
2 years ago
JustAnotherArchivist
9798cc1188
Typo
2 years ago
JustAnotherArchivist
d193637e5e
Add kill-connections
2 years ago
JustAnotherArchivist
6cfe8e51ba
Make job a global variable in --pyfilter expressions so it can be used in genexps
2 years ago
JustAnotherArchivist
a4627fa1c6
Queue derives with `ia tasks` instead of this manual curl rubbish
2 years ago
JustAnotherArchivist
c68b310afc
Always print the parts value if there is an upload ID
Previously, parts wouldn't be printed if it was an empty list. This made resuming uploads that crashed in the first part harder than necessary.
2 years ago
JustAnotherArchivist
fdc3c3d69e
Support float values for --partsize with M or G suffix
2 years ago
JustAnotherArchivist
002c1eb7ae
Wait until item exists
IA doesn't immediately create the item on CreateMultipartUpload, so if it didn't already exist, UploadPart would fail for a while and we'd waste bandwidth.
2 years ago
JustAnotherArchivist
142a5a9c49
Get rid of asyncio
No point in using it when it only delegates to a ThreadPoolExecutor anyway.
2 years ago
JustAnotherArchivist
b6663ae731
Add concurrency
2 years ago
JustAnotherArchivist
22f2e68356
Add JSONL output option for S3 listing
2 years ago
JustAnotherArchivist
bfebe9a2a5
Fix only sending partial file contents on retries
2 years ago
JustAnotherArchivist
39b3b7793a
Add support for IA_CONFIG_FILE environment variable
2 years ago
JustAnotherArchivist
7ed2906dd2
Add progress bar
2 years ago
JustAnotherArchivist
58f0f0f8d0
Fix being unable to resume an upload that crashed in the first part
2 years ago
JustAnotherArchivist
74485c399b
Require decompressed WARCs with warc-tiny
2 years ago
JustAnotherArchivist
e24790132e
Add at-tracker-sample-user-item-size
2 years ago
JustAnotherArchivist
a14939b069
Add base64url
2 years ago
JustAnotherArchivist
5c2ce7ec10
Add cdx-chunk
2 years ago
JustAnotherArchivist
fe0b020352
Add support for reading from stdin
2 years ago
JustAnotherArchivist
1010769c3c
Handle connection errors
2 years ago
JustAnotherArchivist
1acdc88c81
Add ia-upload-stream
2 years ago
JustAnotherArchivist
360c4d9371
Add youtube-extract-rapid
2 years ago
JustAnotherArchivist
d07b5a7d09
Remove debugging prints
2 years ago
JustAnotherArchivist
bf5e065a0f
Add URL/percent decoding tool
urldecode.c is entirely written by OrIdow6 except for one bug fix (char → uint8_t in the mallocs) and whitespace changes. The test suite is by JAA.
Co-authored-by: OrIdow6 <68304414+OrIdow6@users.noreply.github.com>
2 years ago
JustAnotherArchivist
11485d9404
Add infrastructure for simple C-based tools
2 years ago
JustAnotherArchivist
c50a8fd796
Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed
2 years ago
JustAnotherArchivist
5bc3d4b020
Fix crash on an empty response
This check was a leftover from the resumeKey pagination, where empty responses are supposed to be impossible. With the page pagination, they are possible.
2 years ago
JustAnotherArchivist
7f25c092d1
Catch other connection errors
2 years ago
JustAnotherArchivist
f8352809f3
Handle ConnectionResetError
2 years ago
JustAnotherArchivist
0b34268210
Catch socket.timeout, which is a separate exception class from TimeoutError before Python 3.10
2 years ago
JustAnotherArchivist
0f7a2b32a3
Log number of results on a page
2 years ago
JustAnotherArchivist
628aeb052f
Handle rate limiting
2 years ago
JustAnotherArchivist
d3ea3ce8a0
Switch from urllib to http.client to reuse connections
2 years ago
JustAnotherArchivist
8f7619ff3a
Add retries
2 years ago
JustAnotherArchivist
f98fdd5f01
Fix printing HTTP response line to stdout instead of stderr
2 years ago
JustAnotherArchivist
c9400ac46f
Fix recognition of command without optional parts
2 years ago
JustAnotherArchivist
5ca15a7c94
Add concurrency support
The proper way to do that (with asyncio) is of course aiohttp. A major drawback of the implemented approach is that running tasks can't be cancelled in case of an error. However, it works with just the standard library, and that advantage outweighs the awkward error handling for now.
2 years ago
JustAnotherArchivist
191948cf9d
Print number of modified records on requeueing
2 years ago
JustAnotherArchivist
5121524f83
Log retrieval of showNumPages
2 years ago
JustAnotherArchivist
aba7a1b0b8
Replace resumeKey pagination with page number pagination
resumeKey pagination is horribly broken. It may return incomplete results or infinite loops.
2 years ago
JustAnotherArchivist
d57324a26c
Add --where for arbitrary conditions
2 years ago
JustAnotherArchivist
fed64387bd
Invert count/write logic
Previously, write was the actual default action, and in some forms of the command, the action value isn't actually checked against the possible values, so on a typo, it would write instead of count.
2 years ago
JustAnotherArchivist
f914b6afbe
Also reset the status_code on requeueing
2 years ago
JustAnotherArchivist
303bb69c37
Add ia-cdx-search
2 years ago
JustAnotherArchivist
0b45f7b2ba
Swap syntaxes
2 years ago