little-things

Grafico dei commit

Autore	SHA1	Messaggio	Data
JustAnotherArchivist	73f35f5591	Fix infinite loop when file ends with something that is not a WARC record	2 anni fa
JustAnotherArchivist	06d60a798c	Bump read size	2 anni fa
JustAnotherArchivist	3e0b70be6b	Handle processes with too many open connections	2 anni fa
JustAnotherArchivist	df7b25c2db	Error on unknown options	2 anni fa
JustAnotherArchivist	4bd4f5a30c	Fix 'Argument list too long' error when using --urls-from-stdin with many URLs	2 anni fa
JustAnotherArchivist	e20d35a553	Fix crash on 429	2 anni fa
JustAnotherArchivist	cef61434a0	Add --urls-from-stdin	2 anni fa
JustAnotherArchivist	b5cf04947b	Add Wasabi	2 anni fa
JustAnotherArchivist	d2afd1309d	Add s3-bucket-find-direct-url	2 anni fa
JustAnotherArchivist	95988466ec	Make S3 response pattern matching more flexible (so it also works on Scaleway)	2 anni fa
JustAnotherArchivist	a9a03d3a00	Add urlsort	2 anni fa
JustAnotherArchivist	9798cc1188	Typo	2 anni fa
JustAnotherArchivist	d193637e5e	Add kill-connections	2 anni fa
JustAnotherArchivist	6cfe8e51ba	Make job a global variable in --pyfilter expressions so it can be used in genexps	2 anni fa
JustAnotherArchivist	a4627fa1c6	Queue derives with `ia tasks` instead of this manual curl rubbish	2 anni fa
JustAnotherArchivist	c68b310afc	Always print the parts value if there is an upload ID Previously, parts wouldn't be printed if it was an empty list. This made resuming uploads that crashed in the first part harder than necessary.	2 anni fa
JustAnotherArchivist	fdc3c3d69e	Support float values for --partsize with M or G suffix	2 anni fa
JustAnotherArchivist	002c1eb7ae	Wait until item exists IA doesn't immediately create the item on CreateMultipartUpload, so if it didn't already exist, UploadPart would fail for a while and we'd waste bandwidth.	2 anni fa
JustAnotherArchivist	142a5a9c49	Get rid of asyncio No point in using it when it only delegates to a ThreadPoolExecutor anyway.	2 anni fa
JustAnotherArchivist	b6663ae731	Add concurrency	2 anni fa
JustAnotherArchivist	22f2e68356	Add JSONL output option for S3 listing	2 anni fa
JustAnotherArchivist	bfebe9a2a5	Fix only sending partial file contents on retries	2 anni fa
JustAnotherArchivist	39b3b7793a	Add support for IA_CONFIG_FILE environment variable	2 anni fa
JustAnotherArchivist	7ed2906dd2	Add progress bar	2 anni fa
JustAnotherArchivist	58f0f0f8d0	Fix being unable to resume an upload that crashed in the first part	2 anni fa
JustAnotherArchivist	74485c399b	Require decompressed WARCs with warc-tiny	2 anni fa
JustAnotherArchivist	e24790132e	Add at-tracker-sample-user-item-size	2 anni fa
JustAnotherArchivist	a14939b069	Add base64url	2 anni fa
JustAnotherArchivist	5c2ce7ec10	Add cdx-chunk	2 anni fa
JustAnotherArchivist	fe0b020352	Add support for reading from stdin	2 anni fa
JustAnotherArchivist	1010769c3c	Handle connection errors	2 anni fa
JustAnotherArchivist	1acdc88c81	Add ia-upload-stream	2 anni fa
JustAnotherArchivist	360c4d9371	Add youtube-extract-rapid	2 anni fa
JustAnotherArchivist	d07b5a7d09	Remove debugging prints	2 anni fa
JustAnotherArchivist	bf5e065a0f	Add URL/percent decoding tool urldecode.c is entirely written by OrIdow6 except for one bug fix (char → uint8_t in the mallocs) and whitespace changes. The test suite is by JAA. Co-authored-by: OrIdow6 <68304414+OrIdow6@users.noreply.github.com>	2 anni fa
JustAnotherArchivist	11485d9404	Add infrastructure for simple C-based tools	2 anni fa
JustAnotherArchivist	c50a8fd796	Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed	2 anni fa
JustAnotherArchivist	5bc3d4b020	Fix crash on an empty response This check was a leftover from the resumeKey pagination, where empty responses are supposed to be impossible. With the page pagination, they are possible.	2 anni fa
JustAnotherArchivist	7f25c092d1	Catch other connection errors	2 anni fa
JustAnotherArchivist	f8352809f3	Handle ConnectionResetError	2 anni fa
JustAnotherArchivist	0b34268210	Catch socket.timeout, which is a separate exception class from TimeoutError before Python 3.10	2 anni fa
JustAnotherArchivist	0f7a2b32a3	Log number of results on a page	2 anni fa
JustAnotherArchivist	628aeb052f	Handle rate limiting	2 anni fa
JustAnotherArchivist	d3ea3ce8a0	Switch from urllib to http.client to reuse connections	2 anni fa
JustAnotherArchivist	8f7619ff3a	Add retries	2 anni fa
JustAnotherArchivist	f98fdd5f01	Fix printing HTTP response line to stdout instead of stderr	2 anni fa
JustAnotherArchivist	c9400ac46f	Fix recognition of command without optional parts	2 anni fa
JustAnotherArchivist	5ca15a7c94	Add concurrency support The proper way to do that (with asyncio) is of course aiohttp. A major drawback of the implemented approach is that running tasks can't be cancelled in case of an error. However, it works with just the standard library, and that advantage outweighs the awkward error handling for now.	2 anni fa
JustAnotherArchivist	191948cf9d	Print number of modified records on requeueing	2 anni fa
JustAnotherArchivist	5121524f83	Log retrieval of showNumPages	2 anni fa

1 2 3 4 5 ...

331 Commit (73f35f55913e2af31b8859f53bbdb724a7760831) Tutti i branch Cerca

331 Commit (73f35f55913e2af31b8859f53bbdb724a7760831)

Tutti i branch