little-things

The little things give you away... A collection of various small helper stuff

JustAnotherArchivist f914b6afbe Also reset the status_code on requeueing		2 vuotta sitten
LICENSE	Initial commit	5 vuotta sitten
README.md	Initial commit	5 vuotta sitten
alphabetseq	Swap syntaxes	2 vuotta sitten
archivebot-blogspot	Fix HTTPS handling	5 vuotta sitten
archivebot-high-memory	Support python3 in any directory instead of just /usr/bin	4 vuotta sitten
archivebot-irccloud-paste	Add archivebot-irccloud-paste	3 vuotta sitten
archivebot-jobid-calculation	More snscrape helper tools	5 vuotta sitten
archivebot-jobs	Pass through datetime, math, re, and time to --pyfilter	3 vuotta sitten
archivebot-list-stuck-requests	Fix line endings	5 vuotta sitten
archivebot-log-extract-ignores	Add archivebot-log-extract-ignores	3 vuotta sitten
archivebot-monitor-job-queue	First set of little things	5 vuotta sitten
archivebot-youtube	Add helper for AB/chromebot-ing YouTube channels and users	5 vuotta sitten
azure-storage-list	Add --jsonl option	2 vuotta sitten
b64grep	Add b64grep	2 vuotta sitten
bing-scrape	Add Bing, Reddit/Pushshift, and FoolFuuka scrapers	5 vuotta sitten
bugzilla-url-list	Add Bugzilla URL list generator	2 vuotta sitten
combine-by-prefix	Add combine-by-prefix	2 vuotta sitten
curl-ua	Add IE6 UA	3 vuotta sitten
deb-repo-urls	Fix deb file URLs	4 vuotta sitten
dedupe	Another alternative and performance/memory comparison	3 vuotta sitten
europarl-meps-collect	Add script for scraping MEP links from europarl.europa.eu	5 vuotta sitten
foolfuuka-search	Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up	5 vuotta sitten
format-size	Split out size formatting	5 vuotta sitten
fos-ftp-upload	First set of little things	5 vuotta sitten
get-crx4chrome-urls	First set of little things	5 vuotta sitten
github-list-repos	Fix org repo listing on new design/site structure	2 vuotta sitten
gitlab-list-repos	Add support for other instances and full-instance listing	2 vuotta sitten
gofile.io-dl	Add support for password-protected folders	2 vuotta sitten
ia-cdx-search	Add ia-cdx-search	2 vuotta sitten
ia-derive	Add script to queue derive on IA	5 vuotta sitten
ia-files-xml-to-jsonl	Guarantee stable output order	3 vuotta sitten
ia-upload-progress	Proper script for tracking size of uploaded data	5 vuotta sitten
ia-verify-file	Add a timeout to prevent potentially indefinite blocking	2 vuotta sitten
ia-wait-item-tasks	Add ia-wait-item-tasks	2 vuotta sitten
iasha1check	Colourise sha1sum output	3 vuotta sitten
ix.io-upload	Allow overriding the "remote filename"	5 vuotta sitten
kill-wpull-connections	Merge kill-wpull-connections repository into little-things	3 vuotta sitten
killcx-all-https	First set of little things	5 vuotta sitten
mastodon-enumerate-users	Enumerate users on a Mastodon instance	5 vuotta sitten
mastodon-outdated	Finding outdated Mastodon instances	5 vuotta sitten
parent-urls	Refactor, strip query/fragment	3 vuotta sitten
pipelines-launch-in-tmux-windows	First set of little things	5 vuotta sitten
pipelines-monitor-tmux-wget-outcomes	Monitor how a pipeline's wget processes are faring	5 vuotta sitten
pipelines-stop-gracefully	First set of little things	5 vuotta sitten
reddit-pushshift-search	Add Bing, Reddit/Pushshift, and FoolFuuka scrapers	5 vuotta sitten
run-every-five-minutes	First set of little things	5 vuotta sitten
s3-bucket-list	Ignore TLS issues	3 vuotta sitten
s3-bucket-list-qwarc	Record wrapper script in meta WARC as well	3 vuotta sitten
snscrape-extract	Add support for Twitter hashtag extraction	5 vuotta sitten
snscrape-facebook-user	Silence by default	5 vuotta sitten
snscrape-instagram-user	Silence by default	5 vuotta sitten
snscrape-prepare-commands	Add support for Twitter hashtag extraction	5 vuotta sitten
snscrape-tmux	Update tmux session commands	5 vuotta sitten
snscrape-twitter-filter	Filter Twitter hashtag scrapes based on account scrapes	5 vuotta sitten
snscrape-twitter-hashtag	Extract external links from Twitter	5 vuotta sitten
snscrape-twitter-user	Extract external links from Twitter	5 vuotta sitten
snscrape-upload	Print Instagram ignore immediately after upload instead of at the end	5 vuotta sitten
snscrape-vk-user	Silence by default	5 vuotta sitten
snscrape-wiki-transfer-merge	Helper tools for snscrape and the wiki pages	5 vuotta sitten
social-media-extract-profile-link	Fix decoding of links on Facebook profiles	4 vuotta sitten
sum-sizes	Add sum-sizes	3 vuotta sitten
tar-many-files-progress	First set of little things	5 vuotta sitten
tcp-closer	Add tcp-closer command	5 vuotta sitten
transfer.archivete.am-upload	Handle HTTP/2 lowercase headers	3 vuotta sitten
transfer.notkiska.pw-check-ia	Switch to HTTPS	3 vuotta sitten
uniqify	Add uniqify	5 vuotta sitten
url-normalise	Normalise domain name to lower-case before further processing	4 vuotta sitten
warc-peek	Add WARC/1.1 support	3 vuotta sitten
warc-size	Split out size formatting	5 vuotta sitten
warc-tiny	Fix compatibility with wpull 2.x	3 vuotta sitten
website-extract-social-media	Add support for Facebook /pages/category/Category/Name-ID URLs	4 vuotta sitten
wget-spider-estimate-size	First set of little things	5 vuotta sitten
wiki-list-to-main	Add ArchiveBot wiki list helper	5 vuotta sitten
wiki-recursive-extract-normalise	Fix deduplication within each section processing	4 vuotta sitten
wiki-sections-sort	Add wiki-sections-sort	4 vuotta sitten
wiki-website-extract-social-media	Add script for automatic social media discovery	4 vuotta sitten
wpull1-parallel-progress-monitor	First set of little things	5 vuotta sitten
wpull1-progress-monitor	First set of little things	5 vuotta sitten
wpull2-extract-remaining	Clean up wpull DB commands	3 vuotta sitten
wpull2-log-extract-errors	Treat NXDOMAIN and no A/AAAA record errors as ok	3 vuotta sitten
wpull2-requeue	Also reset the status_code on requeueing	2 vuotta sitten
wpull2-url-origin	Clean up wpull DB commands	3 vuotta sitten
youtube-channel-list.py	Add YouTube channel listing script	2 vuotta sitten
youtube-extract	Handle ancient /?v= URLs	2 vuotta sitten
youtube-filter-autogen-channels	Add youtube-filter-autogen-channels	4 vuotta sitten
zstdwarccat	Fix piping when reads return less data than expected	2 vuotta sitten

README.md

Over the past few years, I’ve written and accumulated a number of useful little things to help with archival-related tasks. This repository collects them. I hope someone finds some of them useful.

License (applies to all programs in this repository)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.