The little things give you away... A collection of various small helper stuff
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
JustAnotherArchivist c50a8fd796 Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed 2 vuotta sitten
LICENSE Initial commit 5 vuotta sitten
README.md Initial commit 5 vuotta sitten
alphabetseq Swap syntaxes 2 vuotta sitten
archivebot-blogspot Fix HTTPS handling 5 vuotta sitten
archivebot-high-memory Support python3 in any directory instead of just /usr/bin 4 vuotta sitten
archivebot-irccloud-paste Add archivebot-irccloud-paste 3 vuotta sitten
archivebot-jobid-calculation More snscrape helper tools 5 vuotta sitten
archivebot-jobs Pass through datetime, math, re, and time to --pyfilter 3 vuotta sitten
archivebot-list-stuck-requests Fix line endings 5 vuotta sitten
archivebot-log-extract-ignores Add archivebot-log-extract-ignores 3 vuotta sitten
archivebot-monitor-job-queue First set of little things 5 vuotta sitten
archivebot-youtube Add helper for AB/chromebot-ing YouTube channels and users 5 vuotta sitten
azure-storage-list Add --jsonl option 2 vuotta sitten
b64grep Add b64grep 2 vuotta sitten
bing-scrape Add Bing, Reddit/Pushshift, and FoolFuuka scrapers 5 vuotta sitten
bugzilla-url-list Add Bugzilla URL list generator 2 vuotta sitten
combine-by-prefix Add combine-by-prefix 2 vuotta sitten
curl-ua Add IE6 UA 3 vuotta sitten
deb-repo-urls Fix deb file URLs 3 vuotta sitten
dedupe Another alternative and performance/memory comparison 3 vuotta sitten
europarl-meps-collect Add script for scraping MEP links from europarl.europa.eu 5 vuotta sitten
foolfuuka-search Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up 5 vuotta sitten
format-size Split out size formatting 5 vuotta sitten
fos-ftp-upload First set of little things 5 vuotta sitten
get-crx4chrome-urls First set of little things 5 vuotta sitten
github-list-repos Fix org repo listing on new design/site structure 2 vuotta sitten
gitlab-list-repos Add support for other instances and full-instance listing 2 vuotta sitten
gofile.io-dl Add support for password-protected folders 2 vuotta sitten
ia-cdx-search Fix crash on an empty response 2 vuotta sitten
ia-derive Add script to queue derive on IA 5 vuotta sitten
ia-files-xml-to-jsonl Guarantee stable output order 3 vuotta sitten
ia-upload-progress Proper script for tracking size of uploaded data 5 vuotta sitten
ia-verify-file Add a timeout to prevent potentially indefinite blocking 2 vuotta sitten
ia-wait-item-tasks Add ia-wait-item-tasks 2 vuotta sitten
iasha1check Colourise sha1sum output 3 vuotta sitten
ix.io-upload Allow overriding the "remote filename" 5 vuotta sitten
kill-wpull-connections Merge kill-wpull-connections repository into little-things 3 vuotta sitten
killcx-all-https First set of little things 5 vuotta sitten
mastodon-enumerate-users Enumerate users on a Mastodon instance 5 vuotta sitten
mastodon-outdated Finding outdated Mastodon instances 5 vuotta sitten
parent-urls Refactor, strip query/fragment 3 vuotta sitten
pipelines-launch-in-tmux-windows First set of little things 5 vuotta sitten
pipelines-monitor-tmux-wget-outcomes Monitor how a pipeline's wget processes are faring 5 vuotta sitten
pipelines-stop-gracefully First set of little things 5 vuotta sitten
reddit-pushshift-search Add Bing, Reddit/Pushshift, and FoolFuuka scrapers 5 vuotta sitten
run-every-five-minutes First set of little things 5 vuotta sitten
s3-bucket-list Ignore TLS issues 3 vuotta sitten
s3-bucket-list-qwarc Record wrapper script in meta WARC as well 3 vuotta sitten
snscrape-extract Add support for Twitter hashtag extraction 4 vuotta sitten
snscrape-facebook-user Silence by default 5 vuotta sitten
snscrape-instagram-user Silence by default 5 vuotta sitten
snscrape-prepare-commands Add support for Twitter hashtag extraction 4 vuotta sitten
snscrape-tmux Update tmux session commands 4 vuotta sitten
snscrape-twitter-filter Filter Twitter hashtag scrapes based on account scrapes 5 vuotta sitten
snscrape-twitter-hashtag Extract external links from Twitter 5 vuotta sitten
snscrape-twitter-user Extract external links from Twitter 5 vuotta sitten
snscrape-upload Print Instagram ignore immediately after upload instead of at the end 5 vuotta sitten
snscrape-vk-user Silence by default 5 vuotta sitten
snscrape-wiki-transfer-merge Helper tools for snscrape and the wiki pages 5 vuotta sitten
social-media-extract-profile-link Fix decoding of links on Facebook profiles 4 vuotta sitten
sum-sizes Add sum-sizes 2 vuotta sitten
tar-many-files-progress First set of little things 5 vuotta sitten
tcp-closer Add tcp-closer command 5 vuotta sitten
transfer.archivete.am-upload Handle HTTP/2 lowercase headers 3 vuotta sitten
transfer.notkiska.pw-check-ia Switch to HTTPS 3 vuotta sitten
uniqify Add uniqify 5 vuotta sitten
url-normalise Normalise domain name to lower-case before further processing 4 vuotta sitten
warc-peek Add WARC/1.1 support 3 vuotta sitten
warc-size Split out size formatting 5 vuotta sitten
warc-tiny Fix compatibility with wpull 2.x 3 vuotta sitten
website-extract-social-media Add support for Facebook /pages/category/Category/Name-ID URLs 4 vuotta sitten
wget-spider-estimate-size First set of little things 5 vuotta sitten
wiki-list-to-main Add ArchiveBot wiki list helper 5 vuotta sitten
wiki-recursive-extract-normalise Fix deduplication within each section processing 4 vuotta sitten
wiki-sections-sort Add wiki-sections-sort 4 vuotta sitten
wiki-website-extract-social-media Add script for automatic social media discovery 4 vuotta sitten
wpull1-parallel-progress-monitor First set of little things 5 vuotta sitten
wpull1-progress-monitor First set of little things 5 vuotta sitten
wpull2-extract-remaining Clean up wpull DB commands 3 vuotta sitten
wpull2-log-extract-errors Treat NXDOMAIN and no A/AAAA record errors as ok 3 vuotta sitten
wpull2-requeue Print number of modified records on requeueing 2 vuotta sitten
wpull2-url-origin Clean up wpull DB commands 3 vuotta sitten
youtube-channel-list.py Add YouTube channel listing script 2 vuotta sitten
youtube-extract Handle ancient /?v= URLs 2 vuotta sitten
youtube-filter-autogen-channels Add youtube-filter-autogen-channels 4 vuotta sitten
zstdwarccat Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed 2 vuotta sitten

README.md

Over the past few years, I’ve written and accumulated a number of useful little things to help with archival-related tasks. This repository collects them. I hope someone finds some of them useful.

License (applies to all programs in this repository)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.