LICENSE
Initial commit
5 jaren geleden
README.md
Initial commit
5 jaren geleden
archivebot-blogspot
Fix HTTPS handling
5 jaren geleden
archivebot-high-memory
Support python3 in any directory instead of just /usr/bin
4 jaren geleden
archivebot-irccloud-paste
Add archivebot-irccloud-paste
3 jaren geleden
archivebot-jobid-calculation
More snscrape helper tools
5 jaren geleden
archivebot-jobs
Pass through datetime, math, re, and time to --pyfilter
3 jaren geleden
archivebot-list-stuck-requests
Fix line endings
5 jaren geleden
archivebot-log-extract-ignores
Add archivebot-log-extract-ignores
3 jaren geleden
archivebot-monitor-job-queue
First set of little things
5 jaren geleden
archivebot-youtube
Add helper for AB/chromebot-ing YouTube channels and users
5 jaren geleden
b64grep
Add b64grep
2 jaren geleden
bing-scrape
Add Bing, Reddit/Pushshift, and FoolFuuka scrapers
5 jaren geleden
bugzilla-url-list
Add Bugzilla URL list generator
2 jaren geleden
combine-by-prefix
Add combine-by-prefix
2 jaren geleden
curl-ua
Add IE6 UA
3 jaren geleden
deb-repo-urls
Fix deb file URLs
3 jaren geleden
dedupe
Another alternative and performance/memory comparison
3 jaren geleden
europarl-meps-collect
Add script for scraping MEP links from europarl.europa.eu
5 jaren geleden
foolfuuka-search
Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up
5 jaren geleden
format-size
Split out size formatting
5 jaren geleden
fos-ftp-upload
First set of little things
5 jaren geleden
get-crx4chrome-urls
First set of little things
5 jaren geleden
github-list-repos
Add --git-urls and --gitgud-complete-items
3 jaren geleden
gitlab.com-list-repos
Replicate the GitHub script interface for convenience
3 jaren geleden
gofile.io-dl
Add --urls and --nodl options
3 jaren geleden
ia-derive
Add script to queue derive on IA
5 jaren geleden
ia-files-xml-to-jsonl
Guarantee stable output order
2 jaren geleden
ia-upload-progress
Proper script for tracking size of uploaded data
5 jaren geleden
ia-verify-file
Add ia-verify-file
2 jaren geleden
iasha1check
Colourise sha1sum output
3 jaren geleden
ix.io-upload
Allow overriding the "remote filename"
5 jaren geleden
kill-wpull-connections
Merge kill-wpull-connections repository into little-things
3 jaren geleden
killcx-all-https
First set of little things
5 jaren geleden
mastodon-enumerate-users
Enumerate users on a Mastodon instance
5 jaren geleden
mastodon-outdated
Finding outdated Mastodon instances
5 jaren geleden
parent-urls
Refactor, strip query/fragment
3 jaren geleden
pipelines-launch-in-tmux-windows
First set of little things
5 jaren geleden
pipelines-monitor-tmux-wget-outcomes
Monitor how a pipeline's wget processes are faring
5 jaren geleden
pipelines-stop-gracefully
First set of little things
5 jaren geleden
reddit-pushshift-search
Add Bing, Reddit/Pushshift, and FoolFuuka scrapers
5 jaren geleden
run-every-five-minutes
First set of little things
5 jaren geleden
s3-bucket-list
Ignore TLS issues
3 jaren geleden
s3-bucket-list-qwarc
Record wrapper script in meta WARC as well
3 jaren geleden
snscrape-extract
Add support for Twitter hashtag extraction
4 jaren geleden
snscrape-facebook-user
Silence by default
5 jaren geleden
snscrape-instagram-user
Silence by default
5 jaren geleden
snscrape-prepare-commands
Add support for Twitter hashtag extraction
4 jaren geleden
snscrape-tmux
Update tmux session commands
4 jaren geleden
snscrape-twitter-filter
Filter Twitter hashtag scrapes based on account scrapes
5 jaren geleden
snscrape-twitter-hashtag
Extract external links from Twitter
5 jaren geleden
snscrape-twitter-user
Extract external links from Twitter
5 jaren geleden
snscrape-upload
Print Instagram ignore immediately after upload instead of at the end
5 jaren geleden
snscrape-vk-user
Silence by default
5 jaren geleden
snscrape-wiki-transfer-merge
Helper tools for snscrape and the wiki pages
5 jaren geleden
social-media-extract-profile-link
Fix decoding of links on Facebook profiles
4 jaren geleden
sum-sizes
Add sum-sizes
2 jaren geleden
tar-many-files-progress
First set of little things
5 jaren geleden
tcp-closer
Add tcp-closer command
5 jaren geleden
transfer.archivete.am-upload
Handle HTTP/2 lowercase headers
3 jaren geleden
transfer.notkiska.pw-check-ia
Switch to HTTPS
3 jaren geleden
uniqify
Add uniqify
5 jaren geleden
url-normalise
Normalise domain name to lower-case before further processing
4 jaren geleden
warc-peek
Add WARC/1.1 support
3 jaren geleden
warc-size
Split out size formatting
5 jaren geleden
warc-tiny
Fix compatibility with wpull 2.x
3 jaren geleden
website-extract-social-media
Add support for Facebook /pages/category/Category/Name-ID URLs
4 jaren geleden
wget-spider-estimate-size
First set of little things
5 jaren geleden
wiki-list-to-main
Add ArchiveBot wiki list helper
5 jaren geleden
wiki-recursive-extract-normalise
Fix deduplication within each section processing
4 jaren geleden
wiki-sections-sort
Add wiki-sections-sort
4 jaren geleden
wiki-website-extract-social-media
Add script for automatic social media discovery
4 jaren geleden
wpull1-parallel-progress-monitor
First set of little things
5 jaren geleden
wpull1-progress-monitor
First set of little things
5 jaren geleden
wpull2-extract-remaining
Clean up wpull DB commands
3 jaren geleden
wpull2-log-extract-errors
Treat NXDOMAIN and no A/AAAA record errors as ok
3 jaren geleden
wpull2-requeue
Add script for requeueing skipped URLs due to too many failed attempts on wpull crawls
3 jaren geleden
wpull2-url-origin
Clean up wpull DB commands
3 jaren geleden
youtube-extract
Always decode stdin with surrogateescape to avoid breaking on binary input
2 jaren geleden
youtube-filter-autogen-channels
Add youtube-filter-autogen-channels
4 jaren geleden
zstdwarccat
Fix piping when reads return less data than expected
2 jaren geleden