.gitignore
Add infrastructure for simple C-based tools
il y a 2 ans
.make-and-exec
Add infrastructure for simple C-based tools
il y a 2 ans
.make-and-exec-Makefile
Add infrastructure for simple C-based tools
il y a 2 ans
.urldecode-test
Remove debugging prints
il y a 2 ans
.youtube-extract-rapid-test
Add youtube-extract-rapid
il y a 2 ans
LICENSE
Initial commit
il y a 5 ans
README.md
Initial commit
il y a 5 ans
alphabetseq
Swap syntaxes
il y a 2 ans
archivebot-blogspot
Fix HTTPS handling
il y a 5 ans
archivebot-high-memory
Support python3 in any directory instead of just /usr/bin
il y a 4 ans
archivebot-irccloud-paste
Add archivebot-irccloud-paste
il y a 3 ans
archivebot-jobid-calculation
More snscrape helper tools
il y a 5 ans
archivebot-jobs
Pass through datetime, math, re, and time to --pyfilter
il y a 3 ans
archivebot-list-stuck-requests
Fix line endings
il y a 5 ans
archivebot-log-extract-ignores
Add archivebot-log-extract-ignores
il y a 3 ans
archivebot-monitor-job-queue
First set of little things
il y a 5 ans
archivebot-youtube
Add helper for AB/chromebot-ing YouTube channels and users
il y a 5 ans
at-tracker-sample-user-item-size
Add at-tracker-sample-user-item-size
il y a 2 ans
azure-storage-list
Add --jsonl option
il y a 2 ans
b64grep
Add b64grep
il y a 2 ans
base64url
Add base64url
il y a 2 ans
bing-scrape
Add Bing, Reddit/Pushshift, and FoolFuuka scrapers
il y a 5 ans
bugzilla-url-list
Add Bugzilla URL list generator
il y a 2 ans
cdx-chunk
Add cdx-chunk
il y a 2 ans
combine-by-prefix
Add combine-by-prefix
il y a 2 ans
curl-ua
Add IE6 UA
il y a 3 ans
deb-repo-urls
Fix deb file URLs
il y a 3 ans
dedupe
Another alternative and performance/memory comparison
il y a 3 ans
europarl-meps-collect
Add script for scraping MEP links from europarl.europa.eu
il y a 5 ans
foolfuuka-search
Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up
il y a 5 ans
format-size
Split out size formatting
il y a 5 ans
fos-ftp-upload
First set of little things
il y a 5 ans
get-crx4chrome-urls
First set of little things
il y a 5 ans
github-list-repos
Fix org repo listing on new design/site structure
il y a 2 ans
gitlab-list-repos
Add support for other instances and full-instance listing
il y a 2 ans
gofile.io-dl
Add support for password-protected folders
il y a 2 ans
ia-cdx-search
Fix crash on an empty response
il y a 2 ans
ia-derive
Add script to queue derive on IA
il y a 5 ans
ia-files-xml-to-jsonl
Guarantee stable output order
il y a 2 ans
ia-upload-progress
Proper script for tracking size of uploaded data
il y a 5 ans
ia-upload-stream
Handle connection errors
il y a 2 ans
ia-verify-file
Add a timeout to prevent potentially indefinite blocking
il y a 2 ans
ia-wait-item-tasks
Add ia-wait-item-tasks
il y a 2 ans
iasha1check
Colourise sha1sum output
il y a 3 ans
ix.io-upload
Allow overriding the "remote filename"
il y a 5 ans
kill-wpull-connections
Merge kill-wpull-connections repository into little-things
il y a 3 ans
killcx-all-https
First set of little things
il y a 5 ans
mastodon-enumerate-users
Enumerate users on a Mastodon instance
il y a 5 ans
mastodon-outdated
Finding outdated Mastodon instances
il y a 5 ans
parent-urls
Refactor, strip query/fragment
il y a 3 ans
pipelines-launch-in-tmux-windows
First set of little things
il y a 5 ans
pipelines-monitor-tmux-wget-outcomes
Monitor how a pipeline's wget processes are faring
il y a 5 ans
pipelines-stop-gracefully
First set of little things
il y a 5 ans
reddit-pushshift-search
Add Bing, Reddit/Pushshift, and FoolFuuka scrapers
il y a 5 ans
run-every-five-minutes
First set of little things
il y a 5 ans
s3-bucket-list
Ignore TLS issues
il y a 3 ans
s3-bucket-list-qwarc
Record wrapper script in meta WARC as well
il y a 3 ans
snscrape-extract
Add support for Twitter hashtag extraction
il y a 4 ans
snscrape-facebook-user
Silence by default
il y a 5 ans
snscrape-instagram-user
Silence by default
il y a 5 ans
snscrape-prepare-commands
Add support for Twitter hashtag extraction
il y a 4 ans
snscrape-tmux
Update tmux session commands
il y a 4 ans
snscrape-twitter-filter
Filter Twitter hashtag scrapes based on account scrapes
il y a 5 ans
snscrape-twitter-hashtag
Extract external links from Twitter
il y a 5 ans
snscrape-twitter-user
Extract external links from Twitter
il y a 5 ans
snscrape-upload
Print Instagram ignore immediately after upload instead of at the end
il y a 5 ans
snscrape-vk-user
Silence by default
il y a 5 ans
snscrape-wiki-transfer-merge
Helper tools for snscrape and the wiki pages
il y a 5 ans
social-media-extract-profile-link
Fix decoding of links on Facebook profiles
il y a 4 ans
sum-sizes
Add sum-sizes
il y a 2 ans
tar-many-files-progress
First set of little things
il y a 5 ans
tcp-closer
Add tcp-closer command
il y a 5 ans
transfer.archivete.am-upload
Handle HTTP/2 lowercase headers
il y a 3 ans
transfer.notkiska.pw-check-ia
Switch to HTTPS
il y a 3 ans
uniqify
Add uniqify
il y a 5 ans
url-normalise
Normalise domain name to lower-case before further processing
il y a 4 ans
urldecode
Add URL/percent decoding tool
il y a 2 ans
urldecode.c
Add URL/percent decoding tool
il y a 2 ans
warc-peek
Add WARC/1.1 support
il y a 3 ans
warc-size
Split out size formatting
il y a 5 ans
warc-tiny
Add support for reading from stdin
il y a 2 ans
website-extract-social-media
Add support for Facebook /pages/category/Category/Name-ID URLs
il y a 4 ans
wget-spider-estimate-size
First set of little things
il y a 5 ans
wiki-list-to-main
Add ArchiveBot wiki list helper
il y a 5 ans
wiki-recursive-extract-normalise
Fix deduplication within each section processing
il y a 4 ans
wiki-sections-sort
Add wiki-sections-sort
il y a 4 ans
wiki-website-extract-social-media
Add script for automatic social media discovery
il y a 4 ans
wpull1-parallel-progress-monitor
First set of little things
il y a 5 ans
wpull1-progress-monitor
First set of little things
il y a 5 ans
wpull2-extract-remaining
Clean up wpull DB commands
il y a 3 ans
wpull2-log-extract-errors
Treat NXDOMAIN and no A/AAAA record errors as ok
il y a 3 ans
wpull2-requeue
Print number of modified records on requeueing
il y a 2 ans
wpull2-url-origin
Clean up wpull DB commands
il y a 3 ans
youtube-channel-list.py
Add YouTube channel listing script
il y a 2 ans
youtube-extract
Handle ancient /?v= URLs
il y a 2 ans
youtube-extract-rapid
Add youtube-extract-rapid
il y a 2 ans
youtube-extract-rapid.c
Add youtube-extract-rapid
il y a 2 ans
youtube-filter-autogen-channels
Add youtube-filter-autogen-channels
il y a 4 ans
zstdwarccat
Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed
il y a 2 ans