.gitignore
Add infrastructure for simple C-based tools
3 years ago
.make-and-exec
Warnings are bad, mmkay?
1 year ago
.urldecode-test
Get rid of Makefile for more control; add proper debug build support
1 year ago
.warc-dump-responses-test
Add test for warc-dump-responses
1 year ago
.youtube-extract-rapid-test
Get rid of Makefile for more control; add proper debug build support
1 year ago
LICENSE
Initial commit
5 years ago
README.md
Initial commit
5 years ago
alphabetseq
Add common alphabets as names
3 months ago
archivebot-blogspot
Fix HTTPS handling
5 years ago
archivebot-compress-db
Add archivebot-compress-db
1 year ago
archivebot-db-edit
Add --code filter (and completely overhaul arguments parsing to accomodate it)
1 month ago
archivebot-fix-queue-counters
In-progress URLs are not counted as part of the queue
2 months ago
archivebot-high-resources
Replace archivebot-high-memory with more capable archivebot-high-resources
1 year ago
archivebot-irccloud-paste
Add archivebot-irccloud-paste
4 years ago
archivebot-jobid-calculation
More snscrape helper tools
5 years ago
archivebot-jobs
Fix con-d-commands mode
1 month ago
archivebot-list-stuck-requests
Fix line endings
5 years ago
archivebot-log-extract-ignores
Add archivebot-log-extract-ignores
3 years ago
archivebot-monitor-job-queue
First set of little things
5 years ago
archivebot-pipelines-count-jobs
Add free slots column
2 months ago
archivebot-youtube
Add helper for AB/chromebot-ing YouTube channels and users
5 years ago
at-tracker-sample-user-item-size
Add at-tracker-sample-user-item-size
3 years ago
azure-storage-list
Add --jsonl option
3 years ago
b64grep
Add b64grep
3 years ago
base64url
Add base64url
3 years ago
bencode2json
Add bencode2json
2 years ago
bing-scrape
Fix extraction of search results
1 year ago
bugzilla-url-list
Add Bugzilla URL list generator
3 years ago
cdx-chunk
Add cdx-chunk
3 years ago
cloudflare-email-decode
Add cloudflare-email-decode
2 years ago
combine-by-prefix
Add combine-by-prefix
3 years ago
curl-ia
Add IA_S3_{ACCESS,SECRET} support to curl-ia in header mode
4 months ago
curl-ua
Add IE6 UA
4 years ago
deb-repo-urls
Fix deb file URLs
4 years ago
dedupe
Another alternative and performance/memory comparison
4 years ago
dir-to-ia
Add warning that the script is still experimental
1 month ago
europarl-meps-collect
Add script for scraping MEP links from europarl.europa.eu
5 years ago
extract-urls-for-archiveteam-projects
Add wpull2-extract-ignored-offsite and extract-urls-for-archiveteam-projects
1 year ago
foolfuuka-search
Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up
5 years ago
format-size
Split out size formatting
5 years ago
fos-ftp-upload
First set of little things
5 years ago
get-crx4chrome-urls
First set of little things
5 years ago
gitea-list-repos
Add gitea-list-repos
3 months ago
github-list-repos
Fix extraction on org pages
23 hours ago
gitlab-list-repos
Add comment about HTML nonsense
5 months ago
gofile.io-dl
Add support for password-protected folders
3 years ago
html-extract-stupid
Handle and
1 year ago
http-response-bodies
Add http-response-bodies
2 years ago
http-response-bodies.c
Fix extra LF between chunks
1 year ago
ia-cdx-search
Work around CDX API bugs
4 months ago
ia-cdx-search-subdomains
Fix URLs without a path
2 years ago
ia-derive
Queue derives with `ia tasks` instead of this manual curl rubbish
2 years ago
ia-files-xml-to-jsonl
Guarantee stable output order
3 years ago
ia-s3-auth
Add ia-s3-auth
3 weeks ago
ia-upload-progress
Proper script for tracking size of uploaded data
5 years ago
ia-upload-stream
Allow parts to finish even if an earlier part hasn't done so yet
2 weeks ago
ia-verify-file
Add a timeout to prevent potentially indefinite blocking
3 years ago
ia-wait-item-tasks
Rewrite ia-wait-item-tasks to directly call the tasks API instead of using the ia CLI
6 months ago
iasha1check
Fix output sometimes appearing after prompt
2 years ago
ix.io-upload
Allow overriding the "remote filename"
5 years ago
kill-connections
Handle processes with too many open connections
2 years ago
kill-wpull-connections
Merge kill-wpull-connections repository into little-things
4 years ago
killcx-all-https
First set of little things
5 years ago
mastodon-enumerate-users
Enumerate users on a Mastodon instance
5 years ago
mastodon-outdated
Finding outdated Mastodon instances
5 years ago
moinmoin-url-list
Add moinmoin-url-list
1 year ago
parent-urls
Refactor, strip query/fragment
4 years ago
pipelines-launch-in-tmux-windows
First set of little things
5 years ago
pipelines-monitor-tmux-wget-outcomes
Monitor how a pipeline's wget processes are faring
5 years ago
pipelines-stop-gracefully
First set of little things
5 years ago
reddit-pushshift-search
Add Bing, Reddit/Pushshift, and FoolFuuka scrapers
5 years ago
run-every-five-minutes
First set of little things
5 years ago
s3-bucket-find-direct-url
Add support for PermanentRedirect error responses
1 year ago
s3-bucket-list
Fix for DigitalOcean Spaces (which puts <Marker> after the <Contents>)
5 months ago
s3-bucket-list-qwarc
Add JSONL output option for S3 listing
3 years ago
snscrape-extract
Add support for Twitter hashtag extraction
5 years ago
snscrape-facebook-user
Silence by default
5 years ago
snscrape-instagram-user
Silence by default
5 years ago
snscrape-prepare-commands
Add support for Twitter hashtag extraction
5 years ago
snscrape-tmux
Update tmux session commands
5 years ago
snscrape-twitter-filter
Filter Twitter hashtag scrapes based on account scrapes
5 years ago
snscrape-twitter-hashtag
Extract external links from Twitter
5 years ago
snscrape-twitter-user
Extract external links from Twitter
5 years ago
snscrape-upload
Print Instagram ignore immediately after upload instead of at the end
5 years ago
snscrape-vk-user
Silence by default
5 years ago
snscrape-wiki-transfer-merge
Helper tools for snscrape and the wiki pages
5 years ago
social-media-extract-profile-link
Fix decoding of links on Facebook profiles
4 years ago
sum-sizes
Avoid float roundtrip for integer values
1 year ago
tar-many-files-progress
First set of little things
5 years ago
tcp-closer
Add tcp-closer command
5 years ago
torrent-tiny
Fix negative ints
1 year ago
transfer.archivete.am-upload
Handle HTTP/2 lowercase headers
3 years ago
transfer.notkiska.pw-check-ia
Switch to HTTPS
4 years ago
uniqify
Add uniqify
5 years ago
uniqify-recent
Add uniqify-recent
11 months ago
url-normalise
Normalise domain name to lower-case before further processing
4 years ago
urldecode
Add URL/percent decoding tool
3 years ago
urldecode.c
Fix unused argc and argv error
1 year ago
urlsort
Add urlsort
2 years ago
warc-dump-responses
Add warc-dump-responses
2 years ago
warc-dump-responses.c
Fix error when the terminating CRLFCRLF of a record is truncated
1 year ago
warc-peek
Allow negative offsets to peek near the end of the file
2 years ago
warc-size
Split out size formatting
5 years ago
warc-tiny
Fix empty files being considered valid WARCs
1 year ago
website-extract-social-media
Add support for Facebook /pages/category/Category/Name-ID URLs
4 years ago
wget-spider-estimate-size
First set of little things
5 years ago
wiki-list-to-main
Add ArchiveBot wiki list helper
5 years ago
wiki-recursive-extract-normalise
Fix deduplication within each section processing
5 years ago
wiki-sections-sort
Add wiki-sections-sort
5 years ago
wiki-website-extract-social-media
Add script for automatic social media discovery
5 years ago
wpull1-parallel-progress-monitor
First set of little things
5 years ago
wpull1-progress-monitor
First set of little things
5 years ago
wpull2-children
Add wpull2-children
3 months ago
wpull2-db-edit
Add --code filter (and completely overhaul arguments parsing to accomodate it)
1 month ago
wpull2-extract-ignored
Remove filtering of onsite URLs because it's unreliable
1 year ago
wpull2-extract-remaining
Clean up wpull DB commands
3 years ago
wpull2-log-colourise
Add wpull2-log-colourise
2 years ago
wpull2-log-extract-errors
Treat NXDOMAIN and no A/AAAA record errors as ok
4 years ago
wpull2-requeue
Add wpull2-db-edit, a more flexible and powerful replacement of wpull2-requeue
2 months ago
wpull2-url-origin
Clean up wpull DB commands
3 years ago
youtube-channel-list.py
Use _type instead of key check hack
2 years ago
youtube-extract
Exclude backslashes in channel patterns
2 years ago
youtube-extract-rapid
Add youtube-extract-rapid
3 years ago
youtube-extract-rapid.c
Add youtube-extract-rapid
3 years ago
youtube-filter-autogen-channels
Add youtube-filter-autogen-channels
5 years ago
zstdwarccat
Fix 'Dictionary mismatch' error when very small dicts are used because the temporary file isn't written to disk before zstdcat gets executed
3 years ago