JustAnotherArchivist
|
869ade27eb
|
Separate names in stderr annotations for the various url-normalise processes
|
4 years ago |
JustAnotherArchivist
|
79f0bd4332
|
Normalise URLs everywhere to reduce duplicates
|
4 years ago |
JustAnotherArchivist
|
dc4efcfbfb
|
One URL normalisation script to rule them all
Consolidate social media profile, YouTube, and (new) generic web page URL normalisation into one script
|
4 years ago |
JustAnotherArchivist
|
0f13a1fadd
|
Add verbosity options, and annotate stderr on wiki-recursive-extract
|
4 years ago |
JustAnotherArchivist
|
3ec816cd04
|
Add script for link extraction from social media profiles
|
4 years ago |
JustAnotherArchivist
|
5285c406d9
|
Add script for recursive website and social media discovery
|
4 years ago |
JustAnotherArchivist
|
2be9ca922e
|
Ignore more useless Facebook links
|
4 years ago |
JustAnotherArchivist
|
c3b0e5543e
|
Add support for facebook.com/pg/something
|
4 years ago |
JustAnotherArchivist
|
7c389f1fef
|
Add support for hashbang fragments on Twitter links
|
4 years ago |
JustAnotherArchivist
|
c56736bc4a
|
Ignore /intent on Twitter
|
4 years ago |
JustAnotherArchivist
|
4f34753788
|
Add support for Instagram posts and ignore spurious links from the CDN
|
4 years ago |
JustAnotherArchivist
|
ad030f5d21
|
Add support for Facebook pages and groups
|
4 years ago |
JustAnotherArchivist
|
cd0b3f6214
|
Ignore /vi/* on YouTube (video thumbnails)
|
4 years ago |
JustAnotherArchivist
|
6f1cca73ad
|
Support hashtags
|
4 years ago |
JustAnotherArchivist
|
c61efa03f0
|
Make social media normalisation script snscrape-independent
|
4 years ago |
JustAnotherArchivist
|
e6008eb971
|
Add script for automatic social media discovery
|
4 years ago |
JustAnotherArchivist
|
fed66542fa
|
Support python3 in any directory instead of just /usr/bin
|
4 years ago |
JustAnotherArchivist
|
5982e131a4
|
Stop gracefully when encountering a SIGPIPE
|
4 years ago |
JustAnotherArchivist
|
c13a1150df
|
Add support for WARC/1.1
|
4 years ago |
JustAnotherArchivist
|
376cde7b8c
|
Fix broken block digest calculation on malformed HTTP responses
|
4 years ago |
JustAnotherArchivist
|
b121cbd958
|
Write all log messages to stderr
|
4 years ago |
JustAnotherArchivist
|
ed1270d988
|
Add support for upper-cased chunk lengths
|
4 years ago |
JustAnotherArchivist
|
d4826abde2
|
Add record ID to log messages
|
4 years ago |
JustAnotherArchivist
|
4925a912c0
|
Add youtube-filter-autogen-channels
|
5 years ago |
JustAnotherArchivist
|
9b8f223776
|
Add wiki-sections-sort
|
5 years ago |
JustAnotherArchivist
|
552a4147c2
|
Fix not returning complete body for non-chunked responses
Leftover from debugging
|
5 years ago |
JustAnotherArchivist
|
0dc0de6b50
|
Add support for lists
|
5 years ago |
JustAnotherArchivist
|
9d344df8c6
|
+x
|
5 years ago |
JustAnotherArchivist
|
f6a7cbfc70
|
Fix --with-list-urls help message
|
5 years ago |
JustAnotherArchivist
|
9743aa7c35
|
Add s3-bucket-list
|
5 years ago |
JustAnotherArchivist
|
91adce786f
|
Add YouTube normalisation script
|
5 years ago |
JustAnotherArchivist
|
5ca90c3b7d
|
Update tmux session commands
|
5 years ago |
JustAnotherArchivist
|
679923d37d
|
Add support for Twitter hashtag extraction
|
5 years ago |
JustAnotherArchivist
|
663383830c
|
Add support for lists
|
5 years ago |
JustAnotherArchivist
|
d85d142def
|
Handle parameters on Twitter URLs
|
5 years ago |
JustAnotherArchivist
|
5984565417
|
Handle Twitter URLs with trailing slash
|
5 years ago |
JustAnotherArchivist
|
8647ccaa8f
|
Support subdomain-less Facebook URLs
|
5 years ago |
JustAnotherArchivist
|
66ec0c93c4
|
Handle more Facebook URLs
|
5 years ago |
JustAnotherArchivist
|
baa8a566bd
|
Add script for scraping MEP links from europarl.europa.eu
|
5 years ago |
JustAnotherArchivist
|
c2413b2c4f
|
Add ArchiveBot wiki list helper
|
5 years ago |
JustAnotherArchivist
|
72818019bc
|
Extract external links from Twitter
|
5 years ago |
JustAnotherArchivist
|
b262d893da
|
Silence by default
|
5 years ago |
JustAnotherArchivist
|
6fb9587a2b
|
More flexible normalisation
|
5 years ago |
JustAnotherArchivist
|
06be216f4c
|
Print Instagram ignore immediately after upload instead of at the end
|
5 years ago |
JustAnotherArchivist
|
1be4ed829b
|
Add helper for AB/chromebot-ing YouTube channels and users
|
5 years ago |
JustAnotherArchivist
|
2a7a4ea6dc
|
Fix HTTPS handling
|
5 years ago |
JustAnotherArchivist
|
a812cb5fc2
|
More snscrape helper tools
|
5 years ago |
JustAnotherArchivist
|
3ee3ffc340
|
Generate commands for Blogspot
|
5 years ago |
JustAnotherArchivist
|
5090a8ad02
|
Enumerate users on a Mastodon instance
|
5 years ago |
JustAnotherArchivist
|
0000d8ffd9
|
Add script to queue derive on IA
|
5 years ago |