JustAnotherArchivist
321067819c
Proper script for tracking size of uploaded data
5 роки тому
JustAnotherArchivist
5c654cb16b
Split out size formatting
5 роки тому
JustAnotherArchivist
de2cdc0aae
curl with ArchiveBot UA
5 роки тому
JustAnotherArchivist
89ccd68b59
Helper tools for snscrape and the wiki pages
5 роки тому
JustAnotherArchivist
f2e836d2e9
Add support for differently formatted digests
5 роки тому
JustAnotherArchivist
94c4f76570
Fix crash when a digest is missing from a record
5 роки тому
JustAnotherArchivist
ef78a3318c
Colour only the header field names but not the values
5 роки тому
JustAnotherArchivist
9ce4653094
Document colouring and usage
5 роки тому
JustAnotherArchivist
e7c5d82254
Coloured WARCs?!
5 роки тому
JustAnotherArchivist
70b413f5c1
Better events: include raw WARC header data and separate HTTP requests into headers and body
5 роки тому
JustAnotherArchivist
641bc7a207
Fix infinite loop at end of WARC
5 роки тому
JustAnotherArchivist
a700e8e2fe
Add tcp-closer command
5 роки тому
JustAnotherArchivist
859c75a591
Add tool for WARC verification and extraction
5 роки тому
JustAnotherArchivist
e867a2327f
Replace urlencoded @ symbol
The fix for https://github.com/dutchcoders/transfer.sh/issues/215 led to @ being encoded as %40 in filenames in the URL returned, which is awkward when working with social media scrapes since ArchiveBot normalises it to @ again.
5 роки тому
JustAnotherArchivist
cbd952024b
Workaround for hash no longer needed with current transfer.sh code
5 роки тому
JustAnotherArchivist
61431c2054
Add VK scraping helper
5 роки тому
JustAnotherArchivist
d6ff566c4d
Instagram always uses lower-case usernames
5 роки тому
JustAnotherArchivist
138c2a2d39
Get rid of post-processing now that snscrape (dev version) has clean URLs
Keep the dirty URLs on Instagram because they're not that dirty and are linked from the profile pages. I usually throw it into ArchiveBot anyway such that it grabs the non-"taken-by" URLs as well.
5 роки тому
JustAnotherArchivist
27b0d2da75
Better username capitalisation extraction method
5 роки тому
JustAnotherArchivist
3aa828a0ac
transfer.kiska.pw -> transfer.notkiska.pw
5 роки тому
JustAnotherArchivist
63f4a8b3d3
transfer.sh -> transfer.kiska.pw
5 роки тому
JustAnotherArchivist
0168d50f62
Automatically fix capitalisation of Facebook and Twitter usernames
5 роки тому
JustAnotherArchivist
db0104b3c8
Get correct capitalisation for a Facebook username
5 роки тому
JustAnotherArchivist
4a1a9a10e0
Allow overriding the "remote filename"
5 роки тому
JustAnotherArchivist
769f95808e
Add ix.io upload script
5 роки тому
JustAnotherArchivist
c79721337b
+x
5 роки тому
JustAnotherArchivist
c30dcf5985
Finding outdated Mastodon instances
5 роки тому
JustAnotherArchivist
1748a6b607
Better workaround for the 5000 results limit; works for FoolFuuka 2.0.1 and up
5 роки тому
JustAnotherArchivist
fd680551df
Add Bing, Reddit/Pushshift, and FoolFuuka scrapers
5 роки тому
JustAnotherArchivist
ede77ad142
Filter Twitter hashtag scrapes based on account scrapes
5 роки тому
JustAnotherArchivist
57ef544c6c
Fix line endings
5 роки тому
JustAnotherArchivist
07c3e7baaa
Add snscrape helpers
5 роки тому
JustAnotherArchivist
b7e3a703d8
Monitor how a pipeline's wget processes are faring
5 роки тому
JustAnotherArchivist
168f61b39a
Quote filename so it works with any weird characters in the paths
(Last reconstructed commit from text file full of different versions)
5 роки тому
JustAnotherArchivist
8f77c8c72a
xargs -r flag to not run the second find if the first produces no results (GNU extension)
5 роки тому
JustAnotherArchivist
9d7a4096f9
Pipe into second find directly
5 роки тому
JustAnotherArchivist
e3a4bf6a47
Replace slow lsof with procfs access
5 роки тому
JustAnotherArchivist
4a83a54616
Print host for each stuck request
5 роки тому
JustAnotherArchivist
2b2c65f034
Print PID
5 роки тому
JustAnotherArchivist
fadb70e297
Fixed version which handles multiple roots correctly
5 роки тому
JustAnotherArchivist
d10a1d3675
First set of little things
5 роки тому
JustAnotherArchivist
a00607f28e
Initial commit
5 роки тому
JustAnotherArchivist
2a41f169c5
Add -c option to cast the return value of shutdown(2) to int explicitly on broken machines
6 роки тому
JustAnotherArchivist
8ffb48fb1b
Remove set -e/errexit, which causes the script to silently fail when no process is found with -j
6 роки тому
JustAnotherArchivist
632fbcb4d0
Replace kill with ps in process existence check
kill returns the same status whether a process doesn't exist or the current user doesn't have permission to kill, so the script returned a confusing error message in the latter case.
6 роки тому
JustAnotherArchivist
4f3cfc6e56
Add check for ptrace scope
6 роки тому
JustAnotherArchivist
96a329578e
Refactor
6 роки тому
JustAnotherArchivist
1e7ec4a56e
Executable bit
6 роки тому
JustAnotherArchivist
73877ecb96
Initial commit
6 роки тому
JustAnotherArchivist
10715f1d3a
Rewrite GDB command to stop on the first error, e.g. if lsof is broken.
The use of call("echo 'string'") instead of print('string') or sys.stdout.write('string') is due to the latter two not reliably reporting back whether they were successful or not: print doesn't return anything (and actually can't be chained like this), and the return value of sys.stdout.write depends on the Python version (None on Python 2, number of bytes written on Python 3).
6 роки тому