JustAnotherArchivist
171ca4252b
Disable truncation when stdout is not a terminal
4 years ago
JustAnotherArchivist
9763370976
Truncate URLs by default to fit the terminal width
4 years ago
JustAnotherArchivist
1bc1487ecc
Add script for extracting remaining wpull 2 queue
4 years ago
JustAnotherArchivist
d3c00353da
Make con-d-commands mode an alias of the corresponding format
4 years ago
JustAnotherArchivist
b4fe6dd754
Reorder arguments to make more sense
4 years ago
JustAnotherArchivist
c547fc6c6b
Add format mode
4 years ago
JustAnotherArchivist
7fde199151
Add --mode con-d-commands, replace --dashboard-regex with --mode dashboard-regex
4 years ago
JustAnotherArchivist
0a6f83b1b8
Add --dashboard-regex
4 years ago
JustAnotherArchivist
05ed1e004b
Add more columns
4 years ago
JustAnotherArchivist
3f7d84ab12
Refactor the Bash/Python abomination into a pure Python script so I get to keep my sanity while editing
4 years ago
JustAnotherArchivist
cf879c86c9
Refactor into something more flexible to the addition of new columns
4 years ago
JustAnotherArchivist
d7bd8de09d
Add --dates option
4 years ago
JustAnotherArchivist
236278f0b4
Fix decoding of links on Facebook profiles
4 years ago
JustAnotherArchivist
d7a07d1d99
Normalise domain name to lower-case before further processing
4 years ago
JustAnotherArchivist
e655080e20
Add support for Facebook /pages/category/Category/Name-ID URLs
4 years ago
JustAnotherArchivist
daa1a95792
Proper URL decoding
4 years ago
JustAnotherArchivist
1bee1cdcc7
Add support for Facebook /people/Name/ID URLs
4 years ago
JustAnotherArchivist
00107c0ef0
Add support for YouTube /c/X URLs
4 years ago
JustAnotherArchivist
b59b82041c
Add support for wiki list entries with options
4 years ago
JustAnotherArchivist
d5953ca95c
Use old Opera UA for Twitter to force the old design
4 years ago
JustAnotherArchivist
1fa57d41a3
Fix extraction on Wix sites from JSON inside a data attribute
Example: https://www.martinedocourt.ch/
4 years ago
JustAnotherArchivist
4a742162d0
Suppress output if there are no matched jobs
4 years ago
JustAnotherArchivist
fe72d57d7e
Add filtering based on substrings anywhere in the string and on regex
4 years ago
JustAnotherArchivist
cf30a53f82
Add case-insensitive filtering
4 years ago
JustAnotherArchivist
711e444e8e
Highlight jobs that have been inactive for over 6 hours
4 years ago
JustAnotherArchivist
b2919030ab
Fix sorting on numerical columns
4 years ago
JustAnotherArchivist
257b578fbe
Add descending sort
4 years ago
JustAnotherArchivist
6e7449d137
Support column names in any capitalisation
4 years ago
JustAnotherArchivist
e5e7bdf8af
Add more filtering options
4 years ago
JustAnotherArchivist
c611420be9
Remove options from usage line
4 years ago
JustAnotherArchivist
824eb5e353
Add script for getting an AB job overview table
4 years ago
JustAnotherArchivist
34c1a58034
Fix detection of multiple transfer encodings
4 years ago
JustAnotherArchivist
195df08cd5
Fix marker loop on some filenames due to lacking HTML entity processing
E.g. https://audio-market-dev.s3.amazonaws.com/?marker=media/23/Hard%20Style%20Producer
4 years ago
JustAnotherArchivist
3cc3a1ed38
Fix nested tags
E.g. <Owner> tag which has <ID> and <DisplayName>, e.g. https://appengage-video.s3.amazonaws.com/
4 years ago
JustAnotherArchivist
5c907488e1
Handle broken pipe on stdout
4 years ago
JustAnotherArchivist
b38349e91f
Fix duplicate slashes
4 years ago
JustAnotherArchivist
f23e4cc71e
Retry on internal errors
4 years ago
JustAnotherArchivist
bfe5f59e25
Add marker loop detection
4 years ago
JustAnotherArchivist
66bdef3247
Take a bucket URL argument instead of hostname + bucketname
4 years ago
JustAnotherArchivist
e385c1d302
Limit curl to 10 seconds
4 years ago
JustAnotherArchivist
74162445aa
Replace curl-archivebot-ua with a more general curl-ua script that supports different UAs selected by aliases
4 years ago
JustAnotherArchivist
9d712d64d7
Ignore certain URLs on Twitter and Instagram entirely
4 years ago
JustAnotherArchivist
87826d4844
Use line variable instead of prefix+url
4 years ago
JustAnotherArchivist
163aacf13c
Print deletion URL on stderr
4 years ago
JustAnotherArchivist
486a593f15
Add support for more weird Facebook URLs
4 years ago
JustAnotherArchivist
256a94443e
Fix deduplication within each section processing
4 years ago
JustAnotherArchivist
98d77ecc96
Deduplicate output
This uses mawk's extensions `-W interactive` and `delete array`; it will probably work with certain other AWK implementations as well, but for now it depends on mawk explicitly.
4 years ago
JustAnotherArchivist
6ce64baf87
Remove redundant url-normalise after the extraction
Since all input is run through url-normalise before processing and all output of website and social media extraction is also normalised, it's not necessary to re-normalise again at the end.
4 years ago
JustAnotherArchivist
318183148e
Fix URL extraction from Facebook profile overview pages
4 years ago
JustAnotherArchivist
869ade27eb
Separate names in stderr annotations for the various url-normalise processes
4 years ago