JustAnotherArchivist
|
e093211496
|
Set content type for resource records
|
4 years ago |
JustAnotherArchivist
|
ae46b53401
|
Always write a WARC-Warcinfo-ID header
|
4 years ago |
JustAnotherArchivist
|
23fcdd4026
|
Write microsecond dates for request and response records
|
4 years ago |
JustAnotherArchivist
|
3030ad10ab
|
Mark private API accordingly
|
4 years ago |
JustAnotherArchivist
|
e0b4104d21
|
Remove log handler before writing log record since that requires closing the stream
|
4 years ago |
JustAnotherArchivist
|
6cfd352f68
|
Write WARC/1.1 files
|
4 years ago |
JustAnotherArchivist
|
e1ad5c232e
|
Write warcinfo and resource records in meta WARC on firing up qwarc rather than at the end
|
4 years ago |
JustAnotherArchivist
|
f038cf91db
|
Fix unfound distribution handling
|
4 years ago |
JustAnotherArchivist
|
a5dfd5c805
|
Write spec file + its dependencies and command line to meta WARC
|
4 years ago |
JustAnotherArchivist
|
e99e2304c9
|
Write meta WARC with log file
|
4 years ago |
JustAnotherArchivist
|
d751844626
|
Fix starting another item before stopping on STOP file or memory limit exceedance
|
4 years ago |
JustAnotherArchivist
|
2b0778f9b5
|
Remove leftovers from initial code rewrite
|
4 years ago |
JustAnotherArchivist
|
85d78cee13
|
Add warcinfo record with version information on Python, system, and dependencies
|
4 years ago |
JustAnotherArchivist
|
9cff6bd5c1
|
Only open a WARC file when necessary to avoid producing empty WARCs at the end
|
4 years ago |
JustAnotherArchivist
|
21cf784102
|
Use setuptools_scm for versioning
|
4 years ago |
JustAnotherArchivist
|
ab22966fef
|
Add to log which item a message is coming from
|
5 years ago |
JustAnotherArchivist
|
6fafd32685
|
Error when the retries are exceeded
|
5 years ago |
JustAnotherArchivist
|
8647d6b396
|
Use f-strings instead of str.format
|
5 years ago |
JustAnotherArchivist
|
5008e6e8cd
|
Deduplicate items
|
5 years ago |
JustAnotherArchivist
|
46c95e2157
|
Disable decoding the response content
chardet can be very slow (https://github.com/chardet/chardet/issues/29 https://github.com/psf/requests/issues/2359) and the decoding may be unnecessary if it's binary content.
|
5 years ago |
JustAnotherArchivist
|
85f6f7bd82
|
Make qwarc.utils.handle_response_limit_error_retries more useful by passing the deferring handler as an argument
|
5 years ago |
JustAnotherArchivist
|
ad22a2327a
|
Support adding headers to individual requests
|
5 years ago |
JustAnotherArchivist
|
67076f964c
|
Add support for POST requests
|
5 years ago |
JustAnotherArchivist
|
2d52e78d85
|
Fix reference to aiohttp.CientError
|
5 years ago |
JustAnotherArchivist
|
c1574a06c9
|
Fix sleep task type
|
5 years ago |
JustAnotherArchivist
|
e0ca88c807
|
Fix reference to get_rss
|
5 years ago |
JustAnotherArchivist
|
984d28ede0
|
Fix type of --memorylimit, --disklimit, and --warcsplit values
|
5 years ago |
JustAnotherArchivist
|
8a8935810d
|
Fix references to memory and disk space check methods
|
5 years ago |
JustAnotherArchivist
|
be5673cfbf
|
Add record deduplication within a process
|
5 years ago |
JustAnotherArchivist
|
e892a6b6a7
|
Initial commit
|
5 years ago |