JustAnotherArchivist
518541eb81
Fix metadata fields list caching for subclasses
Because the _allFieldsCache attribute gets inherited, when it gets set for a class, all subclasses will also see that list rather than their own, potentially different list. To fix this, use a global dict indexing on the metadata class instead.
10 months ago
JustAnotherArchivist
47fe0a4e70
Handle URLs with queries and fragments
1 year ago
JustAnotherArchivist
9de50bebdb
Fix metadata parsing on field values containing a colon
1 year ago
JustAnotherArchivist
7eb175fb63
Document how inheritance on Metadata classes works
1 year ago
JustAnotherArchivist
a361fe54e5
Add a metadata version field
1 year ago
JustAnotherArchivist
fb8af13c15
Return all metadata validation errors at the same time
1 year ago
JustAnotherArchivist
811e119835
Add retrieval start/end time metadata fields
1 year ago
JustAnotherArchivist
fa4b60225c
Index → Metadata
'Index' was a misnomer from the start since it contains critical information for the operation that can't be reconstructed (e.g. existing refs).
1 year ago
JustAnotherArchivist
4259d34ec8
Set default ID
1 year ago
JustAnotherArchivist
d5891c795c
More metadata
1 year ago
JustAnotherArchivist
25792d9006
Fix missing inheritance from abc.ABCMeta
1 year ago
JustAnotherArchivist
a910d4851c
Add support for inheritance of index fields; change type of field list to a tuple to lessen the risk of modification
1 year ago
JustAnotherArchivist
8e83c9b7b4
Support incremental Git bundles
Also fix a small discrepancy between the commit list and bundle due to --reflog vs --all
1 year ago
JustAnotherArchivist
ed69ba16c9
logger → _logger
1 year ago
JustAnotherArchivist
2257305872
Disallow underscores in module names
Using the preferred file naming scheme of {moduleName}_{someInputURLDerivative}_{date}*, this allows mapping files to modules without ambiguity.
1 year ago
JustAnotherArchivist
4dcac08585
Fix import order
1 year ago
JustAnotherArchivist
0f1f5abc64
Add indices for files
1 year ago
JustAnotherArchivist
e3da8c7736
Use generic alias types
This requires at least Python 3.9.
1 year ago
JustAnotherArchivist
2a2c9373d0
Documentation of the core
3 years ago
JustAnotherArchivist
1b73693b37
Keep track of and handle errors in modules via metaclass
3 years ago
JustAnotherArchivist
922900ac4e
Add support for selecting a module explicitly using `name+` URL prefix
E.g. `git+https://example.org/ `
3 years ago
JustAnotherArchivist
22c707c04f
Add Module.name attribute
3 years ago
JustAnotherArchivist
90e0af88b9
Fix return type of get_module_{class,instance}
No need to quote the class name since the methods are not inside the class (anymore)
3 years ago
JustAnotherArchivist
5f9547d600
Get rid of inheritance-level-based module selection and instead raise an exception if there are no or multiple matching modules
3 years ago
JustAnotherArchivist
7e8958b063
Allow overriding the archive ID
3 years ago
JustAnotherArchivist
90f80e41a9
Add __repr__ methods
3 years ago
JustAnotherArchivist
9f6e5a9f48
Move InputURL handling to base Module.__init__ and extract URL string for convenience
3 years ago
JustAnotherArchivist
ca68893a59
Run submodules directly within the modules and return results from there instead of processing that externally
3 years ago
JustAnotherArchivist
74a6fc7641
Use dataclass instead of namedtuple for module results
3 years ago
JustAnotherArchivist
07dc1927cf
Initial commit
A significant part of this code (e.g. the module loading, HTTP retrieval, CLI) was mostly or entirely copied from snscrape.
4 years ago