JustAnotherArchivist
7861036624
Add submodule check
#13
před 1 rokem
JustAnotherArchivist
9a61800758
Refactor Git bundling to allow for verification of the bundle contents
This verifies that all objects from the current clone are in either the dependency bundles or the current bundle. This guarantees that the repo as it has been clone at the time of retrieval can be reconstructed exactly from the bundles.
As a side-effect, if a non-standard Git server were to include objects in a clone pack that are not discoverable from refs, this will fail any attempt to archive such a clone. This could in the future be resolved by adding custom refs for those extra objects.
This also fixes a bug where prior bundles could be included as a dependency even though they contain no relevant data due to their refs (as refs are always listed in the bundle metadata). Instead, dependency detection now operates directly on commit and tag objects, which can only be present in one bundle.
před 1 rokem
JustAnotherArchivist
f5fe0496f5
Add support for supplying a file-like object as stdin
před 1 rokem
JustAnotherArchivist
adafd6bd01
Fix race condition in subprocess runner
stdin, stdout, and stderr being closed does not necessarily imply that the process has exited, although it usually does. Still need to explicitly wait for it to terminate after the I/O loop. This matches what the stdlib `subprocess.Popen._communicate` does as well.
před 1 rokem
JustAnotherArchivist
9a0c8398de
Document minimum Git version
před 1 rokem
JustAnotherArchivist
3ca99d8839
Require `Storage.search_metadata` to return files in lexicographical order to minimise dependencies between bundles
před 1 rokem
JustAnotherArchivist
cc7bdbb3f4
Fix tag objects not getting deduplicated
před 1 rokem
JustAnotherArchivist
f1edf4b752
Fix TypeError due to lack of `glob.glob`'s `root_dir` option on Python 3.9
před 1 rokem
JustAnotherArchivist
4d6a423fb5
Replace hacky module importing (taken from snscrape commit aa7d7d3d)
před 1 rokem
JustAnotherArchivist
7eb175fb63
Document how inheritance on Metadata classes works
před 1 rokem
JustAnotherArchivist
a361fe54e5
Add a metadata version field
před 1 rokem
JustAnotherArchivist
fb8af13c15
Return all metadata validation errors at the same time
před 1 rokem
JustAnotherArchivist
811e119835
Add retrieval start/end time metadata fields
před 1 rokem
JustAnotherArchivist
b0505f94fe
Fix typo in package name
před 1 rokem
JustAnotherArchivist
eab6db9f27
Better storage metadata search now that the module name is recorded there anyway
před 1 rokem
JustAnotherArchivist
fa4b60225c
Index → Metadata
'Index' was a misnomer from the start since it contains critical information for the operation that can't be reconstructed (e.g. existing refs).
před 1 rokem
JustAnotherArchivist
4259d34ec8
Set default ID
před 1 rokem
JustAnotherArchivist
d5891c795c
More metadata
před 1 rokem
JustAnotherArchivist
25792d9006
Fix missing inheritance from abc.ABCMeta
před 1 rokem
JustAnotherArchivist
a910d4851c
Add support for inheritance of index fields; change type of field list to a tuple to lessen the risk of modification
před 1 rokem
JustAnotherArchivist
2779148a1b
Add .gitignore
před 1 rokem
JustAnotherArchivist
d5a7d39f74
setup.py → pyproject.toml
před 1 rokem
JustAnotherArchivist
80995bccde
Add comment about FETCH_HEAD
před 1 rokem
JustAnotherArchivist
2a9ff2ee15
Support empty incremental bundles
před 1 rokem
JustAnotherArchivist
0e7b17d3fd
Capture and return stderr
před 1 rokem
JustAnotherArchivist
a6e256c58f
Fix invalid usage of codearchiver.subprocess
Introduced by 240dcceb
před 1 rokem
JustAnotherArchivist
8e83c9b7b4
Support incremental Git bundles
Also fix a small discrepancy between the commit list and bundle due to --reflog vs --all
před 1 rokem
JustAnotherArchivist
021b26973b
Fix handling empty input
před 1 rokem
JustAnotherArchivist
ed69ba16c9
logger → _logger
před 1 rokem
JustAnotherArchivist
6f7a95d289
Add --progress option to cloning for more details
před 1 rokem
JustAnotherArchivist
42e420ad0d
Disable prompts on password-protected repos
před 1 rokem
JustAnotherArchivist
a9e838adde
Raise exception if file already exists in DirectoryStorage target
před 1 rokem
JustAnotherArchivist
6af07cb51c
Raise exceptions on fatal errors
před 1 rokem
JustAnotherArchivist
2257305872
Disallow underscores in module names
Using the preferred file naming scheme of {moduleName}_{someInputURLDerivative}_{date}*, this allows mapping files to modules without ambiguity.
před 1 rokem
JustAnotherArchivist
4dcac08585
Fix import order
před 1 rokem
JustAnotherArchivist
0f1f5abc64
Add indices for files
před 1 rokem
JustAnotherArchivist
e3da8c7736
Use generic alias types
This requires at least Python 3.9.
před 1 rokem
JustAnotherArchivist
f2d2df9428
Simplify storage design; there is no need for the queue
před 1 rokem
JustAnotherArchivist
550afa8644
Add storage abstraction
před 1 rokem
JustAnotherArchivist
06daea162f
Remove GitHub module as it is not ready for use yet
před 1 rokem
JustAnotherArchivist
240dcceb10
Add subprocess wrapper for logging stderr
před 1 rokem
JustAnotherArchivist
6fb0ac4e5e
Initial GitHub module only retrieving the actual repository
před 4 roky
JustAnotherArchivist
2a2c9373d0
Documentation of the core
před 4 roky
JustAnotherArchivist
715420e298
Fix imports in CLI: core and modules aren't needed in the argument parser
před 4 roky
JustAnotherArchivist
1b73693b37
Keep track of and handle errors in modules via metaclass
před 4 roky
JustAnotherArchivist
922900ac4e
Add support for selecting a module explicitly using `name+` URL prefix
E.g. `git+https://example.org/ `
před 4 roky
JustAnotherArchivist
22c707c04f
Add Module.name attribute
před 4 roky
JustAnotherArchivist
90e0af88b9
Fix return type of get_module_{class,instance}
No need to quote the class name since the methods are not inside the class (anymore)
před 4 roky
JustAnotherArchivist
5f9547d600
Get rid of inheritance-level-based module selection and instead raise an exception if there are no or multiple matching modules
před 4 roky
JustAnotherArchivist
7e8958b063
Allow overriding the archive ID
před 4 roky