JustAnotherArchivist
1355db6235
Reduce memory usage by deleting potentially big objects when they're no longer needed
1 anno fa
JustAnotherArchivist
c02859987f
Skip temporary metadata dependency resolution if there are no dependencies
1 anno fa
JustAnotherArchivist
66666a1538
Workaround for incremental bundles with deltified objects
1 anno fa
JustAnotherArchivist
d3c701daa9
Support parallel runs against the same storage
Closes #15
1 anno fa
JustAnotherArchivist
d42ee45bb2
Module puts to storage directly
1 anno fa
JustAnotherArchivist
543c6b0595
Use temporary directory for Git clone directory
The previous approach was flawed and broke on URLs ending with a slash.
1 anno fa
JustAnotherArchivist
e08919d89f
Fix crash on incremental bundling with warnings
For example, if the HEAD is excluded:
warning: ref 'refs/heads/master' is excluded by the rev-list options
warning: ref 'HEAD' is excluded by the rev-list options
fatal: Refusing to create empty bundle.
The fatal message always appears last (though that's of course undocumented).
1 anno fa
JustAnotherArchivist
0da610744c
Keep a record of what HEAD points at
1 anno fa
JustAnotherArchivist
e39548c50b
Fix file extensions
1 anno fa
JustAnotherArchivist
7861036624
Add submodule check
#13
1 anno fa
JustAnotherArchivist
9a61800758
Refactor Git bundling to allow for verification of the bundle contents
This verifies that all objects from the current clone are in either the dependency bundles or the current bundle. This guarantees that the repo as it has been clone at the time of retrieval can be reconstructed exactly from the bundles.
As a side-effect, if a non-standard Git server were to include objects in a clone pack that are not discoverable from refs, this will fail any attempt to archive such a clone. This could in the future be resolved by adding custom refs for those extra objects.
This also fixes a bug where prior bundles could be included as a dependency even though they contain no relevant data due to their refs (as refs are always listed in the bundle metadata). Instead, dependency detection now operates directly on commit and tag objects, which can only be present in one bundle.
1 anno fa
JustAnotherArchivist
cc7bdbb3f4
Fix tag objects not getting deduplicated
1 anno fa
JustAnotherArchivist
a361fe54e5
Add a metadata version field
1 anno fa
JustAnotherArchivist
811e119835
Add retrieval start/end time metadata fields
1 anno fa
JustAnotherArchivist
eab6db9f27
Better storage metadata search now that the module name is recorded there anyway
1 anno fa
JustAnotherArchivist
fa4b60225c
Index → Metadata
'Index' was a misnomer from the start since it contains critical information for the operation that can't be reconstructed (e.g. existing refs).
1 anno fa
JustAnotherArchivist
4259d34ec8
Set default ID
1 anno fa
JustAnotherArchivist
d5891c795c
More metadata
1 anno fa
JustAnotherArchivist
a910d4851c
Add support for inheritance of index fields; change type of field list to a tuple to lessen the risk of modification
1 anno fa
JustAnotherArchivist
80995bccde
Add comment about FETCH_HEAD
1 anno fa
JustAnotherArchivist
2a9ff2ee15
Support empty incremental bundles
1 anno fa
JustAnotherArchivist
0e7b17d3fd
Capture and return stderr
1 anno fa
JustAnotherArchivist
a6e256c58f
Fix invalid usage of codearchiver.subprocess
Introduced by 240dcceb
1 anno fa
JustAnotherArchivist
8e83c9b7b4
Support incremental Git bundles
Also fix a small discrepancy between the commit list and bundle due to --reflog vs --all
1 anno fa
JustAnotherArchivist
ed69ba16c9
logger → _logger
1 anno fa
JustAnotherArchivist
6f7a95d289
Add --progress option to cloning for more details
1 anno fa
JustAnotherArchivist
42e420ad0d
Disable prompts on password-protected repos
1 anno fa
JustAnotherArchivist
6af07cb51c
Raise exceptions on fatal errors
1 anno fa
JustAnotherArchivist
0f1f5abc64
Add indices for files
1 anno fa
JustAnotherArchivist
240dcceb10
Add subprocess wrapper for logging stderr
1 anno fa
JustAnotherArchivist
22c707c04f
Add Module.name attribute
4 anni fa
JustAnotherArchivist
7e8958b063
Allow overriding the archive ID
4 anni fa
JustAnotherArchivist
90f80e41a9
Add __repr__ methods
4 anni fa
JustAnotherArchivist
9f6e5a9f48
Move InputURL handling to base Module.__init__ and extract URL string for convenience
4 anni fa
JustAnotherArchivist
07dc1927cf
Initial commit
A significant part of this code (e.g. the module loading, HTTP retrieval, CLI) was mostly or entirely copied from snscrape.
4 anni fa