#13 Git submodule support

Aperto
aperto 1 anno fa da JustAnotherArchivist · 1 commenti

Submodules are stored in the .gitmodules file. It contains a URL, which may be relative to the superproject repository. Cf gitmodules(5) for more details.
The submodule’s commit ID is stored directly in the repo where it’s included; it’s a special file mode and contains something like Subproject commit <commit-id>.

Ideally, Git repo archives should recursively descend to all submodules that have ever been present in the repo. The commit IDs referenced in the repo should be used as extraBranches to ensure those are fetched even if they’re no longer reachable via a ref in the subproject repo.

Submodules are stored in the `.gitmodules` file. It contains a URL, which may be relative to the superproject repository. Cf `gitmodules(5)` for more details. The submodule's commit ID is stored directly in the repo where it's included; it's a special file mode and contains something like `Subproject commit <commit-id>`. Ideally, Git repo archives should recursively descend to all submodules that have ever been present in the repo. The commit IDs referenced in the repo should be used as `extraBranches` to ensure those are fetched even if they're no longer reachable via a ref in the subproject repo.
JustAnotherArchivist added the
enhancement
label 1 anno fa
JustAnotherArchivist added the
module:git
label 1 anno fa
JustAnotherArchivist 1 anno fa ha commentato
Proprietario

This turns out to be surprisingly tricky.

Firstly, a slight correction: the commit ID is stored in the tree object using just the commit ID (rather than a tree or blob object ID as for normal dirs/files). The Subproject commit <hex-commit-id> format is produced by git diff. Also, the special file mode is 160000.

Collecting the repositories themselves is easy enough. git log --format=format:%H --diff-filter=d --all -- .gitmodules returns all commits where the .gitmodules file was altered in some way (and not deleted), and then git cat-file blob ${commitId}:.gitmodules can be used to retrieve the contents and git config --file - --get-regexp '\.url$' to extract the URLs.

Collecting the commits is a whole different beast. My first thought was to walk the trees of all commits, but that is very inefficient. It doesn’t appear to be possible to filter the git log by file mode, and the file mode is only included in diff/patch output. Attempting to parse that is a horrible idea. One alternative would be an external diff tool GIT_EXTERNAL_DIFF=... or -c diff.external=... which only emits the relevant file mode details, but that still doesn’t fully solve the parsing problem (commit message would still be shown), and it doesn’t scale to large repositories as it requires one process per modified file for each commit. It might be feasible to modify git log/diff.c to change its output based on the presence of an environment variable (i.e. omit the commit message and running the actual diff; the relevant code for the latter is the builtin_diff function).

Also, submodule repository URLs can be relative, and they’re evaluated relative to within the parent repo (cf. man git-submodule).

Random examples from the wild:

This turns out to be surprisingly tricky. Firstly, a slight correction: the commit ID is stored in the tree object using just the commit ID (rather than a tree or blob object ID as for normal dirs/files). The `Subproject commit <hex-commit-id>` format is produced by `git diff`. Also, the special file mode is 160000. Collecting the repositories themselves is easy enough. `git log --format=format:%H --diff-filter=d --all -- .gitmodules` returns all commits where the `.gitmodules` file was altered in some way (and not deleted), and then `git cat-file blob ${commitId}:.gitmodules` can be used to retrieve the contents and `git config --file - --get-regexp '\.url$'` to extract the URLs. Collecting the commits is a whole different beast. My first thought was to walk the trees of all commits, but that is very inefficient. It doesn't appear to be possible to filter the `git log` by file mode, and the file mode is only included in diff/patch output. Attempting to parse that is a horrible idea. One alternative would be an external diff tool `GIT_EXTERNAL_DIFF=...` or `-c diff.external=...` which only emits the relevant file mode details, but that still doesn't fully solve the parsing problem (commit message would still be shown), and it doesn't scale to large repositories as it requires one process per modified file for each commit. It might be feasible to modify `git log`/`diff.c` to change its output based on the presence of an environment variable (i.e. omit the commit message and running the actual diff; the relevant code for the latter is the `builtin_diff` function). Also, submodule repository URLs can be relative, and they're evaluated relative to *within* the parent repo (cf. `man git-submodule`). Random examples from the wild: * Relative URLs: https://github.com/gitextensions/gitextensions * Nested submodules (e.g. qemu also uses submodules): https://github.com/riscv-collab/riscv-gnu-toolchain
JustAnotherArchivist ha fatto riferimento a questa issue dal commit 1 anno fa
JustAnotherArchivist Titolo modificato da Support for Git submodules a Git submodule support 1 anno fa
Effettua l'accesso per partecipare alla conversazione.
Nessuna milestone
Nessuna assegnatario
1 Partecipanti
Notifiche
Data di scadenza

Nessuna data di scadenza impostata.

Dipendenze

Questo problema attualmente non ha alcuna dipendenza.

Caricamento…
Non ci sono ancora contenuti.