#13 Git submodule support

Offen
vor 1 Jahr von JustAnotherArchivist geöffnet · 1 Kommentare

Submodules are stored in the .gitmodules file. It contains a URL, which may be relative to the superproject repository. Cf gitmodules(5) for more details.
The submodule’s commit ID is stored directly in the repo where it’s included; it’s a special file mode and contains something like Subproject commit <commit-id>.

Ideally, Git repo archives should recursively descend to all submodules that have ever been present in the repo. The commit IDs referenced in the repo should be used as extraBranches to ensure those are fetched even if they’re no longer reachable via a ref in the subproject repo.

Submodules are stored in the `.gitmodules` file. It contains a URL, which may be relative to the superproject repository. Cf `gitmodules(5)` for more details. The submodule's commit ID is stored directly in the repo where it's included; it's a special file mode and contains something like `Subproject commit <commit-id>`. Ideally, Git repo archives should recursively descend to all submodules that have ever been present in the repo. The commit IDs referenced in the repo should be used as `extraBranches` to ensure those are fetched even if they're no longer reachable via a ref in the subproject repo.
JustAnotherArchivist hat das
enhancement
-Label vor 1 Jahr hinzugefügt
JustAnotherArchivist hat das
module:git
-Label vor 1 Jahr hinzugefügt
JustAnotherArchivist hat vor 1 Jahr kommentiert
Besitzer

This turns out to be surprisingly tricky.

Firstly, a slight correction: the commit ID is stored in the tree object using just the commit ID (rather than a tree or blob object ID as for normal dirs/files). The Subproject commit <hex-commit-id> format is produced by git diff. Also, the special file mode is 160000.

Collecting the repositories themselves is easy enough. git log --format=format:%H --diff-filter=d --all -- .gitmodules returns all commits where the .gitmodules file was altered in some way (and not deleted), and then git cat-file blob ${commitId}:.gitmodules can be used to retrieve the contents and git config --file - --get-regexp '\.url$' to extract the URLs.

Collecting the commits is a whole different beast. My first thought was to walk the trees of all commits, but that is very inefficient. It doesn’t appear to be possible to filter the git log by file mode, and the file mode is only included in diff/patch output. Attempting to parse that is a horrible idea. One alternative would be an external diff tool GIT_EXTERNAL_DIFF=... or -c diff.external=... which only emits the relevant file mode details, but that still doesn’t fully solve the parsing problem (commit message would still be shown), and it doesn’t scale to large repositories as it requires one process per modified file for each commit. It might be feasible to modify git log/diff.c to change its output based on the presence of an environment variable (i.e. omit the commit message and running the actual diff; the relevant code for the latter is the builtin_diff function).

Also, submodule repository URLs can be relative, and they’re evaluated relative to within the parent repo (cf. man git-submodule).

Random examples from the wild:

This turns out to be surprisingly tricky. Firstly, a slight correction: the commit ID is stored in the tree object using just the commit ID (rather than a tree or blob object ID as for normal dirs/files). The `Subproject commit <hex-commit-id>` format is produced by `git diff`. Also, the special file mode is 160000. Collecting the repositories themselves is easy enough. `git log --format=format:%H --diff-filter=d --all -- .gitmodules` returns all commits where the `.gitmodules` file was altered in some way (and not deleted), and then `git cat-file blob ${commitId}:.gitmodules` can be used to retrieve the contents and `git config --file - --get-regexp '\.url$'` to extract the URLs. Collecting the commits is a whole different beast. My first thought was to walk the trees of all commits, but that is very inefficient. It doesn't appear to be possible to filter the `git log` by file mode, and the file mode is only included in diff/patch output. Attempting to parse that is a horrible idea. One alternative would be an external diff tool `GIT_EXTERNAL_DIFF=...` or `-c diff.external=...` which only emits the relevant file mode details, but that still doesn't fully solve the parsing problem (commit message would still be shown), and it doesn't scale to large repositories as it requires one process per modified file for each commit. It might be feasible to modify `git log`/`diff.c` to change its output based on the presence of an environment variable (i.e. omit the commit message and running the actual diff; the relevant code for the latter is the `builtin_diff` function). Also, submodule repository URLs can be relative, and they're evaluated relative to *within* the parent repo (cf. `man git-submodule`). Random examples from the wild: * Relative URLs: https://github.com/gitextensions/gitextensions * Nested submodules (e.g. qemu also uses submodules): https://github.com/riscv-collab/riscv-gnu-toolchain
JustAnotherArchivist hat dieses Issue vor 1 Jahr aus einem Commit referenziert
JustAnotherArchivist hat den Titel von Support for Git submodules zu Git submodule support vor 1 Jahr geändert
Anmelden, um an der Diskussion teilzunehmen.
Kein Meilenstein
Niemand zuständig
1 Beteiligte
Nachrichten
Fällig am

Kein Fälligkeitsdatum gesetzt.

Abhängigkeiten

Dieses Issue hat momentan keine Abhängigkeiten.

Laden…
Hier gibt es bis jetzt noch keinen Inhalt.