#13 Git submodule support

Abertas
abertas há 1 ano por JustAnotherArchivist · 1 comentários

Submodules are stored in the .gitmodules file. It contains a URL, which may be relative to the superproject repository. Cf gitmodules(5) for more details.
The submodule’s commit ID is stored directly in the repo where it’s included; it’s a special file mode and contains something like Subproject commit <commit-id>.

Ideally, Git repo archives should recursively descend to all submodules that have ever been present in the repo. The commit IDs referenced in the repo should be used as extraBranches to ensure those are fetched even if they’re no longer reachable via a ref in the subproject repo.

Submodules are stored in the `.gitmodules` file. It contains a URL, which may be relative to the superproject repository. Cf `gitmodules(5)` for more details. The submodule's commit ID is stored directly in the repo where it's included; it's a special file mode and contains something like `Subproject commit <commit-id>`. Ideally, Git repo archives should recursively descend to all submodules that have ever been present in the repo. The commit IDs referenced in the repo should be used as `extraBranches` to ensure those are fetched even if they're no longer reachable via a ref in the subproject repo.
JustAnotherArchivist adicionou a etiqueta
enhancement
há 1 ano
JustAnotherArchivist adicionou a etiqueta
module:git
há 1 ano
JustAnotherArchivist comentado há 1 ano
Proprietário(a)

This turns out to be surprisingly tricky.

Firstly, a slight correction: the commit ID is stored in the tree object using just the commit ID (rather than a tree or blob object ID as for normal dirs/files). The Subproject commit <hex-commit-id> format is produced by git diff. Also, the special file mode is 160000.

Collecting the repositories themselves is easy enough. git log --format=format:%H --diff-filter=d --all -- .gitmodules returns all commits where the .gitmodules file was altered in some way (and not deleted), and then git cat-file blob ${commitId}:.gitmodules can be used to retrieve the contents and git config --file - --get-regexp '\.url$' to extract the URLs.

Collecting the commits is a whole different beast. My first thought was to walk the trees of all commits, but that is very inefficient. It doesn’t appear to be possible to filter the git log by file mode, and the file mode is only included in diff/patch output. Attempting to parse that is a horrible idea. One alternative would be an external diff tool GIT_EXTERNAL_DIFF=... or -c diff.external=... which only emits the relevant file mode details, but that still doesn’t fully solve the parsing problem (commit message would still be shown), and it doesn’t scale to large repositories as it requires one process per modified file for each commit. It might be feasible to modify git log/diff.c to change its output based on the presence of an environment variable (i.e. omit the commit message and running the actual diff; the relevant code for the latter is the builtin_diff function).

Also, submodule repository URLs can be relative, and they’re evaluated relative to within the parent repo (cf. man git-submodule).

Random examples from the wild:

This turns out to be surprisingly tricky. Firstly, a slight correction: the commit ID is stored in the tree object using just the commit ID (rather than a tree or blob object ID as for normal dirs/files). The `Subproject commit <hex-commit-id>` format is produced by `git diff`. Also, the special file mode is 160000. Collecting the repositories themselves is easy enough. `git log --format=format:%H --diff-filter=d --all -- .gitmodules` returns all commits where the `.gitmodules` file was altered in some way (and not deleted), and then `git cat-file blob ${commitId}:.gitmodules` can be used to retrieve the contents and `git config --file - --get-regexp '\.url$'` to extract the URLs. Collecting the commits is a whole different beast. My first thought was to walk the trees of all commits, but that is very inefficient. It doesn't appear to be possible to filter the `git log` by file mode, and the file mode is only included in diff/patch output. Attempting to parse that is a horrible idea. One alternative would be an external diff tool `GIT_EXTERNAL_DIFF=...` or `-c diff.external=...` which only emits the relevant file mode details, but that still doesn't fully solve the parsing problem (commit message would still be shown), and it doesn't scale to large repositories as it requires one process per modified file for each commit. It might be feasible to modify `git log`/`diff.c` to change its output based on the presence of an environment variable (i.e. omit the commit message and running the actual diff; the relevant code for the latter is the `builtin_diff` function). Also, submodule repository URLs can be relative, and they're evaluated relative to *within* the parent repo (cf. `man git-submodule`). Random examples from the wild: * Relative URLs: https://github.com/gitextensions/gitextensions * Nested submodules (e.g. qemu also uses submodules): https://github.com/riscv-collab/riscv-gnu-toolchain
JustAnotherArchivist referenciou esta questão num cometimento há 1 ano
JustAnotherArchivist mudou o título de Support for Git submodules para Git submodule support há 1 ano
Inicie a sessão para participar neste diálogo.
Sem etapa
Sem responsáveis
1 Participantes
Notificações
Date limite

Sem data limite definida.

Dependências

Esta questão não tem quaisquer dependências, neste momento.

Carregando…
Ainda não há conteúdo.