#13 Git submodule support

开启中
JustAnotherArchivist1年前创建 · 1 条评论

Submodules are stored in the .gitmodules file. It contains a URL, which may be relative to the superproject repository. Cf gitmodules(5) for more details.
The submodule’s commit ID is stored directly in the repo where it’s included; it’s a special file mode and contains something like Subproject commit <commit-id>.

Ideally, Git repo archives should recursively descend to all submodules that have ever been present in the repo. The commit IDs referenced in the repo should be used as extraBranches to ensure those are fetched even if they’re no longer reachable via a ref in the subproject repo.

Submodules are stored in the `.gitmodules` file. It contains a URL, which may be relative to the superproject repository. Cf `gitmodules(5)` for more details. The submodule's commit ID is stored directly in the repo where it's included; it's a special file mode and contains something like `Subproject commit <commit-id>`. Ideally, Git repo archives should recursively descend to all submodules that have ever been present in the repo. The commit IDs referenced in the repo should be used as `extraBranches` to ensure those are fetched even if they're no longer reachable via a ref in the subproject repo.
JustAnotherArchivist 添加了标签
enhancement
1年前
JustAnotherArchivist 添加了标签
module:git
1年前
JustAnotherArchivist 评论于 1年前
所有者

This turns out to be surprisingly tricky.

Firstly, a slight correction: the commit ID is stored in the tree object using just the commit ID (rather than a tree or blob object ID as for normal dirs/files). The Subproject commit <hex-commit-id> format is produced by git diff. Also, the special file mode is 160000.

Collecting the repositories themselves is easy enough. git log --format=format:%H --diff-filter=d --all -- .gitmodules returns all commits where the .gitmodules file was altered in some way (and not deleted), and then git cat-file blob ${commitId}:.gitmodules can be used to retrieve the contents and git config --file - --get-regexp '\.url$' to extract the URLs.

Collecting the commits is a whole different beast. My first thought was to walk the trees of all commits, but that is very inefficient. It doesn’t appear to be possible to filter the git log by file mode, and the file mode is only included in diff/patch output. Attempting to parse that is a horrible idea. One alternative would be an external diff tool GIT_EXTERNAL_DIFF=... or -c diff.external=... which only emits the relevant file mode details, but that still doesn’t fully solve the parsing problem (commit message would still be shown), and it doesn’t scale to large repositories as it requires one process per modified file for each commit. It might be feasible to modify git log/diff.c to change its output based on the presence of an environment variable (i.e. omit the commit message and running the actual diff; the relevant code for the latter is the builtin_diff function).

Also, submodule repository URLs can be relative, and they’re evaluated relative to within the parent repo (cf. man git-submodule).

Random examples from the wild:

This turns out to be surprisingly tricky. Firstly, a slight correction: the commit ID is stored in the tree object using just the commit ID (rather than a tree or blob object ID as for normal dirs/files). The `Subproject commit <hex-commit-id>` format is produced by `git diff`. Also, the special file mode is 160000. Collecting the repositories themselves is easy enough. `git log --format=format:%H --diff-filter=d --all -- .gitmodules` returns all commits where the `.gitmodules` file was altered in some way (and not deleted), and then `git cat-file blob ${commitId}:.gitmodules` can be used to retrieve the contents and `git config --file - --get-regexp '\.url$'` to extract the URLs. Collecting the commits is a whole different beast. My first thought was to walk the trees of all commits, but that is very inefficient. It doesn't appear to be possible to filter the `git log` by file mode, and the file mode is only included in diff/patch output. Attempting to parse that is a horrible idea. One alternative would be an external diff tool `GIT_EXTERNAL_DIFF=...` or `-c diff.external=...` which only emits the relevant file mode details, but that still doesn't fully solve the parsing problem (commit message would still be shown), and it doesn't scale to large repositories as it requires one process per modified file for each commit. It might be feasible to modify `git log`/`diff.c` to change its output based on the presence of an environment variable (i.e. omit the commit message and running the actual diff; the relevant code for the latter is the `builtin_diff` function). Also, submodule repository URLs can be relative, and they're evaluated relative to *within* the parent repo (cf. `man git-submodule`). Random examples from the wild: * Relative URLs: https://github.com/gitextensions/gitextensions * Nested submodules (e.g. qemu also uses submodules): https://github.com/riscv-collab/riscv-gnu-toolchain
JustAnotherArchivist1年前 在代码提交中引用了该工单
JustAnotherArchivist1年前 修改标题 Support for Git submodulesGit submodule support
登录 并参与到对话中。
未选择里程碑
未指派成员
1 名参与者
通知
到期时间

未设置到期时间。

依赖工单

此工单当前没有任何依赖。

正在加载...
这个人很懒,什么都没留下。