JustAnotherArchivist 4d6c737b5d | 3 years ago | |
---|---|---|
LICENSE | 3 years ago | |
README.md | 3 years ago | |
retrieve | 3 years ago | |
run | 3 years ago |
This repository contains scripts for collecting metadata on Docker Hub images. It further contains the metadata itself for the ArchiveTeam-related Docker Hub profiles on the data
branch.
Docker Hub only exposes the latest build for each tag. Even though the data for the previous builds still exists and can be docker pull
ed using the digest (docker pull namespace/name@sha256:DIGEST
), it is impossible to discover that digest. The only option is therefore to keep a record of those digests while they are displayed on the web interface.
Rather than only targeting the digests, this project attempts to collect all relevant, publicly available metadata for Docker Hub repositories.
The master
branch contains the collection code. The data
branch contains the metadata with its history.
On the data
branch, you will find the following structure:
dockerhub-metadata.profiles
contains a list of profiles whose repositories are being monitored.repositories
directory, which in turn has the repository and tag metadata for each repository.dockerhub-metadata.retrieve.log
is the log of the last run.The script requires the abovementioned structure in the directory where it’s executed to safeguard against accidental execution in the wrong path. The main reason for this is that the script deletes everything in the current directory to replace it with the new version. Yes, it’s hacky. No, I don’t care to change it; doing it properly would require diffing the list of targeted Docker Hub profiles and deleting directories and files as appropriate. Ultimately, some deletions have to take place, and they’ll never be completely foolproof. Just run it in the right path, and everything is fine.
However, this safeguard implies that setting up the collection initially and changing the list of profiles is slightly annoying.
git push
in the script).dockerhub-metadata.profiles
listing one profile to be covered per line. Commit.dockerhub-metadata.retrieve.log
(e.g. with touch
) and an empty directory for each profile (e.g. with Bash: readarray -t profiles <dockerhub-metadata.profiles; mkdir "${profiles[@]}"
)./path/to/this/directory/run
(without changing the directory)To add a profile to the list to be covered, add it to the dockerhub-metadata.profiles
file and create the corresponding directory.
To remove a profile from the list, remove it from that file and delete its directory.
It is recommended to commit any changes to the profiles list manually. The next run will include it anyway, but it makes the intent clearer. It is also recommended to leave the data directory changes uncommitted and let the script handle that part.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.