Metadata for the ArchiveTeam Docker Hub repositories
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
JustAnotherArchivist 4d6c737b5d Never add temporary directories to repo 3 years ago
LICENSE Initial commit 3 years ago
README.md Initial commit 3 years ago
retrieve Fix rate limiting 3 years ago
run Never add temporary directories to repo 3 years ago

README.md

This repository contains scripts for collecting metadata on Docker Hub images. It further contains the metadata itself for the ArchiveTeam-related Docker Hub profiles on the data branch.

Background

Docker Hub only exposes the latest build for each tag. Even though the data for the previous builds still exists and can be docker pulled using the digest (docker pull namespace/name@sha256:DIGEST), it is impossible to discover that digest. The only option is therefore to keep a record of those digests while they are displayed on the web interface.

Rather than only targeting the digests, this project attempts to collect all relevant, publicly available metadata for Docker Hub repositories.

Structure

The master branch contains the collection code. The data branch contains the metadata with its history.

On the data branch, you will find the following structure:

  • dockerhub-metadata.profiles contains a list of profiles whose repositories are being monitored.
  • For each profile, there is a directory with the same name. It contains profile-wide metadata and a repositories directory, which in turn has the repository and tag metadata for each repository.
  • dockerhub-metadata.retrieve.log is the log of the last run.

Requirements

  • Bash 4 or higher
  • Python 3.6 or higher
  • Python Requests

Usage

The script requires the abovementioned structure in the directory where it’s executed to safeguard against accidental execution in the wrong path. The main reason for this is that the script deletes everything in the current directory to replace it with the new version. Yes, it’s hacky. No, I don’t care to change it; doing it properly would require diffing the list of targeted Docker Hub profiles and deleting directories and files as appropriate. Ultimately, some deletions have to take place, and they’ll never be completely foolproof. Just run it in the right path, and everything is fine.

However, this safeguard implies that setting up the collection initially and changing the list of profiles is slightly annoying.

Initial setup

  1. In an empty directory, initialise a git repository (or clone an existing repo, create a new orphan branch, and delete the leftover files from the default branch). Set up a remote branch (or remove the git push in the script).
  2. Create a file dockerhub-metadata.profiles listing one profile to be covered per line. Commit.
  3. Create an empty file dockerhub-metadata.retrieve.log (e.g. with touch) and an empty directory for each profile (e.g. with Bash: readarray -t profiles <dockerhub-metadata.profiles; mkdir "${profiles[@]}").
  4. /path/to/this/directory/run (without changing the directory)

Changes

To add a profile to the list to be covered, add it to the dockerhub-metadata.profiles file and create the corresponding directory.

To remove a profile from the list, remove it from that file and delete its directory.

It is recommended to commit any changes to the profiles list manually. The next run will include it anyway, but it makes the intent clearer. It is also recommended to leave the data directory changes uncommitted and let the script handle that part.

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.