How to verify file integrity with git hashes?

by Martin Monperrus

Git uses a SHA1 hash for each file, directory and commit it manages. This hash is a unique identifier for the content of the file or the state of the repository at the time of the commit.

what is the tree hash in a git commit?

The tree hash in a git commit is a SHA-1 hash that uniquely identifies a the root directory object in Git. This hash is created by hashing the contents of the directory (including the names, modes, and SHA-1 hashes of its contents). The tree hash is stored in a commit object.

To obtain the tree hash for a specific commit, you can use the git cat-file command: git cat-file -p HEAD

The first line contains the tree hash.

$ git cat-file -p HEAD | grep tree
tree 8a45f00d6492a12b84b8d3ae678fd6f6c622bd97

how does git compute the hash of a file?

Git computes the hash of a file using a process called hashing. The specific algorithm it uses is called SHA-1 (Secure Hash Algorithm 1). Here’s a simplified breakdown of the process:

  1. Git takes the content of the file as input. It does not consider the file name or other metadata, only the content itself.

  2. It prepends a header to the content. This header includes the type of the object (“blob” in the case of a file) and the size of the content. The header and the content are separated by a null byte.

  3. The combined header and content are then passed through the SHA-1 algorithm. This generates a 40-character hexadecimal number, which is the hash of the file.

In command line: git hash-object <a file>

how does git compute the hash of a directory?

Git computes the hash of the contents of the directory, including the names, metadata, and contents of all files and subdirectories with a so-called tree object.

A tree object in Git is essentially a list of file names, metadata, and hashes of the content of those files. The hash of a directory in Git is actually the hash of the tree object that represents that directory.

Here’s a simplified sequence of steps Git follows:

  1. For each file in the directory, Git calculates the hash of its content and metadata (such as file permissions). This creates a blob object in Git.
  2. Git then creates a tree object for the directory. The tree object includes the file name, metadata, and the hash of each blob object.
  3. The hash of the tree object is then calculated. This is what we often refer to as the “hash of the directory”.

It’s important to note that this process is recursive. If a directory contains other directories, a tree object will be created for each subdirectory first, and its hash will be included in the tree object of the parent directory.

In command line:

how can one get the tree hash of the latest commit on github?

It is available in the Github API

curl https://api.github.com/repos/:org/:repo/branches/master | jq .commit.commit.tree.sha

example

curl https://api.github.com/repos/INRIA/spoon/branches/master | jq .commit.commit.tree.sha

see also

Tagged as: