Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetches fail when bazel specifies digests with anything other than sha256 #54

Open
tomgr opened this issue Feb 18, 2025 · 1 comment · May be fixed by #57
Open

Fetches fail when bazel specifies digests with anything other than sha256 #54

tomgr opened this issue Feb 18, 2025 · 1 comment · May be fixed by #57

Comments

@tomgr
Copy link
Contributor

tomgr commented Feb 18, 2025

If you have a bazel project that uses bb-remote-asset and does a download where the hash is specified using a digest function other than sh256, then the download fails with errors from bazel such as:

   Traceback (most recent call last):
	File "/private/var/tmp/_bazel_tom/a5669a63620062ea12185a4a3806ccf2/external/aspect_bazel_lib+/lib/private/yq_toolchain.bzl", line 428, column 18, in _yq_platform_repo_impl
		rctx.download(
Error in download: java.io.IOException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Hash has length 64, while 96 characters were expected

We ran across this in the context of rules_js and that ecosystem: lots of the hashes are sha512.

This is reproducible with a very basic bb-remote-asset config:

{
  assetCache : {
    actionCache : {
      grpc : {
        address : "dns:///localhost:8980"
      }
    }    
  },
  contentAddressableStorage : {
    grpc : {
      address : "dns:///localhost:8980"
    },
  },
  fetcher: {
    http: {},
  },
  grpcServers: [{
    listenAddresses: [':8981'],
    authenticationPolicy: { allow: {} },
  }],
  maximumMessageSizeBytes: 16 * 1024 * 1024 * 1024,
  fetchAuthorizer: { allow: {} },
  pushAuthorizer: { allow: {} },
}

The ac/cas is a bb-storage instance configured with the default settings as per its readme, with the exception of changing the putAuthorizing to always allow.

The following MODULE.bazel should be suitable to reproduce this:

http_archive = use_repo_rule("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "googletest",
    integrity = "sha512-vsja0qWrvqjp5fDO7djJ29uJOen3R4VHawlI8h9dtZAQGBV+eDh+EGxnFzJlWNZkL8Djk3nGKvV78SBanfihiw==",
    urls = [
        "https://github.com/google/googletest/releases/download/v1.16.0/googletest-1.16.0.tar.gz",
    ],
    build_file_content = "",
)

I was running bazel 8.1.0 via:

bazel fetch --repository_cache= --experimental_remote_downloader=grpc://localhost:8981 --remote_cache=grpc://localhos
t:8980 @googletest//...

Looking through the code, I think the issue is between the fetcher and the cachingFetcher: it seems that the httpFetcher.FetchBlob will return the digest provided by the client, if one was available. This is then used by the cachingFetcher as the digest that is passed to the asset store. However, the backend asset store always uses a specific digest function; in the case of bb-storage that would always always be sha256, but presumably in general this could be different per backend, hence the mixup in digests.

I'm not quite sure how this should be fixed: you can patch around this by making the http_fetcher always compute the digest by ignoring the checksumSri here:

expectedDigest, digestFunctionEnum, err := getChecksumSri(req.Qualifiers)
if err != nil {

That will cause it to select sha256 as the digest function and correctly calculate the blob's hash in a way bb-storage accepts. However, this feels like a larger problem: I think bb-remote-asset needs to be aware of the digest function that the storage backend is using, and needs to always compute a matching digest.

As a side note I don't think bb-remote-asset validates the hash of the file if an expected digest was provided.

@tomcoldrick-ct
Copy link
Collaborator

Thanks for raising this, I think you are correct that this is a wider issue with how we're using digest functions throughout the codebase. To sort this out I think it's best to rewire a lot of code to correct that everywhere. Hopefully this will also remove a load of messy boilerplate we have.

@tomcoldrick-ct tomcoldrick-ct linked a pull request Mar 4, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants