Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use batch prover from miden-base #659

Merged
merged 14 commits into from
Feb 4, 2025
Merged

Conversation

PhilippGackstatter
Copy link

@PhilippGackstatter PhilippGackstatter commented Jan 31, 2025

Removes TransactionBatch and related parts of the existing batch builder and replaces it with the parts added in miden-base.

Batch Inputs

When fetching inputs for a transaction batch, we need the blocks referenced by all transactions in the batch. This is (essentially) a Vec<ProvenTransaction>. Each ProvenTransaction has a block_ref: Digest, which is the block it references. So we end up with a Vec<Digest> that we need to fetch from the DB.

There is the block_headers table:

CREATE TABLE
block_headers
(
block_num INTEGER NOT NULL,
block_header BLOB NOT NULL,
PRIMARY KEY (block_num),
CONSTRAINT block_header_block_num_is_u32 CHECK (block_num BETWEEN 0 AND 0xFFFFFFFF)
) STRICT;

However, it does not expose the block_hash, so its pretty hard to query the blocks by it. The table assumes that blocks are queried by their block number, but we only have the block hashes/digests.

Option 1: Add block num to ProvenTransaction

I ended up going with this approach, which is add the block number to ProvenTransaction. The block number is not needed for verification of the execution proof, so it is unnecessary in that sense. We likely should check in a submit_proven_transaction request that the block exists, but we haven't done that so far and just assumed that the tx is valid as long as its proof is valid. So I think miden_node_block_producer::server::BlockProducerRpcServer::submit_proven_transaction or miden_node_block_producer::store::StoreClient::get_tx_inputs (which is called by the former) should check that the referenced block exists.

Overall, it seems fine to add the block number to the required data, as it doesn't compromise privacy and adds just 4 bytes. So if/when we implement this "reference block existence check", we can make sure that the block we got from the provided block number has the hash the transaction references.

Option 2: Query by hash

The other option I can think of is to change the database model to support querying by block hash, so basically:

CREATE TABLE
    block_headers
(
    block_num    INTEGER NOT NULL,
    block_hash   BLOB    NOT NULL,
    block_header BLOB    NOT NULL,

    PRIMARY KEY (block_num, block_hash),
    CONSTRAINT block_header_block_num_is_u32 CHECK (block_num BETWEEN 0 AND 0xFFFFFFFF)
) STRICT;

It seems we should try to keep these tables as small as possible to minimize the required state, which is why I went with the above appproach.

RPC APIs

A new get_batch_inputs RPC api was added. Please have a look whether this matches how we usually do this, as it feels like this covers a lot of parts.

Should we try to add an RPC representation of ChainMmr and its contained types or is it fine to use a serialized representation?

Misc

  • The transaction batch tests that were removed are all covered by tests that are now in miden-base.
  • Should we break up miden_node_store::state::State::get_batch_inputs into smaller pieces for better reuse somehow or leave it as-is for now?
  • BatchGraph::insert needs to take a transactions: Vec<(TransactionId, AccountId)>, now because the BatchId is computed from this tuple, whereas previously it was just computed from the transaction IDs.
  • A couple of small APIs in the corresponding PR in miden-base were added to make things at all possible, more efficient or easy here.

closes #26

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I reviewed most non-test code and left a few small comments inline.

I ended up going with this approach, which is add the block number to ProvenTransaction.

Yes, agreed that this is the right approach. We may want to include block number into the transaction kernel input as well to make sure the proof attests to the validity of the entire ProvenTransaction struct. I'll leave a comment about this in miden-base.

Should we try to add an RPC representation of ChainMmr and its contained types or is it fine to use a serialized representation?

I think using a serialized representation is fine. One counter-argument is that it makes keeping backward-compatibility a bit more difficult, but we are using serialized representations in other places as well and so using serialized representation here wouldn't make a material difference.

Should we break up miden_node_store::state::State::get_batch_inputs into smaller pieces for better reuse somehow or leave it as-is for now?

I'd probably keep it as is for now. We may come back later to do a comprehensive refactoring to see how we can organize the state better - but that would be for another PR.

crates/store/src/state.rs Outdated Show resolved Hide resolved
crates/store/src/state.rs Outdated Show resolved Hide resolved
crates/proto/src/domain/batch.rs Outdated Show resolved Hide resolved
@@ -97,12 +100,18 @@ impl BatchGraph {
/// - any parent transactions are _not_ in the graph
pub fn insert(
&mut self,
transactions: Vec<TransactionId>,
transactions: Vec<(TransactionId, AccountId)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a wrapper struct fro (TransactionId, AccountId) tuple? Not sure what to call it though. If we do introduce it, it should probably go into miden-base.

CHANGELOG.md Outdated Show resolved Hide resolved
proto/store.proto Outdated Show resolved Hide resolved
crates/block-producer/src/mempool/batch_graph.rs Outdated Show resolved Hide resolved
Comment on lines +15 to +19
/// This builds a mocked version of a proven batch for testing purposes which can be useful if
/// the batch's details don't need to be correct (e.g. if something else is under test but
/// requires a transaction batch). If you need an actual valid [`ProvenBatch`], build a
/// [`ProposedBatch`](miden_objects::batch::ProposedBatch) first and convert (without proving)
/// or prove it into a [`ProvenBatch`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not remove the mocked object in favor of this? Or is ProposedBatch not very builder friendly?

I really dislike mocked objects 😓

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MMR-related inputs to ProposedBatch are not easy to build, so this was the quickest way to get the existing tests to work, which don't need a valid ProvenBatch, afaict.

It would require using the MockChain from miden-base to create the proper MMR inputs, e.g:
https://github.com/0xPolygonMiden/miden-base/blob/3a58866a5ea7e45f72778fe6b5953ecd6a4eb028/crates/miden-tx-batch-prover/src/tests/proposed_batch.rs#L44-L83

Even there, we're not building proper transactions because that would be a lot of test setup code with the existing tooling. For the tests that are supposed to pass, it would definitely be nice if we could make it all valid.

One the other hand, with mocked things it's easy to create invalid inputs as well, which is important to test for error conditions. So if we have tooling that creates strictly valid inputs, then we still need some other set of tools to test for errors.

Doing both in the same tool might be quite complex and could more easily produce unintended output. In an ideal world I guess we'd have one set of tools for creating valid input (e.g. MockChain) and one set for invalid input (like MockProvenTxBuilder), but haven't thought it all the way through.

Copy link
Contributor

@Mirko-von-Leipzig Mirko-von-Leipzig Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2c without knowing the inner workings of the base types very well.

One the other hand, with mocked things it's easy to create invalid inputs as well, which is important to test for error conditions. So if we have tooling that creates strictly valid inputs, then we still need some other set of tools to test for errors.

I think the best way to write such tests is to generate correct data and then make the data bad. However this doesn't really jive with our codebases which is unfortunate:

fn incorrect_block_hash_is_rejected() {
    let mut correct_header = BlockHeader::random(...);
    let bad_header = correct_header.with_hash(BlockHash::random(..));

    assert!(uut.do_thing(bad_header).is_err());
}

Most of our objects are (a) fully private internals, and (b) don't make incorrect manipulation possible at all. Which I think is a mistake :) (similarly a pain by hiding these extra features under testing flag but that's a separate argument). My preference would actually be that everything is pub unless specified otherwise but we default to private (feels very C++). We already have mut for control so we shouldn't be scared of pub.

In general we should focus on generating random data, and then making constructing correct data off it much easier.

Sounds like we aren't there yet for batch data so this is okay for now. I just know from writing the graph stuff that eventually I made it generic because generating transaction IDs was impossible. And that's a bad reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create a separate issue for discuss this further? I actually like the fact that don't make the fields pub, but also would be good to find a way to get rid of this mock builder.

crates/store/src/server/api.rs Outdated Show resolved Hide resolved
@PhilippGackstatter
Copy link
Author

@Mirko-von-Leipzig What would be a good way to test the changes? The get batch inputs function in the store which I added is not under test at all right now, as it turns out. (I added a panic there and all tests still pass).

@Mirko-von-Leipzig
Copy link
Contributor

@Mirko-von-Leipzig What would be a good way to test the changes? The get batch inputs function in the store which I added is not under test at all right now, as it turns out. (I added a panic there and all tests still pass).

Honestly the best test is running the client integration tests agains the node. Which is unfortunately not part of CI yet so one has to do it manually. Essentially:

  1. Run a node locally, then
  2. Run make integration-test from client repo

Default settings for all things should just work, otherwise one has to tweak the client's config file.

This isn't conclusive proof. Better would be add unit tests for the store side i.e. insert chain data and ensure you get the correct stuff back. I haven't done this before though so I can't advise how hard/easy this is. I suspect you would need to generate correct data first 😅

@PhilippGackstatter
Copy link
Author

Thanks! I ran make docker-build-node and make docker-build-node in the node repo and then make integration-test in the client repo and they were all green.

@Mirko-von-Leipzig
Copy link
Contributor

Thanks! I ran make docker-build-node and make docker-build-node in the node repo and then make integration-test in the client repo and they were all green.

Then at least its as good as it was before :)

Copy link
Contributor

@Mirko-von-Leipzig Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you!

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left a couple of comments inline, one is not related to this PR.

crates/store/src/state.rs Show resolved Hide resolved
crates/store/src/state.rs Show resolved Hide resolved
@PhilippGackstatter PhilippGackstatter merged commit 87b4a70 into next Feb 4, 2025
11 checks passed
@PhilippGackstatter PhilippGackstatter deleted the pgackst-batch-prover branch February 4, 2025 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants