Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: store seed command in stress test CLI #657

Draft
wants to merge 44 commits into
base: next
Choose a base branch
from

Conversation

SantiagoPittella
Copy link
Collaborator

@SantiagoPittella SantiagoPittella commented Jan 29, 2025

@bobbinth i'm applying your suggestion of a new structure (your comment here). The only thing I changed is that I create all the accounts before hand.

I added some queries to check the size of each table, and the avg size of each entry in every table. This is the current output of the binary:

For example, this is the current output for 100k accounts:

Creating new faucet account...
Generated 100000 accounts in 19.377 seconds
Creating notes...
Created notes: inserted 393 blocks with avg insertion time 641 ms
Store file size every 50 blocks:
Block 0: 4096 bytes
Block 50: 2645048 bytes
Block 100: 72965512 bytes
Block 150: 137780152 bytes
Block 200: 201362360 bytes
Block 250: 264911800 bytes
Block 300: 328420280 bytes
Block 350: 392027064 bytes
Block 400: 455551928 bytes
Average growth rate: 1159154.7888040713 bytes per blocks
Total time: 274.344 seconds
DB Stats for account_deltas: 4096 bytes, 4096.0 bytes/entry
DB Stats for block_headers: 122880 bytes, 311.088607594937 bytes/entry
DB Stats for account_fungible_asset_deltas: 4096 bytes,  bytes/entry
DB Stats for notes: 460816384 bytes, 4608.16384 bytes/entry
DB Stats for account_non_fungible_asset_updates: 4096 bytes,  bytes/entry
DB Stats for nullifiers: 4370432 bytes, 43.7218087234894 bytes/entry
DB Stats for account_storage_map_updates: 4096 bytes,  bytes/entry
DB Stats for settings: 4096 bytes, 2048.0 bytes/entry
DB Stats for account_storage_slot_updates: 4096 bytes, 1365.33333333333 bytes/entry
DB Stats for transactions: 5644288 bytes, 56.244337488665 bytes/entry
DB Stats for accounts: 6062080 bytes, 60.6438446609712 bytes/entry

This PR was built on top of #621 .

@SantiagoPittella
Copy link
Collaborator Author

I added a couple more of metrics, and re run a couple times with different number of accounts.

I'm consistently getting 4550~ish bytes/account

@SantiagoPittella
Copy link
Collaborator Author

I'm consistently getting 4550~ish bytes/account

It is worth mention that this number is the result of doing total_db_size / total_number_account so it includes blocks and any other stuff that we store in the DB.

I ran the following query in the DB and got this results:

SELECT
     name,
     SUM(pgsize) AS size_bytes,
     (SUM(pgsize) * 1.0) / (SELECT COUNT(*) FROM accounts) AS bytes_per_row
 FROM dbstat
 WHERE name = 'accounts';

And got this results:
accounts| 6025216 bytes | 60.2750645245193 bytes per account

@bobbinth
Copy link
Contributor

I added a couple more of metrics, and re run a couple times with different number of accounts.

I'm consistently getting 4550~ish bytes/account

4.5KB is better than 5MB - but still looks pretty high. Part of this is nullifiers and notes - but I don't see how these contribute more than 1KB (about 40 bytes for nullifiers + 80 bytes for notes + 500 bytes for note authentication paths). But it is also possible I'm missing something.

Another possibility is that SQLite doesn't do compaction, and maybe there is a lot of "slack" in the file.

@igamigo
Copy link
Collaborator

igamigo commented Jan 30, 2025

There's also the overhead of block headers and indices as well, but those are also probably too small to consider.
One test that you can do is run a VACUUM command to see if the file sizes change considerably after the initial seeding finishes.

@SantiagoPittella
Copy link
Collaborator Author

I added some queries to check the size of each table, and the avg size of each entry in every table. This is the current output of the binary:

Creating new faucet account...
Generated 100000 accounts in 19.377 seconds
Creating notes...
Created notes: inserted 393 blocks with avg insertion time 641 ms
Store file size every 50 blocks:
Block 0: 4096 bytes
Block 50: 2645048 bytes
Block 100: 72965512 bytes
Block 150: 137780152 bytes
Block 200: 201362360 bytes
Block 250: 264911800 bytes
Block 300: 328420280 bytes
Block 350: 392027064 bytes
Block 400: 455551928 bytes
Average growth rate: 1159154.7888040713 bytes per blocks
Average space used per account: 4545.705054133613 bytes
Total time: 274.344 seconds
DB Stats for account_deltas: 4096 bytes, 4096.0 bytes/entry
DB Stats for block_headers: 122880 bytes, 311.088607594937 bytes/entry
DB Stats for account_fungible_asset_deltas: 4096 bytes,  bytes/entry
DB Stats for notes: 460816384 bytes, 4608.16384 bytes/entry
DB Stats for account_non_fungible_asset_updates: 4096 bytes,  bytes/entry
DB Stats for nullifiers: 4370432 bytes, 43.7218087234894 bytes/entry
DB Stats for account_storage_map_updates: 4096 bytes,  bytes/entry
DB Stats for settings: 4096 bytes, 2048.0 bytes/entry
DB Stats for account_storage_slot_updates: 4096 bytes, 1365.33333333333 bytes/entry
DB Stats for transactions: 5644288 bytes, 56.244337488665 bytes/entry
DB Stats for accounts: 6062080 bytes, 60.6438446609712 bytes/entry

@SantiagoPittella SantiagoPittella force-pushed the santiagopittella-stress-testing-copy branch from 58cac73 to 348b54c Compare January 31, 2025 15:00
@SantiagoPittella SantiagoPittella changed the base branch from tomasarrachea-stress-test to next January 31, 2025 15:01
@SantiagoPittella SantiagoPittella marked this pull request as ready for review January 31, 2025 15:18
@SantiagoPittella SantiagoPittella changed the title draft: store seed refactor feat: store seed command in stress test CLI Jan 31, 2025
@SantiagoPittella
Copy link
Collaborator Author

I could not reduce the time that takes for each block. It is currently at 1.6 blocks/sec

@bobbinth
Copy link
Contributor

I could not reduce the time that takes for each block. It is currently at 1.6 blocks/sec

Do you have the breakdown of different components of this? Specifically, how much, out of 1.6 seconds is taken up by block construction vs. block insertion?

@bobbinth
Copy link
Contributor

DB Stats for account_deltas: 4096 bytes, 4096.0 bytes/entry
DB Stats for block_headers: 122880 bytes, 311.088607594937 bytes/entry
DB Stats for account_fungible_asset_deltas: 4096 bytes,  bytes/entry
DB Stats for notes: 460816384 bytes, 4608.16384 bytes/entry
DB Stats for account_non_fungible_asset_updates: 4096 bytes,  bytes/entry
DB Stats for nullifiers: 4370432 bytes, 43.7218087234894 bytes/entry
DB Stats for account_storage_map_updates: 4096 bytes,  bytes/entry
DB Stats for settings: 4096 bytes, 2048.0 bytes/entry
DB Stats for account_storage_slot_updates: 4096 bytes, 1365.33333333333 bytes/entry
DB Stats for transactions: 5644288 bytes, 56.244337488665 bytes/entry
DB Stats for accounts: 6062080 bytes, 60.6438446609712 bytes/entry

Interesting! It seems like 97% of the database is in the notes table (we are adding 4.5KB of data per note). Let's create an issue to optimize note storage. Let's create an issue to optimize note storage. I think we should be able to reduce the size by about 10x.

@SantiagoPittella SantiagoPittella mentioned this pull request Jan 31, 2025
5 tasks
@SantiagoPittella
Copy link
Collaborator Author

@bobbinth I created #662 to address the notes issue.

Do you have the breakdown of different components of this? Specifically, how much, out of 1.6 seconds is taken up by block construction vs. block insertion?

According to the flamegraph, a big part of this process is in miden_processor::Process::execute_mast_node. Though I'm making some changes to get more insights of this.

@bobbinth
Copy link
Contributor

Do you have the breakdown of different components of this? Specifically, how much, out of 1.6 seconds is taken up by block construction vs. block insertion?

According to the flamegraph, a big part of this process is in miden_processor::Process::execute_mast_node. Though I'm making some changes to get more insights of this.

Looking at the code, I think what we are measuring as insertion time is actually BlockBuilder::build_block() time. This will actually covers much more than just inserting block into the store (e.g., it also executes the block kernel to build the block). So, this will be much longer than what we want. What I really mean by insertion time is the time it takes the store to process apply_block() request. My hope is that this is less than 100ms - but we'll need to confirm.

What may make sense to do is build blocks manually (and probably the same for batches as well), without using block and batch builders. This is somewhat difficult now, but should become much easier after #659 and subsequent work is done.

So, I guess we have 3 paths forward:

  1. Review & merge this work roughly as is and then re-factor it once manual block/batch building becomes easier.
  2. Try to implement manual block/batch building now, and then update it with the new structure later.
  3. Put this on hold and then re-work once manual block/batch building becomes easier.

Assuming this is pretty close to being done, path 1 probably makes the most sense - but let me know what you think.

@@ -15,35 +15,37 @@ version.workspace = true
workspace = true

[features]
testing = ["dep:miden-air", "dep:miden-node-test-macro", "dep:rand_chacha", "dep:winterfell"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a more manual approach to batch/block building, we probably wouldn't need to introduce the testing feature. So, maybe that's an argument for putting the work on hold for now.

@SantiagoPittella
Copy link
Collaborator Author

@bobbinth I think that 1) and 3) are the best options.

In a way I think that having this command anyways, building blocks like it is currently doing, is fine for now. We are not testing block insertion time but we accomplish the goal of seeding the DB to further benchmark the endpoints.

I agree with about 3) though, I don't really like re-introducing the testing feature, and if we can choose to replace it in the near future, I would go for that.

@bobbinth
Copy link
Contributor

bobbinth commented Feb 4, 2025

I agree with about 3) though, I don't really like re-introducing the testing feature, and if we can choose to replace it in the near future, I would go for that.

Let's go with option (3) for now so that we don't have to introduce and then again remove the testing feature. Batch building should be simplified even today, and block building may be in the same state by the end of the week.

@SantiagoPittella
Copy link
Collaborator Author

Ok, sounds good! I will put this and the StateSync benchmark one as drafts then.

@TomasArrachea
Copy link
Collaborator

Updated this PR to use the new ProvenBatch, pausing this until the new block building is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants