You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to create a large zarr array by sequentially appending to the array like below:
import zarr
import numpy as np
n_samples=100000
n_records=1 # eventually this will be 20000-30000
record = np.random.rand(n_samples,6).reshape(n_samples,1,6)
z = zarr.create_array("myarray.zarr",shape=(n_samples,1, 6),chunks=(10000,n_records, 6), shards=(10000,n_records,6),dtype='float16')
def add_records(n_records):
for i in range(n_records):
z.append(record,axis=0)
I wish to optimize this for both writing and reading. Eventually, I would mostly querying along the first/sample axis. I have played around with multiple chunk sizes but haven't been able to reach the target performance of <0.2 sec/record with minimal memory usage for both writing and reading.
Any parallel multi-CPU option is not feasible. Only a single record is available at a time so individual appends or similar might be necessary.
I would really appreciate if there are any suggestions or thoughts that it be even feasible. Thank you.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am trying to create a large zarr array by sequentially appending to the array like below:
I wish to optimize this for both writing and reading. Eventually, I would mostly querying along the first/sample axis. I have played around with multiple chunk sizes but haven't been able to reach the target performance of <0.2 sec/record with minimal memory usage for both writing and reading.
Any parallel multi-CPU option is not feasible. Only a single record is available at a time so individual appends or similar might be necessary.
I would really appreciate if there are any suggestions or thoughts that it be even feasible. Thank you.
Beta Was this translation helpful? Give feedback.
All reactions