Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed-up read_vpfiles #57

Open
Rafnuss opened this issue Dec 13, 2017 · 5 comments
Open

Speed-up read_vpfiles #57

Rafnuss opened this issue Dec 13, 2017 · 5 comments
Milestone

Comments

@Rafnuss
Copy link
Collaborator

Rafnuss commented Dec 13, 2017

I was just wandering if we could make readvp.list and/or readvp faster.

Replacing

profile=as.data.frame(lapply(groups,function(x) readOdimProfileData(filename,x)))

with

x=h5read(filename,'/')
profile = data.frame(matrix(unlist(x$dataset1),nrow=length(x$dataset1$data1$data)))

can reduce 1/3 of the running time with my quick test. But I'm not sure if it's really valid for all the radar file. Also, do we really need to read all the attribute when using readvp.list ?

@adokter
Copy link
Owner

adokter commented Dec 13, 2017

Thanks, it has indeed been on my wish list to have a faster read of the data (and at least the option to skip many of the attributes), will look into this

@adokter adokter added this to the 0.5.0 milestone Sep 27, 2018
@adokter adokter changed the title Speed-up readvp Speed-up read_vpfile Jul 23, 2019
@adokter adokter changed the title Speed-up read_vpfile Speed-up read_vpfiles Jul 23, 2019
@adokter adokter modified the milestones: 0.5.0, 1.0.0 Jul 23, 2019
@baptischmi
Copy link
Collaborator

baptischmi commented Apr 21, 2021

How about using foreach and a PSOCK backends? speed up depend on the number of cluster and RAM, but i ran 3x faster on my machine.

  #create and register the cluster
  n_cores = parallel::detectCores() - 1 # define the number of cluster to be used
  my.cluster <- parallel::makeCluster(
    n_cores, 
    type = "PSOCK"
  )
  doParallel::registerDoParallel(cl = my.cluster)
 # read vp-h5 files
  vpl <- foreach(i = 1:length(files)) %dopar% {
    i.file <- files[i] 
    bioRad::read_vpfiles(i.file)
  }
  parallel::stopCluster(cl = my.cluster)

@bart1
Copy link
Collaborator

bart1 commented Apr 21, 2021

It would atleast require good cross platform testing as the parallel environments differ between operating systems. (I think "PSOCK" is not the fastest on linux). It also might conflict with cases where parallelization is done at a higher level. Maybe this issue becomes kind of obsolete once vpts like formats become more common. I think @peterdesmet is working on that, I'm not sure about the progress

@peterdesmet
Copy link
Collaborator

I'll discuss the format with Adriaan next week. The idea is to keep it close to the vol2bird out, but delimited rather than length-based columns. This tabular data should be able to capture any timespan for a single radar and have an additional file with metadata.

@peterdesmet
Copy link
Collaborator

With the VPTS CSV file format, users will be able to read data much faster using read_vpts() #551. But improvements could still be made to speeding up reading hdf5 files too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants