Speed-up read_vpfiles #57

Rafnuss · 2017-12-13T13:06:32Z

I was just wandering if we could make readvp.list and/or readvp faster.

Replacing

profile=as.data.frame(lapply(groups,function(x) readOdimProfileData(filename,x)))

with

x=h5read(filename,'/')
profile = data.frame(matrix(unlist(x$dataset1),nrow=length(x$dataset1$data1$data)))

can reduce 1/3 of the running time with my quick test. But I'm not sure if it's really valid for all the radar file. Also, do we really need to read all the attribute when using readvp.list ?

The text was updated successfully, but these errors were encountered:

adokter · 2017-12-13T14:31:21Z

Thanks, it has indeed been on my wish list to have a faster read of the data (and at least the option to skip many of the attributes), will look into this

baptischmi · 2021-04-21T08:42:12Z

How about using foreach and a PSOCK backends? speed up depend on the number of cluster and RAM, but i ran 3x faster on my machine.

  #create and register the cluster
  n_cores = parallel::detectCores() - 1 # define the number of cluster to be used
  my.cluster <- parallel::makeCluster(
    n_cores, 
    type = "PSOCK"
  )
  doParallel::registerDoParallel(cl = my.cluster)
 # read vp-h5 files
  vpl <- foreach(i = 1:length(files)) %dopar% {
    i.file <- files[i] 
    bioRad::read_vpfiles(i.file)
  }
  parallel::stopCluster(cl = my.cluster)

bart1 · 2021-04-21T08:46:06Z

It would atleast require good cross platform testing as the parallel environments differ between operating systems. (I think "PSOCK" is not the fastest on linux). It also might conflict with cases where parallelization is done at a higher level. Maybe this issue becomes kind of obsolete once vpts like formats become more common. I think @peterdesmet is working on that, I'm not sure about the progress

peterdesmet · 2021-04-21T08:54:29Z

I'll discuss the format with Adriaan next week. The idea is to keep it close to the vol2bird out, but delimited rather than length-based columns. This tabular data should be able to capture any timespan for a single radar and have an additional file with metadata.

peterdesmet · 2023-03-10T16:45:15Z

With the VPTS CSV file format, users will be able to read data much faster using read_vpts() #551. But improvements could still be made to speeding up reading hdf5 files too.

adokter added this to the 0.5.0 milestone Sep 27, 2018

adokter added the enhancement label Dec 7, 2018

adokter changed the title ~~Speed-up readvp~~ Speed-up read_vpfile Jul 23, 2019

adokter changed the title ~~Speed-up read_vpfile~~ Speed-up read_vpfiles Jul 23, 2019

adokter modified the milestones: 0.5.0, 1.0.0 Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed-up read_vpfiles #57

Speed-up read_vpfiles #57

Rafnuss commented Dec 13, 2017

adokter commented Dec 13, 2017

baptischmi commented Apr 21, 2021 •

edited

Loading

bart1 commented Apr 21, 2021

peterdesmet commented Apr 21, 2021

peterdesmet commented Mar 10, 2023

Speed-up read_vpfiles #57

Speed-up read_vpfiles #57

Comments

Rafnuss commented Dec 13, 2017

adokter commented Dec 13, 2017

baptischmi commented Apr 21, 2021 • edited Loading

bart1 commented Apr 21, 2021

peterdesmet commented Apr 21, 2021

peterdesmet commented Mar 10, 2023

baptischmi commented Apr 21, 2021 •

edited

Loading