Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tslearn needs uniformly sampled and same sized time series #3

Closed
MaxBenChrist opened this issue Aug 10, 2017 · 5 comments
Closed

tslearn needs uniformly sampled and same sized time series #3

MaxBenChrist opened this issue Aug 10, 2017 · 5 comments

Comments

@MaxBenChrist
Copy link

MaxBenChrist commented Aug 10, 2017

After a first glimpse on the package, I was wondering if tslearn needs uniformly sampled and same sized time series?

@rtavenar
Copy link
Member

Hi,

tslearn accepts time series of different lengths in a dataset (see here for example), but there is a clear lack in the documentation (and the checks done in the code) about whether each algorithm expects equal sized time series or not.

Concerning sampling, at the moment you are right, there is a uniform sampling assumption (mainly because the algorithms implemented for now do not care about sampling if I am correct. This is also something that should be discussed at some point: if you have experience on that, I would be glad to hear/read you on that point.

@rtavenar
Copy link
Member

In the end, this issue seems related to Issue #2 . So I might close this one as soon as time series format is properly documented and (most importantly) associated machine learning algorithms get flags in the docs indicating if:

  1. they don't care about the time series format (i.e. are supposed to run whatever the format, which should be tested) ;
  2. they accept only time series of equal length.

@MaxBenChrist
Copy link
Author

MaxBenChrist commented Aug 31, 2017

There are four main questions that arise when working with time series data:

  1. Do the time series can have different lengths?
  2. Are the time series uniformly sampled?
  3. Are the time series allowed to have missing values, NaNs etc.?
  4. Are multivariate time series allowed?

Depending on the answers, matrix based formats can not be used, instead one has to use stacked formats.

I think the bast place to have a discussion on this topic would be in my repository with the list of time series python packages. I will add some documents and then link it to here.

@rtavenar
Copy link
Member

A note has been added to the doc of all methods that do not accept time series of different lengths. So if nothing specified, the algo should run even if the dataset has time series of different lengths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants