Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Persistence_matrix] move bar type outside #1083

Merged
merged 9 commits into from
Jul 10, 2024

Conversation

hschreiber
Copy link
Collaborator

To be able to use it also for Zigzag persistence (PR #917) more easily.

I also added the possibility to use std::get and std::tuple_size for the bars for the compatibility with other modules. std::tuple_size just doesn't work for structure inheriting from a Persistence_interval.

Copy link
Member

@mglisse mglisse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we have different stylistic tastes. I tend to favor short:

  • Persistence_interval(dimension_type dim, event_value_type birth, event_value_type death) doesn't need any description (it should still appear in the doc), the declaration is self-documenting.
  • A comment at the beginning that Persistence_interval is usable like a tuple that stores birth, death and dim in this order with std::tuple_element, std::tuple_size, std::get and structured binding, may be clearer than explicitly and redundantly documenting each specialization and overload.

But we don't have to follow my tastes 😉

*
* @brief Overload of `std::get` for @ref Gudhi::persistence_matrix::Persistence_interval.
*
* @tparam I Index of the value to return: 0 for the birth value, 1 for the death value and 2 for the dimension.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems confusing that the constructor uses the order dim-birth-death while the access is in order birth-death-dim. Can't we pick one consistent order? Or are you trying to be consistent with 2 existing, incompatible types, and this is some compromise?

Copy link
Collaborator Author

@hschreiber hschreiber Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially did all the matrix, zigzag etc. with the order dim-birth-death in mind. But when I was looking at making it consistent with the output of cohomology when adding the std::get overloads, I saw that the tuples are in order birth-death-coeff. So I wanted the std::get to stick closer to that. I thought that it won't be much of a problem as the users using the constructors them-selves would probably not use the std::get to access the members and therefore not be confused. But I agree, it was mostly laziness on my side, as I didn't wanted to correct all the calls to the bars in the matrix to change the order... But I could do that if you find it really too confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the new consistent order better (I don't mind that printing is in a different order because the [] around the dimension don't make it look like a tuple).
A drawback is that we are probably less likely to be able to use the short versions of the constructor with 1 or 2 elements 🤷

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I used the short version in zigzag and will have to use the long version now. But it is not the end of the world. Or I could go back to have three different constructors.

@hschreiber
Copy link
Collaborator Author

I guess we have different stylistic tastes. I tend to favor short:

* `Persistence_interval(dimension_type dim, event_value_type birth, event_value_type death)` doesn't need any description (it should still appear in the doc), the declaration is self-documenting.

* A comment at the beginning that Persistence_interval is usable like a tuple that stores birth, death and dim in this order with std::tuple_element, std::tuple_size, std::get and structured binding, may be clearer than explicitly and redundantly documenting each specialization and overload.

But we don't have to follow my tastes 😉

By experience I realized that the more redundant the documentation is, the better the people are understanding it... Even though I personally also find it often too much. So, I will try to make a mix of the two approaches by adding what you mentioned in the main description.

@@ -0,0 +1,215 @@
/* This file is part of the Gudhi Library - https://gudhi.inria.fr/ - which is released under MIT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File should be named Persistence_interval.h, cf. code convention

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the only file which doesn't follow this convention in the module. Is it fine if I correct them all in this PR or should I do another one? Also the example_* are not respected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't mix mass renaming and actual changes in the same branch, it makes it super hard to review.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but as the changes of this PR are mostly in just one file, I thought there could be an exception. But no problem in doing another PR for the renamings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the changes of this PR are mostly in just one file

If you rename things all over the place in other files, it will take a reviewer a long time just to make sure that this file is the only one with non-trivial changes.
(I didn't look at what the renaming was about)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I don't think there will be other reviewesr than you two, no? But, like I said, it makes sense to make another PR out of it, it was more of a "you never know" question. The new PR is here: #1106.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VincentRouvreau I am not sure anymore if I should really rename persistence_interval.h to Persistence_interval. The convention says:

- If a single class or function is provided in a file, its name (with the same letter case) should be used for the file name.
- If a file does not contain a single class, its name should not begin with a capital letter.

The file does not contain only the Persistence_interval class, but also different global methods (the get etc.) and global structures (tuple_element). So, the second point of the convention should apply, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I never read the convention and don't really care)
I would argue that those are part of the interface of this class. Whether they are free functions, static members, etc is a detail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that is why they are in the same file. But I am not sure that is how I interpret the convention. I saw the two namings a way for the developer to search more easily: "if there is a upper case, don't need to look here for this one method related to it, it has to be somewhere else". Otherwise, I am not sure to see the utility to have two different naming convention for files...

Comment on lines 68 to 70
inline friend std::ostream &operator<<(std::ostream &stream, const Persistence_interval &interval) {
stream << "[" << interval.dim << "] ";
stream << interval.birth << ", " << interval.death;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inf values should be detected and print something else.

  using Dimension_type = typename Matrix::dimension_type;
  using Pos_index = typename Matrix::pos_index;
  using Interval = Gudhi::persistence_matrix::Persistence_interval<Dimension_type, Pos_index>;

  Interval bar_default;
  std::cout << bar_default << std::endl;
  // prints: [-1] 4294967295, 4294967295

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the same remark than Marc, but:

  Interval bar(0,1,2);
  std::cout << bar << std::endl;
  // print: [0] 1, 2
  std::cout << std::get<0>(bar) << ", " << std::get<1>(bar) << ", " << std::get<2>(bar) << std::endl;
  // print: 1, 2, 0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to understand which remark of Marc you are referring to?

Copy link
Member

@mglisse mglisse Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the one about having a consistent order on the elements of the tuple between the constructor and get (and operator<<).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I already explained why it was like that, I was waiting for an answer. So, should I change the order in the constructor to match the std::gets? But for the print, I like it better to have the dimension first. [0] 1, 2 is much easier to read than 1, 2 [0] in particular when there are a lot of bars.

@mglisse
Copy link
Member

mglisse commented Jul 9, 2024

Side remark: I am surprised your intervals always come with a dimension. Can you comment on that?

@hschreiber
Copy link
Collaborator Author

Side remark: I am surprised your intervals always come with a dimension. Can you comment on that

Until now, all use cases I had for barcodes needed the dimension somehow (to plot them separately for example), so I thought that adding the dimension is a natural thing to do. In particular when getting the dimension from the barcode using the matrix is not always possible (eg. when MatIdx != PosIdx). But it could be an option if you think that there are enough cases where the dimension is not needed and will therefore just take up useless space in the barcode.

@mglisse mglisse merged commit 272c442 into GUDHI:master Jul 10, 2024
7 checks passed
@mglisse
Copy link
Member

mglisse commented Jul 10, 2024

Until now, all use cases I had for barcodes needed the dimension somehow (to plot them separately for example), so I thought that adding the dimension is a natural thing to do. In particular when getting the dimension from the barcode using the matrix is not always possible (eg. when MatIdx != PosIdx). But it could be an option if you think that there are enough cases where the dimension is not needed and will therefore just take up useless space in the barcode.

When computing usual persistence, the matrix doesn't need to know anything about dimensions. We can have a single matrix for all dimensions, or a separate matrix for each dimension.
When storing a diagram, if we care about dimensions, instead of a list of (birth, death, dim), we could store for each dimension a list of (birth, death), that's essentially the new format we are using in Python, and it is more compact.
(I am just explaining why I was surprised, not telling you to go change everything right now)

@hschreiber
Copy link
Collaborator Author

Until now, all use cases I had for barcodes needed the dimension somehow (to plot them separately for example), so I thought that adding the dimension is a natural thing to do. In particular when getting the dimension from the barcode using the matrix is not always possible (eg. when MatIdx != PosIdx). But it could be an option if you think that there are enough cases where the dimension is not needed and will therefore just take up useless space in the barcode.

When computing usual persistence, the matrix doesn't need to know anything about dimensions. We can have a single matrix for all dimensions, or a separate matrix for each dimension. When storing a diagram, if we care about dimensions, instead of a list of (birth, death, dim), we could store for each dimension a list of (birth, death), that's essentially the new format we are using in Python, and it is more compact. (I am just explaining why I was surprised, not telling you to go change everything right now)

True, for computing the persistence the dimensions are not needed per se, but it can be useful for some optimization during reduction. Also with it, the matrix can be used a bit like a simplex tree by storing commonly needed information about your filtration the user don't have to manage anymore. But I agree that we could simplify the structure by removing completely the notion of dimension from it (or make it completely optional to keep the optimization of $\partial \circ \partial = 0$).

@VincentRouvreau VincentRouvreau added the 3.11.0 GUDHI version 3.11.0 label Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11.0 GUDHI version 3.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants