Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Simplex tree] adds possibility to different insertion strategies #1176

Open
wants to merge 46 commits into
base: master
Choose a base branch
from

Conversation

hschreiber
Copy link
Collaborator

Adds different strategies for insert_simplex_and_subfaces to handle the filtration values of the simplices inserted.

Includes PR #1122 so it should be easier to review once the latter is merged.

Necessary for the multi-simplex tree from #976.

hschreiber and others added 30 commits August 14, 2024 18:53
…b.com:hschreiber/gudhi-devel into simplex_tree_filtration_value_generalization
…version of initialize_filtration + change is_floating_point to has_quit_NaN for more generality + additional unit tests
Co-authored-by: Marc Glisse <marc.glisse@inria.fr>
Comment on lines +1155 to +1156
* - if \f$ \sigma \f$ existed already, then it and all of its potentially already inserted cofaces are pushed to the
* intersection of their old value and \f$ f \f$.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that useful?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine an algorithm changing its mind and wanting to insert a simplex at higher value than before. Which means that all its cofaces potentially also have to be pushed up, for the filtration to remain valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can make sense, but in my mind that's not an insertion anymore. If anything, it is closer to a removal (and the code and complexity are probably more similar to a removal as well).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then, even with other strategies, inserting an existing simplex is not really an insertion. The old version also updated the filtration values of everyone, just the other way around.

But yes, the complexity is going towards a removal, as it needs to iterate through the cofaces (just not always to the very end, as it can stop whenever it is not necessary anymore). But that is to expect if you want this type of strategy, I don't think the user would be surprised.

But if you really don't like it, I can remove it. I don't have a use for it for now, it just didn't hurt to add it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to explain my perspective: if you think of a filtered simplicial complex as a set of simplicial complexes (indexed by the filtration value, and with an inclusion when filtration values are comparable), lowering the filtration value of a simplex means inserting it in some of the complexes where it was not yet present, while increasing the filtration value means removing it from some of the complexes.
I am perfectly happy with such functionality being added, but I'd much rather give it a different name than insert.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an algebraic point of view, I agree that it is a removal and not an insertion. But from a data structural point of view, both strategies do the same, just on different nodes. As a compromise, I would propose to use Update_strategy instead of Insertion_strategy. What do you think? Because I am not convinced that it is a good idea to add a new method for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should ask for more opinions. It seems strange to me to repurpose a function called insert_simplex_and_subfaces to do something that

  1. corresponds to a removal
  2. increases the filtration value of a simplex and its cofaces

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is still a matter of point of view: I see the simplex tree as a structure which stores the simplices of a complex and when store_filtration is enabled, it additionally stores the appearance time of the simplices when the complex is considered filtered. From this point of view, insert is already wrong when a simplex is already inserted, as all the method is doing in that case, is updating the stored appearance time, not inserting it in the simplex tree. But if you allow the method to do so, updating the appearance time of cofaces seems fine too, as the idea the same. And I am pretty sure that a user with the same point of view will not be confused by it.
Things are different when you look at it as an algebraic object of course. But it is difficult to say which point of view is the more intuitive here.
But well, from the discussion below, this will probably not be a problem anymore anyway.

Comment on lines 1153 to 1154
* - if \f$ \sigma \f$ was not inserted yet, then \f$ \sigma \f$ and all its non existing faces are inserted at the
* first possible filtration value with minimum filtration value \f$ f \f$.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit strange that it still inserts missing faces: this means that a missing face and a face with filtration value +infinity have very different consequences.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to understand the comment? If you want to insert a simplex and some of its faces are missing, should insert_simplex_and_subfaces not insert them? I thought that was what the purpose of the method (contrary to insert_simplex). The strategies only apply on that method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a function called insert_*_subfaces should insert subfaces, but the different behavior between missing faces and faces with a high filtration value feels inconsistent (I would understand better if the filtration value was increased to +inf when a face is missing, with the idea that a missing face is the same as a face with infinitely large filtration value).
How does David use the function exactly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one face is at +inf, then all non existing cofaces of that face (which are faces of the given simplex) will be inserted at +inf, the given filtration value is just a minimum in that case (that is what I meant with "first possible filtration value with minimum filtration value $f$". It makes no sense to have sigma at a finite filtration value when some of its faces are at +inf. I am still struggling to understand if you misunderstood something (which means I have to better explain it in the description), or if I still don't understand your question...

Copy link
Collaborator Author

@hschreiber hschreiber Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a simple example: say you have a triangle {0,1,2}. The vertices {0}, {1} and {2} are already inserted at filtration value 0. Same for edge {0,1} that is inserted at value 5. Now, we use insert_simplex_and_subfaces({0,1,2}, 4, HIGHEST). This will insert edge {0,2} and {1,2} at filtration value 4, but the triangle {0,1,2} at filtration value 5.

Before, insert_simplex_and_subfaces({0,1,2}, 4) (equivalent now to insert_simplex_and_subfaces({0,1,2}, 4, LOWEST)) would have done the same for {0,2} and {1,2}, but would have pushed down {0,1} to 4 et inserted {0,1,2} also at 4.

And finally, insert_simplex_and_subfaces({0,1,2}, 4, FIRST_POSSIBLE) is inserting {0,2} and {1,2} at value 0 and {0,1,2} at value 5.

If you have now "missing simplices" in the sense that they are included but at value +inf, then HIGHEST will insert all their future cofaces also at infinity, while LOWEST will force their appearances by changing their filtration value to the given one while inserting the cofaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a simple example: say you have a triangle {0,1,2}. The vertices {0}, {1} and {2} are already inserted at filtration value 0. Same for edge {0,1} that is inserted at value 5. Now, we use insert_simplex_and_subfaces({0,1,2}, 4, HIGHEST). This will insert edge {0,2} and {1,2} at filtration value 4, but the triangle {0,1,2} at filtration value 5.

Would a 4th strategy that inserts 02, 12 and 012 all at filtration value 5 make sense?

Before, insert_simplex_and_subfaces({0,1,2}, 4) (equivalent now to insert_simplex_and_subfaces({0,1,2}, 4, LOWEST)) would have done the same for {0,2} and {1,2}, but would have pushed down {0,1} to 4 et inserted {0,1,2} also at 4.

And finally, insert_simplex_and_subfaces({0,1,2}, 4, FIRST_POSSIBLE) is inserting {0,2} and {1,2} at value 0 and {0,1,2} at value 5.

Is insert(x,*,first_possible) equivalent to insert(x,-inf,lowest)?

I don't know if the names highest/lowest properly convey the main difference: lowest may modify the filtration value of existing simplices (so it can really insert it at the time requested), while the others all keep them fixed. Ah, but that property only exists in my head, since you have cases were insert increases the filtration value of existing cofaces...

If you have now "missing simplices" in the sense that they are included but at value +inf, then HIGHEST will insert all their future cofaces also at infinity, while LOWEST will force their appearances by changing their filtration value to the given one while inserting the cofaces.

I find it a bit confusing that, without 02, I get 012 at time 5, while with 02 at time +inf (supposedly equivalent to not having 02?), I get 012 at time +inf. But maybe that's still useful. Or maybe it could yield a 5th strategy.

I'd still be interested in a reminder of what cases David really cared about when he came up with a different strategy.

Copy link
Collaborator Author

@hschreiber hschreiber Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a simple example: say you have a triangle {0,1,2}. The vertices {0}, {1} and {2} are already inserted at filtration value 0. Same for edge {0,1} that is inserted at value 5. Now, we use insert_simplex_and_subfaces({0,1,2}, 4, HIGHEST). This will insert edge {0,2} and {1,2} at filtration value 4, but the triangle {0,1,2} at filtration value 5.

Would a 4th strategy that inserts 02, 12 and 012 all at filtration value 5 make sense?

It would make sense as it is not wrong. It would just even more disregard the input value than before. But as an additional strategy, it is valid. Complexity wise, I don't know if it would be better, I would have to verify again how the traversal in the recursion is done.

Before, insert_simplex_and_subfaces({0,1,2}, 4) (equivalent now to insert_simplex_and_subfaces({0,1,2}, 4, LOWEST)) would have done the same for {0,2} and {1,2}, but would have pushed down {0,1} to 4 et inserted {0,1,2} also at 4.
And finally, insert_simplex_and_subfaces({0,1,2}, 4, FIRST_POSSIBLE) is inserting {0,2} and {1,2} at value 0 and {0,1,2} at value 5.

Is insert(x,*,first_possible) equivalent to insert(x,-inf,lowest)?

No, it is equivalent to insert(x,-inf,highest). But now that I see it, I could factorise both highest and first_possible.

I don't know if the names highest/lowest properly convey the main difference: lowest may modify the filtration value of existing simplices (so it can really insert it at the time requested), while the others all keep them fixed. Ah, but that property only exists in my head, since you have cases were insert increases the filtration value of existing cofaces...

The idea of highest/lowest was that one strategy inserts at highest choice (between given and potentially already existing value) and the other at lowest choice and then updates the filtration such that it remains valid. It also highlight the duality between both strategies, as one is a bit the "co" version of the other. But I have nothing against a name change.

If you have now "missing simplices" in the sense that they are included but at value +inf, then HIGHEST will insert all their future cofaces also at infinity, while LOWEST will force their appearances by changing their filtration value to the given one while inserting the cofaces.

I find it a bit confusing that, without 02, I get 012 at time 5, while with 02 at time +inf (supposedly equivalent to not having 02?), I get 012 at time +inf. But maybe that's still useful. Or maybe it could yield a 5th strategy.

I don't see "not yet inserted faces" and "missing faces" (at +inf) as something equivalent. A missing face is purposely missing, while a not existing face is a face where you still have to indicate where it should go (at +inf for example).

Perhaps, it would be a good idea to describe me the strategies you would like to have and then I see, if I can replace one of my proposition with yours or if I can add this strategy to the existing ones?

I'd still be interested in a reminder of what cases David really cared about when he came up with a different strategy.

What mattered to David, is that if he constructed a filtration in the simplex tree and later wanted to add a simplex, but without really caring about where, he didn't wanted insert_simplex_and_subfaces(s) to flat partially the filtration down to 0, but to insert the simplex at the first position the already existing filtration permits it. Then, in python, he would potentially modify some filtration values by hand. So, basically the FIRST_POSSIBLE strategy. The idea behind HIGHEST was just to give the possibility to give a lower bound on where to insert the simplex. The subcase of HIGHEST where the simplex already exists was just a natural extension, as this case has to be covered (even though not useful for multipers).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a 4th strategy that inserts 02, 12 and 012 all at filtration value 5 make sense?

It would make sense as it is not wrong. It would just even more disregard the input value than before.

I think it just does things in a slightly different order. It first asks: what is the earliest time I can insert 012? And then does insert_simplex_and_subfaces(simplex_012, that_time). Whereas your version recursively inserts the subfaces with the same strategy.
I mostly suggested it because I have trouble figuring out what strategies are likely to be useful.

Complexity wise, I don't know if it would be better,

Mine should be a bit slower, the simplest way I can see to implement it is to follow yours, inserting all the new Simplex_handle in a vector, and updating their filtration value at the end once we know the final value for the main simplex.

I don't see "not yet inserted faces" and "missing faces" (at +inf) as something equivalent. A missing face is purposely missing, while a not existing face is a face where you still have to indicate where it should go (at +inf for example).

A philosophical difference of opinions...

Perhaps, it would be a good idea to describe me the strategies you would like to have and then I see, if I can replace one of my proposition with yours or if I can add this strategy to the existing ones?

I don't have particular strategies I want. We could easily add a dozen strategies. I am just afraid that at most a couple of them would be useful and we would confuse users into picking the wrong one, or at least waste time trying to figure out what to use. But maybe that's still what we should do.

What mattered to David, is that if he constructed a filtration in the simplex tree and later wanted to add a simplex, but without really caring about where, he didn't wanted insert_simplex_and_subfaces(s) to flat partially the filtration down to 0,

It is true that, when we don't explicitly specify a filtration value, this behavior can be annoying. I guess it wouldn't have been a problem if we had used +inf as the default filtration value (although David favors -inf...).

but to insert the simplex at the first position the already existing filtration permits it.

I guess that makes sense, although it isn't completely obvious in this case if we want each new subface to appear also as early as possible (equivalent to calling the function on each missing subface first) or if it is better to have everything appear at the same time (it is one insertion, turning it into multiple insertions at multiple times could be unexpected).

Then, in python, he would potentially modify some filtration values by hand.

I am unsure how that part works. One possibility, if you are going to overwrite the filtration values anyway, would be to insert at +inf. Then at least you can identify the new subfaces as those with +inf filtration value. Or you could be satisfied with the values that were computed for the subfaces and not modify them. But I suspect that maybe he was only using this function when all the subfaces were already present?

The idea behind HIGHEST was just to give the possibility to give a lower bound on where to insert the simplex.

Makes sense.

The subcase of HIGHEST where the simplex already exists was just a natural extension, as this case has to be covered (even though not useful for multipers).

I see... The way HIGHEST is defined in the normal case, if we try to see what should happen if the simplex is already there with a lower filtration value, the options seem to be

  1. do nothing (this violates the idea that the value should be a lower bound, but we could easily say that this lower bound only applies to new simplices, we don't increase the filtration value of faces that are already there after all)
  2. increase the filtration value of the simplex (and that of the cofaces if necessary)
  3. refuse to do it (throw an exception or something)

Copy link
Collaborator Author

@hschreiber hschreiber Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have particular strategies I want. We could easily add a dozen strategies. I am just afraid that at most a couple of them would be useful and we would confuse users into picking the wrong one, or at least waste time trying to figure out what to use. But maybe that's still what we should do.

I agree, we shouldn't try to write all possible strategies here. We should choose among the strategies we think are most commonly useful. That is what I tried to do with those three options, as I now that they are useful for multipers (except for the modifications of the cofaces, as this method is never used on already existing simplices there). So let me reformulate: what do you think would be useful strategies to replace or complement the ones I suggested?

I guess that makes sense, although it isn't completely obvious in this case if we want each new subface to appear also as early as possible (equivalent to calling the function on each missing subface first) or if it is better to have everything appear at the same time (it is one insertion, turning it into multiple insertions at multiple times could be unexpected).

I know that David needed the first version, but I agree, the second one could also be interesting. I cannot say what would be better: implement both or chose one.

I am unsure how that part works. One possibility, if you are going to overwrite the filtration values anyway, would be to insert at +inf. Then at least you can identify the new subfaces as those with +inf filtration value. Or you could be satisfied with the values that were computed for the subfaces and not modify them. But I suspect that maybe he was only using this function when all the subfaces were already present?

I admit, I don't really remember the details, in particular because he changed his strategy several times. I just remember that for a while, he modified the method such that it just ignored the filtration values and then he overwrote everything in Python. Later, he asked me to modify the insertion method so that he can avoid having to do that in most cases.

I see... The way HIGHEST is defined in the normal case, if we try to see what should happen if the simplex is already there with a lower filtration value, the options seem to be

1. do nothing (this violates the idea that the value should be a lower bound, but we could easily say that this lower bound only applies to **new** simplices, we don't increase the filtration value of faces that are already there after all)

2. increase the filtration value of the simplex (and that of the cofaces if necessary)

3. refuse to do it (throw an exception or something)

Yes. I went for 2, but it does not have to be that way. If you think that this solution violates the idea of insertion, we could just go for 1, as the simplex would already be considered inserted in the filtered complex at filtration value "lower bound". Throwing would somehow go against the current strategy, where it does not throw when a simplex is already inserted in a lower complex than given by the argument filtration value (which would also be a removal otherwise). It just does nothing.

Comment on lines 1153 to 1154
* - if \f$ \sigma \f$ was not inserted yet, then \f$ \sigma \f$ and all its non existing faces are inserted at the
* first possible filtration value with minimum filtration value \f$ f \f$.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a 4th strategy that inserts 02, 12 and 012 all at filtration value 5 make sense?

It would make sense as it is not wrong. It would just even more disregard the input value than before.

I think it just does things in a slightly different order. It first asks: what is the earliest time I can insert 012? And then does insert_simplex_and_subfaces(simplex_012, that_time). Whereas your version recursively inserts the subfaces with the same strategy.
I mostly suggested it because I have trouble figuring out what strategies are likely to be useful.

Complexity wise, I don't know if it would be better,

Mine should be a bit slower, the simplest way I can see to implement it is to follow yours, inserting all the new Simplex_handle in a vector, and updating their filtration value at the end once we know the final value for the main simplex.

I don't see "not yet inserted faces" and "missing faces" (at +inf) as something equivalent. A missing face is purposely missing, while a not existing face is a face where you still have to indicate where it should go (at +inf for example).

A philosophical difference of opinions...

Perhaps, it would be a good idea to describe me the strategies you would like to have and then I see, if I can replace one of my proposition with yours or if I can add this strategy to the existing ones?

I don't have particular strategies I want. We could easily add a dozen strategies. I am just afraid that at most a couple of them would be useful and we would confuse users into picking the wrong one, or at least waste time trying to figure out what to use. But maybe that's still what we should do.

What mattered to David, is that if he constructed a filtration in the simplex tree and later wanted to add a simplex, but without really caring about where, he didn't wanted insert_simplex_and_subfaces(s) to flat partially the filtration down to 0,

It is true that, when we don't explicitly specify a filtration value, this behavior can be annoying. I guess it wouldn't have been a problem if we had used +inf as the default filtration value (although David favors -inf...).

but to insert the simplex at the first position the already existing filtration permits it.

I guess that makes sense, although it isn't completely obvious in this case if we want each new subface to appear also as early as possible (equivalent to calling the function on each missing subface first) or if it is better to have everything appear at the same time (it is one insertion, turning it into multiple insertions at multiple times could be unexpected).

Then, in python, he would potentially modify some filtration values by hand.

I am unsure how that part works. One possibility, if you are going to overwrite the filtration values anyway, would be to insert at +inf. Then at least you can identify the new subfaces as those with +inf filtration value. Or you could be satisfied with the values that were computed for the subfaces and not modify them. But I suspect that maybe he was only using this function when all the subfaces were already present?

The idea behind HIGHEST was just to give the possibility to give a lower bound on where to insert the simplex.

Makes sense.

The subcase of HIGHEST where the simplex already exists was just a natural extension, as this case has to be covered (even though not useful for multipers).

I see... The way HIGHEST is defined in the normal case, if we try to see what should happen if the simplex is already there with a lower filtration value, the options seem to be

  1. do nothing (this violates the idea that the value should be a lower bound, but we could easily say that this lower bound only applies to new simplices, we don't increase the filtration value of faces that are already there after all)
  2. increase the filtration value of the simplex (and that of the cofaces if necessary)
  3. refuse to do it (throw an exception or something)

Co-authored-by: Marc Glisse <marc.glisse@inria.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants