Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't panic on empty ranges. #78

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xfbs
Copy link
Contributor

@xfbs xfbs commented Jan 30, 2024

Currently, on insertion there are assertions to check for invalid ranges:

// RangeMap::insert
assert!(range.start < range.end);
// RangeInclusiveMap::insert
assert!(
    range.start() <= range.end(),
    "Range start can not be after range end"
);

According to the standard library, a range is empty if start >= end (or start > end for inclusive ranges). Therefore, from the point of view of the standard library, ranges such as 0..0 or 15..=14 are perfectly valid (but meaningless) and therefore I would argue that this this library should not panic when they are used. I have the opinion that libraries should never panic, if there is an invalid use that needs to be reported then a Result can be used, because having code that might panic in production systems is quite scary.

Consumers of this library that want to catch these insertions and wish to treat these as errors can always do that themselves quite easily, by inserting assertions at the call site:

assert(range.start < range.end);
map.insert(range, value);

However, it is not possible to go the opposite way and remove an assertion from a consumer of the library. There are some valid use-cases where having empty ranges might be useful, for example to represent missing data or as the output of some algorithms.

Therefore I think it is best to remove these assertions and simply do nothing (return early) when empty ranges are inserted or removed. This allows for the least amount of surprises for users of the library. Ideally, I feel that rangemap should never cause panics for valid input, even if the inputs don't make sense.

According to the standard library, a range is empty if start >= end. An
inclusive range is empty if start > end. While it makes no sense to
insert such a range, it should not cause a panic as these ranges are not
explicitly disallowed by the standard library.
@jeffparsons
Copy link
Owner

IIRC my original reasoning for this is that if you're inserting an empty range, that has no meaning, and so I assume it is a program error and therefore suitable to panic. I don't feel very strongly about this, though.

As for changing it now, I think that would be a breaking change; being more permissive with types is not breaking, but this would change the observable behaviour of existing programs.

It's extremely unlikely that anybody is relying on the current behaviour in practice, but I need to have a think about whether or not this is okay to change.

@xfbs
Copy link
Contributor Author

xfbs commented Jan 31, 2024

It is up to you what you think makes sense. I have only two reasons why I think this might make sense from a user's perspective (from what I have experienced in practise):

  • For example, if you use proptest to generate arbitrary ranges, it will generate empty ranges (so 140..140 or 140..=141 for the inclusive ones). Which means that when I use proptest, I always have to filter out empty ranges or risk getting crashes.
  • For some code, it is very convenient when you can express the idea of an empty range. For example, I have some code paths where I generate the intersection between two ranges at work. Some ranges have no intersection at all. I have to express that as an Option<Range<T>>, because even though the standard library supports the idea of an empty range, the moment I try to insert that into a RangeMap or RangeSet I get a crash, which is obviously not good. Obviously, using an Option is not a real issue here and I would even keep it this way, but others may think that if the standard library supports empty ranges they might as well use them.

So I think the reason I think this might be valuable to change is that people might have semi-good reasons for having empty ranges somewhere in their data pipelines, and if we have three possible behaviours:

  • crash
  • do nothing
  • return error

Then doing "crash" is perhaps the least good option (I think libraries should ideally never panic). Returning an error might be an option but is quite invasive. Doing nothing in my opinion might be the best option here.

It's extremely unlikely that anybody is relying on the current behaviour in practice, but I need to have a think about whether or not this is okay to change.

Fair :) it is possible to rely on this by catching panics, however I think that is quite unlikely outside of unit testing. But it would be nice to document a guarantee that "no matter how you use this API, we will never panic". I think that is a requirement for at least some users.

@ripytide
Copy link

I feel this sort of thing comes down heavily to a users use-cases, for me I like having empty ranges panic because in my use-case empty ranges should be impossible so it's nice that this is always being asserted. On the other hand if your use-case often has empty ranges then having empty ranges being ignored could be nicer too.

Perhaps this sort of thing could be enabled/disabled by a feature? Something like panic_on_empty_range

@xfbs
Copy link
Contributor Author

xfbs commented Jan 31, 2024

I quite like the idea of using a feature here.

Although, some consideration would need to be made, as Cargo performs feature unification. It would be dangerous if there was some code that relied on there being assertions, and then those assertions being disabled because another unrelated crate in the same project enables that feature (or the other way around).

@ripytide
Copy link

An alternative to using features would be define a trait for every method of that takes a range as input, then define two different types PanicOnEmptyRangeMap and IgnoreEmptyRangeMap that implement those trait methods and panic/ignore respectively. This wouldn't suffer from cargo's feature unification but would make the library code a lot more complicated and possibly more difficult to maintain. Although thinking about it PanicOnEmptyRangeMap could simply wrap IgnoreEmptyRangeMap adding an assertion beforehand.

The other option you mentioned of returning a result from each of these methods could work but then you'd have to unwrap() every call for the panic behavior and match it ignoring the EmptyRange variant for the ignore behavior which sounds a bit verbose for me.

@xfbs
Copy link
Contributor Author

xfbs commented Jan 31, 2024

It almost feels like the idea of returning a result gives the most flexibility, because then the caller can decide whether to handle that. Something like:

#[derive(thiserror::Error, Debug)]
pub enum InsertError {
    #[error("empty range inserted")]
    EmptyRange,
}

impl RangeSet<T> {
    pub fn insert(range: Range<T>) -> Result<(), InsertError> {
        if empty(&range) {
            return Err(InsertError::EmptyRange);
        }

        // ...
    }
}

And then you can preserve the existing behaviour by calling set.insert(range).expect("range cannot be empty") or you can explicitly ignore the result.

I generally love overengineering things with the type system, but it feels a bit overkill to achieve this when a result can do the trick. Although, I would still lean a little bit on just doing nothing, just because semantically you can also append an empty vector to a Vec and it works, just does nothing (and an empty range is the same concept, kind of):

let mut data = vec![];
// append empty vec, does nothing.
data.append(&mut vec![]);

@ripytide
Copy link

Yeah that seems like the cleanest solution. I think I might implement that for nodit as well.

@ripytide
Copy link

ripytide commented Jan 31, 2024

Although for consistency shouldn't all the other methods that take ranges also now return results? Like overlapping(), overlaps(), remove() and gaps()? In which case you might not be able to call the error InsertError, you could either make an error enum for each method or a single global EmptyRange error struct that gets returned by all those methods.

@xfbs
Copy link
Contributor Author

xfbs commented Jan 31, 2024

Ah, I had not seen that other methods also have assertions. In that case, I don't think returning errors everywhere is a good way to go, because it is too broad of a change and maybe not entirely useful to most people.

I think the two courses of action then depend on what we define empty ranges as:

  • Treat empty ranges as empty ranges, meaning that on insertion it does nothing (as they are empty, there is nothing meaningful to do), when removing do nothing (there is nothing to remove) and for the iterators simply return nothing (because nothing overlaps.. nothing). So no errors are needed and no assertions either, just checks and early returns.
  • Treat empty ranges as invalid and panic() via an assertion. This is the current behaviour. However, when doing so the fact that the crate can panic on (what the standard library considers empty) ranges should be documented with a warning to prevent people from running into issues with their code crashing in deployments.

The reason for this is that I don't think it is a good idea nor feasible to make all methods return errors. The feature approach could still work, but since it changes the semantics of the range maps and sets, it is not all that easy. I think it would really be simpler to use approach (1). However, if the feature approach is taken one suggestion would be to call it something like debug-panic-empty-range, to clearly mark that this is a feature that is intended to be enabled for debugging, and not in production, which might avoid the feature unification issue.

@schneiderfelipe
Copy link
Contributor

  • Treat empty ranges as empty ranges, meaning that on insertion it does nothing (as they are empty, there is nothing meaningful to do), when removing do nothing (there is nothing to remove) and for the iterators simply return nothing (because nothing overlaps.. nothing). So no errors are needed and no assertions either, just checks and early returns.

I think the idea of doing nothing on empty ranges is the correct one, but instead of getting rid of all assertions, why not make them all debug_assert!s?

@xfbs
Copy link
Contributor Author

xfbs commented Nov 27, 2024

I think the issue with that is that it means there is still an inconsistency. Then in release mode, empty ranges are fine, but in debug mode it is an error. In my opinion, adding an empty range should be like appending an empty list to a list: do nothing. Not panic, even in debug mode.

I think it is easier for end-users to add assertions for this case, and for the library to (strongly) guarantee that it does not panic for any input. Maybe long-term, return an error that can be handled, if needed.

Another option is to have a separate API, for example insert() will do nothing if an empty range is given, and insert_checked() will return an error. Then users have a choice, again.

I think that panic() in general is not such a great user-experience. Especially when there could be valid use-cases for empty range, like if you have some iterator over ranges (some of which might be empty) and you want to .collect(), you shouldn't have to .filter() them to make your application not crash.

@schneiderfelipe
Copy link
Contributor

I think the issue with that is that it means there is still an inconsistency. Then in release mode, empty ranges are fine, but in debug mode it is an error. In my opinion, adding an empty range should be like appending an empty list to a list: do nothing. Not panic, even in debug mode.

Fair enough!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants