Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Make use of pattern matching on singleton type instead. #1026

Merged
merged 1 commit into from
Jan 27, 2024

Conversation

He-Pin
Copy link
Member

@He-Pin He-Pin commented Jan 23, 2024

@mdedetrich
Copy link
Contributor

Build is failing

@He-Pin He-Pin marked this pull request as draft January 23, 2024 04:20
@He-Pin

This comment was marked as outdated.

@som-snytt
Copy link

I just tried it. I forgot about scala/bug#12569

which I think it's just necessary to @nowarn. Alternatively, -Wconf something something.

That is some kind of pattern matching bug, but it's "just a warning".

@som-snytt
Copy link

som-snytt commented Jan 23, 2024

Aliasing it and hiding the type seems to work without nowarn.

val fallback: AnyRef = NotApplied.asInstanceOf[AnyRef]
          pf.applyOrElse(grab(in), NotApplied) match {
            case _: fallback.type => completeStage()

The stdlib examples all "cache" the canonical fallback; I don't think this is why, but it couldn't hurt.

@mdedetrich
Copy link
Contributor

Hmm, with all of the workarounds not sure if this is that much better than whats currently there

case _ => throw new IllegalStateException() // won't happen, compiler exhaustiveness check pleaser
case _: NotApplied.type => super.onUpstreamFailure(ex)
case t: Throwable => super.onUpstreamFailure(t)
case _ => throw new IllegalStateException() // won't happen, compiler exhaustiveness check pleaser
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better which cover the error case. @mdedetrich

@He-Pin
Copy link
Member Author

He-Pin commented Jan 23, 2024

val fallback: AnyRef = NotApplied.asInstanceOf[AnyRef]
pf.applyOrElse(grab(in), NotApplied) match {
case _: fallback.type => completeStage()

@som-snytt what about

          pf.applyOrElse(grab(in), NotApplied: AnyRef) match {
            case _: NotApplied.type => completeStage()

@He-Pin
Copy link
Member Author

He-Pin commented Jan 23, 2024

I just run the bench
image

@mdedetrich @som-snytt cc, performance is the same as eq.

BTW, the sbt-jmh issue on Windows been verified fixed, thanks.

And I will run this bench again when I find time.

@He-Pin He-Pin marked this pull request as ready for review January 23, 2024 06:37
@He-Pin
Copy link
Member Author

He-Pin commented Jan 23, 2024

Jmh/run -i 10 -wi 10 -f1 -t1 .CollectBenchmark.

[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                          Mode  Cnt         Score        Error  Units
[info] CollectBenchmark.benchNewCollect  thrpt   10  15246702.956 ± 127702.345  ops/s
[info] CollectBenchmark.benchOldCollect  thrpt   10  14828734.816 ± 112590.257  ops/s
[success] Total time: 773 s (12:53), completed 2024年1月23日 下午10:41:06

@He-Pin
Copy link
Member Author

He-Pin commented Jan 23, 2024

@mdedetrich @pjfanning ping, what's your option on this, I tested seems the performance is the same as my previous commit.

Copy link
Member

@Roiocam Roiocam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] Benchmark                            Mode  Cnt        Score        Error  Units
[info] CollectBenchmark.benchNewCollect    thrpt    3  9147229.404 ± 255711.579  ops/s
[info] CollectBenchmark.benchOldCollect    thrpt    3  8889505.583 ± 985315.585  ops/s
[info] CollectBenchmark.oldIFACMPNE        thrpt    3  8899066.666 ± 271187.167  ops/s

Just a quick bench, looks great.

But the next benchmark was totally difference:

[info] Benchmark                            Mode  Cnt        Score        Error  Units
[info] CollectBenchmark.benchNewCollect    thrpt    3  9174780.179 ± 828160.581  ops/s
[info] CollectBenchmark.benchOldCollect    thrpt    3  9178338.944 ± 320020.649  ops/s
[info] CollectBenchmark.oldIFACMPNE        thrpt    3  9120740.869 ± 101541.057  ops/s

@@ -43,18 +44,17 @@ private[pekko] final class CollectWhile[In, Out](pf: PartialFunction[In, Out]) e
private lazy val decider = inheritedAttributes.mandatoryAttribute[SupervisionStrategy].decider
import Collect.NotApplied

@nowarn("msg=Any")
override final def onPush(): Unit =
try {
// 1. `applyOrElse` is faster than (`pf.isDefinedAt` and then `pf.apply`)
// 2. using reference comparing here instead of pattern matching can generate less and quicker bytecode,
// eg: just a simple `IF_ACMPNE`, and you can find the same trick in `Collect` operator.
// If you interest, you can check the associated PR for this change and the
// current implementation of `scala.collection.IterableOnceOps.collectFirst`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these comments be outdated after this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the it's still apply, maybe drop the :

// If you interest, you can check the associated PR for this change and the current implementation of scala.collection.IterableOnceOps.collectFirst.

} else {
push(out, result.asInstanceOf[Out])
pf.applyOrElse(grab(in), NotApplied) match {
case _: NotApplied.type => completeStage()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not match on the singleton instance instead? In any case, it seems uncommon to match on the singleton type.

It's not even easy to say if matching on the type or the reference is supposed to be faster in general, and maybe the Scala compiler should make the right choice. (On one hand, reference comparison is as fast as it gets, on the other hand, getting the singleton instance reference is at least a static field lookup which might not be faster than comparing types, would depend on the JIT compiler).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not match on the singleton instance instead? In any case, it seems uncommon to match on the singleton type.

I am under the impression it only works if you match on the type and not on the instance, @som-snytt can you confirm?

By only works I mean producing the optimal bytecode which was the point of the change in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  def collect[A](a: A, pf: PartialFunction[A, A]): A = {
    pf.applyOrElse(a, NotApplied) match {
      case NotApplied => a
      case _          => a
    }
  }

  def collect2[A](a: A, pf: PartialFunction[A, A]): A = {
    pf.applyOrElse(a, NotApplied) match {
      case _: NotApplied.type => a
      case _                  => a
    }
  }

ByteCode:
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdedetrich @jrudolph I think just as @som-snytt said , the current version should generate less bytecode.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clunky pattern is just to avoid a clunky eq.

You might use a local val to avoid "cost of obtaining the value".

This is like pattern match on List with Nil case, where match on :: first, then just case nil. It's nice if you have a type to check.

An example where just "obtaining the value" is expensive is NoSymbol in the compiler, because of the "cake". There are edits trying to choose between eq and == but that is swamped by the fetch. It is outweighed by style concerns.

If you can measure a difference reliably, then of course you are equipped to choose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL, thanks for the explanations. I see how matching on the instance needs to call equals, especially if the scrutinee is of Any type (though, this could be optimized away, if you know that the singleton does not override equals?. Somewhat surprising to see that for this type match the reference comparison is preferred over a checkcast (though, of course, I don't think any of this matters).

@He-Pin
Copy link
Member Author

He-Pin commented Jan 27, 2024

final results:
Jmh/run -i 10 -wi 10 -f1 -t1 .CollectBenchmark.

[info] Benchmark                                      Mode  Cnt                  Score                 Error  Units
[info] CollectBenchmark.benchCollectOnSingleton      thrpt   10  17608004768421276.000 ± 192500018073910.500  ops/s
[info] CollectBenchmark.benchCollectOnSingletonType  thrpt   10  17912827936113308.000 ± 239976332956371.340  ops/s
[info] CollectBenchmark.benchNewCollect              thrpt   10           15283510.041 ±          460993.700  ops/s
[info] CollectBenchmark.benchOldCollect              thrpt   10           14845265.443 ±          178628.150  ops/s

@He-Pin
Copy link
Member Author

He-Pin commented Jan 27, 2024

After carefully bench and diff the bytecode, I think this one should be ready to go.

Copy link
Contributor

@mdedetrich mdedetrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm from my side, would also wait for another review from someone else.

Copy link
Contributor

@pjfanning pjfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@He-Pin He-Pin merged commit db4f573 into apache:main Jan 27, 2024
18 checks passed
@He-Pin He-Pin deleted the collect2 branch January 27, 2024 15:33
@He-Pin He-Pin added the t:stream Pekko Streams label Jan 27, 2024
@He-Pin He-Pin added this to the 1.1.0-M1 milestone Jan 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t:stream Pekko Streams
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants