Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QrackCircuit to compiler options #88

Closed
wants to merge 17 commits into from

Conversation

WrathfulSpatula
Copy link

I love what you've done with ucc! PyQrack offers QrackCircuit for optimizing compilation, which has no proper "transpilation" capability, with mapper and gate translation, relying on Qiskit optimization_level=0 to translate between gate basis sets, but focusing QrackCircuit solely on the task of gate reduction in any potential (physically valid) vernacular gate set of Qrack, or specifically the dialect of single-target-qubit multiplexer gates with arbitrary numbers of control qubits.

UCC is faster than Qrack, it looks like! I hope that Unitary Foundation can consider using QrackCircuit as a "homegrown" compilation pass, though!

@WrathfulSpatula
Copy link
Author

Simply so people don't have to pull my branch for the sake of quick reference, here are the benchmark results in graphical form, from screen shots.
Screenshot from 2024-11-22 13-10-15
Screenshot from 2024-11-22 13-10-26
Screenshot from 2024-11-22 13-10-46
Screenshot from 2024-11-22 13-11-16
Screenshot from 2024-11-22 13-11-25
Screenshot from 2024-11-22 13-11-35
Screenshot from 2024-11-22 13-11-49

@jordandsullivan
Copy link
Collaborator

jordandsullivan commented Nov 27, 2024

Thanks for trying out UCC and benchmarking against Qrack! Very nice results on the gate reductions.
I think I would need to dive more deeply into what Qrack is doing to understand it better.
I think for now, given that the runtime looks to be ~10x slower than UCC on average, we won't merge this into main (plus it would add a Qrack dependency), but we can definitely keep it on the back burner.

@jordandsullivan
Copy link
Collaborator

jordandsullivan commented Nov 27, 2024

So I just took a look at the benchmark notebook where you added the Qrack compile function:

def qrack_compile(qiskit_circuit):
    """Compile given qiskit.QuantumCircuit object and return the compiled circuit and total number of raw and compiled 2 qubit gates"""
    qrack_circuit = QrackCircuit.in_from_qiskit_circuit(qiskit_circuit)
    qiskit_circuit = qrack_circuit.to_qiskit_circuit()
    qiskit_circuit = qiskit_transpile(qiskit_circuit, optimization_level=3, basis_gates=['rz', 'rx', 'ry', 'h', 'cx'])

@WrathfulSpatula I thought you said above that you only used Qiskit with optimization_level=0, in order to translate into the right basis set, but here it looks like you are using optimization level 3?

@WrathfulSpatula
Copy link
Author

WrathfulSpatula commented Dec 1, 2024

@jordandsullivan I tried level 0, but I settled on 3 due to improved results. (To memory, I thought I said that I used 3, but apologies if that wasn't clear.) Note that if we're using QrackCircuit as originally intended, for optimizing compilation specifically for the Qrack simulator, it doesn't need Qiskit at all. However, Qrack operates perfectly happily on a "native" back end gate set of just the multiplexer gates that come directly out of QrackCircuit. Qiskit understands this gate set and can convert it to other dialects, and it should be almost "trivial" to add a method to directly act multiplexer gates on any state vector simulator API, but QrackCircuit is designed to (unapologetically) target simulator back ends, rather than genuine quantum hardware, for this reason.

@WrathfulSpatula
Copy link
Author

WrathfulSpatula commented Dec 1, 2024

I have a few more thoughts about this: it's true that the compilation time might be roughly 10x slower, when making a round trip from Qiskit, to QrackCircuit, to Qiskit, but, even in this case, I don't think it matters nearly as much as the potential fidelity improvement from the fact that QrackCircuit leads over all other options for optimized gate count. Yes, it's comparatively slower, but the compilation times are roughly one to several seconds. Meanwhile, queuing plus execution on any cloud-based quantum computer service could take hours to days. Unless any of these candidates took longer than a few minutes to compile a job at realistic NISQ scales, I don't think compilation time is even a practically relevant metric, as opposed to the comparative fidelity difference that could be expected based on optimized gate count.

Also, I mean to call attention to a detail about the benchmark notebook: it requires QrackCircuit to make a round trip from Qiskit, to QrackCircuit, back to Qiskit, where these format/API conversions might be a large portion of the wall-clock time for QrackCircuit in this specific case. Before I forget why this layer exists at all in Qrack, realize that C++ Qrack uses the native QrackCircuit interface on-the-fly, JIT, to compile as Qrack simulates. Qrack's default optimal simulation mode passes all gates through QrackCircuit before dispatching to any Schrödinger-method back end, because it's empirically observed that overall simulation is faster this way: in native C++, the time it takes QrackCircuit to optimize is significantly less than the differential of overall simulation time when comparing optimized circuit outputs to non-optimized circuit inputs.

As for the matter of QrackCircuit being combined with Qiskit optimization level 3 rather than 0, for comparison, at the bottom of this comment, here's QrackCircuit compiling just to its native multiplexer basis without Qiskit changing the output gate basis (though still making a round trip from a Qiskit circuit, to a QrackCircuit, to a Qiskit circuit), so we see that QrackCircuit is still capable of making the best overall reduction in compiled gate count, albeit now in a mismatched comparison between gate basis sets.

I completely understand not including QrackCircuit in the current milestone, but, long-term, I think it would be short-sighted to deem QrackCircuit uninteresting or unfit for consideration in the UCC project, whether as a component of UCC or as another optimizing compiler against which to compare. It's worth remembering, QrackCircuit is literally just roughly 1,000 lines of code in total. Yes, the approach is "very different," but, considering it leads against all other candidates in overall gate reduction, in reasonable amounts of compilation time, that's why I think it's critically important to consider it in these comparisons of figures-of-merit and as a novel theoretical approach.

Screenshot from 2024-12-01 14-03-11
Screenshot from 2024-12-01 14-03-39
Screenshot from 2024-12-01 14-04-03
Screenshot from 2024-12-01 14-05-13
Screenshot from 2024-12-01 14-04-28
Screenshot from 2024-12-01 14-04-39
Screenshot from 2024-12-01 14-04-52
Screenshot from 2024-12-01 14-05-31

@jordandsullivan
Copy link
Collaborator

Thanks Dan. I was mainly asking to find out how much of the reduction was coming from Qiskit vs. Qrack. I certainly don't think Qrack is uninteresting or not worth including in a future version of UCC. I would just want to understand the structure better to see how we could most effectively integrate it.

@WrathfulSpatula
Copy link
Author

WrathfulSpatula commented Dec 2, 2024

@jordandsullivan Thank you, and sorry if that came across as a bit confrontational. It's not UF's fault, but I have to get personal for a second: many times in Qrack's seven-year history, we've forwarded our software and sometimes our best attempt at comparative (simulation) benchmarks that seemed fair according to the specification and intent of other researchers we worked with who were running comparative benchmarks on simulators, only for them to pretend like we didn't exist or to be paid lip-service that our benchmarks would enter future iterations of results, never to see it ultimately happen, by years later. Now, recently, a couple of unrelated but relatively visible reports have portrayed Qrack simulation results as low-middling performance with no sufficient account of the Qrack settings they used to produce their data (and even getting basic factual details wrong, in peer-reviewed literature, about features like multi-GPU simulation that have been in use for years by the Qrack hobbyist community and authors). In other words, as lead author of Qrack, these reports simply lack sufficient details for me to even try in good faith to reproduce them, to understand what happened, while the Qrack coauthors are left saying to ourselves, "Please RTFM or at least talk to us before you do this," with no one else to hear us.

Thank you, again. I realize that software development as part of a team can't shift roadmaps on a dime, and I know full well that there might be very real reasons not to directly use QrackCircuit as part of the UCC stack, maybe to take a lesson or two from it in the abstract instead to re-implement in a new way in UCC, or turning out to be redundant with other approaches. That's fine, but, for the scientific principles of the matter, QrackCircuit needs to be on these comparative benchmarks, if we're leading in gate reduction, in reasonable compilation times.

I'd love to discuss this further whenever it actually fits in the UCC workflow and roadmap. Sorry if that came across surly, but, again, thank you, because it's not UF's fault, nor yours.

@jordandsullivan
Copy link
Collaborator

I heard you Dan. If you like, we can leave this PR open and circle back when we have more development bandwidth (say in the new year). I just don't want to merge it before launch on Dec 12 because if we add a Qrack dependency to run the default benchmarks, and it's performing better than default UCC in terms of gate reduction, this raises the question of why we haven't just integrated it, but as discussed above, there are nuances there which need to be figured out.

Like could we do better by combining UCC and Qrack? Should we reimplement the Qrack Circuit optimization logic using the TranspilerPass classes we imported from Qiskit, or should we rethink the design philosophy? All good questions, and if you're interested in exploring them, I welcome it!

My take is that we should wait until after public launch to integrate these changes into main.

@WrathfulSpatula
Copy link
Author

Jordan, from a product ownership and management perspective, I understand the desire and the request. However, scientifically, isn't this selective reporting? Doesn't this mean the benchmarks we intend to publicize, as Unitary Foundation, withhold essential information that's tucked back in an open PR on this repository, but publicly available through the Qrack repository for the functionality itself, for a long time, when anyone tries to argue that any one of these compilers leads in fair and complete comparison of a literature survey, not just ucc?

@Misty-W
Copy link
Collaborator

Misty-W commented Dec 2, 2024

However, scientifically, isn't this selective reporting? Doesn't this mean the benchmarks we intend to publicize, as Unitary Foundation, withhold essential information that's tucked back in an open PR on this repository, but publicly available through the Qrack repository for the functionality itself, for a long time, when anyone tries to argue that any one of these compilers leads in fair and complete comparison of a literature survey, not just ucc?

hi @WrathfulSpatula! My understanding from our previous discussion is that QrackCircuit is optimized for a simulator framework such as Qrack, so I disagree with the assessment that it is selective reporting- rather it is trying to maintain only apples-to-apples comparison. While it is true that Qiskit conversion provides a workaround, it is no longer purely the Qrack compiler being benchmarked.

That said, I'm game to investigate the multiplexing methods further, to see whether we can leverage them for an enhanced version of UCC. Please do be patient with our limited bandwidth to assess and respond given the tight deadline approaching.

@WrathfulSpatula
Copy link
Author

Misty, I'm sorry. Challenge my statement, in scientific honesty: QrackCircuit plus Qiskit level optimization 3, or Qrack in its native gate set (in an "unfair" comparison, not really) leads for compiled gate reduction, significantly, over all other candidates. This takes up to several seconds (or even minutes, what if it did) of compilation, before dispatching to a quantum hardware queue and execution, for hours or even days. This leading gate reduction is likely to improve overall NISQ hardware fidelity best, on average. (By the way, Qrack is likely already in wide use for this purpose.)

What do our benchmarks mean, as Unitary Foundation, whether for ucc, if we're not talking about an elephant in the room that, I'm sorry it's there, I guess, but I can't make it go away, at this point? What is the argument that Qrack does not lead any reasonable expectation for application-oriented performance, in general, on simulator and on hardware, at least within the confines of this benchmark suite?

@WrathfulSpatula
Copy link
Author

WrathfulSpatula commented Dec 5, 2024

I think I have an implementation of a compromise that serves and benefits the performance of both optimization pass sets: ucc has a mode option, and, alongside mode='ucc', I've added mode='ucc-qrack'. Qrack still needs the ucc optimization passes on top of its own, where those were previously provided by default Qiskit, but they're better coming ucc. Because the BasisTranslation pass doesn't cover multiplexer in the gate set, seemingly, Qrack relies on optimization_level=0 default Qiskit transpiler basis translation for the initial conversion to ucc gate basis. Then, I notice that you might run the set of ucc passes more than once, particularly if you have a coupling map, for a general abstract-connectivity optimization pass followed by another optimization pass after mapping and layout. However, you translate basis at the end of local passes, so you don't also need to translate basis at the beginning, unless it's the first pass. The update I have for Qrack fixes this redundancy, in general.

I'm running the benchmarks before committing and pushing to the branch, and I notice already that Qrack might be roughly 10 seconds faster on qv_N100_12345_basis_rz_rx_ry_cx.qasm with ucc instead of default Qiskit transpiler, like 60 seconds vs. 70. Good work!

I also think it could behoove both projects to brand together rather than as distinct alternatives. ucc-qrack is exactly what I want to be using in general, personally, but mode='ucc' is a great alternative that manages to walk a line between amazing speed and great gate reduction, while mode='ucc-qrack' sacrifices speed for the best gate reduction of any candidate (if a couple of minutes of compile time is acceptable before throwing the job into queue for a few hours, which it typically will be). It gives your users a sense of control, too: they have a meaningful option with a tradeoff to consider and decide by case.

@jordandsullivan
Copy link
Collaborator

I know you were working on additional improvements today, just give me a ping when you're ready for final review :)

@WrathfulSpatula
Copy link
Author

@jordandsullivan @Misty-W after a round of debugging, the tests pass for 32-bit floating-point precision tolerance, but the compiled gate reduction is very modest. However, it's still there! Personally, I would still mostly use mode='ucc-qrack' for circuits to be run on hardware or on simulator, but there's a meaningful choice between the modes that is left up to the user.

This is ready for final review. I understand if consideration for inclusion is influenced by the reduction in the compiled gate differential, but, again, personally, I'd use ucc-qrack anyway, for even a modest improvement. Thank you so much for your efforts in reviewing this and helping to get it into ucc!

@jordandsullivan
Copy link
Collaborator

So it looks like the final numbers are that PyQrack takes 10x the runtime and is 2% more efficient at gate reduction than UCC, right? WIth those numbers, I'm not really convinced it's worth adding it in at this time.
Screenshot 2024-12-09 at 7 41 55 PM
Screenshot 2024-12-09 at 7 42 01 PM

@jordandsullivan
Copy link
Collaborator

jordandsullivan commented Dec 10, 2024

Thanks for all your work on this Dan. I'm going to close it as not planned for now for the reasons above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants