-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EIP: Precompile for NTT operations #9374
base: master
Are you sure you want to change the base?
Changes from all commits
fa35512
3dfc3be
859fbef
7da67d6
3e3e021
7d14aca
e5dbf1f
5503ad6
096f81a
59dc1ae
3af803a
09b3f35
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,217 @@ | ||||||
--- | ||||||
eip: 7885 | ||||||
title: Precompile for NTT operations | ||||||
description: Proposal to add a precompiled contract that performs number theoretical transformation (NTT) and inverse (InvNTT). | ||||||
author: Renaud Dubois (@rdubois-crypto), Simon Masson (@simonmasson) | ||||||
discussions-to: https://ethereum-magicians.org/t/eip-9374-precompile-for-ntt-operations/22895 | ||||||
status: Draft | ||||||
type: Standards Track | ||||||
category: Core | ||||||
created: 2025-02-12 | ||||||
--- | ||||||
|
||||||
|
||||||
## Abstract | ||||||
|
||||||
This proposal creates a precompiled contract that performs NTT and Inverse NTT transformations. This provides a way to have efficient and fast polynomial multiplication for Post Quantum and Starks applications. | ||||||
|
||||||
## Motivation | ||||||
|
||||||
With the release of Willow cheap, the concern for quantum threat against Ethereum accelerated. Today ECDSA is the EOA signature algorithms, which is prone to quantum computing. Efficient replacement algorithms use polynomial multiplication as the core operation. Once NTT and Inverse NTT are available, the remaining of the verification algorithm is trivial. Choosing to integrate NTT and InvNTT instead of a specific algorithm provides agility, as DILITHIUM or FALCON or any equivalent can be implemented with a modest cost from those operators. NTT is also of interest to speed-up STARK verifiers. This single operator would thus benefit to both the Ethereum scaling and Post Quantum threat mitigation. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
|
||||||
## Specification | ||||||
|
||||||
### Constants | ||||||
|
||||||
| Name | Value | Comment | | ||||||
|---------------------|-------|--------------------| | ||||||
| NTT_FW | 0x0f | precompile address | | ||||||
| NTT_INV | 0x10 | precompile address | | ||||||
| NTT_VECMULMOD | 0x11 | precompile address | | ||||||
| NTT_VECADDMOD | 0x12 | precompile address | | ||||||
|
||||||
We introduce *four* separate precompiles to perform the following operations: | ||||||
|
||||||
- NTT_FW - to perform the forward NTT transformation (Negative wrap convolution) with a gas cost of `600` gas, | ||||||
|
||||||
- NTT_INV - to perform the inverse NTT transformation (Negative wrap convolution) with a gas cost of `600` gas, | ||||||
|
||||||
- NTT_VECMULMOD - to perform vectorized modular multiplication with a gas cost formula defined in the corresponding section, | ||||||
|
||||||
- NTT_VECADDMOD - to perform vectorized modular addition with a gas cost formula defined in the corresponding section. | ||||||
|
||||||
|
||||||
### Field parameters | ||||||
|
||||||
The NTT_FW and NTT_INV are fully defined by the following set of parameters. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
Let $R$ be a cyclotomic ring of the form $R=\mathbb F_q[X]/(X^n+1)$. In these notations, | ||||||
|
||||||
- $n$ is the degree and is a power of 2, | ||||||
|
||||||
- $\mathbb F_q$ is the prime field where $q=1 \mod 2n$, | ||||||
|
||||||
- $\omega$ is a $n$-th root of unity in $\mathbb F_q$, | ||||||
|
||||||
- $\psi$ is a $2n$-th root of unity in $\mathbb F_q$. | ||||||
|
||||||
Any element $a \in R$ is a polynomial of degree at most $n-1$ with integer coefficients, written | ||||||
as $a=\sum_{i=0}^{n-1} a_iX^i$ | ||||||
|
||||||
|
||||||
### NTT_FW | ||||||
|
||||||
The NTT transformation is described by the following algorithm. | ||||||
|
||||||
**Input:** A vector $a = (a[0], a[1], \dots, a[n-1]) \in \mathbb F_q^n$ in standard order, where $q$ is a prime such that $q \equiv 1 \mod 2n$ and $n$ is a power of two, and a precomputed table $\Psi_\text{rev} \in \mathbb{Z}_q^n$ storing powers of $\psi$ in bit-reversed order. | ||||||
|
||||||
**Output:** $a \leftarrow \text{NTT\_FW}(a)$ in bit-reversed order. | ||||||
|
||||||
```plaintext | ||||||
t ← n | ||||||
for m = 1 to n-1 by 2m do | ||||||
t ← t / 2 | ||||||
for i = 0 to m-1 do | ||||||
j1 ← 2 ⋅ i ⋅ t | ||||||
j2 ← j1 + t - 1 | ||||||
S ← Ψrev[m + i] | ||||||
for j = j1 to j2 do | ||||||
U ← a[j] | ||||||
V ← a[j + t] ⋅ S | ||||||
a[j] ← (U + V) mod q | ||||||
a[j + t] ← (U - V) mod q | ||||||
end for | ||||||
end for | ||||||
end for | ||||||
return a | ||||||
``` | ||||||
|
||||||
### NTT_INV | ||||||
|
||||||
The Inverse NTT is described by the following algorithm. | ||||||
|
||||||
**Input:** A vector $a = (a[0], a[1], \dots, a[n-1]) \in \mathbb F_q^n$ in bit-reversed order, where $q$ is a prime such that $q \equiv 1 \mod 2n$ and $n$ is a power of two, and a precomputed table $\Psi^{-1}_\text{rev} \in \mathbb F_q^n$ storing powers of $\psi^{-1}$ in bit-reversed order. | ||||||
|
||||||
**Output:** $a \leftarrow \text{NTT\_INV}(a)$ in standard order. | ||||||
|
||||||
```plaintext | ||||||
|
||||||
t ← 1 | ||||||
for m = n to 1 by m/2 do | ||||||
j1 ← 0 | ||||||
h ← m / 2 | ||||||
for i = 0 to h-1 do | ||||||
j2 ← j1 + t - 1 | ||||||
S ← Ψ⁻¹rev[h + i] | ||||||
for j = j1 to j2 do | ||||||
U ← a[j] | ||||||
V ← a[j + t] | ||||||
a[j] ← (U + V) mod q | ||||||
a[j + t] ← (U - V) ⋅ S mod q | ||||||
end for | ||||||
j1 ← j1 + 2t | ||||||
end for | ||||||
t ← 2t | ||||||
end for | ||||||
for j = 0 to n-1 do | ||||||
a[j] ← a[j] ⋅ n⁻¹ mod q | ||||||
end for | ||||||
return a | ||||||
``` | ||||||
|
||||||
|
||||||
### NTT_VECMULMOD | ||||||
|
||||||
The NTT_VECMULMOD is similar to SIMD in the functioning, but operates with larger sizes in input and output. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
**Input:** Two vectors $a = (a[0], a[1], \dots, a[n-1]), b=(b[0], b[1], \dots, b[n-1]) \in \mathbb F_q^n$ where $n$ and $q$ are defined above. | ||||||
|
||||||
**Output:** The element-wise product $(a[0]\cdot b[0] \mod q, a[1]\cdot b[1]\mod q, \dots, a[n-1]\cdot b[n-1] \mod q)$. | ||||||
|
||||||
**Gas cost:** Denotoing $k$ to be the smallest power of $2$ larger than $\log_2(q)$, the gas cost of this operation is $k\log_2(n) / 8$. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
|
||||||
### NTT_VECADDMOD | ||||||
|
||||||
The NTT_VECMULMOD is similar to SIMD in the functioning, but operates with larger sizes in input and output. | ||||||
|
||||||
**Input:** Two vectors $a = (a[0], a[1], \dots, a[n-1]), b=(b[0], b[1], \dots, b[n-1]) \in \mathbb F_q^n$ where $n$ and $q$ are defined above. | ||||||
|
||||||
**Output:** The element-wise addition $(a[0]+ b[0] \mod q, a[1]+ b[1]\mod q, \dots, a[n-1]+ b[n-1] \mod q)$. | ||||||
|
||||||
**Gas cost:** Denotoing $k$ to be the smallest power of $2$ larger than $\log_2(q)$, the gas cost of this operation is $k\log_2(n) /32$. | ||||||
|
||||||
## Rationale | ||||||
|
||||||
If $f$ and $g$ are two polynomials of $R$, then | ||||||
$f\times g= \text{NTT\_INV}(\text{NTT\_VECMULMOD}( | ||||||
\text{NTT\_FW}(a), \text{NTT\_FW}(b)))$ is equal to the product of $f$ and $g$ in $R$. The algorithm has a complexity of $n \log_2n$ rather than $n^2$ with the classical schoolbook multiplication algorithm. | ||||||
|
||||||
### Fields of interest | ||||||
|
||||||
The implementation applies for many fields of interest for cryptography. In particular, the design applies for: | ||||||
|
||||||
- FALCON: $q=3.2^{12}+1$ (one of the NIST winners for post-quantum signature scheme), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
- DILITHIUM: $q=2^{23}-2^{13}+1$ (one of the NIST winners for post-quantum signature scheme), | ||||||
|
||||||
- KYBER: $q=13.2^8+1$ (one of the NIST winners for post-quantum key encapsulation mechanism), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
- Babybear: $q=15.2^{27}+1$ (Risc0), | ||||||
|
||||||
- Goldilocks: $q=2^{64}-2^{32}+1$ (Polygon's Plonky2), | ||||||
|
||||||
- M31: $q=2^{31}-1$ (Circle STARKS, STwo, Plonky3), | ||||||
|
||||||
- StarkCurve: $q=2^{251}+17.2^{192}+1$ | ||||||
|
||||||
|
||||||
### Benchmarks | ||||||
|
||||||
#### Pure solidity | ||||||
|
||||||
To illustrate the interest of the precompile, the assets provide the measured gas const for a single NTT and extrapolates the minimal gas cost taking into account the required number of NTT_FW and NTT_INV. The provided assets use pure Yul optimizations, with memory access hacks. It is unlikely that more than one order of magnitude could be spared on such a minimal code. | ||||||
|
||||||
|Use case| Parameters | single NTT gas cost | Required NTT(FW/INV) | Estimated NTT/Full cost | | ||||||
|--|------------------------|---------------------|---------------------|---| | ||||||
|Falcon| $q=12289, n=512$ | 1.8 M | 1 NTTFW+1 NTTINV |3.6 M| | ||||||
|Dilithium| $q=2^{23}-2^{13}+1, n=256$| 460K | 4 NTTFW +1 NTTINV|2.3M| | ||||||
|
||||||
Falcon cost has been measured over a full implementation and is compliant to the estimation. Dilithium cost is evaluated assuming | ||||||
|
||||||
This demonstrates that using pure solidity enables cheap L2s to experiment with FALCON from now, but is to expensive for L1. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
Adopting this EIP, the signature verification of Falcon can be reduced to **1500** gas, and a similar result is expected for Dilithium. | ||||||
Adopting the hash function as a separate EIP would enable a gas verification cost of 2000 gas. | ||||||
This is in line with the ratio looking at SUPERCOP implementations. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Fix all spacing errors |
||||||
|
||||||
|
||||||
|
||||||
|
||||||
## Backwards Compatibility | ||||||
|
||||||
There are no backward compatibility questions. | ||||||
|
||||||
## Test Cases | ||||||
|
||||||
There are no edge cases in the considered operations. | ||||||
|
||||||
|
||||||
## Reference Implementation | ||||||
|
||||||
|
||||||
There are two fully spec compatible implementations on the day of writing: | ||||||
|
||||||
- a python reference code provided in the assets of this EIP | ||||||
|
||||||
- a solidity reference code provided in the assets of this EIP | ||||||
|
||||||
Both codes have been validated over a large base of reference vectors, and implementing both FALCON and DILITHIUM algorithms as demonstration of the usefulness of the precompile. | ||||||
|
||||||
|
||||||
## Security Considerations | ||||||
|
||||||
Needs discussion. | ||||||
|
||||||
## Copyright | ||||||
|
||||||
Copyright and related rights waived via [CC0](../LICENSE.md). |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,97 @@ | ||||||
# NTT-EIP as a building block for FALCON, DILITHIUM and Stark verifiers | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
This repository contains the EIP for NTT transform, along with a python reference code, and a solidity implementation. | ||||||
|
||||||
## Context | ||||||
|
||||||
### The threat | ||||||
With the release of Willow cheap, the concern for quantum threat against Ethereum seems to accelerate. Post by [Asanso](https://ethresear.ch/t/so-you-wanna-post-quantum-ethereum-transaction-signature/21291) and [PMiller](https://ethresear.ch/t/tidbits-of-post-quantum-eth/21296) summarize those stakes and possible solutions. Those solutions include use of lattice based signatures such as Dillithium or FALCON (the latter being more optimized for onchain constraints), STARKs and FHE. There is a consensus in the cryptographic research community around lattices as the future of asymetric protocols, and STARKs won the race for ZKEVMs implementation (as used by Scroll, Starknet and ZKsync). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Willow chip" |
||||||
|
||||||
Those protocols have in common to require fast polynomial multiplication over prime fields, and use NTT (a special [FFT](https://vitalik.eth.limo/general/2019/05/12/fft.html) adapted to prime fields). While in the past Montgomery multipliers over elliptic curve fields were the critical target of optimizations (both hardware and software), NTT optimization is the key to a performant PQ implementation. | ||||||
|
||||||
### Discussion | ||||||
|
||||||
In the past Ethereum chose specificity by picking secp256k1 as its sole candidate for signature. Later, after dedicated hardware and proving systems working on other hardwares were realeased, a zoo of EIP flourished to propose alternative curves. There where attempt to have higher level EIPs to enable all of those at once, such as EWASM, SIMD, EVMMAX, or RIP7696 (by decreasing order of genericity and complexity). | ||||||
|
||||||
Picking NTT as EIP instead of a given scheme would provide massive gas cost reduction for all schemes relying on it. | ||||||
- **pros** : massive reduction to all cited protocols, more agility for evolutions. | ||||||
- **cons**: requires to be wrapped into implementations, not optimal for a given target compared to dedicated EIP, not stateless. | ||||||
|
||||||
## Overview | ||||||
|
||||||
The NTT operates on sequences of numbers (often coefficients of polynomials) in a modular arithmetic system. It maps these sequences into a different domain where convolution operations (e.g., polynomial multiplication) become simpler and faster, akin to how FFT simplifies signal convolution. Compared to FFT which uses root of unity in complex plane, NTT uses roots of unity in a finite field or ring. | ||||||
|
||||||
The NTT is based on the Discrete Fourier Transform (DFT), defined as: | ||||||
$$ | ||||||
X[k] = \sum_{j=0}^{N-1} x[j] \cdot \omega^{j \cdot k} \mod q$$ | ||||||
|
||||||
Where: | ||||||
- $x[j]$: Input sequence of length N. | ||||||
- $X[k]$: Transformed sequence, | ||||||
- $q$: A prime modulus, | ||||||
- $\omega$: A primitive N-th root of unity modulo $q$, with | ||||||
$\omega^N \equiv 1 \mod q \quad \text{and} \quad \omega^k \not\equiv 1 \mod q \; \forall \; 0 < k < N$ | ||||||
|
||||||
NTT computation uses the a similar approach as Cooley-Tukey algorithm to provide a O(N log N) complexity. The NTT algorithm transforms a sequence $(x[j])$ to $(X[k])$ using modular arithmetic. It is invertible, allowing reconstruction of the original sequence via the Inverse NTT (INTT). The inverse process is similar but requires dividing by \(N\) (mod \(q\)) and using $(\omega^{-1}$) (the modular inverse of $\omega$). The following algorithm is extracted from | ||||||
[[LN16]](https://eprint.iacr.org/2016/504.pdf), and describe how to compute the NTT when $R_q= \mathbb{Z}_q[X]/X^n+1$ (Negative Wrap Convolution). | ||||||
|
||||||
 | ||||||
|
||||||
The Inverse NTT is computed through the following algorithm: | ||||||
|
||||||
 | ||||||
|
||||||
## Benchmarks | ||||||
|
||||||
### Python | ||||||
|
||||||
| Field | $n$ | Recursive NTT (Tetration) | Iterative NTT (ZKNox) | Iterative InvNTT (ZKNox)| | ||||||
|-|-|-|-|-| | ||||||
|Falcon | 512 | 761 μs | 528 μs | 561 μs | | ||||||
|Falcon | 1024 | 1642 μs | 1076 μs | 1199 μs | | ||||||
|Dilithium| 128 | 165 μs | 114 μs | 113 μs | | ||||||
|Dilithium| 256 | 371 μs | 258 μs | 260 μs | | ||||||
|BabyBear | 256 | 531 μs | 389 μs | 404 μs | | ||||||
|
||||||
The recursive inverse NTT is very costly because of the required inversions. For Falcon, the field is small enough so that field inversions can be precomputed, but the cost is still higher than the iterative inverse NTT. | ||||||
The field arithmetic has not been optimized. In the case of BabyBear, this becomes significant and so the comparison is not really significant. | ||||||
|
||||||
### Solidity | ||||||
|
||||||
|
||||||
| Function | Description | gas cost | Tests Status | | ||||||
|------------------------|---------------------|---------------------|---------------------| | ||||||
| NTT recursive | original gas cost from [falcon-solidity](https://github.com/Tetration-Lab/falcon-solidity/blob/main/src/Falcon.sol) | 6.9M | OK| | ||||||
| InvNTT recursive | original gas cost from [falcon-solidity](https://github.com/Tetration-Lab/falcon-solidity/blob/main/src/Falcon.sol) | 7.8M | OK| | ||||||
| Full Falcon verification | original gas cost from [falcon-solidity](https://github.com/Tetration-Lab/falcon-solidity/blob/main/src/Falcon.sol) | 24 M| OK| | ||||||
| NTT iterative | ZKNOX | 4M | OK| | ||||||
| InvNTT iterative | ZKNOX | 4.2M | OK| | ||||||
| Full Falcon verification | ZKNOX | 8.5 M| OK| | ||||||
|
||||||
|
||||||
### Yul | ||||||
|
||||||
|
||||||
Further optimizations are reached by using Yul for critical sections and using the CODECOPY and EXTCODECOPY trick detailed in of [[RD23]](https://eprint.iacr.org/2023/939.pdf) (section 3.3, "Hacking EVM memory access cost"). | ||||||
|
||||||
|
||||||
| Function | Description | gas cost | Tests Status | | ||||||
|------------------------|---------------------|---------------------|---------------------| | ||||||
| ntt.NTTFW | ZKNOX_NTTFW, iterative yuled | 1.9M | OK| | ||||||
| falcon.verify_opt | Full falcon verification with precomputated pubkey | 3.6M | OK| | ||||||
|
||||||
### Go Ethereum (WIP) | ||||||
|
||||||
ZKNOX is planning a client implementation for node of the considered EIP. | ||||||
|
||||||
## Conclusion | ||||||
|
||||||
We provided an optimized version of FALCON, using an optimized version of NTT. This code can be used to speed up Stark verification as well as other lattices primitives (Dilithium, Kyber, etc.). While it seems achievable to use FALCON as a progressive precompile, the cost remains very high. Using a client implementation with NTT-EIP (in a Geth fork for example), ETHEREUM could become from a PQ-Friendly and ZK-Friendly chain. This work is supported by the Ethereum Foundation. | ||||||
|
||||||
|
||||||
## References | ||||||
|
||||||
- [[LN16]](https://eprint.iacr.org/2016/504.pdf) Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography. Patrick Longa, Michael Naehrig. | ||||||
- [[EIP616]](https://eips.ethereum.org/EIPS/eip-616) EIP-616: SIMD Operations for the EVM. Greg Colvin. | ||||||
- [[RD23]](https://eprint.iacr.org/2023/939.pdf) Speeding up elliptic computations for Ethereum Account Abstraction. Renaud Dubois. | ||||||
- [[DILITHIUM]](https://eprint.iacr.org/2017/633.pdf) CRYSTALS-Dilithium: A Lattice-Based Digital Signature Scheme. Léo Ducas, Eike Kiltz, Tancrède Lepoint, Vadim Lyubashevsky, Peter Schwabe, Gregor Seiler and Damien Stehlé. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# NTT | ||
Generic implementation of the Number Theoretic Transform in the context of cryptography applications. | ||
|
||
We provide tests for various NTT-friendly rings, including Falcon's ring with `q = 12*1024+1` and the defining polynomial `x¹⁰²⁴+1`. | ||
|
||
The implementation requires the file `ntt_constants.py`, generated using `python generate_constants.py`. | ||
|
||
## Install | ||
``` | ||
make install | ||
``` | ||
|
||
## Tests | ||
For running all tests: | ||
``` | ||
make test | ||
``` | ||
For running a specific test, use: | ||
``` | ||
make test TEST=test_ntt_recursive.TestNTTRecursive.test_ntt_intt | ||
``` | ||
|
||
## Benchmarks | ||
For running the benchmarks: | ||
``` | ||
make bench | ||
``` | ||
Note that the field arithmetic is not optimized. For example, Montgomery multiplication is not implemented here. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
PY = python3 | ||
VENV = myenv | ||
PIP = $(VENV)/bin/pip | ||
PYTHON = $(VENV)/bin/python | ||
AUX = *.pyc *.cprof */*.pyc | ||
|
||
install: | ||
$(PY) -m venv $(VENV) | ||
$(PIP) install pycryptodome | ||
|
||
generate_ntt_constants: | ||
$(PYTHON) -m polyntt.scripts.generate_ntt_constants | ||
|
||
generate_test_vectors: | ||
$(PYTHON) -m polyntt.scripts.generate_test_vectors | ||
$(PYTHON) -m polyntt.scripts.generate_test_vectors_solidity | ||
|
||
test: generate_test_vectors | ||
$(PYTHON) -m unittest $(if $(TEST),polyntt.tests.$(TEST),discover -s polyntt.tests) -v | ||
|
||
bench: | ||
$(PYTHON) -m polyntt.bench_iterative_recursive | ||
|
||
clean: | ||
rm -f $(AUX) | ||
rm -rf __pycache__ */__pycache__ | ||
rm -rf scripts/*.sage.py | ||
@echo "Clean done" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.