Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EIP: Precompile for NTT operations #9374

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
217 changes: 217 additions & 0 deletions EIPS/eip-7885.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
---
eip: 7885
title: Precompile for NTT operations
description: Proposal to add a precompiled contract that performs number theoretical transformation (NTT) and inverse (InvNTT).
author: Renaud Dubois (@rdubois-crypto), Simon Masson (@simonmasson)
discussions-to: https://ethereum-magicians.org/t/eip-9374-precompile-for-ntt-operations/22895
status: Draft
type: Standards Track
category: Core
created: 2025-02-12
---


## Abstract

This proposal creates a precompiled contract that performs NTT and Inverse NTT transformations. This provides a way to have efficient and fast polynomial multiplication for Post Quantum and Starks applications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This proposal creates a precompiled contract that performs NTT and Inverse NTT transformations. This provides a way to have efficient and fast polynomial multiplication for Post Quantum and Starks applications.
This proposal creates a precompiled contract that performs NTT and Inverse NTT transformations. This provides a way to efficiently perform fast polynomial multiplication for post-quantum and STARK cryptography.


## Motivation

With the release of Willow cheap, the concern for quantum threat against Ethereum accelerated. Today ECDSA is the EOA signature algorithms, which is prone to quantum computing. Efficient replacement algorithms use polynomial multiplication as the core operation. Once NTT and Inverse NTT are available, the remaining of the verification algorithm is trivial. Choosing to integrate NTT and InvNTT instead of a specific algorithm provides agility, as DILITHIUM or FALCON or any equivalent can be implemented with a modest cost from those operators. NTT is also of interest to speed-up STARK verifiers. This single operator would thus benefit to both the Ethereum scaling and Post Quantum threat mitigation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
With the release of Willow cheap, the concern for quantum threat against Ethereum accelerated. Today ECDSA is the EOA signature algorithms, which is prone to quantum computing. Efficient replacement algorithms use polynomial multiplication as the core operation. Once NTT and Inverse NTT are available, the remaining of the verification algorithm is trivial. Choosing to integrate NTT and InvNTT instead of a specific algorithm provides agility, as DILITHIUM or FALCON or any equivalent can be implemented with a modest cost from those operators. NTT is also of interest to speed-up STARK verifiers. This single operator would thus benefit to both the Ethereum scaling and Post Quantum threat mitigation.
With the recent advances in quantum computing, there are increased concerns for the quantum threat against Ethereum. Today ECDSA is the EOA signature algorithms, which is vulnerable to attacks by quantum computers. Efficient replacement algorithms use polynomial multiplication as the core operation. Once NTT and Inverse NTT are available, the remaining of the verification algorithm is trivial. Choosing to integrate NTT and InvNTT instead of a specific algorithm provides agility, as DILITHIUM or FALCON or any equivalent can be implemented with a modest cost from those operators. NTT is also of interest to speed-up STARK verifiers. This single operator would thus benefit to both the Ethereum scaling and post-quantum threat mitigation.



## Specification

### Constants

| Name | Value | Comment |
|---------------------|-------|--------------------|
| NTT_FW | 0x0f | precompile address |
| NTT_INV | 0x10 | precompile address |
| NTT_VECMULMOD | 0x11 | precompile address |
| NTT_VECADDMOD | 0x12 | precompile address |

We introduce *four* separate precompiles to perform the following operations:

- NTT_FW - to perform the forward NTT transformation (Negative wrap convolution) with a gas cost of `600` gas,

- NTT_INV - to perform the inverse NTT transformation (Negative wrap convolution) with a gas cost of `600` gas,

- NTT_VECMULMOD - to perform vectorized modular multiplication with a gas cost formula defined in the corresponding section,

- NTT_VECADDMOD - to perform vectorized modular addition with a gas cost formula defined in the corresponding section.


### Field parameters

The NTT_FW and NTT_INV are fully defined by the following set of parameters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The NTT_FW and NTT_INV are fully defined by the following set of parameters.
The NTT_FW and NTT_INV are fully defined by the following set of parameters:

Let $R$ be a cyclotomic ring of the form $R=\mathbb F_q[X]/(X^n+1)$. In these notations,

- $n$ is the degree and is a power of 2,

- $\mathbb F_q$ is the prime field where $q=1 \mod 2n$,

- $\omega$ is a $n$-th root of unity in $\mathbb F_q$,

- $\psi$ is a $2n$-th root of unity in $\mathbb F_q$.

Any element $a \in R$ is a polynomial of degree at most $n-1$ with integer coefficients, written
as $a=\sum_{i=0}^{n-1} a_iX^i$


### NTT_FW

The NTT transformation is described by the following algorithm.

**Input:** A vector $a = (a[0], a[1], \dots, a[n-1]) \in \mathbb F_q^n$ in standard order, where $q$ is a prime such that $q \equiv 1 \mod 2n$ and $n$ is a power of two, and a precomputed table $\Psi_\text{rev} \in \mathbb{Z}_q^n$ storing powers of $\psi$ in bit-reversed order.

**Output:** $a \leftarrow \text{NTT\_FW}(a)$ in bit-reversed order.

```plaintext
t ← n
for m = 1 to n-1 by 2m do
t ← t / 2
for i = 0 to m-1 do
j1 ← 2 ⋅ i ⋅ t
j2 ← j1 + t - 1
S ← Ψrev[m + i]
for j = j1 to j2 do
U ← a[j]
V ← a[j + t] ⋅ S
a[j] ← (U + V) mod q
a[j + t] ← (U - V) mod q
end for
end for
end for
return a
```

### NTT_INV

The Inverse NTT is described by the following algorithm.

**Input:** A vector $a = (a[0], a[1], \dots, a[n-1]) \in \mathbb F_q^n$ in bit-reversed order, where $q$ is a prime such that $q \equiv 1 \mod 2n$ and $n$ is a power of two, and a precomputed table $\Psi^{-1}_\text{rev} \in \mathbb F_q^n$ storing powers of $\psi^{-1}$ in bit-reversed order.

**Output:** $a \leftarrow \text{NTT\_INV}(a)$ in standard order.

```plaintext

t ← 1
for m = n to 1 by m/2 do
j1 ← 0
h ← m / 2
for i = 0 to h-1 do
j2 ← j1 + t - 1
S ← Ψ⁻¹rev[h + i]
for j = j1 to j2 do
U ← a[j]
V ← a[j + t]
a[j] ← (U + V) mod q
a[j + t] ← (U - V) ⋅ S mod q
end for
j1 ← j1 + 2t
end for
t ← 2t
end for
for j = 0 to n-1 do
a[j] ← a[j] ⋅ n⁻¹ mod q
end for
return a
```


### NTT_VECMULMOD

The NTT_VECMULMOD is similar to SIMD in the functioning, but operates with larger sizes in input and output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The NTT_VECMULMOD is similar to SIMD in the functioning, but operates with larger sizes in input and output.
The NTT_VECMULMOD is functions similarly to SIMD, but operates with larger input and output sizes.


**Input:** Two vectors $a = (a[0], a[1], \dots, a[n-1]), b=(b[0], b[1], \dots, b[n-1]) \in \mathbb F_q^n$ where $n$ and $q$ are defined above.

**Output:** The element-wise product $(a[0]\cdot b[0] \mod q, a[1]\cdot b[1]\mod q, \dots, a[n-1]\cdot b[n-1] \mod q)$.

**Gas cost:** Denotoing $k$ to be the smallest power of $2$ larger than $\log_2(q)$, the gas cost of this operation is $k\log_2(n) / 8$.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Gas cost:** Denotoing $k$ to be the smallest power of $2$ larger than $\log_2(q)$, the gas cost of this operation is $k\log_2(n) / 8$.
**Gas cost:** Denoting $k$ to be the smallest power of $2$ larger than $\log_2(q)$, the gas cost of this operation is $k\log_2(n) / 8$.



### NTT_VECADDMOD

The NTT_VECMULMOD is similar to SIMD in the functioning, but operates with larger sizes in input and output.

**Input:** Two vectors $a = (a[0], a[1], \dots, a[n-1]), b=(b[0], b[1], \dots, b[n-1]) \in \mathbb F_q^n$ where $n$ and $q$ are defined above.

**Output:** The element-wise addition $(a[0]+ b[0] \mod q, a[1]+ b[1]\mod q, \dots, a[n-1]+ b[n-1] \mod q)$.

**Gas cost:** Denotoing $k$ to be the smallest power of $2$ larger than $\log_2(q)$, the gas cost of this operation is $k\log_2(n) /32$.

## Rationale

If $f$ and $g$ are two polynomials of $R$, then
$f\times g= \text{NTT\_INV}(\text{NTT\_VECMULMOD}(
\text{NTT\_FW}(a), \text{NTT\_FW}(b)))$ is equal to the product of $f$ and $g$ in $R$. The algorithm has a complexity of $n \log_2n$ rather than $n^2$ with the classical schoolbook multiplication algorithm.

### Fields of interest

The implementation applies for many fields of interest for cryptography. In particular, the design applies for:

- FALCON: $q=3.2^{12}+1$ (one of the NIST winners for post-quantum signature scheme),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- FALCON: $q=3.2^{12}+1$ (one of the NIST winners for post-quantum signature scheme),
- FALCON: $q=3.2^{12}+1$ (one of the NIST winners for post-quantum signature schemes),


- DILITHIUM: $q=2^{23}-2^{13}+1$ (one of the NIST winners for post-quantum signature scheme),

- KYBER: $q=13.2^8+1$ (one of the NIST winners for post-quantum key encapsulation mechanism),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- KYBER: $q=13.2^8+1$ (one of the NIST winners for post-quantum key encapsulation mechanism),
- KYBER: $q=13.2^8+1$ (one of the NIST winners for post-quantum key encapsulation mechanism),


- Babybear: $q=15.2^{27}+1$ (Risc0),

- Goldilocks: $q=2^{64}-2^{32}+1$ (Polygon's Plonky2),

- M31: $q=2^{31}-1$ (Circle STARKS, STwo, Plonky3),

- StarkCurve: $q=2^{251}+17.2^{192}+1$


### Benchmarks

#### Pure solidity

To illustrate the interest of the precompile, the assets provide the measured gas const for a single NTT and extrapolates the minimal gas cost taking into account the required number of NTT_FW and NTT_INV. The provided assets use pure Yul optimizations, with memory access hacks. It is unlikely that more than one order of magnitude could be spared on such a minimal code.

|Use case| Parameters | single NTT gas cost | Required NTT(FW/INV) | Estimated NTT/Full cost |
|--|------------------------|---------------------|---------------------|---|
|Falcon| $q=12289, n=512$ | 1.8 M | 1 NTTFW+1 NTTINV |3.6 M|
|Dilithium| $q=2^{23}-2^{13}+1, n=256$| 460K | 4 NTTFW +1 NTTINV|2.3M|

Falcon cost has been measured over a full implementation and is compliant to the estimation. Dilithium cost is evaluated assuming

This demonstrates that using pure solidity enables cheap L2s to experiment with FALCON from now, but is to expensive for L1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This demonstrates that using pure solidity enables cheap L2s to experiment with FALCON from now, but is to expensive for L1.
This demonstrates that using pure solidity enables L2s with low gas fees to experiment with FALCON in the short term, whereas it is too expensive to do so on L1.

Adopting this EIP, the signature verification of Falcon can be reduced to **1500** gas, and a similar result is expected for Dilithium.
Adopting the hash function as a separate EIP would enable a gas verification cost of 2000 gas.
This is in line with the ratio looking at SUPERCOP implementations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is in line with the ratio looking at SUPERCOP implementations.
This is in line with the ratio looking at SUPERCOP implementations.

Fix all spacing errors





## Backwards Compatibility

There are no backward compatibility questions.

## Test Cases

There are no edge cases in the considered operations.


## Reference Implementation


There are two fully spec compatible implementations on the day of writing:

- a python reference code provided in the assets of this EIP

- a solidity reference code provided in the assets of this EIP

Both codes have been validated over a large base of reference vectors, and implementing both FALCON and DILITHIUM algorithms as demonstration of the usefulness of the precompile.


## Security Considerations

Needs discussion.

## Copyright

Copyright and related rights waived via [CC0](../LICENSE.md).
97 changes: 97 additions & 0 deletions assets/eip-9374/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# NTT-EIP as a building block for FALCON, DILITHIUM and Stark verifiers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# NTT-EIP as a building block for FALCON, DILITHIUM and Stark verifiers
# NTT-EIP as a building block for FALCON, DILITHIUM and STARK verifiers


This repository contains the EIP for NTT transform, along with a python reference code, and a solidity implementation.

## Context

### The threat
With the release of Willow cheap, the concern for quantum threat against Ethereum seems to accelerate. Post by [Asanso](https://ethresear.ch/t/so-you-wanna-post-quantum-ethereum-transaction-signature/21291) and [PMiller](https://ethresear.ch/t/tidbits-of-post-quantum-eth/21296) summarize those stakes and possible solutions. Those solutions include use of lattice based signatures such as Dillithium or FALCON (the latter being more optimized for onchain constraints), STARKs and FHE. There is a consensus in the cryptographic research community around lattices as the future of asymetric protocols, and STARKs won the race for ZKEVMs implementation (as used by Scroll, Starknet and ZKsync).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Willow chip"


Those protocols have in common to require fast polynomial multiplication over prime fields, and use NTT (a special [FFT](https://vitalik.eth.limo/general/2019/05/12/fft.html) adapted to prime fields). While in the past Montgomery multipliers over elliptic curve fields were the critical target of optimizations (both hardware and software), NTT optimization is the key to a performant PQ implementation.

### Discussion

In the past Ethereum chose specificity by picking secp256k1 as its sole candidate for signature. Later, after dedicated hardware and proving systems working on other hardwares were realeased, a zoo of EIP flourished to propose alternative curves. There where attempt to have higher level EIPs to enable all of those at once, such as EWASM, SIMD, EVMMAX, or RIP7696 (by decreasing order of genericity and complexity).

Picking NTT as EIP instead of a given scheme would provide massive gas cost reduction for all schemes relying on it.
- **pros** : massive reduction to all cited protocols, more agility for evolutions.
- **cons**: requires to be wrapped into implementations, not optimal for a given target compared to dedicated EIP, not stateless.

## Overview

The NTT operates on sequences of numbers (often coefficients of polynomials) in a modular arithmetic system. It maps these sequences into a different domain where convolution operations (e.g., polynomial multiplication) become simpler and faster, akin to how FFT simplifies signal convolution. Compared to FFT which uses root of unity in complex plane, NTT uses roots of unity in a finite field or ring.

The NTT is based on the Discrete Fourier Transform (DFT), defined as:
$$
X[k] = \sum_{j=0}^{N-1} x[j] \cdot \omega^{j \cdot k} \mod q$$

Where:
- $x[j]$: Input sequence of length N.
- $X[k]$: Transformed sequence,
- $q$: A prime modulus,
- $\omega$: A primitive N-th root of unity modulo $q$, with
$\omega^N \equiv 1 \mod q \quad \text{and} \quad \omega^k \not\equiv 1 \mod q \; \forall \; 0 < k < N$

NTT computation uses the a similar approach as Cooley-Tukey algorithm to provide a O(N log N) complexity. The NTT algorithm transforms a sequence $(x[j])$ to $(X[k])$ using modular arithmetic. It is invertible, allowing reconstruction of the original sequence via the Inverse NTT (INTT). The inverse process is similar but requires dividing by \(N\) (mod \(q\)) and using $(\omega^{-1}$) (the modular inverse of $\omega$). The following algorithm is extracted from
[[LN16]](https://eprint.iacr.org/2016/504.pdf), and describe how to compute the NTT when $R_q= \mathbb{Z}_q[X]/X^n+1$ (Negative Wrap Convolution).

![alt text](image.png)

The Inverse NTT is computed through the following algorithm:

![alt text](image-1.png)

## Benchmarks

### Python

| Field | $n$ | Recursive NTT (Tetration) | Iterative NTT (ZKNox) | Iterative InvNTT (ZKNox)|
|-|-|-|-|-|
|Falcon | 512 | 761 μs | 528 μs | 561 μs |
|Falcon | 1024 | 1642 μs | 1076 μs | 1199 μs |
|Dilithium| 128 | 165 μs | 114 μs | 113 μs |
|Dilithium| 256 | 371 μs | 258 μs | 260 μs |
|BabyBear | 256 | 531 μs | 389 μs | 404 μs |

The recursive inverse NTT is very costly because of the required inversions. For Falcon, the field is small enough so that field inversions can be precomputed, but the cost is still higher than the iterative inverse NTT.
The field arithmetic has not been optimized. In the case of BabyBear, this becomes significant and so the comparison is not really significant.

### Solidity


| Function | Description | gas cost | Tests Status |
|------------------------|---------------------|---------------------|---------------------|
| NTT recursive | original gas cost from [falcon-solidity](https://github.com/Tetration-Lab/falcon-solidity/blob/main/src/Falcon.sol) | 6.9M | OK|
| InvNTT recursive | original gas cost from [falcon-solidity](https://github.com/Tetration-Lab/falcon-solidity/blob/main/src/Falcon.sol) | 7.8M | OK|
| Full Falcon verification | original gas cost from [falcon-solidity](https://github.com/Tetration-Lab/falcon-solidity/blob/main/src/Falcon.sol) | 24 M| OK|
| NTT iterative | ZKNOX | 4M | OK|
| InvNTT iterative | ZKNOX | 4.2M | OK|
| Full Falcon verification | ZKNOX | 8.5 M| OK|


### Yul


Further optimizations are reached by using Yul for critical sections and using the CODECOPY and EXTCODECOPY trick detailed in of [[RD23]](https://eprint.iacr.org/2023/939.pdf) (section 3.3, "Hacking EVM memory access cost").


| Function | Description | gas cost | Tests Status |
|------------------------|---------------------|---------------------|---------------------|
| ntt.NTTFW | ZKNOX_NTTFW, iterative yuled | 1.9M | OK|
| falcon.verify_opt | Full falcon verification with precomputated pubkey | 3.6M | OK|

### Go Ethereum (WIP)

ZKNOX is planning a client implementation for node of the considered EIP.

## Conclusion

We provided an optimized version of FALCON, using an optimized version of NTT. This code can be used to speed up Stark verification as well as other lattices primitives (Dilithium, Kyber, etc.). While it seems achievable to use FALCON as a progressive precompile, the cost remains very high. Using a client implementation with NTT-EIP (in a Geth fork for example), ETHEREUM could become from a PQ-Friendly and ZK-Friendly chain. This work is supported by the Ethereum Foundation.


## References

- [[LN16]](https://eprint.iacr.org/2016/504.pdf) Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography. Patrick Longa, Michael Naehrig.
- [[EIP616]](https://eips.ethereum.org/EIPS/eip-616) EIP-616: SIMD Operations for the EVM. Greg Colvin.
- [[RD23]](https://eprint.iacr.org/2023/939.pdf) Speeding up elliptic computations for Ethereum Account Abstraction. Renaud Dubois.
- [[DILITHIUM]](https://eprint.iacr.org/2017/633.pdf) CRYSTALS-Dilithium: A Lattice-Based Digital Signature Scheme. Léo Ducas, Eike Kiltz, Tancrède Lepoint, Vadim Lyubashevsky, Peter Schwabe, Gregor Seiler and Damien Stehlé.
Binary file added assets/eip-9374/image-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/eip-9374/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 28 additions & 0 deletions assets/eip-9374/pythonref/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# NTT
Generic implementation of the Number Theoretic Transform in the context of cryptography applications.

We provide tests for various NTT-friendly rings, including Falcon's ring with `q = 12*1024+1` and the defining polynomial `x¹⁰²⁴+1`.

The implementation requires the file `ntt_constants.py`, generated using `python generate_constants.py`.

## Install
```
make install
```

## Tests
For running all tests:
```
make test
```
For running a specific test, use:
```
make test TEST=test_ntt_recursive.TestNTTRecursive.test_ntt_intt
```

## Benchmarks
For running the benchmarks:
```
make bench
```
Note that the field arithmetic is not optimized. For example, Montgomery multiplication is not implemented here.
28 changes: 28 additions & 0 deletions assets/eip-9374/pythonref/makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
PY = python3
VENV = myenv
PIP = $(VENV)/bin/pip
PYTHON = $(VENV)/bin/python
AUX = *.pyc *.cprof */*.pyc

install:
$(PY) -m venv $(VENV)
$(PIP) install pycryptodome

generate_ntt_constants:
$(PYTHON) -m polyntt.scripts.generate_ntt_constants

generate_test_vectors:
$(PYTHON) -m polyntt.scripts.generate_test_vectors
$(PYTHON) -m polyntt.scripts.generate_test_vectors_solidity

test: generate_test_vectors
$(PYTHON) -m unittest $(if $(TEST),polyntt.tests.$(TEST),discover -s polyntt.tests) -v

bench:
$(PYTHON) -m polyntt.bench_iterative_recursive

clean:
rm -f $(AUX)
rm -rf __pycache__ */__pycache__
rm -rf scripts/*.sage.py
@echo "Clean done"
Empty file.
Loading
Loading