A program to efficiently determine the total stopping time of the Collatz sequence of any 128-bit integer starting
value, and output all starting values
For the purposes of this document, the following mathematical symbols will represent the corresponding concepts.
-
$\mathbb{Z}$ represents the set of all integers. -
$\mathbb{Z}^+$ represents the set of all positive integers. -
$\mathbb{Z}^{0+}$ represents the set of all non-negative integers. -
$f(n)$ represents the Collatz function applied to$n$ . -
$f^k(n)$ represents the Collatz function applied recursively$k$ times to$n$ . -
$s(n)$ represents the total stopping time of the starting value$n$ .
The Collatz Conjecture is a famous unsolved mathematical problem. It
regards the Collatz function, a mathematical function which takes a positive integer input
The Collatz function can be applied recursively, meaning given an initial input
By applying the Collatz function recursively, the sequence of successive inputs and outputs will form a Collatz
sequence. If a Collatz sequence includes the value
The Collatz Conjecture states that for all positive integer starting values
Collatz Conjecture Simulator aims to find the Collatz sequences with the greatest total stopping times. That is,
positive integer values
Total stopping times are calculated by iterating through Collatz sequences and counting each step (application of the
Collatz function) until a value of
The general environment and system requirements that must be met for Collatz Conjecture Simulator to build and run correctly. The full requirements of the GPU are given in device_requirements.md.
- CMake 3.23
- pthreads
- glslang
- SPIR-V Tools
spirv-link
spirv-opt
spirv-dis
- C17
_Atomic
(Optional C11 feature)__int128
(GNU C extension)
- Vulkan 1.1
synchronization2
timelineSemaphore
Collatz Conjecture Simulator is built via CMake. Comprehensive documentation regarding usage of CMake can be found here. To generate the build system, navigate the terminal to the project directory and execute the following command.
cmake -S . -B build
Several options can be optionally specified to customise the build system by appending -D OPTION=CONFIG
to the above
command.
CMAKE_BUILD_TYPE
specifies the build variant and can be set to Debug, Release, MinSizeRel, or RelWithDebInfo. If not set, it defaults to Debug.EXCESS_WARNINGS
specifies whether to compile the program with a potentially excessive amount of warnings, and defaults to OFF.STATIC_ANALYSIS
specifies whether to statically analyse the program during compilation if compiling with GCC, and defaults to OFF.DEBUG_SHADERS
specifies whether to include debug information in generated SPIR-V, and defaults to OFF.OPTIMISE_SHADERS
specifies whether to optimise generated SPIR-V usingspirv-opt
, and defaults to ON.DISASSEMBLE_SHADERS
specifies whether to disassemble generated SPIR-V usingspirv-dis
, and defaults to OFF.
Once the above command has finished, a build
directory will have been created containing the build system. To now
build Collatz Conjecture Simulator, execute the following command.
cmake --build build
By default, only the executable will be built. To instead build the SPIR-V, add --target Spirv
. To build both, also
add --target CollatzSim
. To specify the build configuration, add --config CONFIG
(only applies for multi-config
generators).
The above command will create a bin
directory containing the SPIR-V and executable. If built in debug, the executable
will be named CollatzSimDebug
. Otherwise, it will be named CollatzSim
. The executable must be run from within the
bin
directory, else it will be unable to locate the generated SPIR-V.
To facilitate this use of the GPU, inout-buffers are used. Inout-buffers are ranges of GPU memory within VkBuffer
objects and consist of an in-buffer and out-buffer. In-buffers are shader storage buffer objects (SSBOs) and contain
an array of 128-bit unsigned integer starting values. Out-buffers are also SSBOs and contain an array of 16-bit unsigned
integer total stopping times (step counts).
The main loop consists of the CPU writing starting values to in-buffers; the GPU reading starting values from
in-buffers, iterating through Collatz sequences, and writing step counts to out-buffers; and the CPU reading step
counts from out-buffers. The number of inout-buffers is dependent on the system's specifications. There are one or more
inout-buffers per VkBuffer
object, one VkBuffer
object per VkDeviceMemory
object, and two or more VkDeviceMemory
objects.
Collatz Conjecture Simulator attempts to minimise the time spent idle by the CPU and GPU due to one waiting for the
other to complete execution. Such as the GPU waiting for starting values, or the CPU waiting for step counts. This is
done by having an even number of VkDeviceMemory
objects, where half contain memory close to the GPU (device local
memory), and half contain memory visible to both the CPU and GPU (host visible memory). There are therefore four types
of memory ranges: host visible in-buffers (HV-in), host visible out-buffers (HV-out), device local in-buffers (DL-in),
and device local out-buffers (DL-out).
Rather than the CPU and GPU taking turns executing, both processors spend time running in parallel. The CPU reads and writes host visible inout-buffers, and the GPU reads and writes device local inout-buffers, simultaneously. Starting values are written to HV-in, copied from HV-in to DL-in, and read from DL-in. Step counts are written to DL-out, copied from DL-out to HV-out, and read from HV-out.
flowchart LR
CPU-->In-buffer
In-buffer-->GPU
GPU-->Out-buffer
Out-buffer-->CPU
subgraph In-buffer
direction LR
HV-in-->DL-in
end
subgraph Out-buffer
direction BT
DL-out-->HV-out
end
It can be mathematically demonstrated that particular sets of starting values will always generate Collatz sequences that contain a smaller value. The step counts of these starting values can thus be expressed as the sum of the step count of the smaller value, and the distance between the starting value and smaller value in the Collatz sequence.
For example, the set of even starting values.
The Collatz sequences of all even starting values
Another set of interest is the set of starting values congruent to 1 modulo 4.
These two patterns remove the requirement to iterate through the Collatz sequence of every 3 in 4 starting values, leaving only those congruent to 3 modulo 4. The step counts of all other starting values can be determined with the previously calculated step counts of lesser starting values.
Hence, Collatz Conjecture Simulator only iterates through the Collatz sequences of starting values
The author of Collatz Conjecture Simulator is not a lawyer, but strongly believes the usage of GPLv3-licensed works in the training and development of proprietary AI is necessarily violating of said licence. However, in the event the GPLv3 does not in itself encompass the above restriction, the following shall apply.
Collatz Conjecture Simulator includes in its terms and conditions regarding copying, distribution, and modification, in addition to those provided by version 3 of the GNU General Public Licence, the strict prohibition of its usage by artificial intelligence software not licensed in their entirety, as a whole, under version 3 or later of the GNU General Public Licence, including but not limited to the training, prompting, or generation of such artificial intelligence models or algorithms.