Skip to content

Latest commit

 

History

History
54 lines (45 loc) · 2.03 KB

README.md

File metadata and controls

54 lines (45 loc) · 2.03 KB

cuckooget

                      __                                      __      
                     /\ \                                    /\ \__   
  ___   __  __    ___\ \ \/'\     ___     ___      __      __\ \ ,_\  
 /'___\/\ \/\ \  /'___\ \ , <    / __`\  / __`\  /'_ `\  /'__`\ \ \/  
/\ \__/\ \ \_\ \/\ \__/\ \ \\`\ /\ \L\ \/\ \L\ \/\ \L\ \/\  __/\ \ \_ 
\ \____\\ \____/\ \____\\ \_\ \_\ \____/\ \____/\ \____ \ \____\\ \__\
 \/____/ \/___/  \/____/ \/_/\/_/\/___/  \/___/  \/___L\ \/____/ \/__/
                                                   /\____/            
                                                   \_/__/             

What

A very fast website copy script using a cuckoo hash table & xxhash & DAG. There are still many problems. I feel sad about disappearing websites, and I’m thinking of ways to save them even faster.

Websites are our memories.
Let everyone rise up and preserve disappearing historical websites, leaving them for the future.
For all geeks and for those who love the internet. If you find an interesting website, please contact me.

Furthermore, with the -w option, you can set higher priorities based on the URL. I don't think other website mirroring software has this feature.

Collisions are avoided by the cuckoo hash table and generated by the ultra-fast xxhash. It consists of xxh32 and xxh64 as different hash values.

Install

deps

pip install maturin
pip install -r requirements.txt

You can build the CuckooHashtables implemented in Rust and install it using pip. This will allow you to call it from your Python code. If you prefer not to install it globally, you can also install it from within a virtual environment.

maturin build
pip install target/wheels/your_package_name.whl

chmod +x main.py

or

pip install target/wheels/your_package_name.whl --force-reinstall

chmod +x main.py

Usage

python3 ./main.py

usage: main.py [-h] [-c CONNECTIONS] [-w WEIGHTS [WEIGHTS ...]]
               [-v EXCLUDE [EXCLUDE ...]]
               url output_dir