Skip to content

Commit

Permalink
Frozen collections
Browse files Browse the repository at this point in the history
  • Loading branch information
Martin Taillefer committed Dec 7, 2024
0 parents commit 037b55d
Show file tree
Hide file tree
Showing 136 changed files with 18,816 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
text eol=crlf
11 changes: 11 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: 2
updates:
- package-ecosystem: "cargo"
directory: "/"
schedule:
interval: "monthly"
open-pull-requests-limit: 10
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "monthly"
57 changes: 57 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: main

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

env:
CARGO_TERM_COLOR: always

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
toolchain:
- 1.82.0
- beta
- nightly

steps:
- uses: actions/checkout@v4
- run: rustup update ${{ matrix.toolchain }} && rustup default ${{ matrix.toolchain }}
- run: rustup component add clippy
- run: rustup component add rustfmt
- name: Build
run: cargo build --verbose --all-targets
- name: Check
run: cargo check --verbose --all-targets
- name: Clippy
run: cargo clippy --verbose --all-targets
- name: Format
run: cargo fmt -- --check
- name: Tests
run: cargo test --verbose
- name: Doc Tests
run: cargo test --doc --verbose

coverage:
runs-on: ubuntu-latest
env:
CARGO_TERM_COLOR: always
steps:
- uses: actions/checkout@v4
- name: Install Rust
run: rustup update stable
- name: Install cargo-llvm-cov
uses: taiki-e/install-action@cargo-llvm-cov
- name: Generate code coverage
run: cargo llvm-cov --all-features --workspace --lcov --output-path lcov.info
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }} # not required for public repos
files: lcov.info
fail_ci_if_error: true
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/target
/Cargo.lock
/.idea
46 changes: 46 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
[workspace]
resolver = "2"
members = [
"frozen-collections",
"frozen-collections-core",
"frozen-collections-macros",
"benches",
"codegen",
]

[workspace.package]
version = "0.1.0"
edition = "2021"
categories = ["data-structures", "no-std", "collections"]
keywords = ["map", "set", "collection"]
repository = "https://github.com/geeknoid/frozen-collections"
license = "MIT"
readme = "README.md"
authors = ["Martin Taillefer <martin@taillefer.org>"]
rust-version = "1.82.0"

[workspace.lints.clippy]
pedantic = { level = "warn", priority = -1 }
correctness = { level = "warn", priority = -1 }
complexity = { level = "warn", priority = -1 }
perf = { level = "warn", priority = -1 }
cargo = { level = "warn", priority = -1 }
nursery = { level = "warn", priority = -1 }
wildcard_imports = "allow"
too_many_lines = "allow"
multiple_crate_versions = "allow"
from-iter-instead-of-collect = "allow"
into_iter_without_iter = "allow"
inline_always = "allow"
unnecessary_wraps = "allow"
cognitive_complexity = "allow"

[profile.bench]
codegen-units = 1
lto = "fat"

[profile.release] # Modify profile settings via config.
codegen-units = 1
lto = "fat"
debug = true # Include debug info.
strip = "none" # Removes symbols or debuginfo.
25 changes: 25 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Copyright (c) 2024 Martin Taillefer

Permission is hereby granted, free of charge, to any
person obtaining a copy of this software and associated
documentation files (the "Software"), to deal in the
Software without restriction, including without
limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software
is furnished to do so, subject to the following
conditions:

The above copyright notice and this permission notice
shall be included in all copies or substantial portions
of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF
ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
240 changes: 240 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
# Frozen Collections - Fast Partially Immutable Collections

[![crate.io](https://img.shields.io/crates/v/frozen-collections.svg)](https://crates.io/crates/frozen-collections)
[![docs.rs](https://docs.rs/frozen-collections/badge.svg)](https://docs.rs/frozen-collections)
[![CI](https://github.com/geeknoid/frozen-collections/workflows/main/badge.svg)](https://github.com/geeknoid/frozen-collections/actions)
[![Coverage](https://codecov.io/gh/geeknoid/frozen-collections/graph/badge.svg?token=FCUG0EL5TI)](https://codecov.io/gh/geeknoid/frozen-collections)
[![Minimum Supported Rust Version 1.82](https://img.shields.io/badge/MSRV-1.82-blue.svg)]()
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)

* [Summary](#summary)
* [Creation](#creation)
* [Short Form](#short-form)
* [Long Form](#long-form)
* [Partially Immutable](#partially-immutable)
* [Performance Considerations](#performance-considerations)
* [Analysis and Optimizations](#analysis-and-optimizations)
* [Cargo Features](#cargo-features)

## Summary

Frozen collections are designed to deliver improved
read performance relative to the standard [`HashMap`](https://doc.rust-lang.org/std/collections/hash/map/struct.HashMap.html) and
[`HashSet`](https://doc.rust-lang.org/std/collections/hash/set/struct.HashSet.html) types. They are ideal for use with long-lasting collections
which get initialized when an application starts and remain unchanged
permanently, or at least extended periods of time. This is a common
pattern in service applications.

As part of creating a frozen collection, analysis is performed over the data that the collection
will hold to determine the best layout and algorithm to use to deliver optimal performance.
Depending on the situation, sometimes the analysis is done at compile-time whereas in
other cases it is done at runtime when the collection is initialized.
This analysis can take some time, but the value in spending this time up front
is that the collections provide faster read-time performance.

## Creation

Frozen collections are created with one of eight macros:
[`fz_hash_map!`](https://docs.rs/frozen-collections/macro.fz_hash_map.html),
[`fz_ordered_map!`](https://docs.rs/frozen-collections/macro.fz_ordered_map.html),
[`fz_scalar_map!`](https://docs.rs/frozen-collections/macro.fz_scalar_map.html),
[`fz_string_map!`](https://docs.rs/frozen-collections/macro.fz_string_map.html),
[`fz_hash_set!`](https://docs.rs/frozen-collections/macro.fz_hash_set.html),
[`fz_ordered_set!`](https://docs.rs/frozen-collections/macro.fz_ordered_set.html),
[`fz_scalar_set!`](https://docs.rs/frozen-collections/macro.fz_scalar_set.html), or
[`fz_string_set!`](https://docs.rs/frozen-collections/macro.fz_string_set.html).
These macros analyze the data you provide
and return a custom implementation type that's optimized for the data. All the
possible implementations implement the [`Map`] or [`Set`] traits.

The macros exist in a short form and a long form, described below.

### Short Form

With the short form, you supply the data that
goes into the collection and get in return an initialized collection of an unnamed
type. For example:

```rust
use frozen_collections::*;

let m = fz_string_map!({
"Alice": 1,
"Bob": 2,
"Sandy": 3,
"Tom": 4,
});
```

At build time, the macro analyzes the data supplied and determines the best map
implementation type to use. As such, the type of `m` is not known to this code. `m` will
always implement the [`Map`] trait however, so you can leverage type inference even though
you don't know the actual type of `m`:

```rust
use frozen_collections::*;

fn main() {
let m = fz_string_map!({
"Alice": 1,
"Bob": 2,
"Sandy": 3,
"Tom": 4,
});

more(m);
}

fn more<M>(m: M)
where
M: Map<&'static str, i32>
{
assert!(m.contains_key(&"Alice"));
}
```

Rather than specifying all the data inline, you can also create a frozen collection by passing
a vector as input:

```rust
use frozen_collections::*;

let v = vec![
("Alice".to_string(), 1),
("Bob".to_string(), 2),
("Sandy".to_string(), 3),
("Tom".to_string(), 4),
];

let m = fz_string_map!(v);
```

The inline form is preferred however since it results in faster code. However, whereas the inline form
requires all the data to be provided at compile time, the vector form enables the content of the
frozen collection to tbe determined at runtime.

### Long Form

The long form lets you provide a type alias name which will be created to
correspond to the collection implementation type chosen by the macro invocation.
Note that you must use the long form if you want to declare a static frozen collection.

```rust
use frozen_collections::*;

fz_string_map!(static MAP: MyMapType<&str, i32> = {
"Alice": 1,
"Bob": 2,
"Sandy": 3,
"Tom": 4,
});
```

The above creates a static variable called `MAP` with keys that are strings and values which are
integers. As before, you don't know the specific implementation type selected by the macro, but
this time you have a type alias (i.e. `MyMapType`) representing that type. You can then use this alias
anywhere you'd like to in your code where you'd like to mention the type explicitly.

To use the long form for non-static uses, replace `static` with `let`:

```rust
use frozen_collections::*;

fz_string_map!(let m: MyMapType<&str, i32> = {
"Alice": 1,
"Bob": 2,
"Sandy": 3,
"Tom": 4,
});

more(m);

struct S {
m: MyMapType,
}

fn more(m: MyMapType) {
assert!(m.contains_key("Alice"));
}
```

And like in the short form, you can also supply the collection's data via a vector:

```rust
use frozen_collections::*;

let v = vec![
("Alice".to_string(), 1),
("Bob".to_string(), 2),
("Sandy".to_string(), 3),
("Tom".to_string(), 4),
];

fz_string_map!(let m: MyMapType<&str, i32> = v);
```

## Partially Immutable

Frozen maps are only partially immutable. The keys associated with a map are determined
at creation time and cannot change, but the values can be updated at will if you have a
mutable reference to the map.

Frozen sets however are completely immutable and so never change after creation.

## Performance Considerations

The analysis performed when creating maps tries to find the best concrete implementation type
given the data at hand. If all the data is visible to the macro at compile time, then you get
the best possible performance. If you supply a vector instead, then the analysis can only be
done at runtime and the resulting collection types are a bit slower.

When creating static collections, the collections produced can often be embedded directly as constant data
into the binary of the application, thus require not initialization time and no heap space at
runtime. This also happens to be the fastest form for these collections. If possible, this happens
automatically, you don't need to do anything special to enable this behavior.

## Analysis and Optimizations

Unlike normal collections, the frozen collections require you to provide all the data for
the collection when you create the collection. The data you supply is analyzed which determines
which specific underlying implementation strategy to use and how to lay out data internally.

The available implementation strategies are:

- **Scalar as Hash**. When the keys are of an integer or enum type, this uses the keys themselves
as hash codes, avoiding the overhead of hashing.

- **Dense Scalar Lookup**. When the keys represent a contiguous range of integer or enum values,
lookups use a simple array access instead of hashing.

- **Sparse Scalar Lookup**. When the keys represent a sparse range of integer or enum values,
lookups use a sparse array access instead of hashing.

- **Length as Hash**. When the keys are of a slice type, the length of the slices
are used as hash code, avoiding the overhead of hashing.

- **Left Hand Substring Hashing**. When the keys are of a slice type, this uses sub-slices of
the keys for hashing, reducing the overhead of hashing.

- **Right Hand Substring Hashing**. Similar to the Left Hand Substring Hashing from above, but
using right-aligned sub-slices instead.

- **Linear Scan**. For very small collections, this avoids hashing completely by scanning through the
entries in linear order.

- **Ordered Scan**. For very small collections where the keys implement the [`Ord`] trait,
this avoids hashing completely by scanning through the entries in linear order.

- **Classic Hashing**. This is the fallback when none of the previous strategies apply. The
frozen implementations are generally faster than
[`HashMap`](https://doc.rust-lang.org/std/collections/hash/map/struct.HashMap.html) and
[`HashSet`](https://doc.rust-lang.org/std/collections/hash/set/struct.HashSet.html).

## Cargo Features

You can specify the following features when you include the `frozen_collections` crate in your
`Cargo.toml` file:

- **`std`**. Enables small features only available when building with the standard library.

The `std` feature is enabled by default.
Loading

0 comments on commit 037b55d

Please sign in to comment.