Skip to content

Commit

Permalink
Various improvements. (#31)
Browse files Browse the repository at this point in the history
- Added missing ?Sized to the definition of the Q generic in a few
collection types. This missing annotation would lead to compilation
errors depending on the collection used and the type of the key
used.

- Simplify code generation, using fewer types for small collections
since it doesn't actually improve performance. I might bring some
of these back if I can figure out how to improve branch prediction
for those collections (which I think is what is killing perf)

- Minor cleanup of some of the doc examples.
  • Loading branch information
geeknoid authored Jan 3, 2025
1 parent 657cd26 commit 36e5be8
Show file tree
Hide file tree
Showing 13 changed files with 227 additions and 398 deletions.
64 changes: 32 additions & 32 deletions BENCHMARKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,78 +36,78 @@ real world hit rate you experience.

Scalar sets where the values are in a contiguous range.

| | `HashSet(classic)` | `HashSet(foldhash)` | `FzScalarSet` | `fz_scalar_set` |
|:-----------|:----------------------------|:--------------------------------|:---------------------------------|:---------------------------------- |
| **`3`** | `26.83 ns` (✅ **1.00x**) | `7.75 ns` (🚀 **3.46x faster**) | `2.69 ns` (🚀 **9.96x faster**) | `3.98 ns` (🚀 **6.73x faster**) |
| **`16`** | `207.07 ns` (✅ **1.00x**) | `65.75 ns` (🚀 **3.15x faster**) | `21.39 ns` (🚀 **9.68x faster**) | `21.06 ns` (🚀 **9.83x faster**) |
| **`256`** | `3.27 us` (✅ **1.00x**) | `1.05 us` (🚀 **3.12x faster**) | `327.73 ns` (🚀 **9.96x faster**) | `310.68 ns` (🚀 **10.51x faster**) |
| **`1000`** | `12.84 us` (✅ **1.00x**) | `3.82 us` (🚀 **3.36x faster**) | `1.17 us` (🚀 **11.01x faster**) | `1.18 us` (🚀 **10.91x faster**) |
| | `HashSet(classic)` | `HashSet(foldhash)` | `FzScalarSet` | `fz_scalar_set` |
|:-----------|:----------------------------|:---------------------------------|:----------------------------------|:---------------------------------- |
| **`3`** | `27.14 ns` (✅ **1.00x**) | `7.51 ns` (🚀 **3.62x faster**) | `2.57 ns` (🚀 **10.57x faster**) | `2.62 ns` (🚀 **10.35x faster**) |
| **`16`** | `136.92 ns` (✅ **1.00x**) | `41.63 ns` (🚀 **3.29x faster**) | `14.06 ns` (🚀 **9.73x faster**) | `13.43 ns` (🚀 **10.19x faster**) |
| **`256`** | `2.32 us` (✅ **1.00x**) | `651.33 ns` (🚀 **3.57x faster**) | `227.27 ns` (🚀 **10.23x faster**) | `226.92 ns` (🚀 **10.24x faster**) |
| **`1000`** | `9.20 us` (✅ **1.00x**) | `2.63 us` (🚀 **3.50x faster**) | `832.45 ns` (🚀 **11.06x faster**) | `844.27 ns` (🚀 **10.90x faster**) |

### sparse_scalar

Scalar sets where the values are in a non-contiguous range.

| | `HashSet(classic)` | `HashSet(foldhash)` | `FzScalarSet` | `fz_scalar_set` |
|:-----------|:----------------------------|:--------------------------------|:----------------------------------|:--------------------------------- |
| **`3`** | `37.11 ns` (✅ **1.00x**) | `10.79 ns` (🚀 **3.44x faster**) | `3.73 ns` (🚀 **9.95x faster**) | `3.74 ns` (🚀 **9.92x faster**) |
| **`16`** | `177.22 ns` (✅ **1.00x**) | `52.82 ns` (🚀 **3.36x faster**) | `18.95 ns` (🚀 **9.35x faster**) | `18.29 ns` (🚀 **9.69x faster**) |
| **`256`** | `3.95 us` (✅ **1.00x**) | `1.08 us` (🚀 **3.66x faster**) | `379.45 ns` (🚀 **10.42x faster**) | `481.16 ns` (🚀 **8.22x faster**) |
| **`1000`** | `16.41 us` (✅ **1.00x**) | `5.02 us` (🚀 **3.27x faster**) | `1.47 us` (🚀 **11.18x faster**) | `1.47 us` (🚀 **11.15x faster**) |
| | `HashSet(classic)` | `HashSet(foldhash)` | `FzScalarSet` | `fz_scalar_set` |
|:-----------|:----------------------------|:---------------------------------|:----------------------------------|:---------------------------------- |
| **`3`** | `26.11 ns` (✅ **1.00x**) | `7.06 ns` (🚀 **3.70x faster**) | `2.65 ns` (🚀 **9.84x faster**) | `3.02 ns` (🚀 **8.65x faster**) |
| **`16`** | `140.36 ns` (✅ **1.00x**) | `40.63 ns` (🚀 **3.45x faster**) | `14.40 ns` (🚀 **9.74x faster**) | `20.29 ns` (🚀 **6.92x faster**) |
| **`256`** | `2.28 us` (✅ **1.00x**) | `629.81 ns` (🚀 **3.62x faster**) | `224.71 ns` (🚀 **10.15x faster**) | `222.06 ns` (🚀 **10.27x faster**) |
| **`1000`** | `9.29 us` (✅ **1.00x**) | `2.55 us` (🚀 **3.64x faster**) | `831.15 ns` (🚀 **11.18x faster**) | `837.56 ns` (🚀 **11.10x faster**) |

### random_scalar

Scalar sets where the values are randomly distributed.

| | `HashSet(classic)` | `HashSet(foldhash)` | `FzScalarSet` | `fz_scalar_set` |
|:-----------|:----------------------------|:---------------------------------|:---------------------------------|:--------------------------------- |
| **`3`** | `45.62 ns` (✅ **1.00x**) | `13.70 ns` (🚀 **3.33x faster**) | `8.20 ns` (🚀 **5.56x faster**) | `9.82 ns` (🚀 **4.64x faster**) |
| **`16`** | `181.57 ns` (✅ **1.00x**) | `55.33 ns` (🚀 **3.28x faster**) | `38.66 ns` (🚀 **4.70x faster**) | `39.36 ns` (🚀 **4.61x faster**) |
| **`256`** | `2.83 us` (✅ **1.00x**) | `936.44 ns` (🚀 **3.02x faster**) | `688.31 ns` (🚀 **4.10x faster**) | `676.42 ns` (🚀 **4.18x faster**) |
| **`1000`** | `11.42 us` (✅ **1.00x**) | `3.52 us` (🚀 **3.25x faster**) | `2.73 us` (🚀 **4.18x faster**) | `2.78 us` (🚀 **4.10x faster**) |
| **`3`** | `26.42 ns` (✅ **1.00x**) | `7.00 ns` (🚀 **3.77x faster**) | `4.61 ns` (🚀 **5.73x faster**) | `4.60 ns` (🚀 **5.75x faster**) |
| **`16`** | `145.67 ns` (✅ **1.00x**) | `40.70 ns` (🚀 **3.58x faster**) | `25.75 ns` (🚀 **5.66x faster**) | `24.40 ns` (🚀 **5.97x faster**) |
| **`256`** | `2.27 us` (✅ **1.00x**) | `687.58 ns` (🚀 **3.31x faster**) | `561.43 ns` (🚀 **4.05x faster**) | `730.54 ns` (🚀 **3.11x faster**) |
| **`1000`** | `11.14 us` (✅ **1.00x**) | `3.51 us` (🚀 **3.17x faster**) | `2.73 us` (🚀 **4.08x faster**) | `2.68 us` (🚀 **4.15x faster**) |

### random_string

String sets where the values are random.

| | `HashSet(classic)` | `HashSet(foldhash)` | `FzStringSet` | `fz_string_set` |
|:-----------|:----------------------------|:---------------------------------|:---------------------------------|:--------------------------------- |
| **`3`** | `59.10 ns` (✅ **1.00x**) | `26.54 ns` (🚀 **2.23x faster**) | `25.13 ns` (🚀 **2.35x faster**) | `20.80 ns` (🚀 **2.84x faster**) |
| **`16`** | `331.63 ns` (✅ **1.00x**) | `124.91 ns` (🚀 **2.65x faster**) | `165.83 ns` (🚀 **2.00x faster**) | `116.13 ns` (🚀 **2.86x faster**) |
| **`256`** | `5.45 us` (✅ **1.00x**) | `2.19 us` (🚀 **2.48x faster**) | `3.08 us` (✅ **1.77x faster**) | `2.38 us` (🚀 **2.29x faster**) |
| **`1000`** | `23.47 us` (✅ **1.00x**) | `9.63 us` (🚀 **2.44x faster**) | `13.07 us` (✅ **1.80x faster**) | `9.15 us` (🚀 **2.57x faster**) |
| **`3`** | `61.04 ns` (✅ **1.00x**) | `25.87 ns` (🚀 **2.36x faster**) | `36.21 ns` ( **1.69x faster**) | `22.89 ns` (🚀 **2.67x faster**) |
| **`16`** | `312.41 ns` (✅ **1.00x**) | `120.63 ns` (🚀 **2.59x faster**) | `181.25 ns` ( **1.72x faster**) | `118.82 ns` (🚀 **2.63x faster**) |
| **`256`** | `5.04 us` (✅ **1.00x**) | `2.02 us` (🚀 **2.50x faster**) | `3.17 us` (✅ **1.59x faster**) | `2.05 us` (🚀 **2.45x faster**) |
| **`1000`** | `23.55 us` (✅ **1.00x**) | `8.83 us` (🚀 **2.67x faster**) | `14.15 us` (✅ **1.66x faster**) | `9.51 us` (🚀 **2.48x faster**) |

### prefixed_string

String sets where the values are random, but share a common prefix.

| | `HashSet(classic)` | `HashSet(foldhash)` | `FzStringSet` | `fz_string_set` |
|:-----------|:----------------------------|:---------------------------------|:---------------------------------|:--------------------------------- |
| **`3`** | `68.35 ns` (✅ **1.00x**) | `30.15 ns` (🚀 **2.27x faster**) | `31.39 ns` (🚀 **2.18x faster**) | `25.52 ns` (🚀 **2.68x faster**) |
| **`16`** | `355.08 ns` (✅ **1.00x**) | `155.86 ns` (🚀 **2.28x faster**) | `171.22 ns` (🚀 **2.07x faster**) | `131.68 ns` (🚀 **2.70x faster**) |
| **`256`** | `5.80 us` (✅ **1.00x**) | `2.50 us` (🚀 **2.32x faster**) | `3.24 us` (✅ **1.79x faster**) | `2.23 us` (🚀 **2.61x faster**) |
| **`1000`** | `25.68 us` (✅ **1.00x**) | `11.92 us` (🚀 **2.15x faster**) | `16.69 us` (✅ **1.54x faster**) | `10.13 us` (🚀 **2.53x faster**) |
| **`3`** | `64.95 ns` (✅ **1.00x**) | `28.36 ns` (🚀 **2.29x faster**) | `35.17 ns` (🚀 **1.85x faster**) | `23.74 ns` (🚀 **2.74x faster**) |
| **`16`** | `343.47 ns` (✅ **1.00x**) | `140.68 ns` (🚀 **2.44x faster**) | `181.78 ns` (🚀 **1.89x faster**) | `141.47 ns` (🚀 **2.43x faster**) |
| **`256`** | `5.65 us` (✅ **1.00x**) | `2.28 us` (🚀 **2.47x faster**) | `3.40 us` (✅ **1.66x faster**) | `2.42 us` (🚀 **2.34x faster**) |
| **`1000`** | `24.01 us` (✅ **1.00x**) | `10.93 us` (🚀 **2.20x faster**) | `16.88 us` (✅ **1.42x faster**) | `9.03 us` (🚀 **2.66x faster**) |

### hashed

Sets with a complex key type that is hashable.

| | `HashSet(classic)` | `HashSet(foldhash)` | `FzHashSet` | `fz_hash_set` |
|:-----------|:----------------------------|:---------------------------------|:---------------------------------|:--------------------------------- |
| **`3`** | `78.97 ns` (✅ **1.00x**) | `26.22 ns` (🚀 **3.01x faster**) | `43.42 ns` (🚀 **1.82x faster**) | `45.17 ns` ( **1.75x faster**) |
| **`16`** | `424.21 ns` (✅ **1.00x**) | `153.06 ns` (🚀 **2.77x faster**) | `137.41 ns` (🚀 **3.09x faster**) | `148.56 ns` (🚀 **2.86x faster**) |
| **`256`** | `6.65 us` (✅ **1.00x**) | `2.60 us` (🚀 **2.56x faster**) | `2.84 us` (🚀 **2.34x faster**) | `2.58 us` (🚀 **2.57x faster**) |
| **`1000`** | `27.76 us` (✅ **1.00x**) | `10.24 us` (🚀 **2.71x faster**) | `11.03 us` (🚀 **2.52x faster**) | `10.11 us` (🚀 **2.75x faster**) |
| **`3`** | `75.34 ns` (✅ **1.00x**) | `28.20 ns` (🚀 **2.67x faster**) | `29.94 ns` (🚀 **2.52x faster**) | `28.91 ns` (🚀 **2.61x faster**) |
| **`16`** | `395.30 ns` (✅ **1.00x**) | `138.94 ns` (🚀 **2.85x faster**) | `147.65 ns` (🚀 **2.68x faster**) | `123.05 ns` (🚀 **3.21x faster**) |
| **`256`** | `6.51 us` (✅ **1.00x**) | `2.43 us` (🚀 **2.68x faster**) | `2.45 us` (🚀 **2.65x faster**) | `2.37 us` (🚀 **2.74x faster**) |
| **`1000`** | `26.28 us` (✅ **1.00x**) | `9.55 us` (🚀 **2.75x faster**) | `10.20 us` (🚀 **2.58x faster**) | `10.05 us` (🚀 **2.61x faster**) |

### ordered

Sets with a complex key type that is ordered.

| | `BTreeSet` | `FzOrderedSet` | `fz_ordered_set` |
|:-----------|:--------------------------|:---------------------------------|:--------------------------------- |
| **`3`** | `68.85 ns` (✅ **1.00x**) | `62.66 ns` (✅ **1.10x faster**) | `45.39 ns` (✅ **1.52x faster**) |
| **`16`** | `859.33 ns` (✅ **1.00x**) | `896.02 ns` (✅ **1.04x slower**) | `834.60 ns` (✅ **1.03x faster**) |
| **`256`** | `29.44 us` (✅ **1.00x**) | `19.59 us` (✅ **1.50x faster**) | `18.74 us` (✅ **1.57x faster**) |
| **`1000`** | `213.63 us` (✅ **1.00x**) | `191.99 us` (✅ **1.11x faster**) | `184.03 us` (✅ **1.16x faster**) |
| **`3`** | `68.52 ns` (✅ **1.00x**) | `58.24 ns` (✅ **1.18x faster**) | `57.56 ns` (✅ **1.19x faster**) |
| **`16`** | `934.93 ns` (✅ **1.00x**) | `593.69 ns` (✅ **1.57x faster**) | `588.56 ns` (✅ **1.59x faster**) |
| **`256`** | `33.38 us` (✅ **1.00x**) | `24.05 us` (✅ **1.39x faster**) | `23.75 us` (✅ **1.41x faster**) |
| **`1000`** | `204.16 us` (✅ **1.00x**) | `181.22 us` (✅ **1.13x faster**) | `176.04 us` (✅ **1.16x faster**) |

---
Made with [criterion-table](https://github.com/nu11ptr/criterion-table)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ use crate::utils::BitVec;
use alloc::vec::Vec;

/// How to treat a collection of hash codes for best performance.
//#[derive(Clone)]
pub struct HashCodeAnalysisResult {
/// The recommended hash table size. This is not necessarily optimal, but it's good enough.
pub num_hash_slots: usize,
Expand Down
31 changes: 4 additions & 27 deletions frozen-collections-core/src/emit/collection_emitter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,6 @@ pub struct CollectionEmitter {
}

const INLINE_SCAN_THRESHOLD: usize = 2;
const SCAN_THRESHOLD: usize = 4;
const ORDERED_SCAN_THRESHOLD: usize = 7;
const BINARY_SEARCH_THRESHOLD: usize = 64;

impl CollectionEmitter {
/// Creates a new `CollectionEmitter` instance.
Expand Down Expand Up @@ -246,7 +243,7 @@ impl CollectionEmitter {
self.clean_values(&mut entries);

let gen = self.preflight(entries.len())?;
let output = if entries.len() < SCAN_THRESHOLD {
let output = if entries.len() < INLINE_SCAN_THRESHOLD {
gen.gen_inline_scan(entries)
} else {
gen.gen_inline_hash_with_bridge(entries)
Expand Down Expand Up @@ -275,12 +272,8 @@ impl CollectionEmitter {
self.clean_values(&mut entries);

let gen = self.preflight(entries.len())?;
let output = if entries.len() < SCAN_THRESHOLD {
let output = if entries.len() < INLINE_SCAN_THRESHOLD {
gen.gen_inline_scan(entries)
} else if entries.len() < ORDERED_SCAN_THRESHOLD {
gen.gen_inline_ordered_scan(entries)
} else if entries.len() < BINARY_SEARCH_THRESHOLD {
gen.gen_inline_binary_search(entries)
} else {
gen.gen_inline_eytzinger_search(entries)
};
Expand Down Expand Up @@ -314,10 +307,8 @@ impl CollectionEmitter {
ScalarKeyAnalysisResult::DenseRange => gen.gen_inline_dense_scalar_lookup(entries),
ScalarKeyAnalysisResult::SparseRange => gen.gen_inline_sparse_scalar_lookup(entries),
ScalarKeyAnalysisResult::General => {
if entries.len() < SCAN_THRESHOLD {
if entries.len() < INLINE_SCAN_THRESHOLD {
gen.gen_inline_scan(entries)
} else if entries.len() < ORDERED_SCAN_THRESHOLD {
gen.gen_inline_ordered_scan(entries)
} else {
gen.gen_inline_hash_with_passthrough(entries, &PassthroughHasher::new())
}
Expand All @@ -344,10 +335,8 @@ impl CollectionEmitter {
self.clean_values(&mut entries);

let gen = self.preflight(entries.len())?;
let output = if entries.len() < SCAN_THRESHOLD {
let output = if entries.len() < INLINE_SCAN_THRESHOLD {
gen.gen_inline_scan(entries)
} else if entries.len() < ORDERED_SCAN_THRESHOLD {
gen.gen_inline_ordered_scan(entries)
} else {
let iter = entries.iter().map(|x| x.key.as_bytes());

Expand Down Expand Up @@ -394,8 +383,6 @@ impl CollectionEmitter {
let gen = self.preflight(entries.len())?;
let output = if entries.len() < INLINE_SCAN_THRESHOLD {
gen.gen_inline_scan(entries)
} else if entries.len() < SCAN_THRESHOLD {
gen.gen_scan(entries)
} else {
gen.gen_hash_with_bridge(entries)
};
Expand All @@ -411,12 +398,6 @@ impl CollectionEmitter {
let gen = self.preflight(entries.len())?;
let output = if entries.len() < INLINE_SCAN_THRESHOLD {
gen.gen_inline_scan(entries)
} else if entries.len() < SCAN_THRESHOLD {
gen.gen_scan(entries)
} else if entries.len() < ORDERED_SCAN_THRESHOLD {
gen.gen_ordered_scan(entries)
} else if entries.len() < BINARY_SEARCH_THRESHOLD {
gen.gen_binary_search(entries)
} else {
gen.gen_eytzinger_search(entries)
};
Expand All @@ -432,8 +413,6 @@ impl CollectionEmitter {
let gen = self.preflight(entries.len())?;
let output = if entries.len() < INLINE_SCAN_THRESHOLD {
gen.gen_inline_scan(entries)
} else if entries.len() < SCAN_THRESHOLD {
gen.gen_scan(entries)
} else {
gen.gen_fz_scalar(entries)
};
Expand All @@ -449,8 +428,6 @@ impl CollectionEmitter {
let gen = self.preflight(entries.len())?;
let output = if entries.len() < INLINE_SCAN_THRESHOLD {
gen.gen_inline_scan(entries)
} else if entries.len() < SCAN_THRESHOLD {
gen.gen_scan(entries)
} else {
gen.gen_fz_string(entries)
};
Expand Down
6 changes: 3 additions & 3 deletions frozen-collections-core/src/emit/collection_entry.rs
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ mod tests {
fn test_to_tokens() {
let key_expr: Expr = parse_quote!(key);
let value_expr: Expr = parse_quote!(value);
let entry = CollectionEntry::map_entry("key", key_expr.clone(), value_expr.clone());
let entry = CollectionEntry::map_entry("key", key_expr, value_expr);
let mut tokens = TokenStream::new();
entry.to_tokens(&mut tokens);

Expand All @@ -124,8 +124,8 @@ mod tests {
let key = "key";
let key_expr: Expr = parse_quote!(key);
let value_expr: Expr = parse_quote!(value);
let entry = CollectionEntry::map_entry(key, key_expr.clone(), value_expr.clone());
let debug_str = format!("{:?}", entry);
let entry = CollectionEntry::map_entry(key, key_expr, value_expr);
let debug_str = format!("{entry:?}");

assert_eq!(
debug_str,
Expand Down
Loading

0 comments on commit 36e5be8

Please sign in to comment.