Skip to content

Commit

Permalink
docs: rewrite architecture guide
Browse files Browse the repository at this point in the history
  • Loading branch information
erikgrinaker committed Feb 6, 2025
1 parent dae2a75 commit cddd7a5
Showing 1 changed file with 69 additions and 7 deletions.
76 changes: 69 additions & 7 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,74 @@
# toyDB Architecture

At the highest level, toyDB consists of a cluster of nodes that execute SQL transactions against
a replicated state machine. Clients can connect to any node in the cluster and submit SQL
statements. It aims to provide
[linearizability](https://jepsen.io/consistency/models/linearizable) (i.e. strong consistency)
and [serializability](https://jepsen.io/consistency/models/serializable), but falls slightly
short as it currently only implements
[snapshot isolation](https://jepsen.io/consistency/models/snapshot-isolation).
toyDB is a simple distributed SQL database, intended to illustrate how such systems are built. The
overall structure is similar to real-world distributed databases, but the design and implementation
has been kept as simple as possible for understandability. Performance and scalability are explicit
non-goals, as these are major sources of complexity in real-world systems.

## Properties

toyDB consists of a cluster of nodes that execute [SQL](https://en.wikipedia.org/wiki/SQL)
transactions against a replicated state machine. Clients can connect to any node in the cluster
and submit SQL statements. It is:

* **Distributed:** runs across a cluster of nodes.
* **Highly available:** tolerates loss of a minority of nodes.
* **SQL compliant:** correctly supports most common SQL features.
* **Strongly consistent:** committed writes are immediately visible to all readers ([linearizability](https://en.wikipedia.org/wiki/Linearizability)).
* **Transactional:** provides ACID transactions:
* **Atomic:** groups of writes are applied as a single, atomic unit.
* **Consistent:** database constraints and referential integrity are always enforced.
* **Isolated:** concurrent transactions don't affect each other ([snapshot isolation](https://en.wikipedia.org/wiki/Snapshot_isolation)).
* **Durable:** committed writes are never lost.

For simplicity, toyDB _is not_:

* **Scalable:** every node stores the full dataset, and all reads/writes happen on a single node.
* **Reliable:** only handles crash failures, not e.g. partial network partitions or node stalls.
* **Performant:** data processing is slow, and not optimized at all.
* **Efficient:** does not compress or garbage collect data, and will pull entire tables into memory.
* **Full-featured:** only basic SQL functionality is implemented.
* **Backwards compatible:** existing databases will break when data formats and protocols change.
* **Flexible:** nodes can't be added or removed while running, and take a long time to join.
* **Secure:** there is no authentication, authorization, nor encryption.

## Overview

Internally, toyDB has a few distinct components:

* **Storage engine:** stores data on disk and manages transactions.
* **Raft consensus engine:** replicates data and coordinates cluster nodes.
* **SQL engine:** organizes SQL data, manages SQL sessions, and executes SQL statements.
* **Server:** manages network connections, both with SQL clients and Raft nodes.
* **Client:** provides a SQL user interface and communicates with the server.

This diagram illustrates the internal structure of a single toyDB node:

![toyDB architecture](./images/architecture.svg)

We will go through each of these components from the bottom up.

## Storage Engine

toyDB uses an embedded [key/value store](https://en.wikipedia.org/wiki/Key–value_database) for data
storage, located in the [`storage`](https://github.com/erikgrinaker/toydb/tree/f979d3d869fc880f417130d8f68395369b317b66/src/storage)
module. This stores arbitrary keys and values as binary byte strings, and doesn't care what they
contain. We'll see later how the SQL data model, with tables and rows, is mapped onto this key/value
structure.

The storage engine supports simple set/get/delete operations on individual keys. It does not itself
support transactions -- this is built on top, and we'll get back to it shortly.

Keys are stored in sorted order. This allows range scans, where we can iterate over all key/value
pairs between two specific keys, or with a specific key prefix. This will be necessary e.g. to scan
all rows in a specific SQL table.

The storage engine is pluggable: there can be several different implementations, and the user can
choose which one to use in the configuration file. These must implement the `storage::Engine` trait:

https://github.com/erikgrinaker/toydb/blob/f979d3d869fc880f417130d8f68395369b317b66/src/storage/engine.rs#L6-L56

# XXX Old

The [Raft algorithm](https://raft.github.io) is used for cluster consensus, which tolerates the
failure of any node as long as a majority of nodes are still available. One node is elected
Expand Down

0 comments on commit cddd7a5

Please sign in to comment.