The goal of this project is to build an efficient record store with persistence in C++, with efficient retrieval using B Trees. This projects aims to build a simplified version of a SQL database as an academic exercise.
Further goals include building a Query manager, concurrency control and supporting multithreading(the current implementation is single threaded)
A database is represented on the disk as a collection of files, with pagedata
containing the actual data, it is of size PAGE_SIZE * number of pages
consists of information about the database such as the page size, version, number of pages and other details
is a bitmap which represents free pages in the pagedata
is used so that only one db process can use the database files at one time
is an abstract class which provides persistence for the pages, DiskStorageBackend
is a concrete implementation which writes the pages to a file.
keeps the pages in memory using a map of vectors, and is useful for testing.
is an interface to access the pages, it caches the pages and handles reading and writing them
All pages are of size 4096
bytes or 4KB
, integer values are always stored in little-endian format.
A page can be of three types - data page (which contains the data), internal B+ tree page, or B+ tree leaf page.
Offset | Size (in bytes) | Description |
0 | 1 | Page type |
1 | 8 | Reserved for future use |
9 | 4 | Page id of the current page |
13 | 3 | Reserved for future use |
The common page header size is 16 bytes
A table metadata page contains metadata about a table in the database, it contains the table name, the type and number of columns, and name of the columns. It also contains the name of the columns.
Page type is set to 0x54
Offset | Size | Name | Description |
0 | 128 | table_name | Fixed length string, unused space filled with \0 , maximum length of table name is 128 characters |
128 | 64 | column_format | Fixed length string, type of each column as a byte, max number of columns is 64 |
192 | 4 | table_page_id | Page id which contains the actual records (0 if table has no data) |
196 | 59*64 | column_names | Fixed length column names, of max size 59 characters |
Where each byte in column_format is one of the following
Key Type | Data type |
'b' |
uint8_t |
'B' |
int8_t |
's' |
uint16_t |
'S' |
int16_t |
'i' |
uint32_t |
'I' |
int32_t |
'l' |
uint64_t |
'L' |
int64_t |
'f' |
float |
'd' |
double |
'c' |
string |
This is a special purpose table which holds data about other tables and database implementation information, it's table metadata page is always at page 0
For simplicity in parsing, all commands are single line, terminated by a newline
List all tables in the database
> list tables
To create a table
> create table table_name columnn_1_name column_1_type [column_2_name] [column_2_type] ...
To view information about a table
> describe [table_name]
- Implement page representation class
- Storage backend (Disk)
- Memory storage backend
- Free page management (free list/bitmap) for disk storage
- Metadata reader/writer implementation
- Database lockfile
- Buffer pool manager
- Extendible hash table
- Cache replacer
- Support for integers, floats
- Arrays
- BTrees and indices
- Simple query parsers
- Socket server for serving requests
Use the following command to build and run the executable target.
cmake -S standalone -B build/standalone
cmake --build build/standalone
./build/standalone/PineDB --help
cmake -S standalone -B build -DUSE_SANITIZER='Address;Undefined' -DUSE_STATIC_ANALYZER=clang-tidy
Use the following commands from the project's root directory to run the test suite.
cmake -S test -B build/test
cmake --build build/test
CTEST_OUTPUT_ON_FAILURE=1 cmake --build build/test --target test
# or simply call the executable:
To collect code coverage information, run CMake with the -DENABLE_TEST_COVERAGE=1
Use the following commands from the project's root directory to check and fix C++ and CMake source style. This requires clang-format, cmake-format and pyyaml to be installed on the current system.
cmake -S test -B build/test
# view changes
cmake --build build/test --target format
# apply changes
cmake --build build/test --target fix-format
See Format.cmake for details. These dependencies can be easily installed using pip.
pip install clang-format==14.0.6 cmake_format==0.6.11 pyyaml
The documentation is automatically built and published whenever a GitHub Release is created. To manually build documentation, call the following command.
cmake -S documentation -B build/doc
cmake --build build/doc --target GenerateDocs
# view the docs
open build/doc/doxygen/html/index.html
To build the documentation locally, you will need Doxygen, jinja2 and Pygments installed on your system.
The project also includes an all
directory that allows building all targets at the same time.
This is useful during development, as it exposes all subprojects to your IDE and avoids redundant builds of the library.
cmake -S all -B build
cmake --build build
# run tests
# format code
cmake --build build --target fix-format
# run standalone
./build/standalone/PineDB --help
# build docs
cmake --build build --target GenerateDocs
The test and standalone subprojects include the tools.cmake file which is used to import additional tools on-demand through CMake configuration arguments. The following are currently supported.
Sanitizers can be enabled by configuring CMake with -DUSE_SANITIZER=<Address | Memory | MemoryWithOrigins | Undefined | Thread | Leak | 'Address;Undefined'>
Static Analyzers can be enabled by setting -DUSE_STATIC_ANALYZER=<clang-tidy | iwyu | cppcheck>
, or a combination of those in quotation marks, separated by semicolons.
By default, analyzers will automatically find configuration files such as .clang-format
Additional arguments can be passed to the analyzers by setting the CLANG_TIDY_ARGS
Ccache can be enabled by configuring with -DUSE_CCACHE=<ON | OFF>