An AI-powered in-memory relational database system that supports SQL-like operations, enhanced with AI-based natural language processing.
Version: 1.0 | Author: j.sutradhar@symbola.io
- Introduction
- Getting Started
- Architecture
- Command Line Interface (CLI)
- SQL Reference
- Natural Language Processing (NLP)
- Persistence
- Indexing
- Configuration (
config.txt
) - API Integration (Google Gemini)
- Limitations and Considerations
- Troubleshooting
- Contributing
HexaDB is a lightweight, in-memory relational database implemented in C++. It provides basic SQL-like functionality (CRUD operations) and integrates with Google's Gemini API, allowing users to interact via natural language queries translated into SQL.
- In-Memory: Fast data access; data lost on exit unless saved.
- SQL Support: Core commands (
CREATE TABLE
,INSERT
,SELECT
,UPDATE
,DELETE
). - Natural Language Processing (NLP): Execute queries using plain English via the Gemini API.
- Persistence: Save/load database state to/from a file.
- Basic Data Types: Supports
INT
,TEXT
,REAL
. - Simple Indexing: Basic single-column indexing support.
- Learning database internals and C++.
- Demonstrating NLP-database integration.
- Prototyping simple applications.
- Temporary personal data storage.
- C++17 Compiler: GCC (g++) or Clang recommended.
- libcurl: Development library (e.g.,
libcurl4-openssl-dev
on Debian/Ubuntu,libcurl-devel
on Fedora). - nlohmann/json: Single header file (
json.hpp
) from https://github.com/nlohmann/json. - Google Gemini API Key: For NLP functionality.
- Create
config.txt
in the same directory as the executable. - Add the line:
GEMINI_API_KEY=your_actual_gemini_api_key
- Important: Secure this file, as it contains your API key.
Compile the source files (.cpp
, .h
), ensuring json.hpp
is accessible:
g++ -std=c++17 hexadb.cpp nlp_processor.cpp -o hexadb -lcurl
-std=c++17
: Enables required C++ features.-lcurl
: Links the cURL library.
./hexadb
You'll see the welcome message and the HexaDB>
prompt.
HexaDB comprises several interacting components:
- Core Database Engine: Manages tables, rows (
std::variant
basedValue
), columns, indexes, SQL execution, and persistence. - NLP Processor: Interfaces with Gemini API (via
libcurl
) using schema/history context, parses JSON responses (nlohmann/json
), and triggers SQL execution. - Configuration Reader: Reads the
GEMINI_API_KEY
fromconfig.txt
. - Command Line Interface (CLI): Provides the interactive terminal, reads input, routes commands to the appropriate processor (DB Engine or NLP), and displays output.
The NLP Query Data Flow is visualized below:
Run ./hexadb
. The prompt HexaDB>
indicates readiness.
Enter commands at the prompt. SQL keywords are case-insensitive; lookups often ignore case, but string literals are case-sensitive.
CREATE TABLE name (col1 TYPE, ...)
: Creates a table.- Ex:
CREATE TABLE Users (UserID INT, Name TEXT)
- Ex:
CREATE INDEX name ON table (column)
: Creates a single-column index.- Ex:
CREATE INDEX idx_uname ON Users (Name)
- Ex:
INSERT INTO table (cols) VALUES (vals)
: Inserts a row.- Ex:
INSERT INTO Users (UserID, Name) VALUES (101, 'Alice')
- Ex:
SELECT cols | * FROM table [WHERE condition]
: Retrieves rows.- Ex:
SELECT Name FROM Users WHERE UserID > 100
- Ex:
UPDATE table SET col=val, ... [WHERE condition]
: Modifies rows.- Ex:
UPDATE Users SET Name = 'Bob' WHERE UserID = 101
- Ex:
DELETE FROM table [WHERE condition]
: Removes rows (all if noWHERE
).- Ex:
DELETE FROM Users WHERE UserID = 101
- Ex:
PRINT TABLE name
: Displays table contents formatted.- Ex:
PRINT TABLE Users
- Ex:
SAVE DB [filename]
: Saves database state (default:hexadb.data
).- Ex:
SAVE DB backup.hdb
- Ex:
LOAD DB [filename]
: Loads state, replacing current data (default:hexadb.data
).- Ex:
LOAD DB backup.hdb
- Ex:
NLP <natural language query>
: Translates query to SQL via Gemini API and executes.- Ex:
NLP show all users
- Ex:
NLP create table products (id int, price real)
- Ex:
help
: Displays command summary.exit
: Quits HexaDB (does not auto-save).
Supports a limited SQL subset. Parser is basic; complex queries (JOINs, subqueries, AND
/OR
, functions) are not supported.
INT
: Whole numbers.TEXT
: Character strings.REAL
: Floating-point numbers.
(Syntax blocks remain. Explanations slightly shortened.)
CREATE TABLE table_name (col1 TYPE, col2 TYPE, ...);
-- Parentheses required; at least one column needed.
INSERT INTO table_name (col1, col2, ...) VALUES (val1, val2, ...);
-- Column/value counts must match; types must be compatible.
-- Use single quotes for TEXT literals ('example').
SELECT col1, col2 | * FROM table_name [WHERE condition];
-- Select specific columns or all (*). WHERE is optional.
UPDATE table_name SET col1 = val1, ... [WHERE condition];
-- Modifies rows matching WHERE (or all rows if omitted).
DELETE FROM table_name [WHERE condition];
-- Removes rows matching WHERE (or all rows if omitted).
CREATE INDEX index_name ON table_name (column_name);
-- Creates index on a single specified column.
Filters rows in SELECT
, UPDATE
, DELETE
. Simple format: WHERE column OPERATOR value
.
- Operators:
=
,==
,!=
,<
,>
- Limitations: Only one condition; no
AND
/OR
.<
/>
primarily forINT
/REAL
.
- INT:
123
,-45
- REAL:
99.95
,100.0
- TEXT:
'John Doe'
,'Hello'
(use single quotes)
Table/column/index names. Should ideally be alphanumeric + underscore. Lookup is often case-insensitive.
Execute database operations using plain English queries, translated to SQL via the Google Gemini API.
Prefix your query with NLP
:
### 5.5 Literals
* **INT:** `123`, `-45`
* **REAL:** `99.95`, `100.0`
* **TEXT:** `'John Doe'`, `'Hello'` (use single quotes)
### 5.6 Identifiers
Table/column/index names. Should ideally be alphanumeric + underscore. Lookup is often case-insensitive.
---
## 6. Natural Language Processing (NLP)
### 6.1 Overview
Execute database operations using plain English queries, translated to SQL via the Google Gemini API.

### 6.2 Usage
Prefix your query with `NLP`:
HexaDB> NLP
### 6.3 How it Works
HexaDB sends your query, current database schema, and recent command history to the Gemini API. Gemini returns a generated SQL query and reasoning. HexaDB displays this information, then executes the SQL.
### 6.4 Examples
* `NLP create table Customers (id INT, name TEXT)`
* `NLP insert customer id 1, name 'Bob'`
* `NLP show names of customers`
* `NLP update customer name to 'Alice' where id = 1`
* `NLP delete customer with id 5`
### 6.5 Tips for Effective Queries
* Be specific (table/column names, values).
* Keep queries simple.
* Use common keywords (show, list, create, add, update, delete, where).
* Use `PRINT TABLE` to check schema if unsure.
### 6.6 Output Format
Typically shows a box with your NL query, the generated SQL, and Gemini's reasoning, followed by the result of the SQL execution.
### 6.7 Reliability
Accuracy depends on the Gemini LLM understanding the query and schema.
* Incorrect or incompatible SQL may be generated.
* There is no validation before execution; review the generated SQL.
---
## 7. Persistence
### 7.1 Overview
Save/load the in-memory database state to prevent data loss.
### 7.2 `SAVE DB` Command
SAVE DB [filename]
Saves current state (default: `hexadb.data`). Overwrites existing file.
### 7.3 `LOAD DB` Command
LOAD DB [filename]
Loads state (default: `hexadb.data`), **replacing** current database.
### 7.4 File Format (`hexadb.data`)
Custom human-readable text format.
DATABASE_NAME ... TABLE_COUNT ... TABLE_NAME ... COLUMN_COUNT ... COLUMN name type_enum
HexaDB> NLP
### 6.3 How it Works
HexaDB sends your query, current database schema, and recent command history to the Gemini API. Gemini returns a generated SQL query and reasoning. HexaDB displays this information, then executes the SQL.
### 6.4 Examples
* `NLP create table Customers (id INT, name TEXT)`
* `NLP insert customer id 1, name 'Bob'`
* `NLP show names of customers`
* `NLP update customer name to 'Alice' where id = 1`
* `NLP delete customer with id 5`
### 6.5 Tips for Effective Queries
* Be specific (table/column names, values).
* Keep queries simple.
* Use common keywords (show, list, create, add, update, delete, where).
* Use `PRINT TABLE` to check schema if unsure.
### 6.6 Output Format
Typically shows a box with your NL query, the generated SQL, and Gemini's reasoning, followed by the result of the SQL execution.
### 6.7 Reliability
Accuracy depends on the Gemini LLM understanding the query and schema.
* Incorrect or incompatible SQL may be generated.
* There is no validation before execution; review the generated SQL.
---
## 7. Persistence
### 7.1 Overview
Save/load the in-memory database state to prevent data loss.
### 7.2 `SAVE DB` Command
SAVE DB [filename]
Saves current state (default: `hexadb.data`). Overwrites existing file.
### 7.3 `LOAD DB` Command
LOAD DB [filename]
Loads state (default: `hexadb.data`), **replacing** current database.
### 7.4 File Format (`hexadb.data`)
Custom human-readable text format.
DATABASE_NAME ... TABLE_COUNT ... TABLE_NAME ... COLUMN_COUNT ... COLUMN name type_enum ... ROW_COUNT ... ROW type_prefix value ... ...
* Type Enums: `INT=0`, `TEXT=1`, `REAL=2`
* Row Prefixes: `I`(INT), `T`(TEXT), `R`(REAL). TEXT values are quoted (`"`).
### 7.5 Considerations
* Format is simple but brittle; manual editing is risky.
* Potentially slow for very large databases.
* No transactional guarantees.
---
## 8. Indexing
### 8.1 Overview
Basic single-column indexes to potentially speed up exact match lookups.
### 8.2 `CREATE INDEX` Command
```sql
CREATE INDEX index_name ON table_name (column_name);
Creates an index mapping column values to row indices.
- Limited Optimization: May not significantly speed up
WHERE
clauses, especially range queries (<
,>
). Primarily for exact matches (=
). - Update/Delete Overhead: Maintaining indexes can slow down modifications. Rebuilds may be inefficient.
- Single Column Only.
- Not Persistent: Index structures are not saved in the
.data
file and needCREATE INDEX
to be run again after loading.
Expected in the runtime directory. Simple KEY=value
format. #
denotes comments.
The only key used; requires your valid Google Gemini API key.
GEMINI_API_KEY=AIza...................
- Plain text API keys are a risk. Restrict file permissions.
- Do not commit
config.txt
to Git. Consider environment variables for better security.
Enables the NLP feature by translating English queries to SQL.
Uses libcurl
to send HTTPS POST requests (with schema/history/query in JSON) to the Gemini API endpoint, requiring the API key. Receives JSON response with SQL/reasoning.
- Your natural language query.
- Database schema (table/column names, types).
- Recent command history (NL/SQL pairs).
- No actual row data is sent. Be mindful of sensitive schema information.
- Internet connection.
- Valid Gemini API key.
- Availability of the Google Gemini API service.
- SQL Parser: Basic, limited syntax, sensitive to format.
- Performance: In-memory but scans common; index use limited; not for large data/high load.
- Concurrency: Single-threaded only.
- NLP Reliability: Depends on external LLM; potential for errors; no validation.
- Persistence: Simple text format, potentially brittle/slow.
- Error Handling: Basic runtime errors.
- Security: Plain text API key; schema sent externally.
- Data Integrity: No transactions or constraints.
- Feature Set: Lacks advanced database features.
- "Could not open config file...": Check
config.txt
exists, is readable. - "GEMINI_API_KEY not found/empty/default": Verify
config.txt
content and key value. - NLP fails (CURL/JSON error): Check internet, API key validity/status, Gemini service status.
- "Table/Column not found": Check spelling/case; use
PRINT TABLE
. - "Invalid value/Error converting": Ensure data matches column type; use single quotes for
TEXT
. - Load errors: Check
.data
file existence/location; avoid manual edits. - SQL Syntax Errors: Check spacing, keywords, parentheses, commas; refer to SQL Reference; remember limitations.
We welcome contributions to HexaDB! As an educational project and prototype, there's plenty of room for improvement and new features. Whether it's fixing bugs, enhancing the SQL parser, improving the NLP integration, adding more robust error handling, writing tests, or improving this documentation, your help is appreciated. Please feel free to fork the repository, create a feature branch, and submit a pull request. If you find an issue or have a feature suggestion, please open an issue in the project's issue tracker first to discuss it. Thank you for considering contributing!