Skip to content

An AI-powered in-memory relational database system that supports SQL-like operations, enhanced with AI-based natural language processing

Notifications You must be signed in to change notification settings

JeninSutradhar/HexaDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logo

HexaDB

An AI-powered in-memory relational database system that supports SQL-like operations, enhanced with AI-based natural language processing.

High-Level Overview Diagram

Version: 1.0 | Author: j.sutradhar@symbola.io

Table of Contents

  1. Introduction
  2. Getting Started
  3. Architecture
  4. Command Line Interface (CLI)
  5. SQL Reference
  6. Natural Language Processing (NLP)
  7. Persistence
  8. Indexing
  9. Configuration (config.txt)
  10. API Integration (Google Gemini)
  11. Limitations and Considerations
  12. Troubleshooting
  13. Contributing

1. Introduction

1.1 What is HexaDB?

HexaDB is a lightweight, in-memory relational database implemented in C++. It provides basic SQL-like functionality (CRUD operations) and integrates with Google's Gemini API, allowing users to interact via natural language queries translated into SQL.

NLP to Database Query Flow Diagram

1.2 Key Features

  • In-Memory: Fast data access; data lost on exit unless saved.
  • SQL Support: Core commands (CREATE TABLE, INSERT, SELECT, UPDATE, DELETE).
  • Natural Language Processing (NLP): Execute queries using plain English via the Gemini API.
  • Persistence: Save/load database state to/from a file.
  • Basic Data Types: Supports INT, TEXT, REAL.
  • Simple Indexing: Basic single-column indexing support.

1.3 Intended Use Cases

  • Learning database internals and C++.
  • Demonstrating NLP-database integration.
  • Prototyping simple applications.
  • Temporary personal data storage.

2. Getting Started

2.1 Prerequisites

  • C++17 Compiler: GCC (g++) or Clang recommended.
  • libcurl: Development library (e.g., libcurl4-openssl-dev on Debian/Ubuntu, libcurl-devel on Fedora).
  • nlohmann/json: Single header file (json.hpp) from https://github.com/nlohmann/json.
  • Google Gemini API Key: For NLP functionality.

2.2 Configuration

  1. Create config.txt in the same directory as the executable.
  2. Add the line: GEMINI_API_KEY=your_actual_gemini_api_key
  3. Important: Secure this file, as it contains your API key.

2.3 Compilation

Compile the source files (.cpp, .h), ensuring json.hpp is accessible:

g++ -std=c++17 hexadb.cpp nlp_processor.cpp -o hexadb -lcurl
  • -std=c++17: Enables required C++ features.
  • -lcurl: Links the cURL library.

2.4 Running HexaDB

./hexadb

You'll see the welcome message and the HexaDB> prompt.


3. Architecture

HexaDB Architecture Overview Diagram

HexaDB comprises several interacting components:

  • Core Database Engine: Manages tables, rows (std::variant based Value), columns, indexes, SQL execution, and persistence.
  • NLP Processor: Interfaces with Gemini API (via libcurl) using schema/history context, parses JSON responses (nlohmann/json), and triggers SQL execution.
  • Configuration Reader: Reads the GEMINI_API_KEY from config.txt.
  • Command Line Interface (CLI): Provides the interactive terminal, reads input, routes commands to the appropriate processor (DB Engine or NLP), and displays output.

The NLP Query Data Flow is visualized below:

System Interaction Flow Diagram


4. Command Line Interface (CLI)

4.1 Starting the CLI

Run ./hexadb. The prompt HexaDB> indicates readiness.

4.2 Command Structure

Enter commands at the prompt. SQL keywords are case-insensitive; lookups often ignore case, but string literals are case-sensitive.

4.3 Available Commands

4.3.1 SQL DDL

  • CREATE TABLE name (col1 TYPE, ...): Creates a table.
    • Ex: CREATE TABLE Users (UserID INT, Name TEXT)
  • CREATE INDEX name ON table (column): Creates a single-column index.
    • Ex: CREATE INDEX idx_uname ON Users (Name)

4.3.2 SQL DML

  • INSERT INTO table (cols) VALUES (vals): Inserts a row.
    • Ex: INSERT INTO Users (UserID, Name) VALUES (101, 'Alice')
  • SELECT cols | * FROM table [WHERE condition]: Retrieves rows.
    • Ex: SELECT Name FROM Users WHERE UserID > 100
  • UPDATE table SET col=val, ... [WHERE condition]: Modifies rows.
    • Ex: UPDATE Users SET Name = 'Bob' WHERE UserID = 101
  • DELETE FROM table [WHERE condition]: Removes rows (all if no WHERE).
    • Ex: DELETE FROM Users WHERE UserID = 101

4.3.3 Database Management

  • PRINT TABLE name: Displays table contents formatted.
    • Ex: PRINT TABLE Users
  • SAVE DB [filename]: Saves database state (default: hexadb.data).
    • Ex: SAVE DB backup.hdb
  • LOAD DB [filename]: Loads state, replacing current data (default: hexadb.data).
    • Ex: LOAD DB backup.hdb

4.3.4 Natural Language Processing (NLP)

  • NLP <natural language query>: Translates query to SQL via Gemini API and executes.
    • Ex: NLP show all users
    • Ex: NLP create table products (id int, price real)

4.3.5 Utility Commands

  • help: Displays command summary.
  • exit: Quits HexaDB (does not auto-save).

5. SQL Reference

5.1 Overview

Supports a limited SQL subset. Parser is basic; complex queries (JOINs, subqueries, AND/OR, functions) are not supported.

5.2 Data Types

  • INT: Whole numbers.
  • TEXT: Character strings.
  • REAL: Floating-point numbers.

5.3 SQL Commands

(Syntax blocks remain. Explanations slightly shortened.)

5.3.1 CREATE TABLE

CREATE TABLE table_name (col1 TYPE, col2 TYPE, ...);
-- Parentheses required; at least one column needed.

5.3.2 INSERT INTO

INSERT INTO table_name (col1, col2, ...) VALUES (val1, val2, ...);
-- Column/value counts must match; types must be compatible.
-- Use single quotes for TEXT literals ('example').

5.3.3 SELECT

SELECT col1, col2 | * FROM table_name [WHERE condition];
-- Select specific columns or all (*). WHERE is optional.

5.3.4 UPDATE

UPDATE table_name SET col1 = val1, ... [WHERE condition];
-- Modifies rows matching WHERE (or all rows if omitted).

5.3.5 DELETE FROM

DELETE FROM table_name [WHERE condition];
-- Removes rows matching WHERE (or all rows if omitted).

5.3.6 CREATE INDEX

CREATE INDEX index_name ON table_name (column_name);
-- Creates index on a single specified column.

5.4 WHERE Clause

Filters rows in SELECT, UPDATE, DELETE. Simple format: WHERE column OPERATOR value.

  • Operators: =, ==, !=, <, >
  • Limitations: Only one condition; no AND/OR. </> primarily for INT/REAL.

5.5 Literals

  • INT: 123, -45
  • REAL: 99.95, 100.0
  • TEXT: 'John Doe', 'Hello' (use single quotes)

5.6 Identifiers

Table/column/index names. Should ideally be alphanumeric + underscore. Lookup is often case-insensitive.


6. Natural Language Processing (NLP)

6.1 Overview

Execute database operations using plain English queries, translated to SQL via the Google Gemini API.

Example NLP Terminal Output

6.2 Usage

Prefix your query with NLP:

### 5.5 Literals

*   **INT:** `123`, `-45`
*   **REAL:** `99.95`, `100.0`
*   **TEXT:** `'John Doe'`, `'Hello'` (use single quotes)

### 5.6 Identifiers

Table/column/index names. Should ideally be alphanumeric + underscore. Lookup is often case-insensitive.

---

## 6. Natural Language Processing (NLP)

### 6.1 Overview

Execute database operations using plain English queries, translated to SQL via the Google Gemini API.

![Example NLP Terminal Output](https://github.com/user-attachments/assets/377f6346-94ff-4cfc-9bbe-571ac97ffc0a)

### 6.2 Usage

Prefix your query with `NLP`:

HexaDB> NLP


### 6.3 How it Works

HexaDB sends your query, current database schema, and recent command history to the Gemini API. Gemini returns a generated SQL query and reasoning. HexaDB displays this information, then executes the SQL.

### 6.4 Examples

*   `NLP create table Customers (id INT, name TEXT)`
*   `NLP insert customer id 1, name 'Bob'`
*   `NLP show names of customers`
*   `NLP update customer name to 'Alice' where id = 1`
*   `NLP delete customer with id 5`

### 6.5 Tips for Effective Queries

*   Be specific (table/column names, values).
*   Keep queries simple.
*   Use common keywords (show, list, create, add, update, delete, where).
*   Use `PRINT TABLE` to check schema if unsure.

### 6.6 Output Format

Typically shows a box with your NL query, the generated SQL, and Gemini's reasoning, followed by the result of the SQL execution.

### 6.7 Reliability

Accuracy depends on the Gemini LLM understanding the query and schema.
*   Incorrect or incompatible SQL may be generated.
*   There is no validation before execution; review the generated SQL.

---

## 7. Persistence

### 7.1 Overview

Save/load the in-memory database state to prevent data loss.

### 7.2 `SAVE DB` Command

SAVE DB [filename]

Saves current state (default: `hexadb.data`). Overwrites existing file.

### 7.3 `LOAD DB` Command

LOAD DB [filename]

Loads state (default: `hexadb.data`), **replacing** current database.

### 7.4 File Format (`hexadb.data`)

Custom human-readable text format.

DATABASE_NAME ... TABLE_COUNT ... TABLE_NAME ... COLUMN_COUNT ... COLUMN name type_enum

HexaDB> NLP


### 6.3 How it Works

HexaDB sends your query, current database schema, and recent command history to the Gemini API. Gemini returns a generated SQL query and reasoning. HexaDB displays this information, then executes the SQL.

### 6.4 Examples

*   `NLP create table Customers (id INT, name TEXT)`
*   `NLP insert customer id 1, name 'Bob'`
*   `NLP show names of customers`
*   `NLP update customer name to 'Alice' where id = 1`
*   `NLP delete customer with id 5`

### 6.5 Tips for Effective Queries

*   Be specific (table/column names, values).
*   Keep queries simple.
*   Use common keywords (show, list, create, add, update, delete, where).
*   Use `PRINT TABLE` to check schema if unsure.

### 6.6 Output Format

Typically shows a box with your NL query, the generated SQL, and Gemini's reasoning, followed by the result of the SQL execution.

### 6.7 Reliability

Accuracy depends on the Gemini LLM understanding the query and schema.
*   Incorrect or incompatible SQL may be generated.
*   There is no validation before execution; review the generated SQL.

---

## 7. Persistence

### 7.1 Overview

Save/load the in-memory database state to prevent data loss.

### 7.2 `SAVE DB` Command

SAVE DB [filename]

Saves current state (default: `hexadb.data`). Overwrites existing file.

### 7.3 `LOAD DB` Command

LOAD DB [filename]

Loads state (default: `hexadb.data`), **replacing** current database.

### 7.4 File Format (`hexadb.data`)

Custom human-readable text format.

DATABASE_NAME ... TABLE_COUNT ... TABLE_NAME ... COLUMN_COUNT ... COLUMN name type_enum ... ROW_COUNT ... ROW type_prefix value ... ...

*   Type Enums: `INT=0`, `TEXT=1`, `REAL=2`
*   Row Prefixes: `I`(INT), `T`(TEXT), `R`(REAL). TEXT values are quoted (`"`).

### 7.5 Considerations

*   Format is simple but brittle; manual editing is risky.
*   Potentially slow for very large databases.
*   No transactional guarantees.

---

## 8. Indexing

### 8.1 Overview

Basic single-column indexes to potentially speed up exact match lookups.

### 8.2 `CREATE INDEX` Command
```sql
CREATE INDEX index_name ON table_name (column_name);

Creates an index mapping column values to row indices.

8.3 Usage and Limitations

  • Limited Optimization: May not significantly speed up WHERE clauses, especially range queries (<, >). Primarily for exact matches (=).
  • Update/Delete Overhead: Maintaining indexes can slow down modifications. Rebuilds may be inefficient.
  • Single Column Only.
  • Not Persistent: Index structures are not saved in the .data file and need CREATE INDEX to be run again after loading.

9. Configuration (config.txt)

9.1 Location and Format

Expected in the runtime directory. Simple KEY=value format. # denotes comments.

9.2 GEMINI_API_KEY

The only key used; requires your valid Google Gemini API key.

GEMINI_API_KEY=AIza...................

9.3 Security Considerations

  • Plain text API keys are a risk. Restrict file permissions.
  • Do not commit config.txt to Git. Consider environment variables for better security.

10. API Integration (Google Gemini)

10.1 Purpose

Enables the NLP feature by translating English queries to SQL.

10.2 Mechanism

Uses libcurl to send HTTPS POST requests (with schema/history/query in JSON) to the Gemini API endpoint, requiring the API key. Receives JSON response with SQL/reasoning.

10.3 Data Sent

  • Your natural language query.
  • Database schema (table/column names, types).
  • Recent command history (NL/SQL pairs).
  • No actual row data is sent. Be mindful of sensitive schema information.

10.4 Dependencies

  • Internet connection.
  • Valid Gemini API key.
  • Availability of the Google Gemini API service.

11. Limitations and Considerations

  • SQL Parser: Basic, limited syntax, sensitive to format.
  • Performance: In-memory but scans common; index use limited; not for large data/high load.
  • Concurrency: Single-threaded only.
  • NLP Reliability: Depends on external LLM; potential for errors; no validation.
  • Persistence: Simple text format, potentially brittle/slow.
  • Error Handling: Basic runtime errors.
  • Security: Plain text API key; schema sent externally.
  • Data Integrity: No transactions or constraints.
  • Feature Set: Lacks advanced database features.

12. Troubleshooting

  • "Could not open config file...": Check config.txt exists, is readable.
  • "GEMINI_API_KEY not found/empty/default": Verify config.txt content and key value.
  • NLP fails (CURL/JSON error): Check internet, API key validity/status, Gemini service status.
  • "Table/Column not found": Check spelling/case; use PRINT TABLE.
  • "Invalid value/Error converting": Ensure data matches column type; use single quotes for TEXT.
  • Load errors: Check .data file existence/location; avoid manual edits.
  • SQL Syntax Errors: Check spacing, keywords, parentheses, commas; refer to SQL Reference; remember limitations.

13. Contributing

We welcome contributions to HexaDB! As an educational project and prototype, there's plenty of room for improvement and new features. Whether it's fixing bugs, enhancing the SQL parser, improving the NLP integration, adding more robust error handling, writing tests, or improving this documentation, your help is appreciated. Please feel free to fork the repository, create a feature branch, and submit a pull request. If you find an issue or have a feature suggestion, please open an issue in the project's issue tracker first to discuss it. Thank you for considering contributing!

About

An AI-powered in-memory relational database system that supports SQL-like operations, enhanced with AI-based natural language processing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages