🔒 Privacy Statement: Everything runs locally on your machine. No cloud services, no data collection, no external dependencies. Your conversations and data stay completely private.
-
Prerequisites
- Python 3.8 or higher
- Ollama installed
- Windows or macOS
-
Installation
# Clone the repository git clone https://github.com/farshard/Lowkey-Llama.git cd Local-LLM # Create and activate virtual environment python -m venv venv source venv/bin/activate # On macOS/Linux venv\Scripts\activate # On Windows # Install dependencies python -m pip install -r requirements.txt
-
Running the Application
python src/launcher.py
The launcher will:
- Check and install dependencies
- Verify Ollama installation and pull required models
- Start the API server (port 8002)
- Launch the Streamlit UI (port 8501)
- Open your default browser automatically
This application supports creating and using custom models with optimized parameters:
-
Using Pre-configured Models
mistral-factual
: Optimized for accurate, format-compliant responses- Ultra-low temperature (0.01) for deterministic outputs
- Strict format enforcement with zero tolerance
- Specialized for exact word count requirements
mistral-format
: Optimized for precise formatting- Extremely focused sampling (top-p: 0.1)
- Minimal token selection (top-k: 3)
- Format-first approach with strict verification
-
Creating Your Own Models
# Create a custom model using a modelfile ollama create my-custom-model -f models/my-custom-model.modelfile
-
Model Optimization
- See custom_models.md for detailed instructions on:
- Creating custom models with specialized capabilities
- Optimizing parameters for different use cases
- Troubleshooting model issues
- Example modelfiles for different purposes
- See custom_models.md for detailed instructions on:
-
Adding to Configuration
- Custom models are automatically detected
- You can add them to
config.json
for persistent settings:
"models": { "mistral-factual": { "temp": 0.01, "top_p": 0.1, "top_k": 3, "max_tokens": 4096, "context_window": 8192 } }
-
Hardware
- Minimum 8GB RAM
- 4+ CPU cores
- GPU recommended but not required
-
Software
- Python 3.8+
- Ollama 0.5.0+
- Windows 10/11 or macOS 10.15+
-
Network
- Fixed port configuration:
- API server: port 8002
- UI server: port 8501
- Ollama: port 11434 (must be available)
- The launcher automatically handles port cleanup and management
- Fixed port configuration:
-
Port Conflicts
- Error: "Port X is already in use"
- Solutions:
- The system will automatically try fallback ports
- Check the console output for the actual ports being used
- The config.json file will be updated with the working ports
- Restart your computer if ports are held by zombie processes
-
API Connection Issues
- Error: "API request failed: 404 Not Found"
- Solutions:
- The UI automatically uses the port from config.json
- If you see this error, restart the application
- The system will find available ports and update the configuration
- If issues persist, try:
taskkill /F /IM python.exe
(Windows) orpkill python
(macOS/Linux)
-
Ollama Issues
- Error: "Failed to start Ollama server"
- Solutions:
- Ensure Ollama is installed and in PATH
- Run
ollama serve
manually to check for errors - Check Ollama logs for detailed information
-
Dependency Issues
- Error: "Failed to install dependencies"
- Solutions:
- Update pip:
python -m pip install --upgrade pip
- Install manually:
pip install -r requirements.txt
- Check Python version compatibility
- Update pip:
-
Model Download Issues
- Error: "Failed to pull model"
- Solutions:
- Check internet connection
- Ensure sufficient disk space
- Try pulling model manually:
ollama pull mistral
If you're getting very short (one-word) responses from Mistral models, try these solutions:
-
Use the
mistral-fixed
model (recommended):# Create the optimized model ollama create mistral-fixed -f models/mistral-fixed.modelfile # Then select "mistral-fixed" in the model dropdown
-
The
mistral-fixed
model includes:- Optimized parameters for detailed responses
- Enhanced system prompt forcing comprehensive answers
- Improved handling of the Ollama API's ndjson format
- Better fallback mechanisms for incomplete responses
-
Adjust settings in the UI:
- Increase temperature (0.7-0.9) for more detailed responses
- Set max tokens higher (2048+) to allow for longer outputs
- Use the "mistral-fixed" model which is pre-configured for verbosity
-
Be explicit in your prompts:
- Add phrases like "Please provide a detailed answer with multiple paragraphs"
- Ask for explanations: "Explain in detail..."
- Request specific number of examples or points
-
For developers, see custom_models.md for:
- How to create custom models with specific optimization parameters
- Troubleshooting model response issues
- Advanced configuration options
-
Check System Status
# Check Ollama status curl http://localhost:11434/api/health # List running Python processes ps aux | grep python # macOS/Linux tasklist | findstr python.exe # Windows # Check port usage netstat -ano | findstr :8000 # Windows lsof -i :8000 # macOS/Linux
-
Clean Start
# Stop all related processes taskkill /F /IM ollama.exe # Windows pkill ollama # macOS/Linux # Clear temporary files rm -rf src/temp_audio/* # Restart application python src/launcher.py
See development.md for:
- Project structure
- Development setup
- Testing guidelines
- Contributing instructions
The system uses a modular architecture with these main components:
-
System Orchestrator
- Manages initialization and shutdown
- Coordinates all components
- Handles dependency checks
-
API Server
- FastAPI backend
- Handles model interactions
- Manages audio processing
-
Streamlit UI
- User interface
- Real-time chat
- Settings management
-
Ollama Integration
- Model management
- Inference handling
- Server lifecycle
For detailed API documentation, see api.md.
MIT License - See LICENSE file for details.