ostruct tranforms unstructured inputs into structured, usable JSON output using OpenAI APIs.
ostruct will process a set of plain text files (data, source code, CSV, etc), input variables, a dynamic prompt template, and a JSON schema specifying the desired output format, and will produce the result in JSON format.
- Generate structured JSON output from natural language using OpenAI models and a JSON schema
- Rich template system for defining prompts (Jinja2-based)
- Automatic token counting and context window management
- Streaming support for real-time output
- Secure handling of sensitive data
- Model registry management with support for updating to the latest OpenAI models
- Non-intrusive registry update checks with user notifications
- Python 3.10 or higher
To install the latest stable version from PyPI:
pip install ostruct-cli
If you plan to contribute to the project, see the Development Setup section below for instructions on setting up the development environment with Poetry.
ostruct-cli respects the following environment variables:
OPENAI_API_KEY
: Your OpenAI API key (required unless provided via command line)OPENAI_API_BASE
: Custom API base URL (optional)OPENAI_API_VERSION
: API version to use (optional)OPENAI_API_TYPE
: API type (e.g., "azure") (optional)OSTRUCT_DISABLE_UPDATE_CHECKS
: Set to "1", "true", or "yes" to disable automatic registry update checks
ostruct-cli supports shell completion for Bash, Zsh, and Fish shells. To enable it:
Add this to your ~/.bashrc
:
eval "$(_OSTRUCT_COMPLETE=bash_source ostruct)"
Add this to your ~/.zshrc
:
eval "$(_OSTRUCT_COMPLETE=zsh_source ostruct)"
Add this to your ~/.config/fish/completions/ostruct.fish
:
eval (env _OSTRUCT_COMPLETE=fish_source ostruct)
After adding the appropriate line, restart your shell or source the configuration file. Shell completion will help you with:
- Command options and their arguments
- File paths for template and schema files
- Directory paths for
-d
and--base-dir
options - And more!
- Set your OpenAI API key:
export OPENAI_API_KEY=your-api-key
- Create a template file
extract_person.j2
:
Extract information about the person from this text: {{ stdin }}
- Create a schema file
schema.json
:
{
"type": "object",
"properties": {
"person": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The person's full name"
},
"age": {
"type": "integer",
"description": "The person's age"
},
"occupation": {
"type": "string",
"description": "The person's job or profession"
}
},
"required": ["name", "age", "occupation"],
"additionalProperties": false
}
},
"required": ["person"],
"additionalProperties": false
}
- Run the CLI:
# Basic usage
echo "John Smith is a 35-year-old software engineer" | ostruct run extract_person.j2 schema.json
# For longer text using heredoc
cat << EOF | ostruct run extract_person.j2 schema.json
John Smith is a 35-year-old software engineer
working at Tech Corp. He has been programming
for over 10 years.
EOF
# With advanced options
echo "John Smith is a 35-year-old software engineer" | \
ostruct run extract_person.j2 schema.json \
--model gpt-4o \
--sys-prompt "Extract precise information about the person" \
--temperature 0.7
The command will output:
{
"person": {
"name": "John Smith",
"age": 35,
"occupation": "software engineer"
}
}
- Create a template file
extract_from_file.j2
:
Extract information about the person from this text: {{ text.content }}
-
Use the same schema file
schema.json
as above. -
Run the CLI:
# Basic usage
ostruct run extract_from_file.j2 schema.json -f text input.txt
# With advanced options
ostruct run extract_from_file.j2 schema.json \
-f text input.txt \
--model gpt-4o \
--max-output-tokens 1000 \
--temperature 0.7
The command will output:
{
"person": {
"name": "John Smith",
"age": 35,
"occupation": "software engineer"
}
}
ostruct-cli provides three ways to specify a system prompt, with a clear precedence order:
-
Command-line option (
--sys-prompt
or--sys-file
):# Direct string ostruct run template.j2 schema.json --sys-prompt "You are an expert analyst" # From file ostruct run template.j2 schema.json --sys-file system_prompt.txt
-
Template frontmatter:
--- system_prompt: You are an expert analyst --- Extract information from: {{ text }}
-
Default system prompt (built into the CLI)
When multiple system prompts are provided, they are resolved in this order:
-
Command-line options take highest precedence:
- If both
--sys-prompt
and--sys-file
are provided,--sys-prompt
wins - Use
--ignore-task-sysprompt
to ignore template frontmatter
- If both
-
Template frontmatter is used if:
- No command-line options are provided
--ignore-task-sysprompt
is not set
-
Default system prompt is used only if no other prompts are provided
Example combining multiple sources:
# Command-line prompt will override template frontmatter
ostruct run template.j2 schema.json --sys-prompt "Override prompt"
# Ignore template frontmatter and use default
ostruct run template.j2 schema.json --ignore-task-sysprompt
ostruct-cli maintains a registry of OpenAI models and their capabilities, which includes:
- Context window sizes for each model
- Maximum output token limits
- Supported parameters and their constraints
- Model version information
To ensure you're using the latest models and features, you can update the registry:
# Update from the official repository
ostruct update-registry
# Update from a custom URL
ostruct update-registry --url https://example.com/models.yml
# Force an update even if the registry is current
ostruct update-registry --force
This is especially useful when:
- New OpenAI models are released
- Model capabilities or parameters change
- You need to work with custom model configurations
The registry file is stored at ~/.openai_structured/config/models.yml
and is automatically referenced when validating model parameters and token limits.
The update command uses HTTP conditional requests (If-Modified-Since headers) to check if the remote registry has changed before downloading, ensuring efficient updates.