Skip to content

april-changelog #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
237 changes: 237 additions & 0 deletions changelog/2025/apr.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
---
title: "April"
---

**April in review! 🧪✨**

We kicked off April with an announcement— we make 95% of LLM costs vanish overnight. Excuse us if you took the bait!

While we can’t make bills disappear with a snap, we’ve delivered some powerful upgrades this month that will help you build and ship robust, reliable GenAI apps, faster!

This month, we introduced workspace-level budget and rate limits, Prompt CRUD APIs, virtual keys for self-hosted models, and powerful new integrations with OpenAI Codex CLI and n8n. We added support for the latest models like GPT-4.1, Gemini 2.5, LLaMA 4, and Qwen 3, and brought enhanced security and moderation with AWS KMS, SCIM, and expanded guardrail coverage.

Here’s what we shipped this month:


## Summary

| Area | Key Updates |
| :-- | :-- |
| Platform | • Prompt CRUD APIs<br/>• Export logs to your internal stack<br/>• Budget limits and rate limits on workspace<br/>• n8n integration<br/>• OpenAI Codex CLI integration<br/>• New retry setting to determine wait times<br/>• Milvus for Semantic Cache<br/>• Plugins moved to org-level Settings<br/>• Virtual Key exhaustion alert includes workspace<br/>• Workspace control setup option |
| Gateway & Providers | • OpenAI embeddings latency improvement (200ms)<br/>• Responses API for OpenAI & Azure OpenAI<br/>• Bedrock prompt caching via unified API<br/>• Virtual keys for self-hosted models<br/>• Tool calling support for Groq, OpenRouter, and Ollama<br/>• New providers: Dashscope, Recraft AI, Replicate, Azure AI Foundry<br/>• Enhanced parameter support: Openrouter, Vertex AI, Perplexity, Bedrock<br/>• Claude’s `anthropic_beta` parameter for Computer use beta |
| Technical Improvements | • Unified caching/logging of thinking responses<br/>• Strict metadata logging: Workspace > API Key > Request<br/>• Prompt render endpoint available on Gateway URL<br/>• API key default config now locked from overrides |
| New Models & Integrations | • GPT-4.1<br/>• Gemini 2.5 Pro and Flash<br/>• LLaMA 4 via Fireworks, Together, Groq<br/>• o1-pro<br/>• gpt-image-1<br/>• Qwen 3<br/>• Audio models via Groq |
| Guardrails | • Azure AI Content Safety integration<br/>• Exa Online Search as a Guardrail |

---

## Platform

**Prompt CRUD APIs**

Prompt CRUD APIs give you the control to scale by enabling you to:

- Programmatically create, update, and delete prompts
- Manage prompts in bulk or version-control them
- Integrate prompt updates into your own tools and workflows
- Automate updates for A/B testing and rapid experimentation

Read more about this [here](https://portkey.ai/docs/api-reference/admin-api/control-plane/prompts/create-prompt).

**Export logs to your internal stack**

Enterprises can now push analytics logs to any OTEL-compliant store through Portkey to centralize monitoring, maintain compliance, and ensure efficient operations.
See how it's done [here](https://portkey.ai/docs/product/enterprise-offering/components#analytics-store)

**Budget limits and rate limts on workspace**

Configure budget and rate limits at the workspace level to:

- Allocate specific budgets to different departments, teams, or projects
- Prevent individual workspaces from consuming disproportionate resources
- Ensure equitable API access and complete visibility

**n8n integration**

Add enterprise-grade controls to your n8n workflows with:

- **Unified AI Gateway**: Connect to 1600+ models with full API key management—not just OpenAI or Anthropic.
- **Centralized observability**: Track 40+ metrics and request logs in real time.
- **Governance**: Monitor spend, set budgets, and apply RBAC across workflows.
- **Security guardrails**: Enable PII detection, content filtering, and compliance controls.

Read more about the integration[here](https://portkey.ai/docs/integrations/libraries/n8n)

**OpenAI Codex CLI integration**

OpenAI Codex CLI gives developers a streamlined way to analyze, modify, and execute code directly from their terminal. Portkey's integration enhances this experience with:

- Access to 250+ additional models beyond OpenAI Codex CLI's standard offerings
- Content filtering and PII detection with guardrails
- Real-time analytics and logging
- Cost attribution, budget controls, RBAC, and more!

Read more about the integration[here](https://portkey.ai/docs/integrations/libraries/codex#openai-codex-cli)

**Other updates**

- Introduced a new retry setting `use_retry_after_header`. When set to true, if the provider returns the `retry-after` or `retry-after-ms headers`, the Gateway will use these headers to determine retry wait times, rather than applying the default exponential backoff for 429 responses.
- You can now store and retrieve vector embeddings for semantic cache using Milvus in Portkey. Read more about semantic cache store [here](https://portkey.ai/docs/product/enterprise-offering/components#semantic-cache-store)
- Plugins have now been moved under Settings (org-level) in the Portkey app.
- Virtual Key exhaustion alert emails now include which workspace the exhausted key belonged to.
- Set up your workspace with Workspace control on the Portkey app.

## Gateway & Providers

**OpenAI embeddings response**
We’ve optimized the Gateway’s handling of OpenAI embeddings requests, leading to around 200ms improvement in response latency.

**Responses API**

You can now use the Responses API to access OpenAI and Azure OpenAI models on Portkey, enabling a flexible and easier way to create agentic experiences.

- Complete observability and usage tracking
- Caching support for streaming requests
- Access to advanced tools — web search, file search, and code execution, with per-tool cost tracking

**Bedrock prompt caching**

You can now implement Amazon Bedrock’s prompt caching through our OpenAI-compliant unified API and prompt templates.

- Cache specific portions of your requests for repeated use
- Reduce inference response latency and input token costs

Read more about the implementation [here](https://portkey.ai/docs/integrations/llms/bedrock/prompt-caching)

**Virtual keys for self-hosted models**

You can now create a virtual key for any self-hosted model - whether you're running Ollama, vLLM, or any custom/private model.

- No extra setup required
- Stay in control with logs, traces, and key metrics
- Manage all your LLM interactions through one interface

**Advanced capabilities**

- **Openrouter**: Added mapping for new parameters - modalities, reasoning, transforms, provider, models, response_format.
- **Vertex AI**: Added support for explicitly mentioning mime_type for urls sent in the request. Gemini 2.5 thinking parameters are now available.
- **Perplexity**: Added support for response_format and search_recency_filter request parameters.
- **Bedrock**: You can now pass the `anthropic_beta` parameter in Bedrock’s Anthropic API via Portkey to enable Claude's Computer use beta feature.

**Tool calling**

Portkey now supports tool calling for Groq, OpenRouter, and Ollama

**New Providers**

<CardGroup cols={2}>
<Card title="Dashscope">
Integrate with Dashscope
</Card>
<Card title="Recraft AI">
Generate production-ready visuals with Recraft
</Card>
</CardGroup>
<CardGroup cols={2}>
<Card title="Replicate">
Run open-source models via simple APIs with Replicate
</Card>
<Card title="Azure AI Foundry">
Access over 1,800 models with Azure AI Foundry
</Card>
</CardGroup>

**Technical Improvements**

- **Caching and Logging Unified Thinking Responses**: Unified thinking response (content_blocks) now logged and cached for stream responses.
- **Strict Metadata Enforcement**: The metalogging preference order now is `Workspace Default > API Key Default > Incoming Request`. This is provide better control to org admins and ensure values set by them are not overridden.
- **Prompt render endpoint**: Previously only available via the control plane, the prompt render endpoint is now supported directly on the Gateway URL.
- Default config in an API key can no longer be overridden.
-


## New Models & Integrations

<CardGroup cols={2}>
<Card title="GPT-4.1">
OpenAI’s new model for faster and improved responses
</Card>
<Card title="Gemini 2.5 Pro">
Google's most advanced model
</Card>
<Card title="Gemini 2.5 Flash">
Google's fast, coest-efficient thinking model
</Card>
<Card title="Llama 4">
Meta's latest model via Fireworks, Together, and Groq
</Card>
<Card title="o1-pro">
OpenAI's model for better reasoning and consistent answers
</Card>
<Card title="gpt-image-1">
OpenAI's latest image generation capabilities
</Card>
<Card title="Qwen 3">
Alibaba's latest model with hybrid reasoning
</Card>
<Card title="Audio models">
Access audio models via Groq
</Card>
</CardGroup>

## Guardrails

- **Azure AI content safety**: Use Microsoft’s content filtering solution to moderate inputs and outputs across supported models.

- **Exa Online Search**: You can now configure Exa Online Search as a Guardrail in Portkey to enable real-time, grounded search across the web before answering. This makes any LLM capable of handling current events or live queries without needing model retraining.

## Documentation

<Card icon="book" title="Administration Docs" href="https://portkey.ai/docs/product/administration/enforcing-request-metadata" horizontal />

We've made significant improvements to our documentation:

- **Virtual keys access**: Defining who can view and manage virtual keys within workspaces. [Learn more](https://portkey.ai/docs/product/administration/configure-virtual-key-access-permissions)
- **API keys access**: Control how workspace managers and members interact with API keys within their workspaces. [Learn more](https://portkey.ai/docs/product/administration/configure-api-key-access-permissions)


## Community

<iframe
width="560"
height="315"
src="https://www.youtube.com/watch?v=6MgPd3O3FXs"
title="Build a Customer Support AI Agent with LangGraph & Portkey Prompt API"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>

<Frame>
<img src="/images/changelog/testimonial.jpeg" />
</Frame>

<Frame>
<img src="/images/changelog/testimonial2.png" />
</Frame>

**Partner blog**
[See] (https://portkey.ai/blog/securing-your-ai-via-ai-gateways/) how Portkey and Pillar together can help you build secure GenAI apps for production.

### Community Contributors

A special thanks to our community contributors this month:
- [unsync](https://github.com/unsync)
- [francescov1](https://github.com/francescov1)
- [Ajay Satish](https://github.com/Ajay-Satish-01)

## Support

<CardGroup cols={2}>
<Card title="Need Help?" icon="bug" href="https://github.com/Portkey-AI/gateway/issues">
Open an issue on GitHub
</Card>
<Card title="Join Us" icon="discord" href="https://portkey.wiki/community">
Get support in our Discord
</Card>
</CardGroup>
Binary file added images/changelog/testimonial.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/changelog/testimonial2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion introduction/what-is-portkey.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "What is Portkey?"
title: "What is"
description: Portkey AI is a comprehensive platform designed to streamline and enhance AI integration for developers and organizations. It serves as a unified interface for interacting with over 250 AI models, offering advanced tools for control, visibility, and security in your Generative AI apps.
mode: "wide"
---
Expand Down
Loading