Add FastAPI-based OpenAI-compatible Text-to-Speech API and Audio Format Conversion Tools #913

yueguobin · 2025-03-09T18:25:33Z

Description

This Pull Request introduces a FastAPI-based Text-to-Speech (TTS) API that aligns with the OpenAI TTS interface specification, enabling text-to-audio conversion with support for multiple output formats (MP3, WAV, OGG). Leveraging the ChatTTS model for speech synthesis, this implementation includes optimized audio processing tools to enhance performance and flexibility.

Key Features

OpenAI-compatible TTS API (openai_api.py)

Implements a FastAPI service with a /v1/audio/speech endpoint for text-to-speech conversion.
Supports multiple voice options (default, alloy, echo) and audio formats (MP3, WAV, OGG).
Offers both streaming and non-streaming response modes to accommodate different use cases.
Allows speed adjustment (0.5x to 2.0x) with input validation for robust parameter handling.
Uses asyncio.Lock to manage concurrent model access, ensuring thread safety.
Preloads speaker embeddings at startup to reduce runtime overhead.
Includes a /health endpoint for service status monitoring.
Provides unified error handling for improved stability.

Audio Format Conversion Tools

Enables efficient conversion of PCM audio data to MP3, OGG, and WAV formats.
Supports optional WAV header generation, tailored for streaming scenarios.
Utilizes wave and numpy for high-quality audio processing.
Returns memoryview objects to optimize memory usage and performance.

Improvements

Thread Safety: Manages global state and model access using app.state and asynchronous locking.
Flexibility: Supports multiple audio formats and voice options for customizable outputs.
Robustness: Includes input validation and exception handling to prevent invalid requests or runtime errors.
Performance: Preloads embeddings and uses streaming to minimize latency and resource consumption.
Maintainability: Features modular code design and clear logging for easier debugging and scalability.

Usage example

openai

# OpenAI API test (non-streaming)
from openai import OpenAI
from IPython.display import Audio, display

# Initialize the client
client = OpenAI(
    api_key="dummy-key",
    base_url="http://localhost:8000/v1"
)

# Generate audio
response = client.audio.speech.create(
    model="tts-1",
    voice="echo",
    input= """
    以下是一些中英文对照的话语。 
    1. 早上好！希望你有美好的一天。Good morning! Wish you a wonderful day. 
    2. 你好呀，最近怎么样？Hello there, how have you been recently? 
    3. 别放弃，你能做到的！Don't give up, you can do it! 
    4. 继续努力，你的付出会有回报的。Keep up the good work, your efforts will pay off.
    """,
    response_format="wav"
)

# Get audio binary data
audio_data = response.content  # response.content is of type bytes

# Display and play in the Notebook
display(Audio(audio_data, autoplay=False))

Request example

# Test using the requests module, streaming mode
import requests
from IPython.display import Audio, display
import io

payload = {
    "model": "tts-1",
    "input": """
    以下是一些中英文对照的话语。 
    1. 早上好！希望你有美好的一天。Good morning! Wish you a wonderful day. 
    2. 你好呀，最近怎么样？Hello there, how have you been recently? 
    3. 别放弃，你能做到的！Don't give up, you can do it! 
    4. 继续努力，你的付出会有回报的。Keep up the good work, your efforts will pay off.
    """,
    "voice": "echo",
    "response_format": "wav", 
    "stream": True
}

try:
    response = requests.post("http://localhost:8000/v1/audio/speech", json=payload, stream=True)
    response.raise_for_status()  # Check the status code
    
    audio_buffer = io.BytesIO()
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            audio_buffer.write(chunk)
    
    audio_buffer.seek(0)
    display(Audio(audio_buffer.getvalue(), autoplay=False))
    print("Audio has been loaded into the Notebook and can be played manually")
except requests.exceptions.RequestException as e:
    print(f"Request failed: {str(e)}")
    if hasattr(e.response, "text"):
        print(f"Error details: {e.response.text}")

Stream example

import subprocess

# Use pipeline to implement streaming playback, WAV format
cmd = (
    'curl -X POST "http://localhost:8000/v1/audio/speech" '
    '-H "Content-Type: application/json" '
    '-d \'{"model": "tts-1", "input": "以下是一些中英文对照的话语。 1. 早上好！希望你有美好的一天。Good morning! Wish you a wonderful day. 2. 你好呀，最近怎么样？Hello there, how have you been recently? 3. 别放弃，你能做到的！Dont give up, you can do it! 4. 继续努力，你的付出会有回报的。Keep up the good work, your efforts will pay off.", "voice": "echo", "response_format": "wav", "stream": true}\' '
    '-s | mpv --no-video -'
)
subprocess.run(cmd, shell=True, check=True)

Here is the demo video.

Click to watch the video.

增加openai_api接口的封装示例，并测试。

增加openai_api.py文件，实现openai tts api接口的兼容。支持model\voice\input\response_format。其中voice需要自定义。测试代码见openai_api.ipynb

update #

tools/pcm.py add wav/ogg.

fumiama

Thanks!

yueguobin · 2025-03-13T17:01:32Z

看来我得先解决CI的问题了，虽然tests目录下的测试脚本与我提交的代码并无关系，但是需要通过CI才能完成PR。

fumiama · 2025-03-13T17:03:55Z

确实是这样，但是既然你没有对ci做改动，说明是ci本身的问题，所以我可以先合并了，之后我们再想办法。

yueguobin · 2025-03-13T17:13:37Z

确实是这样，但是既然你没有对ci做解决办法，说明是ci本身的问题，所以我可以先合并了，之后我们再想。

第一次PR，不熟悉流程，这个页面还需要我做些什么操作么？

fumiama · 2025-03-14T03:32:02Z

不用。

yueguobin added 10 commits March 6, 2025 08:57

增加兼容openai的api接口

6eaec4f

1

0b37ebb

pcm.py 修改，增加wav\mp3\ogg三种音频格式的输出

9dc2ae2

增加openai_api接口的封装示例，并测试。

修改文件名称

af5863b

添加中英文注释

381b2a1

修改pcm模块中wav格式输出代码，实现wav格式音频流式输出。

Loading
Loading status checks…

b7263a5

增加openai_api.py文件，实现openai tts api接口的兼容。支持model\voice\input\response_format。其中voice需要自定义。测试代码见openai_api.ipynb

delete skip_refine_text code.

Loading
Loading status checks…

7e75a50

update #

example/api/openai_api.py add openai tts api.

bf09f8e

tools/pcm.py add wav/ogg.

Merge branch 'temp-branch'

Loading
Loading status checks…

64438e3

delete openai_api.ipynb file

Loading
Loading status checks…

9a577e2

github-actions bot changed the base branch from main to dev March 9, 2025 18:25

fumiama approved these changes Mar 12, 2025

View reviewed changes

fumiama added enhancement ui labels Mar 12, 2025

fumiama enabled auto-merge (squash) March 12, 2025 13:53

fumiama disabled auto-merge March 13, 2025 17:03

fumiama approved these changes Mar 13, 2025

View reviewed changes

fumiama merged commit d582fd5 into 2noise:dev Mar 14, 2025
2 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FastAPI-based OpenAI-compatible Text-to-Speech API and Audio Format Conversion Tools #913

Add FastAPI-based OpenAI-compatible Text-to-Speech API and Audio Format Conversion Tools #913

yueguobin commented Mar 9, 2025

fumiama left a comment

yueguobin commented Mar 13, 2025

fumiama commented Mar 13, 2025

yueguobin commented Mar 13, 2025

fumiama commented Mar 14, 2025

Add FastAPI-based OpenAI-compatible Text-to-Speech API and Audio Format Conversion Tools #913

Add FastAPI-based OpenAI-compatible Text-to-Speech API and Audio Format Conversion Tools #913

Conversation

yueguobin commented Mar 9, 2025

Description

Key Features

OpenAI-compatible TTS API (openai_api.py)

Audio Format Conversion Tools

Improvements

Usage example

openai

Request example

Stream example

Here is the demo video.

fumiama left a comment

Choose a reason for hiding this comment

yueguobin commented Mar 13, 2025

fumiama commented Mar 13, 2025

yueguobin commented Mar 13, 2025

fumiama commented Mar 14, 2025