Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FastAPI-based OpenAI-compatible Text-to-Speech API and Audio Format Conversion Tools #913

Merged
merged 10 commits into from
Mar 14, 2025

Conversation

yueguobin
Copy link

Description

This Pull Request introduces a FastAPI-based Text-to-Speech (TTS) API that aligns with the OpenAI TTS interface specification, enabling text-to-audio conversion with support for multiple output formats (MP3, WAV, OGG). Leveraging the ChatTTS model for speech synthesis, this implementation includes optimized audio processing tools to enhance performance and flexibility.

Key Features

OpenAI-compatible TTS API (openai_api.py)

  • Implements a FastAPI service with a /v1/audio/speech endpoint for text-to-speech conversion.
  • Supports multiple voice options (default, alloy, echo) and audio formats (MP3, WAV, OGG).
  • Offers both streaming and non-streaming response modes to accommodate different use cases.
  • Allows speed adjustment (0.5x to 2.0x) with input validation for robust parameter handling.
  • Uses asyncio.Lock to manage concurrent model access, ensuring thread safety.
  • Preloads speaker embeddings at startup to reduce runtime overhead.
  • Includes a /health endpoint for service status monitoring.
  • Provides unified error handling for improved stability.

Audio Format Conversion Tools

  • Enables efficient conversion of PCM audio data to MP3, OGG, and WAV formats.
  • Supports optional WAV header generation, tailored for streaming scenarios.
  • Utilizes wave and numpy for high-quality audio processing.
  • Returns memoryview objects to optimize memory usage and performance.

Improvements

  • Thread Safety: Manages global state and model access using app.state and asynchronous locking.
  • Flexibility: Supports multiple audio formats and voice options for customizable outputs.
  • Robustness: Includes input validation and exception handling to prevent invalid requests or runtime errors.
  • Performance: Preloads embeddings and uses streaming to minimize latency and resource consumption.
  • Maintainability: Features modular code design and clear logging for easier debugging and scalability.

Usage example

openai

# OpenAI API test (non-streaming)
from openai import OpenAI
from IPython.display import Audio, display

# Initialize the client
client = OpenAI(
    api_key="dummy-key",
    base_url="http://localhost:8000/v1"
)

# Generate audio
response = client.audio.speech.create(
    model="tts-1",
    voice="echo",
    input= """
    以下是一些中英文对照的话语。 
    1. 早上好!希望你有美好的一天。Good morning! Wish you a wonderful day. 
    2. 你好呀,最近怎么样?Hello there, how have you been recently? 
    3. 别放弃,你能做到的!Don't give up, you can do it! 
    4. 继续努力,你的付出会有回报的。Keep up the good work, your efforts will pay off.
    """,
    response_format="wav"
)

# Get audio binary data
audio_data = response.content  # response.content is of type bytes

# Display and play in the Notebook
display(Audio(audio_data, autoplay=False))

Request example

# Test using the requests module, streaming mode
import requests
from IPython.display import Audio, display
import io

payload = {
    "model": "tts-1",
    "input": """
    以下是一些中英文对照的话语。 
    1. 早上好!希望你有美好的一天。Good morning! Wish you a wonderful day. 
    2. 你好呀,最近怎么样?Hello there, how have you been recently? 
    3. 别放弃,你能做到的!Don't give up, you can do it! 
    4. 继续努力,你的付出会有回报的。Keep up the good work, your efforts will pay off.
    """,
    "voice": "echo",
    "response_format": "wav", 
    "stream": True
}

try:
    response = requests.post("http://localhost:8000/v1/audio/speech", json=payload, stream=True)
    response.raise_for_status()  # Check the status code
    
    audio_buffer = io.BytesIO()
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            audio_buffer.write(chunk)
    
    audio_buffer.seek(0)
    display(Audio(audio_buffer.getvalue(), autoplay=False))
    print("Audio has been loaded into the Notebook and can be played manually")
except requests.exceptions.RequestException as e:
    print(f"Request failed: {str(e)}")
    if hasattr(e.response, "text"):
        print(f"Error details: {e.response.text}")

Stream example

import subprocess

# Use pipeline to implement streaming playback, WAV format
cmd = (
    'curl -X POST "http://localhost:8000/v1/audio/speech" '
    '-H "Content-Type: application/json" '
    '-d \'{"model": "tts-1", "input": "以下是一些中英文对照的话语。 1. 早上好!希望你有美好的一天。Good morning! Wish you a wonderful day. 2. 你好呀,最近怎么样?Hello there, how have you been recently? 3. 别放弃,你能做到的!Dont give up, you can do it! 4. 继续努力,你的付出会有回报的。Keep up the good work, your efforts will pay off.", "voice": "echo", "response_format": "wav", "stream": true}\' '
    '-s | mpv --no-video -'
)
subprocess.run(cmd, shell=True, check=True)

Here is the demo video.

Click to watch the video.

yueguobin added 10 commits March 6, 2025 08:57
增加openai_api接口的封装示例,并测试。
增加openai_api.py文件,实现openai tts api接口的兼容。
支持model\voice\input\response_format。其中voice需要
自定义。测试代码见openai_api.ipynb
update #
@github-actions github-actions bot changed the base branch from main to dev March 9, 2025 18:25
Copy link
Member

@fumiama fumiama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@fumiama fumiama added enhancement New feature or request ui UI improvements & issues labels Mar 12, 2025
@fumiama fumiama enabled auto-merge (squash) March 12, 2025 13:53
@yueguobin
Copy link
Author

看来我得先解决CI的问题了,虽然tests目录下的测试脚本与我提交的代码并无关系,但是需要通过CI才能完成PR。
2025-03-14_00-59-59

@fumiama
Copy link
Member

fumiama commented Mar 13, 2025

确实是这样,但是既然你没有对ci做改动,说明是ci本身的问题,所以我可以先合并了,之后我们再想办法。

@fumiama fumiama disabled auto-merge March 13, 2025 17:03
@yueguobin
Copy link
Author

确实是这样,但是既然你没有对ci做解决办法,说明是ci本身的问题,所以我可以先合并了,之后我们再想。

第一次PR,不熟悉流程,这个页面还需要我做些什么操作么?

@fumiama
Copy link
Member

fumiama commented Mar 14, 2025

不用。

@fumiama fumiama merged commit d582fd5 into 2noise:dev Mar 14, 2025
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ui UI improvements & issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants