-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement endpoint: stream_synthesis #1542
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
実装ありがとうございます!!!
どうすれば実行できるかも書いていただけると試しやすいのかなと!!
(エディタでの挙動を確かめたりできるので)
@@ -380,6 +380,51 @@ def multi_synthesis( | |||
background=BackgroundTask(try_delete_file, f.name), | |||
) | |||
|
|||
@router.post( | |||
"/stream_synthesis", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
あ、どういうシグネチャが良いかいろいろ考えてみたのがこちらにあるので参考にしていただければ!!
#1492 (comment)
(例えばpcmではなくwavで返すのが良さそう、などが実装と違いそう)
ただ仕様がこれでよいのかかなり自信がないので、突っ込みあればウェルカムです!!
wavの分割に変えてみます エディタから叩くのは大分工事が必要なので、javascriptでそれっぽいサンプルコードを書いてみます |
おーぜひぜひ!! |
計測結果:
<html>
<head>
<title>音声合成のテスト</title>
<meta charset="utf-8" />
</head>
<body>
<input
type="text"
id="input"
value="これは音声合成のテストです。ストリーム版はレスポンスを受信しながら同時に再生します。"
size="50"
/>
<br />
<button onclick="speak_normal()">normal</button>
<button onclick="speak_stream()">stream</button>
<br />
<audio id="audio" controls></audio>
<script>
async function audio_query() {
const input = document.getElementById("input").value;
const url = `http://localhost:50021/audio_query?speaker=1&text=${input}`;
const response = await fetch(url, {
method: "POST",
headers: {
"Content-Type": "application/json",
}
});
return await response.text();
}
async function speak_normal() {
const start = performance.now();
const audio = document.getElementById("audio");
const query = await audio_query();
const end1 = performance.now();
const url = `http://localhost:50021/synthesis?speaker=1`;
const response = await fetch(url, {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: query,
});
const blob = await response.blob();
const end2 = performance.now();
console.log(`[normal] query: ${Math.round(end1 - start)} ms, synthesis: ${Math.round(end2 - end1)} ms`);
audio.src = URL.createObjectURL(blob);
audio.play();
}
async function speak_stream() {
const start = performance.now();
const audio = document.getElementById("audio");
const audioContext = new AudioContext();
const query = await audio_query();
const end1 = performance.now();
const url = `http://localhost:50021/stream_synthesis?speaker=1`;
const response = await fetch(url, {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: query,
});
const reader = response.body.getReader();
let recvLength = 0;
let recvBuffer = new Uint8Array();
let blobPtr = 44;
let lastBufferedTime = 0;
while (true) {
const { done, value } = await reader.read();
if (done) {
break;
}
recvLength += value.length;
const newBuffer = new Uint8Array(recvLength);
newBuffer.set(recvBuffer);
newBuffer.set(value, recvBuffer.length);
recvBuffer = newBuffer;
if (recvLength >= blobPtr + 2) {
const recvView = new DataView(recvBuffer.buffer);
const nextPtr = recvLength - recvLength % 2;
const numFrames = (nextPtr - blobPtr) / 2;
const audioArrayBuffer = audioContext.createBuffer(1, numFrames, 24000);
const audioData = audioArrayBuffer.getChannelData(0);
for (let i = 0; i < numFrames; i++) {
const sample = recvView.getInt16(blobPtr + i * 2, true);
audioData[i] = sample / 32768;
}
const source = audioContext.createBufferSource();
source.buffer = audioArrayBuffer;
source.connect(audioContext.destination);
source.start(lastBufferedTime);
lastBufferedTime = Math.max(lastBufferedTime, audioContext.currentTime) + numFrames / 24000;
if (blobPtr === 44) {
// this is the first chunk
const end2 = performance.now();
console.log(`[stream] query: ${Math.round(end1 - start)} ms, synthesis (first byte): ${Math.round(end2 - end1)} ms`);
}
blobPtr = nextPtr;
}
}
}
</script>
</body>
</html> |
内容
固定時間(1秒)単位で音声合成を行いストリーミングでレスポンスを返すエンドポイントを実装。
既存のエンドポイント(/synthesis)は音声がすべて完成してからレスポンスを返す(下図)のに対して、

この実装は初めの1秒が完成次第レスポンスを返す(下図)のでfirst byteが早い

関連 Issue
スクリーンショット・動画など
その他