You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
add offset/duration as arguments in predict.whisper, include new column called segment_offset in output by default, make sure examples on stereo are run in language es
Copy file name to clipboardexpand all lines: NEWS.md
+2-1
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,8 @@
1
1
## CHANGES IN audio.whisper VERSION 0.4
2
2
3
3
- Allow to pass on multiple offset/durations
4
-
- Allow to give sections in the audio (e.g. detected with a voice acitivy detector) to filter out these (voiced) data, make the transcription and make sure to add the amount of time which was cut out such that the resulting timepoints in from/to are aligned to the original audio file
4
+
- Allow to give sections in the audio (e.g. detected with a voice activity detector) to filter out these (voiced) data, make the transcription and make sure to add the amount of time which was cut out to the from/to timestamps such that the resulting timepoints in from/to are aligned to the original audio file
5
+
- The data element of the predict.whisper now includes a column called segment_offset indicating the offset of the provided sections or offsets
Copy file name to clipboardexpand all lines: R/whisper.R
+24-10
Original file line number
Diff line number
Diff line change
@@ -8,13 +8,13 @@
8
8
#' @param language the language of the audio. Defaults to 'auto'. For a list of all languages the model can handle: see \code{\link{whisper_languages}}.
9
9
#' @param sections a data.frame with columns start and duration (measured in milliseconds) indicating voice segments to transcribe. This will make a new audio file with
10
10
#' these sections, do the transcription and make sure the from/to timestamps are aligned to the original audio file. Defaults to transcribing the full audio file.
11
+
#' @param offset an integer vector of offsets in milliseconds to start the transcription. Defaults to 0 - indicating to transcribe the full audio file.
12
+
#' @param duration an integer vector of durations in milliseconds indicating how many milliseconds need to be transcribed from the corresponding \code{offset} onwards. Defaults to 0 - indicating to transcribe the full audio file.
11
13
#' @param trim logical indicating to trim leading/trailing white space from the transcription using \code{\link{trimws}}. Defaults to \code{FALSE}.
12
14
#' @param trace logical indicating to print the trace of the evolution of the transcription. Defaults to \code{TRUE}
13
15
#' @param ... further arguments, directly passed on to the C++ function, for expert usage only and subject to naming changes. See the details.
14
16
#' @details
15
17
#' \itemize{
16
-
#' \item{offset: milliseconds indicating to start transcribing from that timepoint onwards. Defaults to 0.}
17
-
#' \item{duration: how many milliseconds need to be transcribed. Defaults to the whole audio file.}
18
18
#' \item{token_timestamps: logical indicating to get the timepoints of each token}
19
19
#' \item{n_threads: how many threads to use to make the prediction. Defaults to 1}
20
20
#' \item{prompt: the initial prompt to pass on the model. Defaults to ''}
@@ -25,10 +25,12 @@
25
25
#' \item{max_context: maximum number of text context tokens to store. Defaults to -1}
26
26
#' \item{diarize: logical indicating to perform speaker diarization for audio with more than 1 channel}
27
27
#' }
28
+
#' If sections are provided
29
+
#' If multiple offsets/durations are provided
28
30
#' @return an object of class \code{whisper_transcription} which is a list with the following elements:
29
31
#' \itemize{
30
32
#' \item{n_segments: the number of audio segments}
31
-
#' \item{data: a data.frame with the transcription with columns segment, text, from, to and optionally speaker if diarize=TRUE}
33
+
#' \item{data: a data.frame with the transcription with columns segment, segment_offset, text, from, to and optionally speaker if diarize=TRUE}
32
34
#' \item{tokens: a data.frame with the transcription tokens with columns segment, token_id, token, token_prob indicating the token probability given the context}
33
35
#' \item{params: a list with parameters used for inference}
34
36
#' \item{timing: a list with elements start, end and duration indicating how long it took to do the transcription}
@@ -69,15 +71,22 @@
69
71
#' trans <- predict(model, newdata = audio, language = "auto", diarize = TRUE)
0 commit comments