DISCLAIMER: This project is currently a work in progress.
TVL is a video loading library with a common interface for decoding on the GPU and CPU. Video frames are returned as PyTorch tensors, ready for use with a computer vision model.
import torch
import tvl
# Create a VideoLoader for the video file 'my_video.mkv' that will decode frames as float tensors
# using the first CUDA-enabled GPU device ('cuda:0').
vl = tvl.VideoLoader('my_video.mkv', 'cuda:0', dtype=torch.float32)
# Request three frames by index. Note that the return value is an iterator, and the frames may
# be lazy-loaded.
frames_iter = vl.select_frames([24, 26, 25])
# Force all of the frames to be decoded by creating a list from the iterator. The result will
# be a list of torch.cuda.FloatTensor objects, with shape [3 x H x W].
frames = list(frames_iter)
Depending on the backend(s) you choose, you may need to install the following dependencies:
- ffmpeg >= 4.1
- GCC >= 7
- SWIG 3
- CUDA >= 10
make dist
The wheels for tvl and all backends will be placed in dist/
.
docker build -t tvl . && docker run --rm -it tvl
A simple Tkinter-based video player is provided in the examples/
directory. Try it out by running
the following command:
python examples/video_player.py
CUDA and multiprocessing don't mix very well. When multiprocessing is in "fork" mode, CUDA straight-up fails to initialise. Specifying spawn/forkserver for multiprocessing works, but is ridiculously slow.
My recommendation is to run the video loader in a single background thread (note: using more
than one thread can cause a deadlock, depending on the backend). This enables background loading of
video frames in parallel with your programming doing some other work (eg. training a model).
See examples/async_dataloading.py
.
Backend class | Supported devices |
---|---|
FffrBackend (recommended) | cpu,cuda |
NvdecBackend | cuda |
PyAvBackend | cpu |
OpenCvBackend | cpu |
If you wanted to install tvl
with support for the FFFR backend you would install the
package like so:
$ pip install "tvl[FffrBackend]"
When you call select_frames
to read a bunch of frames, TVL must decide when to read frames
sequentially and when to seek. Sometimes reading frames sequentially and discarding unneeded frames
is faster than seeking, and sometimes it isn't. The minimum distance between frames for which
seeking is faster than reading sequentially depends on a number of file-specific factors, such as
GoP size. Unfortunately, these factors can't be inferred automatically on the fly.
In TVL you can manually configure the threshold value for triggering seeks using the
seek_threshold
backend option.
import tvl
# Provide a hint that if frames are more than 3 frames apart, a seek should be triggered.
vl = tvl.VideoLoader('my_video.mkv', 'cpu', backend_opts={'seek_threshold': 3})
If you expect to be reading a lot of videos that are encoded in a similar way, we recommend
benchmarking a range of seek_threshold
values to find which is fastest.
TVL also includes a helper class for managing VideoLoader objects on multiple devices at once. Say, for example, that you want to load at most 2 videos at a time on the first GPU device, and 3 on the CPU.
import tvl
import torch
pool = tvl.VideoLoaderPool({'cuda:0': 2, 'cpu': 3})
def do_video_loading(video_filename):
# Optional: you can define device-specific backend options.
backend_opts_by_device = {
'cuda:0': { 'seek_threshold': 5 },
}
with pool.loader(video_filename, torch.float32, backend_opts_by_device) as vl:
# ...use the VideoLoader instance (vl)...
pass
# ...call do_video_loading from multiple threads...
The following options are supported by all backends, and can be specified by giving a
backend_opts
argument to tvl.VideoLoader
.
out_width
: Specify a width for read frames to be automatically resized to.out_height
: Specify a height for read frames to be automatically resized to.seek_threshold
: Specify the threshold value for seeking instead of reading frames sequentially.
import torch
import tvl
vl = tvl.VideoLoader('my_video.mkv', 'cuda:0', dtype=torch.float32,
backend_opts={'out_width': 200, 'out_height': 100})
# This tensor will have dimensions [3, 100, 200].
frame = vl.read_frame()
- GPU support is only available for NVIDIA cards
- Decoding only, no encoding/transcoding