- 🚀🚀🚀 Walkthrough for optimizations done, speeding up the pipeline 9 -> 47 FPS - notebook
- 🎥 Run inference on video - notebook
- [Optional] It's recommended to run with
uv
for faster installation. First, installuv
:
pip install uv
- Install
rt_pose
(you can ignoreuv
in case you want to install with purepip
)
uv pip install rt-pose # with minimal dependencies
uv pip install rt-pose[demo] # with additional dependencies to run `scripts/` and `notebooks/`
import torch
from rt_pose import PoseEstimationPipeline
# Load pose estimation pipeline
pipeline = PoseEstimationPipeline(
object_detection_checkpoint="PekingU/rtdetr_r50vd_coco_o365",
pose_estimation_checkpoint="usyd-community/vitpose-plus-small",
device="cuda",
dtype=torch.bfloat16,
compile=False, # or True to get more speedup
)
# Run pose estimation on image
output = pipeline(image)
# output.person_boxes_xyxy (`torch.Tensor`):
# of shape `(N, 4)` with `N` boxes of detected persons on the image in (x_min, y_min, x_max, y_max) format
# output.keypoints_xy (`torch.Tensor`):
# of shape `(N, 17, 2)` with 17 keypoints per each person
# output.scores (`torch.Tensor`):
# of shape (N, 17) with corresponding scores (aka confidence) for each keypoint
# Visualize with supervision/matplotlib/opencv
# see ./scripts/run_on_image.py
Other object detection checkpoints on the Hub:
Other pose estimation checkpoints on the Hub:
--input
can be URL or path
python scripts/run_on_image.py \
--input "https://res-3.cloudinary.com/dostuff-media/image/upload//w_1200,q_75,c_limit,f_auto/v1511369692/page-image-10656-892d1842-b089-4a7a-80f1-5be99b2b3454.png" \
--output "results/image.png" \
--device "cuda:0"
--input
can be URL or path--dtype
it's recommended to run inbfloat16
precision to get the best precision/speed tradeoff--compile
you can compile models in the pipeline to get even more speed up (x2), but compilation can be quite long, so it makes sense to activate for long videos only.
python scripts/run_on_video.py \
--input "https://huggingface.co/datasets/qubvel-hf/assets/blob/main/rt_pose_break_dance_v1.mp4" \
--output "results/rt_pose_break_dance_v1_annotated.mp4" \
--device "cuda:0" \
--dtype bfloat16