PosePerfect is a Python project that allows you to edit the pose of an object within a scene using cutting-edge generative AI techniques. This project focuses on two key tasks:
PosePerfectT1.ipynb (.ipynb for quicker view of the code and the output), PosePerfectTask1/run.py (to run the code)
What it does: Isolates the target object from the background image.
How it works: Takes an input image and a text prompt specifying the object class (e.g., chair, table).
Output: A segmented image with a red mask highlighting the object's boundaries.
This code utilizes YOLOv5 for object detection and the Segment Anything Model (SAM) for segmentation to detect, segment, and visually highlight specified objects in images.
OpenCV (cv2): For image processing (reading, writing).
NumPy (np): For numerical operations on image data.
Torch: To load and run the YOLOv5 and SAM models.
Matplotlib (plt): For displaying images.
The model path (CHECKPOINT_PATH) and type (MODEL_TYPE) are set, and the device is configured to use a GPU if available.
SAM is loaded using its checkpoint and set to the specified device. An instance of SamAutomaticMaskGenerator is created for segmentation. The YOLOv5 model is loaded from Ultralytics’ repository as a pre-trained model.
The display_image function reads an image, converts it from BGR to RGB, and displays it using Matplotlib.
The function detects objects in the image and checks for the target object. If detected, it extracts the region of interest (ROI), segments it using SAM, and applies a red highlight before saving and displaying the final image.
- Open VSCode
- Copy the given command
git clone https://github.com/saarinii/PosePerfect.git
- Navigate to the project directory:
cd PosePerfectTask1
- Install the required packages:
pip install -r requirements.txt
- You can now run the script as follows::
python run.py --image ./example.jpg --class "chair" --output ./generated.png
Replace ./example.jpg with the path to your input image, "chair" with the object class you want to segment, and ./generated.png with the desired output path.
PosePerfectT2.ipynb (.ipynb for quicker view of the code and the output), PosePerfectTask2/run.py (to run the code)
What it does: Detects a target object in an image, segments it, and then uses inpainting to modify the image based on the mask of the detected object.
How it works: The script accepts an input image and a text prompt specifying the object class (e.g., chair, table). It uses YOLOv5 to detect the specified object, SAM to generate a mask around the object, and Stable Diffusion inpainting to modify the image based on the mask.
Output: An inpainted image where the specified object is detected, segmented, and modified (e.g., filled in or removed) using Stable Diffusion. The output is saved as a new image file.
OpenCV (cv2): For image processing (reading, writing, and displaying). NumPy (np): For numerical operations on image data. Torch: For loading and running the YOLOv5, SAM, and Stable Diffusion models. PIL (Pillow): For handling images in the inpainting process. Diffusers: To handle the Stable Diffusion inpainting pipeline.
The script checks if CUDA is available to run models on a GPU. It specifies a pre-trained YOLOv5 model from Ultralytics for object detection. The Segment Anything Model (SAM) is loaded using a checkpoint, with its type (e.g., vit_h) specified.
YOLOv5: Loaded from Ultralytics, this pre-trained model detects objects in the input image and extracts the region of interest (ROI) for further segmentation. SAM (Segment Anything Model): SAM is used to generate a mask around the detected object, allowing for precise segmentation. Stable Diffusion Inpainting: Uses the diffusers library to apply inpainting on the segmented area, modifying the image based on the generated mask. Detection and Segmentation Function:
Detecting Objects: The script reads an image and detects objects using YOLOv5, identifying the specified object class by its label (e.g., chair). The detected object's coordinates are used to extract the ROI.
Segmentation: SAM is employed to generate a mask for the object within the ROI, and the best mask (highest confidence) is selected for further processing.
Inpainting: The mask is applied to the image, and Stable Diffusion is used to perform inpainting on the segmented region. This modifies the image based on the prompt and mask (e.g., removing or replacing the object).
The script saves the segmented mask and the final inpainted image as separate output files, ensuring you get both the segmented and modified versions of the input image.
- Open VSCode
- Copy the given command
git clone https://github.com/saarinii/PosePerfect.git
- Navigate to the project directory:
cd PosePerfectTask2
- Install the required packages:
pip install -r requirements.txt
- You can now run the script as follows::
python run.py --image ./example.jpg --class "chair" --azimuth +72 --polar +0 --output ./generated.png
Replace ./example.jpg with the path to your input image, "chair" with the object class you want to segment, ./generated.png with the desired output path, and '--azimuth +72 --polar +0' with the angle you want to rotate the object to.
To look more into my learning journey and what all the experimenting and trial and errors I had to do to make this project and a lot of my mistakes head to