Original implimentation: https://github.com/JunukCha/Text2HOI
Create an environment and activate it.
conda create -n $YOUR_ENV_NAME python=3.9
conda activate $YOUR_ENV_NAME
Dependencies:
conda install pytorch torchvision torchaudio cudatoolkit=11.8 -c pytorch -c nvidia
pip install numpy==1.23
pip install opencv-python
pip install scikit-image
pip install tensorboard
pip install matplotlib
pip install tqdm
pip install pymeshlab
pip install open3d
Install additional dependencies and official CLIP repo as a Python package.
pip install ftfy regex
pip install git+https://github.com/openai/CLIP.git
We use MANO model and MANO implimentation in smplx package. Note that you should follow the licenses of MANO.
- Download models (Models & Code) from the MANO website.
- Unzip and copy the MANO models folder
.../mano_v*/models
into.../Text2HOI/models/components/mano
- Install additional dependencies.
pip install smplx pip install trimesh==4.5.2 pip install chumpy==0.70 pip install pyglet==1.5.22
Your folder structure should look like this:
THOI
|-- models
|-- components
|-- mano
|-- models
| |-- MANO_LEFT.pkl
| |-- MANO_RIGHT.pkl
| |-- ...
|-- utils.py
Install and follow the instructions to download GRAB.
For the text prompt annotation, we use visualize code to identify which hand is interacting with the object, and mannually annotate these frames with sentences to represent these motions. The annotation procedure follow protocols below:
- Selected no more than 150 continuous frames from the sequence that ends a few frames after the interaction ends
- If available, select a sequence of consecutive frames using the right hand, left hand, and both hands
After you download the GRAB dataset and unzip files, copy the prompt annotation folder in here to
../GRAB/GRAB-dataset/tools
.
(optional) If you want to create your own annotation, you can copy our modify visualization code to where you install the GRAB repo and use it.
Install additional dependencies as a Python package.
pip install git+https://github.com/steveli/pytorch-sqrtm.git
There are a few things that need to be made before running the code. Please follow the instructions below:
- After you install
smplx
package, go to.../Lib/site-packages/smplx/body_models.py -> MANO -> forward()
, and uncomment the following statement in the code, then the number of returning joint will be 16 (without 5 tips) instead of 21:# Add pre-selected extra joints that might be needed joints = self.vertex_joint_selector(vertices, joints) # Uncomment
- After you install
pytorch
package, go to.../Lib/site-packages/torch/nn/functional.py -> _verify_batch_size()
, and comment the following statement in the code to allow situation when there are only one element in current batch:size_prods = size[0] for i in range(len(size) - 2): size_prods *= size[i + 2] # Comment the following code: # if size_prods == 1: # raise ValueError( # f"Expected more than 1 value per channel when training, got input size {size}" # )
- After you accuire GRAB dataset and finish the extraction using the official script, run
.../datasets/preprocess_grab_object_mesh.py
to preprocess the GRAB dataset (to reduce the number of vertices).
There are three ways to run the porject:
- Training without resuming from a checkpoint, and load your configuration in
.yml
file:python run.py --config_dir $YOUR_CONFIG_DIR
- Training with resuming from a checkpoint. In this mode, loading configuration from
--config_dir
is disabled:python run.py --resume [--checkpoint_dir $YOUR_CHECKPOINT_DIR]
- Inferencing. In this mode, loading configuration from
--config_dir
and--resume
is disabled:python run.py --inference [--checkpoint_dir $YOUR_CHECKPOINT_DIR] [--result_path $YOUR_RESULT_DIR]