Skip to content

Commit

Permalink
Support HiFA (#381)
Browse files Browse the repository at this point in the history
* fix zvar loss

* add hifa options for dreamfusion and vsd

* add hifa config

Co-authored-by: JunzheJosephZhu <junzhe.joseph.zhu@gmail.com>
Co-authored-by: Junzhe Zhu <josef@ampere1.stanford.edu>
  • Loading branch information
3 people authored Jan 3, 2024
1 parent 253a072 commit 7afa727
Show file tree
Hide file tree
Showing 39 changed files with 1,138 additions and 54 deletions.
3 changes: 0 additions & 3 deletions DOCUMENTATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,6 @@ For the first three options, you can check more details in [pipe_stable_diffusio
| var_red | bool | Whether to use Eq. 16 in [SJC paper](https://arxiv.org/pdf/2212.00774.pdf). Default: True |
| token_merging | bool | Whether to use token merging. This will speed up the unet forward and slightly affect the performance. Default: False |
| token_merging_params | Optional[dict] | The config for token merging. See [here](https://github.com/dbolya/tomesd/blob/main/tomesd/patch.py#L183-L213) for supported arguments. Default: {} |
| max_step_percent_annealed | float | The precent range (max value) of the random timesteps to add noise and denoise after t annealing. Default: 0.5 |
| anneal_start_step | Optional[int] | If specified, denotes at which step to perform t annealing. Default: None |

### deep-floyd-guidance
Expand All @@ -428,8 +427,6 @@ No specific configuration.
| pretrained_model_name_or_path_lora | str | The pretrained base model path for the LoRA model. Default: "stabilityai/stable-diffusion-2-1" |
| guidance_scale_lora | float | The classifier free guidance scale for the LoRA model. Default: 1. |
| lora_cfg_training | bool | Whether to adopt classifier free guidance training strategy in LoRA training. If True, will zero out the camera condition with a probability 0.1. Default: True |
| max_step_percent_annealed | float | The precent range (max value) of the random timesteps to add noise and denoise after t annealing. Default: 0.5 |
| anneal_start_step | Optional[int] | If specified, denotes at which step to perform t annealing. Default: 5000 |
| camera_condition_type | str | Which to use as the camera condition for the LoRA model, in ["extrinsics", "mvp"]. Default: "extrinsics" |

## Prompt Processors
Expand Down
44 changes: 44 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,17 @@ threestudio is a unified framework for 3D content creation from text prompts, si
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/01a00207-3240-4a8e-aa6f-d48436370fe7.png" width="100%">
<br/>
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/1dbdebab-43d5-4830-872c-66b38d9fda92" width="60%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/d746b874-d82f-4977-a549-98d9ba764dfc" width="30%">

<br/>
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/437b4044-142c-4e5d-a406-4d9bad0205e1" width="60%">
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/812741c0-7229-412e-b6ab-81e377890f04" width="30%">

<br/>
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/4f4d62c5-2304-4e20-b632-afe6d144a203" width="68%">
<br/>
<img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/2f36ddbd-e3cf-4431-b269-47a9cb3d6e6e" width="68%">
<br/>
</p>

<p align="center"><b>
Expand Down Expand Up @@ -267,6 +272,45 @@ python launch.py --config configs/prolificdreamer-geometry.yaml --train --gpu 0
# texturing with 512x512 rasterization, Stable Difusion VSD guidance
python launch.py --config configs/prolificdreamer-texture.yaml --train --gpu 0 system.prompt_processor.prompt="a pineapple" system.geometry_convert_from=path/to/stage2/trial/dir/ckpts/last.ckpt
```
### HiFA [![arXiv](https://img.shields.io/badge/arXiv-2209.14988-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2305.18766)
**This is a re-implementation, missing some improvements from the original paper(coarse-to-fine NeRF sampling, kernel smoothing). For original results, please refer to [https://github.com/JunzheJosephZhu/HiFA](https://github.com/JunzheJosephZhu/HiFA)**

HiFA is more like a suite of improvements including image space SDS, z-variance loss, and noise strength annealing. It is compatible with most optimization-based methods. Therefore, we provide three variants based on DreamFusion, ProlificDreamer, and Magic123. We provide a unified guidance config as well as an SDS/VSD guidance config for the DreamFusion and ProlificDreamer variants, both configs should achieve the same results. Additionally, we also make HiFA compatible with ProlificDreamer-scene.

**Results obtained by threestudio(Dreamfusion-HiFA, 512x512)**

https://github.com/threestudio-project/threestudio/assets/24391451/c0030c66-0691-4ec2-8b79-d933101864a0

**Results obtained by threestudio(ProlificDreamer-HiFA, 512x512)**

https://github.com/threestudio-project/threestudio/assets/24391451/ff5dc4d0-d7d7-4a73-964e-84b8c48e2907

**Results obtained by threestudio(Magic123-HiFA, 512x512)**

https://github.com/threestudio-project/threestudio/assets/24391451/eb6f2f74-9143-4e26-8429-e300ad2d2b80

**Example running commands**

```sh
# ------ DreamFusion-HiFA ------- # (similar to original paper)
python launch.py --config configs/hifa.yaml --train --gpu 0 system.prompt_processor.prompt="a plate of delicious tacos"
python launch.py --config configs/experimental/unified-guidance/hifa.yaml --train --gpu 0 system.prompt_processor.prompt="a plate of delicious tacos"
# ------ ProlificDreamer-HiFA ------- #
python launch.py --config configs/prolificdreamer-hifa.yaml --train --gpu 0 system.prompt_processor.prompt="a plate of delicious tacos"
python launch.py --config configs/experimental/unified-guidance/prolificdreamer-hifa.yaml --train --gpu 0 system.prompt_processor.prompt="a plate of delicious tacos"
# ------ ProlificDreamer-scene-HiFA ------- #
python launch.py --config configs/prolificdreamer-scene-hifa.yaml --train --gpu 0 system.prompt_processor.prompt="A DSLR photo of a hamburger inside a restaurant"
# ------ Magic123-HiFA ------ #
python launch.py --config configs/magic123-hifa-coarse-sd.yaml --train --gpu 0 data.image_path=load/images/firekeeper_rgba.png system.prompt_processor.prompt="a toy figure of firekeeper from dark souls"
```

**Tips**

- If the generated object's color seems oversaturated, decrease lambda_sds_img(or lambda_sd_img if using unified guidance).
- If the generated object looks cloudy, increase lamda_z_variance. If the shape becomes corrupted, decrease lambda_z_variance.
- If the generated object overall seems to have high luminance, increase min_step_percent.
- Make sure sqrt_anneal and use_img_loss are both set to True.
- Check out the [original repo](https://github.com/JunzheJosephZhu/HiFA)! The results are better.

### DreamFusion [![arXiv](https://img.shields.io/badge/arXiv-2209.14988-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2209.14988)

Expand Down
2 changes: 0 additions & 2 deletions configs/debugging/stablediffusion.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,3 @@ system:
guidance_scale: 7.5
min_step_percent: 0.02
max_step_percent: 0.98
max_step_percent_annealed: 0.5
anneal_start_step: 5000
1 change: 1 addition & 0 deletions configs/dreamfusion-sd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ system:
lambda_orient: [0, 10., 1000., 5000]
lambda_sparsity: 1.
lambda_opaque: 0.

optimizer:
name: Adam
args:
Expand Down
1 change: 1 addition & 0 deletions configs/experimental/co3d-imagecondition.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ system:
# lambda_orient: [1000, 0.0, 10, 6000]
lambda_sparsity: 0.0
lambda_opaque: 0.01

optimizer:
name: Adan
args:
Expand Down
1 change: 1 addition & 0 deletions configs/experimental/imagecondition.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ system:
lambda_sparsity: 0.0
lambda_opaque: 0.0


optimizer:
name: Adan
args:
Expand Down
1 change: 1 addition & 0 deletions configs/experimental/imagecondition_zero123nerf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@ system:
lambda_sparsity: 0.1
lambda_opaque: 0.1


optimizer:
name: Adam
args:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ system:
lambda_sparsity: 0.
lambda_opaque: 0.


optimizer:
name: Adam
args:
Expand Down
1 change: 0 additions & 1 deletion configs/experimental/prolificdreamer-importance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,6 @@ system:
lambda_orient: 0.
lambda_sparsity: 10.
lambda_opaque: [10000, 0.0, 1000.0, 10001]
lambda_z_variance: 0.
optimizer:
name: AdamW
args:
Expand Down
1 change: 0 additions & 1 deletion configs/experimental/prolificdreamer-neus-importance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,6 @@ system:
lambda_orient: 0.
lambda_sparsity: 0.
lambda_opaque: 0
lambda_z_variance: 0.
lambda_eikonal: 100.
optimizer:
name: AdamW
Expand Down
2 changes: 2 additions & 0 deletions configs/experimental/unified-guidance/dreamfusion-sd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ system:
weighting_strategy: dreamfusion
min_step_percent: 0.02
max_step_percent: 0.98
use_img_loss: false

loggers:
wandb:
Expand All @@ -87,6 +88,7 @@ system:
lambda_orient: [0, 10., 1000., 5000]
lambda_sparsity: 1.
lambda_opaque: 0.

optimizer:
name: Adam
args:
Expand Down
112 changes: 112 additions & 0 deletions configs/experimental/unified-guidance/hifa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
name: "hifa-unified"
tag: "${rmspace:${system.prompt_processor.prompt},_}"
exp_root_dir: "outputs"
seed: 0

data_type: "random-camera-datamodule"
data:
batch_size: 1
width: 512
height: 512
camera_distance_range: [1.0, 1.5]
fovy_range: [40, 70]
elevation_range: [-10, 45]
camera_perturb: 0.
center_perturb: 0.
up_perturb: 0.
eval_camera_distance: 1.5
eval_fovy_deg: 70.

system_type: "dreamfusion-system"
system:
geometry_type: "implicit-volume"
geometry:
radius: 1.0
normal_type: null

density_bias: "blob_magic3d"
density_activation: softplus
density_blob_scale: 10.
density_blob_std: 0.5

pos_encoding_config:
otype: HashGrid
n_levels: 16
n_features_per_level: 2
log2_hashmap_size: 19
base_resolution: 16
per_level_scale: 1.447269237440378 # max resolution 4096

material_type: "no-material"
material:
n_output_dims: 3
color_activation: sigmoid

background_type: "neural-environment-map-background"
background:
color_activation: sigmoid
random_aug: true


renderer_type: "nerf-volume-renderer"
renderer:
radius: ${system.geometry.radius}
num_samples_per_ray: 512

prompt_processor_type: "stable-diffusion-prompt-processor"
prompt_processor:
pretrained_model_name_or_path: "stabilityai/stable-diffusion-2-1-base"
prompt: ???

guidance_type: "stable-diffusion-unified-guidance"
guidance:
guidance_type: "sds"
pretrained_model_name_or_path: "stabilityai/stable-diffusion-2-1-base"
guidance_scale: 100.
weighting_strategy: dreamfusion
min_step_percent: 0.3
max_step_percent: 0.98
use_img_loss: true
sqrt_anneal: true

loggers:
wandb:
enable: false
project: "threestudio"
name: None

loss:
lambda_sd: 1.
lambda_sd_img: 0.01
lambda_orient: 0.
lambda_sparsity: 1.
lambda_opaque: 0.
lambda_z_variance: 100.

optimizer:
name: Adam
args:
lr: 0.01
betas: [0.9, 0.99]
eps: 1.e-15
params:
geometry.encoding:
lr: 0.01
geometry.density_network:
lr: 0.001
geometry.feature_network:
lr: 0.001
background:
lr: 0.001
trainer:
max_steps: 25000
log_every_n_steps: 1
num_sanity_val_steps: 0
val_check_interval: 200
enable_progress_bar: true
precision: 32

checkpoint:
save_last: true # save at each validation time
save_top_k: -1
every_n_train_steps: ${trainer.max_steps}
120 changes: 120 additions & 0 deletions configs/experimental/unified-guidance/prolificdreamer-hifa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
name: "prolificdreamer-hifa-unified"
tag: "${rmspace:${system.prompt_processor.prompt},_}"
exp_root_dir: "outputs"
seed: 0

data_type: "random-camera-datamodule"
data:
batch_size: [1, 1]
# 0-4999: 64x64, >=5000: 512x512
# this drastically reduces VRAM usage as empty space is pruned in early training
width: [64, 512]
height: [64, 512]
resolution_milestones: [5000]
camera_distance_range: [1.0, 1.5]
fovy_range: [40, 70]
elevation_range: [-10, 45]
camera_perturb: 0.
center_perturb: 0.
up_perturb: 0.
eval_camera_distance: 1.5
eval_fovy_deg: 70.

system_type: "prolificdreamer-system"
system:
stage: coarse
geometry_type: "implicit-volume"
geometry:
radius: 1.0
normal_type: null

density_bias: "blob_magic3d"
density_activation: softplus
density_blob_scale: 10.
density_blob_std: 0.5

pos_encoding_config:
otype: HashGrid
n_levels: 16
n_features_per_level: 2
log2_hashmap_size: 19
base_resolution: 16
per_level_scale: 1.447269237440378 # max resolution 4096

material_type: "no-material"
material:
n_output_dims: 3
color_activation: sigmoid

background_type: "neural-environment-map-background"
background:
color_activation: sigmoid
random_aug: true

renderer_type: "nerf-volume-renderer"
renderer:
radius: ${system.geometry.radius}
num_samples_per_ray: 512

prompt_processor_type: "stable-diffusion-prompt-processor"
prompt_processor:
pretrained_model_name_or_path: "stabilityai/stable-diffusion-2-1-base"
prompt: ???
front_threshold: 30.
back_threshold: 30.

guidance_type: "stable-diffusion-unified-guidance"
guidance:
guidance_type: "vsd"
pretrained_model_name_or_path: "stabilityai/stable-diffusion-2-1-base"
guidance_scale: 7.5
weighting_strategy: dreamfusion
min_step_percent: 0.3
max_step_percent: 0.98
vsd_phi_model_name_or_path: "stabilityai/stable-diffusion-2-1"
sqrt_anneal: true
use_img_loss: true

loggers:
wandb:
enable: false
project: "threestudio"
name: None

loss:
lambda_sd: 1.
lambda_sd_img: 0.01
lambda_train_phi: 1.
lambda_orient: 0.
lambda_sparsity: 10.
lambda_opaque: [10000, 0.0, 1000.0, 10001]
lambda_z_variance: 300.
optimizer:
name: AdamW
args:
betas: [0.9, 0.99]
eps: 1.e-15
params:
geometry.encoding:
lr: 0.01
geometry.density_network:
lr: 0.001
geometry.feature_network:
lr: 0.001
background:
lr: 0.001
guidance:
lr: 0.0001

trainer:
max_steps: 25000
log_every_n_steps: 1
num_sanity_val_steps: 0
val_check_interval: 200
enable_progress_bar: true
precision: 32

checkpoint:
save_last: true
save_top_k: -1
every_n_train_steps: ${trainer.max_steps}
Loading

0 comments on commit 7afa727

Please sign in to comment.