文章标题:

NViST: In the Wild New View Synthesis from a Single Image with Transformers

1. 环境配置

创建环境

conda create -n nvist python=3.9

 进入环境

conda activate nvist

安装torch、torchvision、torchaudio

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121

 安装其它依赖

pip install tqdm scikit-image opencv-python configargparse lpips imageio-ffmpeg lpips tensorboard torch_efficient_distloss
pip install easydict timm plyfile matplotlib kornia accelerate
pip install tensorflow pandas
pip install git+https://github.com/google/nerfies.git@v2
pip install "git+https://github.com/google/nerfies.git#egg=pycolmap&subdirectory=third_party/pycolmap"

2. 数据下载与预处理

2.1. 获取下载地址和密码

点击链接 https://docs.google.com/forms/d/e/1FAIpQLSfU9BkV1hY3r75n5rc37IvlzaK2VFYbdsvohqPGAjb2YWIbUg/viewform

填写所有的必填项

得到下载地址和密码

点击链接并输入密码

2.2. 使用chrome下载

进入开发者模式(Windows和Linux快捷键Ctrl+Shift+I,MacOS快捷键command+option+J)

进入Network tab

选择若干文件,点击下载

如果是下载到桌面客户端,则等待下载完成即可;如果想下载到远端,则需要继续下面的步骤。

看到一个类似"download.aspx?..."的条目,右键点击→Copy→Copy as cURL

在复制的内容后面加入--output mvi_xxx.zip,然后粘贴到终端运行

2.3. 数据预处理

2.3.1. 解压数据包,然后进行下采样

python preprocess/downsample_images.py --data_dir [data directory]

2.3.2. 计算相机位姿

修改read_colmap_results_mvimgnet.py中的data_dir

python preprocess/read_colmap_results_mvimgnet.py

2.3.3. 生成cache文件

python preprocess/make_cache.py --data_dir [data directory]
python preprocess/make_cache.py --data_dir [data directory] --split test

2.4. 问题记录

pycolmap自带bug

Traceback (most recent call last):
  File "/workspace/xueht@xiaopeng.com/code/nvist_official/preprocess/read_colmap_results.py", line 3, in <module>
    import pycolmap
  File "/opt/conda/envs/nvist/lib/python3.9/site-packages/pycolmap/__init__.py", line 4, in <module>
    from .scene_manager import SceneManager
  File "/opt/conda/envs/nvist/lib/python3.9/site-packages/pycolmap/scene_manager.py", line 22, in <module>
    class SceneManager:
  File "/opt/conda/envs/nvist/lib/python3.9/site-packages/pycolmap/scene_manager.py", line 23, in SceneManager
    INVALID_POINT3D = np.uint64(-1)

这是一个明显bug,把-1转换为无符号整型,改为

INVALID_POINT3D = np.int64(-1)

3. 训练

3.1. 精调MAE

3.1.1. 下载预训练模型

mkdir pretrained
cd pretrained
wget -nc https://dl.fbaipublicfiles.com/mae/visualize/mae_visualize_vit_base.pth

3.1.2. 修改配置文件

修改文件configs/mvimgnet_mae.txt的data_dir和base_dir

dataset_name=mvimgnet # dataset name - mvimgnet or shapenet
data_dir=/xxx/MVImgNet_test/ # dataset directory
img_size=[160,90]
vis_every=5000 # how you often visualize intermediate results

batch_size=84
vis_every=5000
n_iters=30001 # number of iterations for training

base_dir=../../output/mae_finetuned # output parent directory
expname=mae_mvimgnet # output directory

using_mae_pretrained=False # whether you would use the pretrained model as initialization

lr_encoder_init=0.0001 # start lr rate (after warm up)
lr_minimum=0.000001 # final lr rate
encoder_warmup_iters=1000 # lr warmup until this iteration - to lr_encoder_init

ckpt=False

# encoder
encoder_patch_size=5
encoder_depth=12
apply_minus_one_to_one_norm=False
encoder_embed_dim=768
encoder_num_heads=12

# mae decoder
mae_decoder_embed_dim=512
mae_decoder_depth=8
mae_decoder_num_heads=16
masking_ratio=0.75 # masking ratio for MAE

using_mae_pretrained=True

3.1.3. 训练

accelerate launch --mixed_precision=fp16 scripts/train_mae.py --config configs/mvimgnet_mae.txt --apply_minus_one_to_one_norm False --expname mae_mvimgnet_imgnet

实际测试单卡内存占用约为15GB

3.2. 训练NViST

支持多GPU训练。以下两条命令分别针对单卡和双卡训练,其中设置的batch size(输入到编码器的图像数量)和 batch pixel sizes(用于渲染的像素数量)占用 40GB A100 GPUs。

如果把batch size和batch pixel size增加到N倍,则需把学习率增加到\sqrt{N}倍。

3.2.1. 单卡训练

CUDA_VISIBLE_DEVICES=0 accelerate launch --mixed_precision=fp16 scripts/train_nvist.py --config configs/mvimgnet_nvist.txt\
 --batch_size 11 --batch_pixel_size 165000 --expname nvist_mvimgnet_1gpu

3.2.2. 双卡训练

accelerate launch --mixed_precision=fp16 scripts/train_nvist.py --config configs/mvimgnet_nvist.txt\
 --batch_size 22 --batch_pixel_size 330000 --expname nvist_mvimgnet_2gpus --lr_encoder_init 0.00006 --lr_decoder_init 0.0003 --lr_renderer_init 0.0003

3.3. 问题记录

发生崩溃, 报这样的错误

AttributeError: 'AcceleratorState' object has no attribute 'use_fp16'

这应该是代码中的bug,把报错的行注释掉就行了。

4. 推理

CUDA_VISIBLE_DEVICES=0 python scripts/eval_nvist.py --config <config_path> --ckpt_dir <ckpt_path>

参考文献

GitHub - wbjang/nvist_official: (CVPR 2024) NViST: In the wild New View Synthesis from a Single Image with Transformers

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐