MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

Paper | Website | Video | Data | Checkpoints

Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu,Rakesh Ranjan, Alexander Schwing, Zhicheng Yan

TL;DR

Multi-view Pose-free RGB-only 3D reconstruction in one step. Also supports for new view synthesis and relative pose estimation.

Please see more visual results and video on our website!

Update Logs

2025-1-1: A gradio demo, all checkpoints, training/evaluation code and training/evaluation trajectories of ScanNet.
2025-1-8: demo view selection improved, better quality for multiple rooms.

Installation

We only test this on a linux server and CUDA=12.4

Clone MV-DUSt3R+

git clone https://github.com/facebookresearch/mvdust3r.git
cd mvdust3r

Install the virtual environment under anaconda.

./install.sh

(version of pytorch and pytorch3d should be changed if you need other CUDA version.)

(Optional for faster runtime) Compile the cuda kernels for RoPE (the same as DUSt3R and Croco)

cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../

Checkpoints

Please download checkpoints here to the folder checkpoints before trying demo and evaluation.

Name	Description
MVD.pth	MV-DUSt3R
MVDp_s1.pth	MV-DUSt3R+ trained on stage 1 (8 views)
MVDp_s2.pth	MV-DUSt3R+ trained on stage 1 then stage 2 (mixed 4~12 views)
DUSt3R_ViTLarge_BaseDecoder_224_linear.pth	the pretrained DUSt3R model. Our training is finetuned upon it

Gradio Demo

python demo.py --weights ./checkpoints/{CHECKPOINT}

You will see the UI like this:

The input can be multiple images (we do not support a single image) or a video. You will see the pointcloud along with predicted camera poses (3DGS visualization as future work).

The confidence threshold controls how many low confidence points should be filtered. The No. of video frames is only valid when the input is a video and controls how many frames are uniformly selected from the video for reconstruction.

Note that the demo's inference is slower than what claimed in the paper due to overheads of gradio and model loading. If you need faster runtime, please use our evaluation code.

some tips to improve quality especially for multiple rooms.

Data

We use five data for training and test: ScanNet, ScanNet++, HM3D, Gibson, MP3D. Please go to their website to sign contract, download and extract them in the folder data. Here are more instructions.

Currently we released the trajectories of ScanNet for evaluation. Please download it to the folder trajectories More trajectories for training and more data will be released later.

Evaluation

Here we have the following scripts for evaluation on ScanNet in the folder scripts:

Name	Description
test_mvd.sh	MV-DUSt3R
test_mvdp_stage1.sh	MV-DUSt3R+ trained on stage 1 (8 views)
test_mvdp_stage2.sh	MV-DUSt3R+ trained on stage 1 then stage 2 (mixed 4~12 views)

They should reproduce the paper's result on ScanNet (Tab. 2, 3, 4, S2, S3, and S5).

Training

We are still preparing for the releasing of trajectories of training data and code of trajectory generation. Here we also put training scripts in the folder scripts, which can provide more information about our training.

Name	Description
train_mvd.sh	MV-DUSt3R, loaded from DUSt3R to finetune
train_mvdp_stage1.sh	MV-DUSt3R+ training on stage 1 (8 views), loaded from DUSt3R to finetune
train_mvdp_stage2.sh	MV-DUSt3R+ trained on stage 1 finetuning on stage 2 (mixed 4~12 views)

Citation

@article{tang2024mv,
  title={MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds},
  author={Tang, Zhenggang and Fan, Yuchen and Wang, Dilin and Xu, Hongyu and Ranjan, Rakesh and Schwing, Alexander and Yan, Zhicheng},
  journal={arXiv preprint arXiv:2412.06974},
  year={2024}
}

License

We use CC BY-NC 4.0

Acknowledgement

Many thanks to:

DUSt3R for the codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
checkpoints		checkpoints
croco		croco
data		data
dust3r		dust3r
scripts		scripts
static		static
trajectories		trajectories
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIB.md		CONTRIB.md
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
inference_global_optimization.py		inference_global_optimization.py
inference_global_optimization_batch.py		inference_global_optimization_batch.py
install.sh		install.sh
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

TL;DR

Update Logs

Installation

Checkpoints

Gradio Demo

Data

Evaluation

Training

Citation

License

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

facebookresearch/mvdust3r

Folders and files

Latest commit

History

Repository files navigation

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

TL;DR

Update Logs

Installation

Checkpoints

Gradio Demo

Data

Evaluation

Training

Citation

License

Acknowledgement

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages