LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video

3DMV@CVPRW 2026

Pedro Quesado¹, Erkut Akdag¹, Yasaman Kashefbahrami¹, Willem Menu¹, Egor Bondarev¹

¹AIMSGroup, Department of Electrical Engineering, Eindhoven University of Technology

Overview

LiveStre4m is a feed-forward model for real-time novel view synthesis (NVS) that enables temporally consistent live streaming from as few as two unposed, sparse multi-view inputs by predicting camera parameters directly from RGB images. By combining a multi-view vision transformer for 3D scene reconstruction with a diffusion-transformer interpolation module, the system achieves an average reconstruction time of 0.07s per frame at 1024x768 resolution, vastly outperforming optimization-based dynamic scene representation methods in runtime.

Figure 1. Illustration of the proposed LiveStre4m method, a feed-forward model for live-streaming novel viewpoint video from two or more low-resolution input streams.

Results

LiveStre4m prioritizes efficiency, synthesizing the target viewpoint using only two neighboring input views, without requiring optimization or ground-truth camera parameters. It demonstrates a remarkable improvement in efficiency, operating in 0.14 seconds per frame on complex dynamic scenes, approximately 55x faster than 4DGS and 19x faster than IGS.

Figure 2. Qualitative results produced by LiveStre4m on dynamic scenes.

Dynamic Scene Reconstruction

Quantitative comparison against optimization methods on the Neural3DVideo dataset on a single A100 GPU.

Category	Method	Runtime (s) ↓	PSNR ↑	Camera Free	#Views
Video Optimization	K-planes	48.00	32.17	❌	>19
	4DGS	7.80	32.70	❌	≥19
	Spacetime-GS	48.00	33.71	❌	≥19
Frame Optimization	StreamRF	15.00	32.09	❌	>19
	3DGStream	16.93	32.75	❌	≥19
	IGS	2.67	33.89	❌	≥19
Feed-forward	LiveStre4m (Ours)	0.14	20.64	✅	2

(Note: MeetRoom results follow a similar trend, where LiveStre4m operates in 0.10s using only 2 views.)

Feed-Forward Scene Reconstruction

Comparison of feed-forward, camera-free scene reconstruction methods on a single H100 GPU.

Dataset	Method	Runtime (s) ↓	PSNR ↑	Resolution	Pose Free	#Views
Neural3DVideo	FLARE	0.249	21.45	512x384	✅	2
	LiveStre4m (Ours)	0.062	22.44	512x384	✅	2
	LiveStre4m (Ours)	0.074	21.11	1024x768	✅	2
MeetRoom	FLARE	0.243	16.65	512x384	✅	2
	LiveStre4m (Ours)	0.062	19.32	512x384	✅	2
	LiveStre4m (Ours)	0.074	18.65	1024x768	✅	2

Citation

@misc{quesado2026livestre4mfeedforwardlivestreaming,
      title={LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video}, 
      author={Pedro Quesado and Erkut Akdag and Yasaman Kashefbahrami and Willem Menu and Egor Bondarev},
      year={2026},
      eprint={2604.06740},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.06740}, 
}