Swipe to browse more examples
As AI models are increasingly integrated into applications involving human interaction, understanding the alignment between human perception and machine vision has become essential. One example is the estimation of visual motion (optical flow) in dynamic applications such as driving assistance. While there are numerous optical flow datasets and benchmarks with ground truth information, human-perceived flow in natural scenes remains underexplored. We introduce HuPerFlow—a benchmark for human-perceived flow measured at 2,400 locations selected from ten representative computer vision optical flow datasets. Through online psychophysical experiments, we collected ~38,400 response vectors from 480 participant instances. Our data demonstrate that human-perceived flow aligns with ground truth in spatiotemporally smooth locations while also showing systematic errors influenced by various environmental properties. Additionally, we evaluated several optical flow algorithms against human-perceived flow, uncovering both similarities and unique aspects of human perception in complex natural scenes. HuPerFlow is the first large-scale human-perceived flow benchmark for alignment between computer vision models and human perception, as well as for scientific exploration of human motion perception in natural scenes. The HuPerFlow benchmark will be available online upon acceptance.
Measuring reliable human perception of motion in natural scenes is extremely challenging. Inspired by classical psychophysical methods, we have developed an online experiment to collect human-perceived optical flow at specific spatial locations and time points. This online paradigm includes a calibration process that ensures the user's viewing angle. Display resolution and frame rate are appropriately adjusted for accurate data collection across devices.
To maximize reliability, each participant receives thorough training with guided instructions, and we filter out inconsistent participants to ensure data quality. The overall setup allows us to efficiently collect large-scale, reliable data from diverse users.
In the experiment, participants are first shown a short video with a target location (labeled as Point A). Participants then use mouse controls to match the direction and speed of a moving noise pattern,
aligning it with the perceived motion in the natural video.
Each video and noise pattern is presented multiple times, allowing participants sufficient opportunities to infer the most accurate result based on their perception. Note the video is slowed down for better readability. Feedback on ground truth and response vectors was provided after each response during practice trials and the training session but was not shown in the main session when collecting benchmark data. The complete experimental workflow is illustrated below.
@inproceedings{Yang2025Huperflow,
author = {Yung-hao Yang and Zitang Sun and Taiki Fukiage and Shin’ya Nishida},
title = {Huperflow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2025},
note = {In press}
}