Huperflow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison

1Cognitive Informatics Lab, Kyoto University, Japan. 2Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Japan.

HuPerFlow collects and analyzes extensive human-perceived optical flow data across multiple established optical flow benchmarks. The video above demonstrates examples from the Spring Benchmark. In many cases, human perception of motion (shown by red arrows) differs from the ground truth (blue arrows), with the discrepancy represented by the endpoint error (size of the green circles).


Swipe to browse more examples


Abstract

As AI models are increasingly integrated into applications involving human interaction, understanding the alignment between human perception and machine vision has become essential. One example is the estimation of visual motion (optical flow) in dynamic applications such as driving assistance. While there are numerous optical flow datasets and benchmarks with ground truth information, human-perceived flow in natural scenes remains underexplored. We introduce HuPerFlow—a benchmark for human-perceived flow measured at 2,400 locations selected from ten representative computer vision optical flow datasets. Through online psychophysical experiments, we collected ~38,400 response vectors from 480 participant instances. Our data demonstrate that human-perceived flow aligns with ground truth in spatiotemporally smooth locations while also showing systematic errors influenced by various environmental properties. Additionally, we evaluated several optical flow algorithms against human-perceived flow, uncovering both similarities and unique aspects of human perception in complex natural scenes. HuPerFlow is the first large-scale human-perceived flow benchmark for alignment between computer vision models and human perception, as well as for scientific exploration of human motion perception in natural scenes. The HuPerFlow benchmark will be available online upon acceptance.


Several representative computer vision models and human-inspired models for motion estimation were tested on our human benchmark. Here, we select the VIPER Benchmark and MPI-Sintel Benchmark as demonstrations. The red, yellow, and blue arrows represent human perception, computer vision (CV) models, and ground truth (GT), respectively. We introduce the Relative Consistency Index (RCI) to illustrate each model's ability to align with human perception (i.e., human bias that differs from GT). The yellow circle denotes this alignment; a larger circle indicates that the CV model's response is closer to human perception than to GT. Overall, we found that basic human-aligned computations, such as models based on motion energy calculations, capture human illusions of motion more effectively than state-of-the-art CV models.


Online Psychophysical Experiment

Measuring reliable human perception of motion in natural scenes is extremely challenging. Inspired by classical psychophysical methods, we have developed an online experiment to collect human-perceived optical flow at specific spatial locations and time points. This online paradigm includes a calibration process that ensures the user's viewing angle. Display resolution and frame rate are appropriately adjusted for accurate data collection across devices.

To maximize reliability, each participant receives thorough training with guided instructions, and we filter out inconsistent participants to ensure data quality. The overall setup allows us to efficiently collect large-scale, reliable data from diverse users.

In the experiment, participants are first shown a short video with a target location (labeled as Point A). Participants then use mouse controls to match the direction and speed of a moving noise pattern, aligning it with the perceived motion in the natural video.

Each video and noise pattern is presented multiple times, allowing participants sufficient opportunities to infer the most accurate result based on their perception. Note the video is slowed down for better readability. Feedback on ground truth and response vectors was provided after each response during practice trials and the training session but was not shown in the main session when collecting benchmark data. The complete experimental workflow is illustrated below.


Experimental Workflow Diagram
Figure: Overview of the experimental workflow for collecting human-perceived motion data.

BibTeX



  @inproceedings{Yang2025Huperflow,
  author    = {Yung-hao Yang and Zitang Sun and Taiki Fukiage and Shin’ya Nishida},
  title     = {Huperflow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2025},
  note      = {In press}
}