UAV datasets – EyeTrackUAV2


EyeTrackUAV2 augments the EyeTrackUAV dataset regarding the number and variety of content as well as the population sample size establishing the ground-truth. This large scale dataset covers a wide range of challenging aspects for saliency prediction, object detection, tracking, and recognition, among others. It addresses the need to have more data to enable empirical analyses of visual attention in UAV videos, especially in the present context of big data. Besides, to exhaustively study diverse contexts of visualization, we collected gaze data under both Free-Viewing (FV) and Task-related (T) viewing conditions. Additionally, 30 observers watched every stimulus. Also, a different population sample took part in the test per viewing condition (FV and T).

Video Stimuli

The data collection was conducted on 43 sequences, which are 1280×720 and 720×480, 30 fps, RGB.

They were extrcated from the UAV123 database, DTB70 and VIRAT.

These contents were selected to be representative of the UAV ecosystem. Accordingly, test sequences comprise various visual quality artifacts, camera movements (both translation and rotation), short- and long-term occlusions, illumination variations, viewpoint change, background clutter, severe weather conditions, and target deformability. It also embeds several non-natural contents, addressing the ability of UAV to capture infra-red, thermal, and multispectral content.

Gaze deployment information

Precise binocular gaze data were collected thanks to the EyeLink®, recording binocular eye positions at a rate of 1000 Hz, with a constructor accuracy of 0.25-0.50° of visual angle. The remote mode was used, enabling chinrest-free experiments, in view to let observers have a more natural content exploration.
30 observers participated, observing visual stimuli in Free-Viewing (FV) and Task (T) conditions, in a controlled laboratory setup. In FV, an observer explores the content without constraint. On the contrary, under T conditions, participants were required to signal the introduction of a new moving object (e.g. people, vehicle, bike, etc.) in the video. This task simulates a basic surveillance procedure that encompasses target-specific training (repeated discrimination of targets and non-targets) and visual search scanning (targets potentially located anywhere).

Overall, the dataset comprises eye-tracking information on 42241 frames, which represents 1408 seconds of video.

Six Ground truths

We wondered if the dichotomy between the ocular dominance theory and the cyclopean theory made sense regarding the creation process of saliency maps. The maximal mean of absolute error between eye positions of about 0.6 degrees of visual angle in our dataset further urged us to evaluate it.
We tested six methods, namely Left (L), Right (R), Binocular (B), Dominant (D), non Dominant (nD), and Both Eyes (BE). B corresponds to the average position between the left and right eyes and can be called version signal.

We find that the information of both eyes may be favored to study saliency. If not possible, choosing information from the dominant eye is encouraged. This advice is not a strict recommendation.

Additional insights

We qualitatively found a content-dependant and original database-specific center bias, if ever it is present. This emphasises the differences in behavior towards UAV videos when comared to conventional contents.

We have also observed that task-related gaze density maps seem more spread out than that of FV. Task-based gaze density maps cover more content when compared to free-viewing condition for most sequences.

Both behaviors should be examined further.

Available information

From the dataset, you can download 

  • Gaze raw signals – transformed into the coordinate system of image sequences,
  • Saliency maps for every video frames – computed from raw signals without filtering, for the six scenarios
  • Saccades and Fixation information – computed based on the implementation of EyeMMV‘s Dispersion-Threshold Identi cation (I-DT) algorithm.

All files can be freely downloaded from

This work results from the collaboration of IRISA and LS2N within the framework of the ongoing research project ANR ASTRID DISSOCIE (Automated Detection of SaliencieS from Operators’ Point of View and Intelligent Compression of DronE videos) referenced as ANR-17-ASTR-0009.

For more details on the dataset and conducted analyses, see the folowing paper. Also, if you use any of EyeTrackUAV2 data, please cite the following:

Perrin, A. F., Krassanakis, V., Zhang, L., Ricordel, V., Perreira Da 
Silva, M., & Le Meur, O. (2020). EyeTrackUAV2: a Large-Scale 
Binocular Eye-Tracking Dataset for UAV Videos. Drones, 4(1), 2.


Comments are closed.