Designing 2D-anisotropic Gaussian biases for content-specific saliency in UAV videos
Our aim is to adress the specificities of salience in UAV contents. We want to propose a dictionary of biases specific to UAV saliency patterns as a low-complexity prediction system. This could drastically impact the entire field in UAV research for object recognition and tracking for instance.
Prior to designing biases, we have to show that saliency patterns in UAV videos differ from that of typical sequences. In order to visualize differences, we defined features to represent salience in a high dimentional spapce. We used the t-SNE algorithm for dimention reduction and visualization.
Difference in typical and conventional imaging salience
We first design features that we extract from several representations of saliency:
- Human Saliency Maps (HSMs) We characterize the spatial complexity with energy and entropy measures and the temporal complexity with the gradient between two consecutive saliency maps.
- Fixations The number of fixation, and the number of clusters (DBSCAN , HDBSCAN) are highly informative data.
- Marginal distributions (MDs) Moments describe a distribution comprehensively: mean, std, skewness, and kurtosis. Also, to enable comparisons of distributions with a Gaussian, we compute the median and geometric mean.
- 2D K-means and Gaussian Mixture Models (GMM) We compute K-means and GMM with one and HDBSCAN clusters. The center coordinates and variance of the most representative ones are kept to express saliency.
The t-SNE operates on all features when dealing with frames and on the average and standard deviation of features over the video for sequences. We also tried using only MDs features and verify such features can form a separability criterion. It gives the following results:
- We can see different areas of the space are covered by typical (DHF1K) and UAV (DTB70, UAV123, and VIRAT) points. Some points overlap but not a majority. Consequently, conventional saliency is not fully able to describe that of UAV videos. It thus makes sense to study specifically patterns in UAV saliency.
- We also see several separate groups formed by UAV sequences. This observation hints at the presence of several patterns of gaze deployment.
Cluster UAV sequences, based on the similarity between saliency patterns
This time, we focus on UAV video and their marginal distribution only. We defined seven groups of sequence presenting similar saliency patterns, based on a t-SNE on marginal distribution features-based distance and hierarchical clustering. The pipeline above expresses our methodology.
Identification of Biases
The identification of biases takes three steps. First, we characterize horizontal and vertical marginal distributions with their mean and standard deviation (std). We then compute the distribution of these characteristics over a cluster. We define as representative the local maxima of such distributions. Finally, we combine all parameters to reconstruct a bias. The reconstruction consists of the multiplication of two marginal distribution defined by extracted parameters.
We ultimately obtained 296 biases. A selection process is necessary to keep only significant patterns.
Selection of biases:
We select the three best patterns per cluster based on three conventional metrics for saliency prediction: Correlation Coefficient (CC), Similarity (SIM), and Kullback Leibler divergence (KL). Eventually, we reduced the dictionary of biases to 21 elements that perform the most, and that beats the Center Bias (CB) power of prediction, on the cluster. Here are the illustrations of the patterns.
Biases from V, VI and VII exceed our expectations as they perform well overall. I, II, III, and IV are more content-specific biases.
[To Be Improved Soon]
[Accepted paper 30-06-2020]