A&A_dataset: The Aesthetics and visual Attention image dataset

IQ Lab

Resources

A&A_dataset: The Aesthetics and visual Attention image dataset
EMMA: Database for emotion and mood recognition
i_QoE: A database for individual QoE analysis
Perceived Ringing
Eye-Tracking Release 1
Eye-Tracking Release 2
Interactions
Video Task Effect

A&A_dataset

We kindly request you to cite either of the following papers in any published work if you use this dataset.

Redi J, Povoa I, "The role of visual attention in the aesthetic appeal of consumer images: a preliminary study." in proc. International Conference on Video Communication and Image Processing (VCIP 2013), 2013.

The files are password protected. To get the password you can contact Judith Redi (j.a.redi@tudelft.nl)

Motivation and Background

Predicting the aesthetic appeal of (consumer) images is of great interest for a number of applications, from image retrieval to visual quality optimization. A key element in determining the beauty of an image is the ability of the photographer to guide the attention of the viewer to the subject of interest. To this purpose, both image simplicity (i.e., clarity of the subject [1]) and compositional rules (e.g., the well-known rule of thirds [2]), are used to drive the observer’s visual focus and ease perceptual fluency. Quite interestingly, although these are well-accepted rules-of-thumb for good photography, there has been very little effort in validating them in a scientific way, especially towards investigating the interactions between image aesthetic appeal appreciation and visual attention deployment. The A&A dataset aims at providing a basis for performing such more rigorous validation. 200 images varying in subject, composition and chromatic properties are provided along with their corresponding saliency maps, generated from tracking the eye-movements of 19 subjects while observing the images.
For each image we also provide an aesthetic appeal judgments of the 19 subjects, as measured on an discrete, five-point [3] scale.

Dataset summary

The dataset consists of 200 images, their subjective ratings on Aesthetic Appeal, Color likeability, Recognizability and Familiarity, and their respective fixation and saliency maps.

Of the 200 images in the dataset, 56 correspond to those already included in study [4], 26 were chosen from images freely available online, and 118 were taken from the private collection of an amateur photographer. The images cover a wide range of subject categories, labeled based on 16 categories from the website 500px.com, for both expert and amateur photography. The following criteria were considered when selecting the categories:

- Compliance to categories used in computer vision literature (e.g., the LHI dataset [4]), as in the case of Landscapes, People and Sport.
- Frequent occurrence in social networks, as in the case of Food and Fashion.
- Need to encompass different levels of familiarity, as in the case of Abstract and Celebrities.

The image evaluation was performed in a room with constant illumination at approximately 70 lux, in an environment compliant to ITU recommendations [3]. A 23" LED backlight monitor having a resolution of 1360x768 was used to display the stimuli. Participant's face movements were constrained by a chinrest at a distance of 0.7 meters from the display. A SensoMotoric Instruments RED III Eye Tracker with a sampling rate of 50Hz was used to track the participants' eye movements during the image viewing.

For each image in the database, participants were asked to score its aesthetic appeal in a Single Stimulus setup [3], using a 5-point discrete scale ranging between very low (1) and very high aesthetic appeal (5). At the beginning of every session (and after every break) the eye-tracker was calibrated on the participant's gaze based on a 13-points grid. A short training session (consisting in rating 3 images) was also performed at the beginning of every experiment to allow participants to familiarize with their task. The images provided in the training were not intended to be anchoring stimuli for the scoring scale, as we did not want to prime participants with specific criteria for judging images. Participants had no time constraints in observing the images prior to scoring (both in the training and in the actual experiment). The scoring scale was accessible only after completing the viewing of an image, in order to avoid distraction during the image observation. Images were presented in a randomized order for every participant.

File list

The dataset consists of three folders:

Images: Includes the 200 original images included in the study

Data and ratings: Includes the files summarizing image information and aesthetic appeal evaluations
    o Image_Categories.xls reports the semantic category of the images in the 'images' folder
    o Subjective_ratings.xls reports the subjective ratings of the 19 observers for aesthetic appeal, color likeability, recognizability and familiarity.
    o A&A_thesis_report.pdf includes a detailed description of the dataset composition, the rationale behind the design choices, the experimental setup, and a preliminary analysis of the data.

Visual Attention: includes two sub-folders
o FMAPs: Includes the Fixation Maps corresponding to the 200 images in the 'images' folder
o FMAPs: Includes the Saliency Maps corresponding to the 200 images in the 'images'folder

Download information

The database can be downloaded here.

The files are password protected. To get the password you can contact Judith Redi (j.a.redi@tudelft.nl).

References

[1] C. D. Cerosaletti, A. C. Loui, and A. C. Gallagher, "Investigating two features of aesthetic perception in consumer photographic images: clutter and center," SPIE Conference Series, 2011, p. 5
[2] P. Obrador, L. Schmidt-Hackenberg, and N. Oliver, "The role of image composition in image aesthetics," in Proc. IEEE ICIP, 2010
[3] ITU-R Recommendation BT.500-11, "Methodology for the subjective assessment of the quality of television pictures," Geneva, (2002)
[4] J. A. Redi, "Visual quality beyond artifact visibility," in IS&T/SPIE Electronic Imaging, 2013, 86510N-86510N-11.

This research group is part of the Multimedia Computing Group based in the Technical University of Delft
For questions or comments regarding this page, please contact Technical support