A visual analytic tool for multicriterial retrieval in large databases

May 21st, 2010 | By | Category: Featured Articles, News

The field of visual analytics has opened new directions for smart data analysis and knowledge discovery. Utilizing rich and novel visualization methods and appropriate interaction mechanisms, visual analytics technologies exploit the perceptual abilities and intuition of humans, capacities that have not been matched by machines yet. CERTH / ITI has an active interest in the field of visual analytics and is investing effort into developing relevant technologies.

This article presents some of the the current research conducted in CERTH / ITI in the field of visual analytics. In particular, it presents a tool for retrieval of search results in large databases according to multiple criteria. The tool is currently used for retrieval of items in 3D object databases but could be used in any retrieval application, especially in applications with hierarchical organization of the database and where multiple search criteria are relevant.

The rest of this article is organized as follows. First, a brief review of the use of visual analytics technologies in retrieval systems is provided. Subsequently, the features used for retrieval of 3D items are discussed. Then the developed multi-criterial retrieval interface is described and finally, a discussion of possible extensions and plans concludes the document.

Visual analytics for search and retrieval

Interesting work in the fields of visual analytics and information visualization for rich representa- tion of search results and advanced interactive mechanisms has recently appeared. An example is ResultMaps [CDF09], which represents the hierarchical structure of search results using treemaps. [GPL*09] presents a tool that can cluster images returned from Google’s image search and inter- act with the user to determine the most relevant search results.

Similar applications can be found in [FGL*08] and [FKG*09], where the developed tools provide visual summarization of a large number of images retrieved from Flickr, in order to assist search and to provide helpful recommendations to the user. The novelty of the tool developed by CERTH / ITI in comparison to the above, is that it provides the user with a compact representation of the relevance of search results according to multiple criteria simultaneously.

3D object feature extraction

As discussed, the tool that has been developed has been used for retrieval of 3D items. In 3D item retrieval, a set of descriptors are extracted from the items in the database and used to rank them according to their similarity to a query item. There is a multitude of methods in the literature for extracting geometric features with rich discriminative power. CERTH / ITI has formulated descriptors that have very high discriminative power and have given superior results in retrieval tasks.

In this tool, a robust geometric descriptor that has resulted from previous work in CERTH / ITI, hybrid Krawtchouk moments [ZDA*07] [MDTS09] combined with the spherical trace transform is used for retrieval. Moreover, a set of more intuitive geometric descriptors: the volume, the ratio of volume to the volume of the convex hull and the surface of a 3D item are used as additional search criteria.

Visual representation and interactive search

The developed interface utilizes a treemap in order to visualize the hierarchical structure of the items in the database, enabling the user to visually identify relevant groups of data for further inspection. Appropriately chosen colors in an appropriately chosen layout are used, so as to represent the relevance of the items according to the set of criteria in a non-cluttered manner. In addition, interaction mechanisms for managing the display of results and for refining the search are developed.

Figure 1: The complete search interface. A search by item is performed and the 10 top items in the ranking are displayed.It can clearly be seen that two items that belong to the subgroup ”vase” are the most relevant to the query based on all features. Furthermore, it appears that two other subgroups of items are highly relevant to the query at least according to the geometrical descriptors.

The squarified layout [BHvW99] was chosen for the treemap so that each item is represented using an as much as possible uniform shape. The Princeton benchmark database, which consists of 905 objects in 35 categories and 92 subcategories is used. The search interface and treemap can be seen in Fig. 1. Groups and subgroups of items are indicated by thicker black and grey lines respectively.
A useful feature of the treemap is that it displays a 2D view of the most relevant to the current query item from each subgroup of items, in the treemap itself, thereby improving the perceptual processing of results by the user. It is important to stress that the displayed item changes dynamically according to the query, instead of just being static, displaying the median object for instance, providing a better clue about the relevance of each group of items.

The interface presents the relevance of items according to the four features mentioned previously and additionally, an aggregate ranking is displayed. There are two options for search. The first is by value, where range bars are used to filter the displayed results according to the values that the user has provided. The distributions of the relevant features are displayed above the range bars to provide a hint to the user about the amount of items that will be retrieved. The other option is search by item, where the features are automatically extracted from the query item and are used to rank the items in the database according to their similarity to the features. When searching by item, the user can choose how many top ranked results will be displayed by manipulating the rank limit bar for each of the features.

Different colors indicate the relevance / value of different features, i.e. the intensity of the color is proportional to the value or the ranking depending on the mode of search. For instance, when searching by value, the user can determine a range of values for each feature that he is interested in.

Items with larger feature values are displayed with higher intensity colors whereas items with lower values are displayed with lower intensity colors. Items with feature values that are out of the specified range are coded with white. Similarly, when searching by item, the items that have feature values that are closest to the feature value of the query item are displayed with higher intensity colors for the corresponding features. The user can determine which features are displayed so that he can spot relevant items according to the set of features that he deems important. Different arrangments like horizontal and vertical strips and adjacent rectangles have been tried, they were found however to create confusion about which features belong to each item and were therefore abandoned.

Instead, concentric rectangles were found to maintain the cohesiveness of the visual representation of each item and were therefore chosen. An appropriate choice of ranges of colors for each feature has ensured that different items are easily compared regardless of what features the user has chosen to display. Relevant items can be easily spotted using this representation if they are characterized by all the colors that correspond to the criteria that the user has determined as visible. Moreover, similar groups of items interms of some feature may be identified by regions where this feature is present. Additionally, the user can determine the relevant weights wk of the ranking of each criterion for determining the aggregate ranking using Borda’s algorithm. This enables him to simultaneously see the relevance according to different criteria and at the same time control the aggregation for assisting him/her in retrieving the
most important results.

In addition, a list displays the 10 most relevant results, according to the aggregated ranking. Apart from the name of the item, the name of its group and subgroup are also displayed. This list is connected to the treemap and when an item in the list is highlighted, the corresponding item in the treemap is also highlighted and vice versa. If a highlighted item in the treemap is not among the top 10 in ranking, its details are displayed separately. The list is an important feature as it provides subtle details that cannot be easily incorporated in the visualization. For instance, it is difficult to disambiguate ranking between closely ranked items solely based on the treemap. Moreover, it provides more details about the groupwise coherence of the top results.

Evaluation and possible improvements

A treemap based interface for visualization of search results according to multiple criteria has been presented. A merit of this method is that the relevance of items according to different search parameters can instantly be visually observed. Moreover, important groups of items can be instantly determined due to the hierarchical structure of the treemap. A critical factor for achieving this, has been the choice of the concentric rectangles layout as, compared to alternative arrangements, it has improved the clarity of the visual representation and maintains the visual cohesiveness of the features of each item.

Nevertheless, this arrangement is appropriate for a few features / criteria only, due to visual clutter. Appropriate choice of color or arrangement of features within the representation of a single item may be necessary to make this approach suitable for problems with more criteria. A possible extension could be the use of a 3D treemap [CHW09], with the third dimension being used to display the relevance according to the set of criteria at hand. In addition, the implemented system is appropriate only for small databases where all items can be simultaneously presented. Currently, a 3D interface for richer representation of the information and more advanced interaction mechanisms is being built. This interface is not based on a treemap, however the system allows hierarchical navigation through results with aggregated relevance scores being displayed at each level in order to guide the user in searching larger databases.

Georgios Petkos, Vasilios Darlagiannis, Konstantinos Moustakas and Dimitrios Tzovaras

Bibliography

  • [BHvW99] Bruls M., Huizing K., van Wijk J.: Squari?ed treemaps. In In Proceedings of the Joint Eurographics and IEEE TCVG Symposium on Visualization (1999), Press, pp. 33–42.
  • [CDF09] Clarkson E., Desai K., Foley J.: Resultmaps: Visualization for search interfaces. Visualization and Computer Graphics, IEEE Transactions on 15, 6 (Nov.-Dec. 2009), 1057 –1064.
  • [CHW09] Chaudhuri A., Han-Wei S.: A self-adaptive treemap-based technique for visualizing hierarchical data in 3d. pp. 105 –112.
  • [FGL*08] Fan J., Gao Y., Luo H., Keim D. A., Li Z.: A novel approach to enable semantic and visual image summarization for exploratory image search. In MIR ’08: Proceeding of the 1st ACM international conference on Multimedia information retrieval (New York, NY, USA, 2008), ACM, pp. 358–365.
  • [FKG*09] Fan J., Keim D. A., Gao Y., Luo H., Li Z.: Justclick: personalized image recom- mendation via exploratory search from large-scale ?ickr images. IEEE Transactions on Circuits and Systems for Video Technology 19, 2 (2009), 273–288.
  • [GPL*09] Gao Y., Peng J., Luo H., Keim D., Fan J.: An interactive approach for ?ltering out junk images from keyword-based google search results. IEEE Transactions on Circuits and Systems for Video Technology 19, 12 (December 2009), 1851–1865.
  • [MDTS09] Mademlis A., Daras P., Tzovaras D., Strintzis M. G.: 3d object retrieval using the 3d shape impact descriptor. Pattern Recognition 42, 11 (2009), 2447–2459.
  • [ZDA*07] Zarpalas D., Daras P., Axenopoulos A., Tzovaras D., Strintzis M. G.: 3d model search and retrieval using the spherical trace transform. EURASIP J. Appl. Signal Process. 2007, 1 (2007), 207–207.
Tags: , ,