Abstract
A basic problem in large collections of documents is to find similar items, for basic search, recommendation, or other purposes. Traditionally techniques for this have relied only on the text in the document, since analysing images has been difficult and costly. However, recently, new methods for analysis of images via deep convolutional neural networks have made it possible to efficiently classify and compare images, which opens up for using images as additional information when comparing documents. In our case we will explore methods of comparing classified ads in the Norwegian online marketplace finn.no, utilizing both images and text in the ads, so called multi-modal comparison. We will evaluate two methods using both images and text for comparing classified ads, one based on representing images and text with features from respectively a deep convolutional neural network and a topic model, and one based on combining classifier output via a separately learned word-embedding model that retains "semantic" similarity between words. We will compare these two methods with baseline comparison methods utilizing only text or only images.