This project was inspired from Mahalanobis metric learning which estimates a low rank linear projection of source data, which results in desired distances among the items in a given database.
Instead of generating a dense low rank projection, we aimed at generating a projection to a larger (sub) space, with the constraint of having sparse projections. Such a representation has a much more potential than a low rank projection because it represents a union of subspaces instead of a single subspace.
However the optimization problem is essentially non-linear, hence we opted to use fully connected neural nets with stochastic gradient descent for learning. The resulting algorithm is also extendible to have multiple layers to have a deep network.
The additional benefit of this approach is that it can also be used to learn a low rank Mahalanobis metric projection without a low rank condition during the optimization, hence have a very efficient and parallalizable optimization scheme.
This work was also my first project for which I used python and numpy extensively. I also used the scikit-learn and theano libraries for implementing the (one or multi-layer) neaural network and the stochastic gradient descent algorithm for training it.
One of the obstacles we have encountered during this project was not having a large enough database to train a neural net for image retrieval. Hence we created our own!
This dataset was generated from images taken from publicly available CNN news footage videos, and labeled according to their similarity. It is the largest database for image retrieval as of 2016. The database is available upon request from the website of Patrick Pérez upon request.