In this page, we provide our implementation of the baseline system used in our CVPR'14 papers. If you use any part of the code in your research, please kindly cite either of the following papers.
Basically, the Bag-of-Words (BoW) model [1] is used. Then, a number of improvements including [2][3][4][5] are added to the system. In the following, a step-by-step implementation is provided.
1. Codebook Generation code
Following [2], codebook is trained on an independent dataset, i.e., the Flickr60k dataset [2], which can be found on this page. The Approximate K-Means (AKM) [1] clustering is used, and codebook size is set to 20k ([6] shows that 65k is better). We use the FLANN library [7] to perform Approximate Nearest Neighbors (ANN) computations. We provide the pre-compiled mex64-file as long as the MATLAB codes here. Note that this code should be run on 64-bit machines.
Notes:
1) It is OK to run AKM for a coupled of rounds, say, 10 to 20. The search performance remains stable. If one wants to train a large codebook such as 1M, 2-5 rounds would meet the need.
2) During AKM, the ANN precision is set to 0.70 for speed consideration. One can try if a higher precision would lead to a finer codebook, at the cost of slow AKM.
3) If you cannot download the code from google drive, here is an alternative link.
2. Quantization code
Again, the FLANN library is used in feature quantization. The code is much similar to codebook generation, except that the target precision should be higher (0.98 in the code).
Code in Google Drive is here, and in Baidu Pan is here.
3. Hamming Embedding code (Bug fixed on 01/31/2015. Thank you, Kelvin, Zhun, and Yanzhang!)
We implement the Hamming Embedding (HE) [2] algorithm. Two steps are involved here.
1) Threshold training on Flickr60k data.
2) Generation of 128-bit binary signatures given SIFT features in an image.
Code in Google Drive is here, and in Baidu Pan is here.
4. Search with BoW and HE code
We provide the code using both the BoW baseline and HE on Holidays dataset. The parameter settings are: codebook size = 20k, HE code length = 128 bits, HE parameters are 52 and 26, respectively, MA = 3. We also apply rootSIFT [3], burstiness weighting [5], avgIDF [4] to improve performance.
Results: for BoW, mAP = 49.67%; for HE, mAP = 81.75%.
Note: the SIFT descriptors are downloaded from the Holidays webpage.
Code in Google Drive is here, and in Baidu Pan is here.
5. Search with HSV global feature code
As stated in our CVPR'14 paper, in addition to the BoW representation, we use a 1000-dim rootHSV histogram as global feature, and perform global image search on Ukbench as an example. The code in Google Drive is here, and in Baidu Pan is here.
If you have any problem with the code or have any suggestion, please email me at zheng-l06@mails.tsinghua.edu.cn.
References
[1] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, "Object Retrieval with Large Vocabularies and Fast Spatial Matching", In: CVPR, 2007.
[2] H. Jegou, M. Douze, and C. Schmid, "Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search", In: ECCV, 2008.
[3] A. Relja and A. Zisserman, "Three Things Everyone Should Know to Improve Object Retrieval", In: CVPR, 2012.
[4] L. Zheng, S. Wang, Z. Liu, and Q. Tian, "Lp-norm IDF for Large Scale Image Search", In: CVPR, 2013.
[5] H. Jegou, M. Douze, and C. Schmid, "On the Burstiness of Visual Elements", In: CVPR, 2009.
[6] G. Tolias, Y. Avrithis, and H. Jegou, "To Aggregate or not to Aggregate: Selective Match Kernels for Image Seaerch", In: ICCV, 2013.
[7] Marius Muja and David G. Lowe, "Scalable Nearest Neighbor Algorithms for High Dimensional Data", Pattern Analysis and Machine Intelligence (PAMI), Vol. 36, 2014.