State of the art on the MARS dataset
We summarize the state-of-the-art methods on the MARS dataset. We will report both mAP and rank-1, 5, 10, 20 accuracies. Note that this may not be the only performance measurement. Other metrics, such as recognition time, are also important. Please contact me at liangzheng06@gmail.com.
Reference | MARS | Notes | ||||||
rank-1 | rank-5 | rank-20 | mAP | |||||
"MARS: A Video Benchmark for Large-Scale Person Re-identification", Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, Qi Tian, ECCV 2016 | 2.6 | 6.4 | 12.4 | 0.8 | HOG3D [1] + kissme [2], Euclidean distance, single query | |||
1.2 | 2.8 | 7.4 | 0.4 | GEI [3] + kissme [2], single query. | ||||
18.6 | 33.0 | 45.9 | 8.0 | HistLBP [4] + XQDA [5], single query | ||||
30.6 | 46.2 | 59.2 | 15.5 | BoW [6] + kissme [2], single query | ||||
60.0 | 77.9 | 87.9 | 42.4 | IDE, average pooling, Euclidean distance, single query | ||||
65.0 | 81.1 | 88.9 | 45.6 | IDE + kissme, max pooling, Euclidean distance, single query | ||||
68.3 | 82.6 | 89.4 | 49.3 | IDE + kissme, max pooling, Euclidean distance, multiple query | ||||
Current state of the art | ||||||||
"Learning Compact Appearance Representation for Video-based Person Re-Identification", Wei Zhang, Shengnan Hu, Kan Liu, Arxiv 2017 | 55.5 | 70.2 | 80.2 | - | A frame selection step is used before feature pooling | |||
"Multi-Target Tracking in Multiple Non-Overlapping Cameras using Constrained Dominant Sets", Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, and Mubarak Shah, Arxiv 2017 | 68.22 | - | - | - | The constrained dominant sets clustering (CDSC) method is proposed. | |||
"Re-ranking Person Re-identification with k-reciprocal Encoding", Zhun Zhong, Liang Zheng, Donglin Cao, Shaozi Li, CVPR 2017. | 67.78 | - | - | 57.98 | IDE (CaffeNet) + re-ranking, single query. | |||
73.94 | - | - | 68.45 | IDE (ResNet50) + re-ranking, single query. | ||||
"Learning Deep Context-aware Features over Body and Latent Parts for Person Re-identification", Dangwei Li, Xiaotang Chen, Zhang Zhang, Kaiqi Huang, CVPR 2017. | 71.77 | 86.57 | 93.08 | 56.05 | Using the fine-tuned TriNet and Euclidean distance, single query. | |||
83.03 | 93.69 | 97.63 | 66.43 | TriNet + re-ranking [7] | ||||
"See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification", Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan, CVPR 2017 | 70.6 | 90.0 | 97.6 | 50.7 | Single query. Handles both spatial and temporal information. | |||
"Quality Aware Network for Set to Set Recognition", Yu Liu, Junjie Yan, Wanli Ouyang, CVPR 2017 | 73.74 | 84.90 | 91.62 | 51.70 | P-QAN (googlenet), single query. Numbers are provided by the authors, not reported in the paper | |||
"In Defense of the Triplet Loss for Person Re-Identification", Alexander Hermans, Lucas Beyer and Bastian Leibe, Arxiv 2017. | 79.80 | 91.36 | - | 67.70 | Using the fine-tuned TriNet and Euclidean distance, single query. | |||
81.21 | 90.76 | - | 77.43 | TriNet + re-ranking [7] | ||||
Use the dataset for training, but do not report results/using a different evaluation protocol | ||||||||
"Simple Online and Realtime Tracking with a Deep Association Metric", Nicolai Wojke, Alex Bewley, Dietrich Paulus, ArXiv 2017. | - | - | - | - | The CNN model is trained on MARS | |||
"Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification", Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, Pan Zhou, ICCV 2017 | 44 | 70 | 81 | - | Single query. Joint Spatial and Temporal Attention Pooling Network. The evaluation protocol is different from the original one. |
References
[1] Klaser, A., Marsza lek, M., Schmid, C.: A spatio-temporal descriptor based on 3dgradients. In: BMVC (2008).
[2] Kostinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: CVPR. pp. 2288–2295 (2012)
[3] Han, J., Bhanu, B.: Individual recognition using gait energy image. Pattern Analysis and Machine Intelligence, IEEE Transactions on 28(2), 316–322 (2006)
[4] F. Xiong, M. Gou, O. Camps, and M. Sznaier. Person reidentification using kernel-based metric learning methods. In ECCV, 2014.
[5] S. Liao, Y. Hu, X. Zhu, and S. Z. Li. Person re-identification by local maximal occurrence representation and metric learning. In CVPR, 2015.
[6] Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person reidentification: A benchmark. In: CVPR (2015).
[7] Z. Zhong, L. Zheng, D. Cao, and S. Li. Re-ranking Person Re-identification with k-reciprocal Encoding. In CVPR 2017