State of the art on the MARS dataset

We summarize the state-of-the-art methods on the MARS dataset. We will report both mAP and rank-1, 5, 10, 20 accuracies. Note that this may not be the only performance measurement. Other metrics, such as recognition time, are also important. Please contact me at liangzheng06@gmail.com.

Reference MARS Notes

rank-1 rank-5 rank-20 mAP

"MARS: A Video Benchmark for Large-Scale Person Re-identification", Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, Qi Tian, ECCV 2016 2.6 6.4 12.4 0.8 HOG3D [1] + kissme [2], Euclidean distance, single query

1.2 2.8 7.4 0.4 GEI [3] + kissme [2], single query.

18.6 33.0 45.9 8.0 HistLBP [4] + XQDA [5], single query

30.6 46.2 59.2 15.5 BoW [6] + kissme [2], single query

60.0 77.9 87.9 42.4 IDE, average pooling, Euclidean distance, single query

65.0 81.1 88.9 45.6 IDE + kissme, max pooling, Euclidean distance, single query

68.3 82.6 89.4 49.3 IDE + kissme, max pooling, Euclidean distance, multiple query

Current state of the art

"Learning Compact Appearance Representation for Video-based Person Re-Identification", Wei Zhang, Shengnan Hu, Kan Liu, Arxiv 2017 55.5 70.2 80.2 - A frame selection step is used before feature pooling

"Multi-Target Tracking in Multiple Non-Overlapping Cameras using Constrained Dominant Sets", Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, and Mubarak Shah, Arxiv 2017 68.22 - - - The constrained dominant sets clustering (CDSC) method is proposed.

"Re-ranking Person Re-identification with k-reciprocal Encoding", Zhun Zhong, Liang Zheng, Donglin Cao, Shaozi Li, CVPR 2017. 67.78 - - 57.98 IDE (CaffeNet) + re-ranking, single query.

73.94 - - 68.45 IDE (ResNet50) + re-ranking, single query.

"Learning Deep Context-aware Features over Body and Latent Parts for Person Re-identification", Dangwei Li, Xiaotang Chen, Zhang Zhang, Kaiqi Huang, CVPR 2017. 71.77 86.57 93.08 56.05 Using the fine-tuned TriNet and Euclidean distance, single query.

83.03 93.69 97.63 66.43 TriNet + re-ranking [7]

"See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification", Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan, CVPR 2017 70.6 90.0 97.6 50.7 Single query. Handles both spatial and temporal information.

"Quality Aware Network for Set to Set Recognition", Yu Liu, Junjie Yan, Wanli Ouyang, CVPR 2017 73.74 84.90 91.62 51.70 P-QAN (googlenet), single query. Numbers are provided by the authors, not reported in the paper

"In Defense of the Triplet Loss for Person Re-Identification", Alexander Hermans, Lucas Beyer and Bastian Leibe, Arxiv 2017. 79.80 91.36 - 67.70 Using the fine-tuned TriNet and Euclidean distance, single query.

81.21 90.76 - 77.43 TriNet + re-ranking [7]

Use the dataset for training, but do not report results/using a different evaluation protocol

"Simple Online and Realtime Tracking with a Deep Association Metric", Nicolai Wojke, Alex Bewley, Dietrich Paulus, ArXiv 2017. - - - - The CNN model is trained on MARS

"Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification", Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, Pan Zhou, ICCV 2017 44 70 81 - Single query. Joint Spatial and Temporal Attention Pooling Network. The evaluation protocol is different from the original one.

References

[1] Klaser, A., Marsza lek, M., Schmid, C.: A spatio-temporal descriptor based on 3dgradients. In: BMVC (2008).
[2] Kostinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: CVPR. pp. 2288–2295 (2012) [3] Han, J., Bhanu, B.: Individual recognition using gait energy image. Pattern Analysis and Machine Intelligence, IEEE Transactions on 28(2), 316–322 (2006) [4] F. Xiong, M. Gou, O. Camps, and M. Sznaier. Person reidentification using kernel-based metric learning methods. In ECCV, 2014.
[5] S. Liao, Y. Hu, X. Zhu, and S. Z. Li. Person re-identification by local maximal occurrence representation and metric learning. In CVPR, 2015.
[6] Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person reidentification: A benchmark. In: CVPR (2015).
[7] Z. Zhong, L. Zheng, D. Cao, and S. Li. Re-ranking Person Re-identification with k-reciprocal Encoding. In CVPR 2017

Reference	MARS				Notes
Reference	rank-1	rank-5	rank-20	mAP	Notes
"MARS: A Video Benchmark for Large-Scale Person Re-identification", Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, Qi Tian, ECCV 2016	2.6	6.4	12.4	0.8	HOG3D [1] + kissme [2], Euclidean distance, single query
	1.2	2.8	7.4	0.4	GEI [3] + kissme [2], single query.
	18.6	33.0	45.9	8.0	HistLBP [4] + XQDA [5], single query
	30.6	46.2	59.2	15.5	BoW [6] + kissme [2], single query
	60.0	77.9	87.9	42.4	IDE, average pooling, Euclidean distance, single query
	65.0	81.1	88.9	45.6	IDE + kissme, max pooling, Euclidean distance, single query
	68.3	82.6	89.4	49.3	IDE + kissme, max pooling, Euclidean distance, multiple query
Current state of the art
"Learning Compact Appearance Representation for Video-based Person Re-Identification", Wei Zhang, Shengnan Hu, Kan Liu, Arxiv 2017	55.5	70.2	80.2	-	A frame selection step is used before feature pooling
"Multi-Target Tracking in Multiple Non-Overlapping Cameras using Constrained Dominant Sets", Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, and Mubarak Shah, Arxiv 2017	68.22	-	-	-	The constrained dominant sets clustering (CDSC) method is proposed.
"Re-ranking Person Re-identification with k-reciprocal Encoding", Zhun Zhong, Liang Zheng, Donglin Cao, Shaozi Li, CVPR 2017.	67.78	-	-	57.98	IDE (CaffeNet) + re-ranking, single query.
	73.94	-	-	68.45	IDE (ResNet50) + re-ranking, single query.
"Learning Deep Context-aware Features over Body and Latent Parts for Person Re-identification", Dangwei Li, Xiaotang Chen, Zhang Zhang, Kaiqi Huang, CVPR 2017.	71.77	86.57	93.08	56.05	Using the fine-tuned TriNet and Euclidean distance, single query.
	83.03	93.69	97.63	66.43	TriNet + re-ranking [7]
"See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification", Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan, CVPR 2017	70.6	90.0	97.6	50.7	Single query. Handles both spatial and temporal information.
"Quality Aware Network for Set to Set Recognition", Yu Liu, Junjie Yan, Wanli Ouyang, CVPR 2017	73.74	84.90	91.62	51.70	P-QAN (googlenet), single query. Numbers are provided by the authors, not reported in the paper
"In Defense of the Triplet Loss for Person Re-Identification", Alexander Hermans, Lucas Beyer and Bastian Leibe, Arxiv 2017.	79.80	91.36	-	67.70	Using the fine-tuned TriNet and Euclidean distance, single query.
	81.21	90.76	-	77.43	TriNet + re-ranking [7]
Use the dataset for training, but do not report results/using a different evaluation protocol
"Simple Online and Realtime Tracking with a Deep Association Metric", Nicolai Wojke, Alex Bewley, Dietrich Paulus, ArXiv 2017.	-	-	-	-	The CNN model is trained on MARS
"Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification", Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, Pan Zhou, ICCV 2017	44	70	81	-	Single query. Joint Spatial and Temporal Attention Pooling Network. The evaluation protocol is different from the original one.