State of the art on the Market-1501 dataset
In this page, will summarize the state-of-the-art methods on Market-1501 dataset. We will report both mAP and rank-1, 5, 10, 20 accuracies. Note that this may not be the only performance measurement. Other metrics, such as recognition time, are also important.
When CMC curves are used in the respective paper, we roughly estimate the numbers and fill in the blanks. The authors may feel free to contact me with the accurate numbers. Priorities are given to papers whose codes are published. Should you have any inquery, please contact me at liangzheng06@gmail.com.
Reference | Market-1501 | Notes | ||||||
rank-1 | rank-5 | rank-10 | rank-20 | rank-30 | rank-50 | mAP | ||
"Scalable person re-identification: a benchmark", Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, Qi Tian, ICCV 2015 | 8.28 | - | - | - | - | - | 2.23 | gBiCov [1], Euclidean distance, single query |
9.62 | - | - | - | - | - | 2.72 | HistLBP [2], Euclidean distance, single query. Super thanks to Mengran Gou for sending us the evaluation results | |
26.07 | - | - | - | - | - | 7.75 | LOMO [3], Euclidean distance, single query | |
35.84 | 52.40 | 60.33 | 67.64 | 71.88 | 75.80 | 14.75 | BoW, Euclidean distance, single query | |
44.36 | 60.24 | 66.48 | 73.25 | 76.19 | 79.69 | 19.42 | BoW, Euclidean distance, multiple query | |
34.00 | - | - | - | - | - | 15.66 | BoW + LMNN, single query | |
38.21 | - | - | - | - | - | 17.05 | BoW + ITML, single query | |
44.42 | 63.90 | 72.18 | 78.95 | 82.51 | 87.05 | 20.76 | BoW + KISSME, single query | |
"Person re-identification: Past, Present and Future", Liang Zheng, Yi Yang, Alexander Hauptmann, Arxiv 2016 | 55.49 | 76.28 | 83.55 | 88.98 | 91.72 | 93.97 | 32.36 | AlexNet identification model, using FC7 (4,096-dim) and Euclidean distance for testing, single query. This method is also used in [4,5] |
73.90 | 87.68 | 91.54 | 94.80 | 96.02 | 97.21 | 47.78 | ResNet-50 identification model, using Pool5 (2,048-dim) and Euclidean distance for testing, single query | |
State of the art in Supervised Learning | ||||||||
"Multiregion Bilinear Convolutional Neural Networks for Person Re-Identification", Evgeniya Ustinova, Yaroslav Ganin, Victor Lempitsky, AVSS 2017. | 66.36 | 85.01 | 90.17 | - | - | - | 41.17 | Multiregion Bilinear DML, single query. |
"Scalable Metric Learning via Weighted Approximate Rank Component Analysis", Cijo Jose, François Fleuret, ECCV 2016 | 45.16 | 68.12 | 76 | 84 | 87 | - | - | Use the baseline BoW descriptor and the proposed WARCA metric learning method. |
"A Comprehensive Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets", Srikrishna Karanam, Mengran Gou, Ziyan Wu, Angels Rates-Borras, Octavia Camps, Richard J. Radke, ArXiv 2016 | 46.5 | 71.1 | 79.9 | 86.9 | - | - | - | HistLBP+kLFDA. Single query. |
"Temporal Model Adaptation for Person Re-Identification", Niki Martinel, Abir Das, Christian Micheloni, Amit K. Roy-Chowdhury, ECCV 2016 | 47.92 | - | - | - | - | - | 22.31 | Using 13.58% of the labeled data. Single query. |
"Deep Linear Discriminant Analysis on Fisher Networks: A Hybrid Architecture for Person Re-identification", Lin Wu, Chunhua Shen, Anton van den Hengel, ArXiv 2016 | 48.15 | - | - | - | - | - | 29.94 | Combines Fisher vector and deep neural network. Not sure whether multiple queries are used. |
"Learning a Discriminative Null Space for Person Re-identification", Li Zhang, Tao Xiang, Shaogang Gong, CVPR 2016. | 55.43 | - | - | - | - | - | 29.87 | LOMO+Discriminative Null Space, single query. |
71.56 | - | - | - | - | - | 46.03 | Both multiple query (MQ) and score-level feature fusion are used. | |
"Similarity Learning with Spatial Constraints for Person Re-identification", Dapeng Chen, Zejian Yuan, Badong Chen, Nanning Zheng, CVPR 2016 | 51.90 | - | - | - | - | - | 26.35 | Extract HSV, LAB, HOG, and SILTP features from patches, and use the proposed SCSP method. Single query. |
"PersonNet: Person Re-identification with Deep Convolutional Neural Networks", Lin Wu, Chunhua Shen, Anton van den Hengel, ArXiv 2016. | 37.21 | - | - | - | - | - | 18.57 | Use single query. Similarity between boxes is learnt end-to-end through a deep network. |
"End-to-End Comparative Attention Networks for Person Re-identification", Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, Shuicheng Yan, ArXiv 2016. | 48.24 | - | - | - | - | - | 24.43 | Use single query. Features are learned by the Comparative Attention Network |
"Deep Attributes Driven Multi-Camera Person Re-identification", Chi Su, Shiliang Zhang, Junliang Xing, Wen Gao, Qi Tian, ECCV 2016. | 39.4 | - | - | - | - | - | 19.6 | single query. |
49.0 | - | - | - | - | - | 25.8 | Multiple query. | |
"Multi-Scale Triplet CNN for Person Re-Identification", Jiawei Liu, Zheng-Jun Zha, Qi Tian, Dong Liu, Ting Yao, Qiang Ling, Tao Mei, A 2016. | 45.1 | 70.1 | 78.4 | - | 88.7 | - | - | single query. Use a triplet loss CNN model with multi-scale improvement. |
55.4 | 78.9 | 85.6 | - | 93.7 | - | - | Multiple query | |
"Learning Deep Embeddings with Histogram Loss", Evgeniya Ustinova and Victor Lempitsky, NIPS 2016. | 59.47 | 80.73 | 86.94 | 91.09 | - | - | - | It seems the single query mode is chosen. A previously introduced deep metric learning framework is adopted, but with new loss functions. |
"A Siamese Long Short-Term Memory Architecture for Human Re-Identification", Rahul Rama Varior, Bing Shuai, Jiwen Lu, Dong Xu, Gang Wang, ECCV 2016. | 61.6 | - | - | - | - | - | 35.3 | Use multiple queries. The LSTM model processes image regions sequentially. |
"Gated Siamese Convolutional Neural Network Architecture for Human Re-Identification", Rahul Rama Varior, Mrinal Haloi, Gang Wang, ECCV 2016. | 65.88 | - | - | - | - | - | 39.55 | single query. Feature learned by the Gated Siamese CNN. |
76.04 | - | - | - | - | - | 48.45 | Multiple query | |
"Point to Set Similarity Based Deep Feature Learning for Person Re-identification", Sanping Zhou, Jinjun Wang, Jiayun Wang, Yihong Gong, Nanning Zheng, CVPR 2017. | 70.72 | - | - | - | - | - | 44.27 | single query. The pairwise loss, triplet loss and a regularizor are jointly optimzed in the loss function. |
85.78 | - | - | - | - | - | 55.73 | Multiple query | |
"Person Re-Identification by Camera Correlation Aware Feature Augmentation", Ying-Cong Chen, Xiatian Zhu, Wei-Shi Zheng, Jian-Huang Lai, TPAMI 2017. | 71.8 | - | - | - | - | - | 45.5 | single query. Use CRAFT-MFA+LOMO |
79.7 | - | - | - | - | - | 54.3 | Multiple query | |
"Consistent-Aware Deep Learning for Person Re-identification in a Camera Network, Ji Lin, Liangliang Ren, Jiwen Lu, Jianjiang Feng, Jie Zhou, CVPR 2017. | 73.84 | - | - | - | - | - | 47.11 | single query. Pairwise similarities are considered across multiple cameras for samples in a batch. |
80.85 | - | - | - | - | - | 55.58 | Multiple query | |
"Looking Beyond Appearances: Synthetic Training Data for Deep CNNs in Re-identification", Igor Barros Barbosa, Marco Cristani, Barbara Caputo, Aleksander Rognhaugen and Theoharis Theoharis, Arxiv 2017. | 73.87 | 88.03 | 92.22 | 95.07 | 96.20 | 97.39 | 47.89 | single query. Use SOMAnet and Market1501 as training set. |
81.29 | 92.61 | 95.31 | 97.12 | 97.68 | 98.43 | 56.98 | Multiple query | |
"Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion", Haiyu Zhao, Maoqing Tian, Shuyang Sun, Jing Shao, Junjie Yan, Shuai Yi, Xiaogang Wang, Xiaoou Tang, CVPR 2017. | 76.9 | 91.5 | 94.6 | 96.7 | - | - | - | single query. CPM is trained on MPII for pose estimation and part localization. |
"Re-ranking Person Re-identification with k-reciprocal Encoding", Zhun Zhong, Liang Zheng, Donglin Cao and Shaozi Li, CVPR 2017. | 77.11 | - | - | - | - | - | 63.63 | Single query. Re-ranking is performed. |
"Pose Invariant Embedding for Deep Person Re-identification", Liang Zheng, Yujia Huang, Huchuan Lu, and Yi Yang, Arxiv 2017. | 79.33 | 90.76 | 94.41 | 96.52 | - | - | 55.95 | Single query. The PIE descriptor and kissme is used. |
"Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro", Zhedong Zheng, Liang Zheng, Yi Yang, ICCV 2017. | 78.06 | - | - | - | - | - | 56.23 | single query. GAN images are used in the ResNet baseline. |
85.12 | - | - | - | - | - | 68.52 | Multiple query | |
"A Discriminatively Learned CNN Embedding for Person Re-identification", Zhedong Zheng, Liang Zheng, Yi Yang, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2017. | 79.51 | 90.91 | 94.09 | 96.23 | 97.33 | 98.25 | 59.87 | single query. Identification and Verification losses are used in a siamese network based on ResNet-50. |
85.84 | 94.54 | 96.41 | 97.51 | 98.07 | 98.81 | 70.33 | Multiple query | |
"Learning Deep Context-aware Features over Body and Latent Parts for Person Re-identification", Dangwei Li, Xiaotang Chen, Zhang Zhang, Kaiqi Huang, CVPR 2017 | 80.31 | - | - | - | - | - | 57.53 | single query. Latent body parts are discovered by the spatial transformer network instead of rigid partitioning. |
86.79 | - | - | - | - | - | 66.70 | Multiple query | |
"Deeply-Learned Part-Aligned Representations for Person Re-Identification", Liming Zhao, Xi Li, Jingdong Wang, Yueting Zhuang, ICCV 2017. | 81.0 | - | - | - | - | - | 63.4 | single query. Body parts are detected from feature maps and their respective features are concatenated later. |
"Scalable Person Re-identification on Supervised Smoothed Manifold", Song Bai, Xiang Bai, Qi Tian, CVPR 2017. | 82.21 | - | - | - | - | - | 68.80 | single query. IDE+re-ranking. |
88.18 | - | - | - | - | - | 76.18 | Multiple query | |
"Divide and Fuse: A Re-ranking Approach for Person Re-identification", Rui Yu, Zhichao Zhou, Song Bai, Xiang Bai, BMVC 2017. | 82.30 | - | - | - | - | - | 72.42 | single query. Features are divided into sub-vectors before re-encoded into a new vector. The new vectors are fused into one vector for ranking. |
"SVDNet for Pedestrian Retrieval", Yifan Sun, Liang Zheng, Weijian Deng, Shengjin Wang, ICCV 2017. | 82.3 | - | - | - | - | - | 62.1 | Single query. 1,024-dim pool5 feature from svdnet is used. |
"Pose-driven Deep Convolutional Model for Person Re-identification", Chi Su, Jianing Li, Shiliang Zhang, Junliang Xing, Wen Gao, Qi Tian, ICCV 2017. | 84.14 | 92.73 | 94.92 | 96.82 | - | - | 63.41 | Single query. Human part is discovered with pose models. Local and Global images are used for feature learning. |
"Deep Transfer Learning for Person Re-identification", Mengyue Geng, Yaowei Wang, Tao Xiang, Yonghong Tian, Arxiv 2016. | 83.7 | - | - | - | - | - | 65.5 | single query. Identification and Verification losses are used in a siamese network based on GoogleNet. |
89.6 | - | - | - | - | - | 73.8 | Multiple query | "Improving Person Re-identification by Attribute and Identity Learning", Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu and Yi Yang, Arxiv 2017. | 84.29 | 93.20 | 95.19 | 97.00 | - | - | 64.67 | Single query. Attributes and ID classification are jointly learning. |
"Pedestrian Alignment Network for Person Re-identification", Liang Zheng, Zhedong Zheng, Yi Yang, Arxiv 2017. | 82.81 | - | - | - | - | - | 63.35 | single query. Pedestrians are aligned by the Spatial Transformer Network. Results could be higher when fine-tuning on the GAN model [7]. |
85.78 | 93.38 | - | - | - | - | 76.56 | Single query + re-ranking [6] | |
88.18 | - | - | - | - | - | 71.72 | Multiple query | |
89.79 | - | - | - | - | - | 83.79 | Multiple query + re-ranking [6] | |
"Deep Spatial Feature Reconstruction for Partial Person Re-identification: Alignment-free Approach", Lingxiao He, Jian Liang, Haiqing Li, and Zhenan Sun, CVPR 2018. | 83.58 | - | - | - | - | - | 64.25 | single query. Deep Spatial feature Reconstruction (DSR) is further developed to avoid explicit alignment.. |
"Person re-identification by deep joint learning of multi-loss classification", Wei Li, Xiatian Zhu, and Shaogang Gong, IJCAI 2017. | 83.9 | - | - | - | - | - | 64.4 | single query. Stripes and global images are jointly considered in a classification CNN network with multiple streams. |
88.8 | - | - | - | - | - | 72.9 | Single query + re-ranking [6] | |
85.1 | - | - | - | - | - | 65.5 | single query, 4 body parts | |
89.7 | - | - | - | - | - | 74.5 | Multiple query, 4 body parts | |
"In Defense of the Triplet Loss for Person Re-Identification", Alexander Hermans, Lucas Beyer and Bastian Leibe, Arxiv 2017. | 84.92 | 94.21 | - | - | - | - | 69.14 | single query. The triplet-loss based network is fine-tuned. Image size: 256x128. The last layer in ResNet is replaced with one 1,024-dim layer and one 128-dim layer. Batch normalization is used as well. |
86.67 | 93.38 | - | - | - | - | 81.07 | Single query + re-ranking [6] | |
90.53 | 96.29 | - | - | - | - | 76.42 | Multiple query | |
91.75 | 95.78 | - | - | - | - | 87.18 | Multiple query + re-ranking [6] | |
"CamStyle Augmentation", Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li , Yi Yang, CVPR 2018. | 88.12 | - | - | - | - | - | 68.72 | single query. A new data augmentation approach which transfers images from one camera to the style of another camera. |
89.49 | - | - | - | - | - | 71.55 | Re-ranking. | |
"Deep Mutual Learning", Ying Zhang, Tao Xiang, Timothy Hospedales, Huchuan Lu, CVPR 2018. | 87.73 | - | - | - | - | - | 68.83 | single query. Two MoblieNets learn from each other, and the average re-ID results of the two individual networks is reported. |
91.66 | - | - | - | - | - | 77.14 | multiple query | |
"A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking", M. Saquib Sarfraz, Arne Schumann, Andreas Eberle, Rainer Stiefelhagen, CVPR 2018. | 87.7 | - | - | - | - | - | 69.0 | single query. Camera view and body joints are integrated in the network. |
90.3 | - | - | - | - | - | 84.0 | single query + ECN (Expanded Cross Neighborhood) re-ranking | |
"Random Erasing Data Augmentation", Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, Yi Yang, Arxiv 2017. | 87.08 | - | - | - | - | - | 71.31 | single query. SVDNet + random erasing data augmentation. |
89.13 | - | - | - | - | - | 83.93 | Re-ranking is used on the rank list obtained by single query | |
"Features for Multi-Target Multi-Camera Tracking and Re-Identification", Ergys Ristani, Carlo Tomasi, CVPR 2018. | 89.46 | - | - | - | - | - | 75.67 | single query. Based on DPFL, called AWTL (2-stream). |
"Harmonious Attention Network for Person Re-Identification", Wei Li, Xiatian Zhu, Shaogang Gong, CVPR 2018. | 91.2 | - | - | - | - | - | 75.7 | single query. Pixel-level attention, regional level attention and feature learning are jointly optimized. |
93.8 | - | - | - | - | - | 82.8 | multiple queries. | |
State of the art in unsupervised Learning / domain adaptation | ||||||||
"Efficient Online Local Metric Adaptation via Negative Samples for Person Re-Identification", Jiahuan Zhou, Pei Yu, Wei Tang and Ying Wu, ICCV 2017. | 40.93 | - | - | 74.06 | - | - | - | Single query. LOMO is used for initialization. This method does not need any positive pairs. |
51.45 | - | - | 80.98 | - | - | - | Multiple query. | |
"Unsupervised Person Re-identification: Clustering and Fine-tuning", Hehe Fan, Liang Zheng and Yi Yang, Arxiv 2017. | 44.7 | 59.1 | 65.6 | 71.7 | - | - | 20.1 | Single query. An IDE model trained on DukeMTMC-reID [7] is used for initialization. Kmeans is used for label estimation. |
41.9 | 57.3 | 64.3 | 70.5 | - | - | 18.0 | Single query. An IDE model trained on CUHK03 is used for initialization. | |
"Cross-view Asymmetric Metric Learning for Unsupervised Person Re-identification", Hong-Xing Yu, Ancong Wu, and Wei-Shi Zheng, ICCV 2017. | 54.5 | - | - | - | - | - | 26.3 | Multiple query. JSTL is used for initialization. A clustering method is used for label estimation. |
"Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification", Weijian Deng, Liang Zheng, Guoliang Kang, Yi Yang, Qixiang Ye, Jianbin Jiao, CVPR 2018. | 51.5 | 70.1 | 76.8 | - | - | - | 22.8 | Single query. DukeMTMC [7] labels are used for domain adaptation. SPGAN is an improved version of CycleGAN. |
57.7 | 75.8 | 82.4 | - | - | - | 26.7 | Single query. Local max pooling is used in addition to SPGAN. | |
57.0 | 73.9 | 80.3 | - | - | - | 27.1 | Multiple query. SPGAN is used without local max pooling. | |
"Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-Identification", Jingya Wang, Xiatian Zhu, Shaogang Gong, Wei Li, CVPR 2018. | 58.2 | 74.8 | 81.1 | 86.5 | - | - | 26.5 | Single query. DukeMTMC [7] labels are used as source for unsupervised domain adaptation. Source attributes are used in addition to the ID labels. |
"Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatial-Temporal Patterns", Jianming Lv, Weihang Chen, Qing Li, and Can Yang, CVPR 2018. | 60.75 | 74.44 | 79.25 | - | - | - | - | Single query. Pedestriansâ spatio-temporal patterns in the target domain are learned, in addition to model fusion and learning to rank methods. |
"Unsupervised Person Re-identification by Deep Learning Tracklet Association", Minxian Li, Xiatian Zhu, and Shaogang Gong, ECCV 2018. | 63.7 | - | - | - | - | - | - | Single query. Tracklets are associated across cameras to provide labels for subsequent learning. |
Use the dataset, but do not report results/ use different evaluation protocols | ||||||||
"Constrained Deep Metric Learning for Person Re-identification", Hailin Shi, Xiangyu Zhu, Shengcai Liao, Zhen Lei, Yang Yang, Stan Z. Li, ArXiv 2015. | - | - | - | - | - | - | - | Used together with CUHK03 as training data for the proposed Constrained Deep Metric Learning. Test on CUHK01 and VIPeR. |
"An Enhanced Deep Feature Representation for Person Re-identification", Shangxuan Wu, Ying-Cong Chen, Xiang Li, An-Cong Wu, Jin-Jie You, Wei-Shi Zheng, WACV 2016. | - | - | - | - | - | - | - | Used as training data for the proposed Feature Fusion Net. Testing is performed on other benchmarks. |
"Semantics-Aware Deep Correspondence Structure Learning for Robust Person Re-identification", Yaqing Zhang, Xi Li, Liming Zhao, Zhongfei Zhang, IJCAI 2016. | - | - | - | - | - | - | - | Used as training data for the proposed DCSL model. |
"Human-In-The-Loop Person Re-Identification", Hanxiao Wang, Shaogang Gong, Xiatian Zhu, Tao Xiang, ECCV 2016. | 78.0 | - | - | - | - | 86.0 | - | 1000 identities, 300 queries are used. Single Shot. 6 random splits. |
33.8 | 61.0 | 73.6 | 85.3 | - | - | - | 501 identities, single shot, 6 random splits. We assume 501 queries are used. |
References
[1] B. Ma, Y. Su, and F. Jurie. Covariance descriptor based on bioinspired features for person re-identification and face verification. Image and Vision Computing, 2014.
[2] F. Xiong, M. Gou, O. Camps, and M. Sznaier. Person reidentification using kernel-based metric learning methods. In ECCV, 2014.
[3] S. Liao, Y. Hu, X. Zhu, and S. Z. Li. Person re-identification by local maximal occurrence representation and metric learning. In CVPR, 2015.
[4] L. Zheng, Z. Bie, Y. Sun, J. Wang, C. Su, S. Wang, and Q. Tian, MARS: A Video Benchmark for Large-Scale Person Re-identification. In ECCV, 2016.
[5] L. Zheng, H. Zhang, S. Sun, M. Chandraker, Yi Yang, and Q. Tian. Person re-identification in the Wild. In CVPR, 2017.
[6] Z. Zhong, L. Zheng, D. Cao, and S. Li. Re-ranking Person Re-identification with k-reciprocal Encoding. In CVPR 2017.
[7] Z. Zheng, L. Zheng, Yi Yang, Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro. ArXiv 2017.