Enhancing Multi-Object Tracking with Compact Model Adjustment
Main Article Content
Abstract
Tracking human movement and interactions in complex environments is a key challenge in computer vision, especially for multi-object tracking. Transformer-based models have shown promise in addressing these challenges due to their capacity to recognize complex patterns across sequences. However, their high computational demands and substantial training data requirements often restrict their real-world applicability. This study aimed to enhance multi-object tracking by introducing a Compact Model Adjustment approach that integrates trainable rank-decomposition matrices within the Transformer architecture. This approach involves freezing the pre-trained model weights and adding trainable low-rank matrices to each layer, substantially reducing the number of parameters that need updating during training. This design allows the model to retain its pre-trained knowledge while efficiently adapting to new tasks, thereby reducing the overall computational load. Additionally, the proposed approach utilizes data from both the current and previous frames to refine object localization and association. Experimental results on the MOT17 benchmark demonstrated that this method achieved a Multiple Object Tracking Accuracy of 71.0, comparable to state-of-the-art techniques while enhancing computational efficiency. This work provides a practical solution for real-world applications in areas such as surveillance, autonomous driving, and sports analytics.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Copyright: Asia-Pacific International University reserve exclusive rights to publish, reproduce and distribute the manuscript and all contents therein.
References
Alzubaidi, L., Bai, J., Al-Sabaawi, A., Santamaría, J., Albahri, A. S., Al-dabbagh, B. S. N., Fadhel, M. A., Manoufali, M., Zhang, J., Al-Timemy, A. H., Duan, Y., Abdullah, A., Farhan, L., Lu, Y., Gupta, A., Albu, F., Abbosh, A., & Gu, Y. (2023). A survey on deep learning tools dealing with data scarcity: Definitions, challenges, solutions, tips, and applications. Journal of Big Data, 10(1), 1–82. https://doi.org/10.1186/s40537-023-00727-2
Amosa, T. I., Sebastian, P., Izhar, L. I., Ibrahim, O., Ayinla, L. S., Bahashwan, A. A., Bala, A., & Samaila, Y. A. (2023). Multi-camera multi-object tracking: A review of current trends and future advances. Neurocomputing, 552, 126558. https://doi.org/https://doi.org/10.1016/j.neucom.2023.126558
Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016, September 25–28). Simple online and realtime tracking [Paper presentation]. 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA. https://doi.10.1109/ICIP.2016.7533003
Boragule, A., Jang, H., Ha, N., & Jeon, M. (2022). Pixel-guided association for multi-object tracking. Sensors, 22(22), 8922. https://www.mdpi.com/1424-8220/22/22/8922
Cao, Q. (2021, July 28–30). Experimental study on the effect of loss function on object detection [Paper presentation]. Proceedings of the 2021 International Conference on Pattern Recognition and Intelligent Systems, Bangkok, Thailand. https://doi.org/10.1145/3480651.3480690
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020, August 23–28). End-to-End Object Detection with Transformers [Paper presentation]. Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, Proceedings, Part I, Glasgow, United Kingdom. https://doi.org/10.1007/978-3-030-58452-8_13
Du, C., Lin, C., Jin, R., Chai, B., Yao, Y., & Su, S. (2024). Exploring the state-of-the-art in multi-object tracking: A comprehensive survey, evaluation, challenges, and future directions. Multimedia Tools and Applications, 83, 73151–73189. https://doi.org/10.1007/s11042-023-17983-2
Emami, P., Pardalos, P. M., Elefteriadou, L., & Ranka, S. (2020). Machine learning methods for data association in multi-object tracking. ACM Computing Survey, 53(4), 1–34. https://doi.org/10.1145/3394659
Faber, N. G., Ziabari, S. S. M., & Nejadasl, F. K. (2024). Leveraging foundation models via knowledge distillation in multi-object tracking: Distilling DINOv2 features to FairMOT. ArXiv, abs/2407.18288. https://doi:10.48550/arXiv.2407.18288
Fournier, Q., Caron, G. M., & Aloise, D. (2023). A practical survey on faster and lighter transformers. ACM Computing Survey, 55(14s), 1–40. https://doi.org/10.1145/3586074
Hay, T. D., & Wolf, L. (2024). Dynamic layer tying for parameter-efficient transformers. ArXiv, abs/2401.12819. https://arxiv.org/html/2401.12819v1
He, Y., Wei, X., Hong, X., Ke, W., & Gong, Y. (2022). Identity-quantity harmonic multi-object tracking. IEEE Transactions on Image Processing, 31, 2201–2215. https://doi.org/10.1109/TIP.2022.3154286
Hu, J. E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., & Chen, W. (2021). LoRA: Low-rank adaptation of large language models. ArXiv, abs/2106.09685. https://doi.org/10.48550/arXiv.2106.09685
Karthik, S., Prabhu, A., & Gandhi, V. (2020). Simple unsupervised multi-object tracking. ArXiv, abs/2006.02609. https://doi:10.48550/arXiv.2006.02609
Khodarahmi, M., & Maihami, V. (2023). A review on Kalman Filter Models. Archives of Computational Methods in Engineering, 30(1), 727–747. https://doi.org/10.1007/s11831-022-09815-7
Liu, Q., Chen, D., Chu, Q., Yuan, L., Liu, B., Zhang, L., & Yu, N. (2022). Online multi-object tracking with unsupervised re-identification learning and occlusion estimation. Neurocomput., 483(C), 333–347. https://doi.org/10.1016/j.neucom.2022.01.008
Manakitsa, N., Maraslidis, G. S., Moysis, L., & Fragulis, G. F. (2024). A review of machine learning and deep learning for object detection, semantic segmentation, and human action recognition in machine and robotic vision. Technologies, 12(2), 15. https://www.mdpi.com/2227-7080/12/2/15
Meinhardt, T., Kirillov, A., Leal-Taixé, L., & Feichtenhofer, C. (2022, June 18–24). TrackFormer: Multi-object tracking with transformers. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) [pp. 8834–8844], New Orleans, Louisiana, USA. https://doi: 10.1109/CVPR52688.2022.00864
Patwardhan, N., Marrone, S., & Sansone, C. (2023). Transformers in the real world: A survey on NLP applications. Information, 14(4), 242. https://doi.org/10.3390/info14040242
Psalta, A., Tsironis, V., & Karantzalos, K. (2024). Transformer-based assignment decision network for multiple object tracking. Computer Vision and Image Understanding, 241(C), 103957. https://doi.org/10.1016/j.cviu.2024.103957
Singh, A., & Príncipe, J. C. (2010, July 18–23). A loss function for classification based on a robust similarity metric. The 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain. https://doi:10.1109/IJCNN.2010.5596485
Stadler, D., & Beyerer, J. (2021, November 16–19). Multi-pedestrian tracking with clusters. 2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) [pp. 1–10], Washington, District of Columbia, USA. https://doi: 10.1109/AVSS52988.2021.9663829.
Sun, P., Jiang, Y., Zhang, R., Xie, E., Cao, J., Hu, X., Kong, T., Yuan, Z., Wang, C., & Luo, P. (2020). TransTrack: Multiple-object tracking with transformer. ArXiv, abs/2012.15460. https://doi:10.48550/arXiv.2012.15460
Wan, X., Zhou, S., Wang, J., & Meng, R. (2021). Multiple object tracking by trajectory map regression with temporal priors embedding. Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event [pp. 1377–1386], China. https://doi.org/10.1145/3474085.3475304
Wang, X., Hu, J., Lai, J., Zhang, J., & Zheng, W. (2019). Progressive teacher-student learning for early action prediction. Conference on Computer Vision and Pattern Recognition 2019 [pp. 3551–3560], Long Beach, California, United States. http://cvpr2019.thecvf.com/
Wojke, N., Bewley, A., & Paulus, D. (2017, September 17–20). Simple online and realtime tracking with a deep association metric. 2017 IEEE International Conference on Image Processing (ICIP) [pp. 3645–3649], Beijing, China. https://doi:10.1109/ICIP.2017.8296962
Wu, J. Y., Yu, C., Fu, S. W., Liu, C. T., Chien, S. Y., & Tsao, Y. (2019). Increasing compactness of deep learning based speech enhancement models with parameter pruning and quantization techniques. IEEE Signal Processing Letters, 26(12), 1887–1891. https://doi.org/10.1109/LSP.2019.2951950
Xie, Y., Guo, Y., Hou, X., & Zheng, J. (2023, August 5–6). Mixed-precision collaborative quantization for fast object tracking. International Conference on Advances in Brain Inspired Cognitive Systems [pp. 229–238], Kuala Lumpur, Malaysia. https://doi.org/10.1007/978-981-97-1417-9_22
Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys, 38(4), 13–es. https://doi.org/10.1145/1177352.1177355
You, S., Yao, H., Bao, B. k., & Xu, C. (2023, 17-24 June 2023). UTM: A unified multiple object tracking model with identity-aware feature enhancement. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 21876–21886). https://doi:10.1109/CVPR52729.2023.02095
Zeng, K., You, Y., Shen, T., Wang, Q., Tao, Z., Wang, Z., & Liu, Q. (2023). NCT: Noise-control multi-object tracking. Complex & Intelligent Systems, 9(4), 4331–4347. https://doi.org/10.1007/s40747-022-00946-9
Zhai, H., Cheng, J., & Wang, M. (2020, December 11–13). Rethink the IoU-based loss functions for bounding box regression. 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC) [pp. 1522–1528], Chongqing, China. https://doi: 10.1109/ITAIC49862.2020.9339070