Enhancing Multi-Object Tracking with Compact Model Adjustment

Pimpa Cheewaprakobkit

doi:10.62370/hbds.v25i3.276177

pdf

Published: Dec 11, 2024

DOI: https://doi.org/10.62370/hbds.v25i3.276177

Keywords:

Multi-Object Tracking Compact Model Adjustment Transformer architecture

Pimpa Cheewaprakobkit

Asia-Pacific International University, Thailand

Abstract

Tracking human movement and interactions in complex environments is a key challenge in computer vision, especially for multi-object tracking. Transformer-based models have shown promise in addressing these challenges due to their capacity to recognize complex patterns across sequences. However, their high computational demands and substantial training data requirements often restrict their real-world applicability. This study aimed to enhance multi-object tracking by introducing a Compact Model Adjustment approach that integrates trainable rank-decomposition matrices within the Transformer architecture. This approach involves freezing the pre-trained model weights and adding trainable low-rank matrices to each layer, substantially reducing the number of parameters that need updating during training. This design allows the model to retain its pre-trained knowledge while efficiently adapting to new tasks, thereby reducing the overall computational load. Additionally, the proposed approach utilizes data from both the current and previous frames to refine object localization and association. Experimental results on the MOT17 benchmark demonstrated that this method achieved a Multiple Object Tracking Accuracy of 71.0, comparable to state-of-the-art techniques while enhancing computational efficiency. This work provides a practical solution for real-world applications in areas such as surveillance, autonomous driving, and sports analytics.

Issue

Vol. 25 No. 3 (2024): September - December 2024

Section

Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Copyright: Asia-Pacific International University reserve exclusive rights to publish, reproduce and distribute the manuscript and all contents therein.

References

Alzubaidi, L., Bai, J., Al-Sabaawi, A., Santamaría, J., Albahri, A. S., Al-dabbagh, B. S. N., Fadhel, M. A., Manoufali, M., Zhang, J., Al-Timemy, A. H., Duan, Y., Abdullah, A., Farhan, L., Lu, Y., Gupta, A., Albu, F., Abbosh, A., & Gu, Y. (2023). A survey on deep learning tools dealing with data scarcity: Definitions, challenges, solutions, tips, and applications. Journal of Big Data, 10(1), 1–82. https://doi.org/10.1186/s40537-023-00727-2

Amosa, T. I., Sebastian, P., Izhar, L. I., Ibrahim, O., Ayinla, L. S., Bahashwan, A. A., Bala, A., & Samaila, Y. A. (2023). Multi-camera multi-object tracking: A review of current trends and future advances. Neurocomputing, 552, 126558. https://doi.org/https://doi.org/10.1016/j.neucom.2023.126558

Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016, September 25–28). Simple online and realtime tracking [Paper presentation]. 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA. https://doi.10.1109/ICIP.2016.7533003

Boragule, A., Jang, H., Ha, N., & Jeon, M. (2022). Pixel-guided association for multi-object tracking. Sensors, 22(22), 8922. https://www.mdpi.com/1424-8220/22/22/8922

Cao, Q. (2021, July 28–30). Experimental study on the effect of loss function on object detection [Paper presentation]. Proceedings of the 2021 International Conference on Pattern Recognition and Intelligent Systems, Bangkok, Thailand. https://doi.org/10.1145/3480651.3480690

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020, August 23–28). End-to-End Object Detection with Transformers [Paper presentation]. Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, Proceedings, Part I, Glasgow, United Kingdom. https://doi.org/10.1007/978-3-030-58452-8_13

Du, C., Lin, C., Jin, R., Chai, B., Yao, Y., & Su, S. (2024). Exploring the state-of-the-art in multi-object tracking: A comprehensive survey, evaluation, challenges, and future directions. Multimedia Tools and Applications, 83, 73151–73189. https://doi.org/10.1007/s11042-023-17983-2

Emami, P., Pardalos, P. M., Elefteriadou, L., & Ranka, S. (2020). Machine learning methods for data association in multi-object tracking. ACM Computing Survey, 53(4), 1–34. https://doi.org/10.1145/3394659

Faber, N. G., Ziabari, S. S. M., & Nejadasl, F. K. (2024). Leveraging foundation models via knowledge distillation in multi-object tracking: Distilling DINOv2 features to FairMOT. ArXiv, abs/2407.18288. https://doi:10.48550/arXiv.2407.18288

Fournier, Q., Caron, G. M., & Aloise, D. (2023). A practical survey on faster and lighter transformers. ACM Computing Survey, 55(14s), 1–40. https://doi.org/10.1145/3586074

Hay, T. D., & Wolf, L. (2024). Dynamic layer tying for parameter-efficient transformers. ArXiv, abs/2401.12819. https://arxiv.org/html/2401.12819v1

He, Y., Wei, X., Hong, X., Ke, W., & Gong, Y. (2022). Identity-quantity harmonic multi-object tracking. IEEE Transactions on Image Processing, 31, 2201–2215. https://doi.org/10.1109/TIP.2022.3154286

Hu, J. E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., & Chen, W. (2021). LoRA: Low-rank adaptation of large language models. ArXiv, abs/2106.09685. https://doi.org/10.48550/arXiv.2106.09685

Karthik, S., Prabhu, A., & Gandhi, V. (2020). Simple unsupervised multi-object tracking. ArXiv, abs/2006.02609. https://doi:10.48550/arXiv.2006.02609

Khodarahmi, M., & Maihami, V. (2023). A review on Kalman Filter Models. Archives of Computational Methods in Engineering, 30(1), 727–747. https://doi.org/10.1007/s11831-022-09815-7

Liu, Q., Chen, D., Chu, Q., Yuan, L., Liu, B., Zhang, L., & Yu, N. (2022). Online multi-object tracking with unsupervised re-identification learning and occlusion estimation. Neurocomput., 483(C), 333–347. https://doi.org/10.1016/j.neucom.2022.01.008

Manakitsa, N., Maraslidis, G. S., Moysis, L., & Fragulis, G. F. (2024). A review of machine learning and deep learning for object detection, semantic segmentation, and human action recognition in machine and robotic vision. Technologies, 12(2), 15. https://www.mdpi.com/2227-7080/12/2/15

Meinhardt, T., Kirillov, A., Leal-Taixé, L., & Feichtenhofer, C. (2022, June 18–24). TrackFormer: Multi-object tracking with transformers. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) [pp. 8834–8844], New Orleans, Louisiana, USA. https://doi: 10.1109/CVPR52688.2022.00864

Patwardhan, N., Marrone, S., & Sansone, C. (2023). Transformers in the real world: A survey on NLP applications. Information, 14(4), 242. https://doi.org/10.3390/info14040242

Psalta, A., Tsironis, V., & Karantzalos, K. (2024). Transformer-based assignment decision network for multiple object tracking. Computer Vision and Image Understanding, 241(C), 103957. https://doi.org/10.1016/j.cviu.2024.103957

Singh, A., & Príncipe, J. C. (2010, July 18–23). A loss function for classification based on a robust similarity metric. The 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain. https://doi:10.1109/IJCNN.2010.5596485

Stadler, D., & Beyerer, J. (2021, November 16–19). Multi-pedestrian tracking with clusters. 2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) [pp. 1–10], Washington, District of Columbia, USA. https://doi: 10.1109/AVSS52988.2021.9663829.

Sun, P., Jiang, Y., Zhang, R., Xie, E., Cao, J., Hu, X., Kong, T., Yuan, Z., Wang, C., & Luo, P. (2020). TransTrack: Multiple-object tracking with transformer. ArXiv, abs/2012.15460. https://doi:10.48550/arXiv.2012.15460

Wan, X., Zhou, S., Wang, J., & Meng, R. (2021). Multiple object tracking by trajectory map regression with temporal priors embedding. Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event [pp. 1377–1386], China. https://doi.org/10.1145/3474085.3475304

Wang, X., Hu, J., Lai, J., Zhang, J., & Zheng, W. (2019). Progressive teacher-student learning for early action prediction. Conference on Computer Vision and Pattern Recognition 2019 [pp. 3551–3560], Long Beach, California, United States. http://cvpr2019.thecvf.com/

Wojke, N., Bewley, A., & Paulus, D. (2017, September 17–20). Simple online and realtime tracking with a deep association metric. 2017 IEEE International Conference on Image Processing (ICIP) [pp. 3645–3649], Beijing, China. https://doi:10.1109/ICIP.2017.8296962

Wu, J. Y., Yu, C., Fu, S. W., Liu, C. T., Chien, S. Y., & Tsao, Y. (2019). Increasing compactness of deep learning based speech enhancement models with parameter pruning and quantization techniques. IEEE Signal Processing Letters, 26(12), 1887–1891. https://doi.org/10.1109/LSP.2019.2951950

Xie, Y., Guo, Y., Hou, X., & Zheng, J. (2023, August 5–6). Mixed-precision collaborative quantization for fast object tracking. International Conference on Advances in Brain Inspired Cognitive Systems [pp. 229–238], Kuala Lumpur, Malaysia. https://doi.org/10.1007/978-981-97-1417-9_22

Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys, 38(4), 13–es. https://doi.org/10.1145/1177352.1177355

You, S., Yao, H., Bao, B. k., & Xu, C. (2023, 17-24 June 2023). UTM: A unified multiple object tracking model with identity-aware feature enhancement. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 21876–21886). https://doi:10.1109/CVPR52729.2023.02095

Zeng, K., You, Y., Shen, T., Wang, Q., Tao, Z., Wang, Z., & Liu, Q. (2023). NCT: Noise-control multi-object tracking. Complex & Intelligent Systems, 9(4), 4331–4347. https://doi.org/10.1007/s40747-022-00946-9

Zhai, H., Cheng, J., & Wang, M. (2020, December 11–13). Rethink the IoU-based loss functions for bounding box regression. 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC) [pp. 1522–1528], Chongqing, China. https://doi: 10.1109/ITAIC49862.2020.9339070

Article Sidebar

Main Article Content

Abstract

Article Details

References