Multi-Temporal Remote Sensing Image Registration with Deep Neural Networks and Region of Interest

Document Type : Original Article

Authors
1 associate professor
2 Student of Iran University of Science and Technology
3 Iranian Space Research Institute
Abstract
The purpose of image registration is to align two or more images taken from the same scene at different times and/or from different perspectives and/or using different devices. In recent years, with the continuous improvement of human ability to observe the earth, the accuracy and quality of remote sensing images have increased. Therefore, the need for new image registration models that can perform high calculations of these images and also have good accuracy is observed. In this thesis, we have used a new method to solve these problems. The proposed solution includes the use of regions of interest in order to reduce the search area and increase the accuracy. For this purpose, first, the areas that are the same between two images are identified, and then, the image is registered according to the similar areas. To find the region of interest, a deep transformer neural network model is used. The proposed deep neural network of the transformer includes several layers of inner-attention and cross-attention, which has the task of learning the importance of different positions within an image and between two images. The proposed model is a self-supervised method that generate training data using the segment swapping. The training data was collected from Google Earth images and annotated by us. After training the model and obtaining the similar regions, we use the common SIFT model to obtain the image registration. For testing, we have used Sentinel-2 aerial images. To quantitatively evaluate the result, we use the root mean square error. Quantitative and qualitative results show a significant performance gap in cost and accuracy, compared to conventional methods of capturing aerial images.
Keywords
Subjects

[1] S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-netvlad: Multi-scale fusion of locally global descriptors for place recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.14141–14152, June 2021.
[2] Y.-C. Chen, Y.-Y. Lin, M.-H. Yang, and J.-B. Huang, “Show, match and segment: Joint weakly supervised learning of semantic matching and object co-segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.43, no.10, pp.3632–3647, 2021.
[3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), vol.30, Curran Associates, Inc., 2017.
[4] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” ArXiv, vol.1409, 09 2014.
[5] D. Lowe, “Object recognition from local scale-invariant features,” vol.2, pp.1150 – 1157, 1999.
[6] Y. Ye and J. Shan, “A local descriptor based registration method for multispectral remote sensing images with non-linear intensity differences,” ISPRS Journal of Photogrammetry and Remote Sensing,vol.90, p.83–95, 2014.
[7] Z. Hossein-nejad and M. Nasri, “Rkem: Redundant keypoint elimination method in image registration,” IET Image Processing, vol.11, 2017.
[8] W. Ma, Y. Wu, Y. Zheng, Z. Wen, and L. Liu, “Remote sensing image registration based on multi-feature and region division,” IEEE Geoscience and Remote Sensing Letters, vol.14, no.10, pp.1680–1684, 2017.
[9] W. Ma, Y. Wu, S. Liu, Q. Su, and Y. Zhong, “Remote sensing image registration based on phase congruency feature detection and spatial constraint matching,” IEEE Access, vol.6, pp.77554–77567, 2018.
[10] D. Quan, S. Wang, M. Ning, T. Xiong, and L. Jiao, “Using deep neural networks for synthetic aperture radar image registration,” pp.2799–2802, 2016.
[11] S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao, “A deep learning framework for remote sensing image registration,” ISPRS Journal of Photogrammetry and Remote Sensing, vol.145,2018.
[12] F. Ye, Y. Su, H. Xiao, X. Zhao, and W. Min, “Remote sensing image registration using convolutional neural network features,” IEEE Geoscience and Remote Sensing Letters, vol.PP, pp.1–5, 2018.
[13] Z. Yang, T. Dan, and Y. Yang, “Multi-temporal remote sensing image registration using deep convolutional features,” IEEE Access, vol.PP, pp.1–1, 2018.
[14] C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications.,” IEEE Trans. Pattern Anal. Mach. Intell., vol.33, pp.978–994, 2011.
[15] P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. binovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
[16] W. Jiang, E. Trulls, J. Hosang, A. Tagliasacchi, and K. Yi, “Cotr: Correspondence transformer for matching across images,” pp.6187–6197, 2021.
[17] J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” pp.8918–8927, 2021.
[18] V. H. Vo, F. Bach, M. Cho, K. Han, Y. Lecun, P. Perez, and J. Ponce, “Unsupervised image matching and object discovery as optimization,” 2019.
[19] V. H. Vo, F. Bach, M. Cho, K. Han, Y. Lecun, P. Perez, and J. Ponce, “Unsupervised image matching and object discovery as optimization,” 2019.
[20] X. Shen, A. Efros, and M. Aubry, “Discovering visual patterns in art collections with spatially-consistent feature learning,” pp.9270–9279, 2019.
[21] X. Shen, A. A. Efros, A. Joulin, and M. Aubry, “Learning co-segmentation by segment swappingfor retrieval and discovery,” in Proceedings of the IEEE/CVF Conference on Computer Vision andPattern Recognition (CVPR) Workshops, pp.5082–5092, June 2022.
[22] P. Pérez, M. Gangnet, and A. Blake, “Poisson image editing,” ACM Trans. Graph., vol.22, pp.313–318, 2003.
[23] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” pp.1510–1519, 2017.
[24] X. Shen, A. Efros, and M. Aubry, “Discovering visual patterns in art collections with spatially-consistent feature learning,” pp.9270–9279, 2019.
[25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” pp.770–778, 2016.
[26] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, “Imagenet: a large-scale hierarchical image database,” pp.248–255, 2009.
[27] X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with momentum contrastive learning,” 2020.
[28] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, End-to-End Object Detection with Transformers, pp.213–229. 2020.
[29] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International Conference on Learning Representations, 2014.
[30] H. Goncalves, J. A. Goncalves, and L. Corte-Real, “Measures for an objective evaluation of the geometric correction process quality,” IEEE Geoscience and Remote Sensing Letters, vol.6, no.2, pp.292–296, 2009.
[31] W. Ma, Y. Wu, Y. Zheng, Z. Wen, and L. Liu, “Remote sensing image registration based on multi-feature and region division,” IEEE Geoscience and Remote Sensing Letters, vol.14, no.10, pp.1680–1684, 2017.
Volume 4, Issue 2
March 2025
Pages 49-62

  • Receive Date 12 June 2024
  • Revise Date 31 October 2024
  • Accept Date 29 December 2024