HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images

Research Authors

Mahmoud SalahEldin Kasem, Mohamed Mahmoud , Bilel Yagoub, Mostafa Farouk Senussi, Mahmoud Abdalla , andHyun-Soo Kang

Research Date

Tue, 01/14/2025 - 12:00

Research Department

Multimedia Department

Research File

mathematics-13-00266 (1).pdf

Research Journal

Mathematics

Research Member

Mahmoud Salah Eldin Mohammed

Mohamed Hamdy Hamdan Mahmoud

Mostafa Farouk

Research Publisher

MDPI

Research Vol

13

Research Website

https://doi.org/10.3390/math13020266

Research Year

2025

Research Abstract

Table detection in document images is a challenging problem due to diverse layouts, irregular structures, and embedded graphical elements. In this study, we present HTTD (Hierarchical Transformer for Table Detection), a cutting-edge model that combines a Swin-L Transformer backbone with advanced Transformer-based mechanisms to achieve superior performance. HTTD addresses three key challenges: handling diverse document layouts, including historical and modern structures; improving computational efficiency and training convergence; and demonstrating adaptability to non-standard tasks like medical imaging and receipt key detection. Evaluated on benchmark datasets, HTTD achieves state-of-the-art results, with precision rates of 96.98% on ICDAR-2019 cTDaR, 96.43% on TNCR, and 93.14% on TabRecSet. These results validate its effectiveness and efficiency, paving the way for advanced document analysis and data digitization tasks.