Table detection in document images is a challenging problem due to diverse layouts, irregular structures, and embedded graphical elements. In this study, we present HTTD (Hierarchical Transformer for Table Detection), a cutting-edge model that combines a Swin-L Transformer backbone with advanced Transformer-based mechanisms to achieve superior performance. HTTD addresses three key challenges: handling diverse document layouts, including historical and modern structures; improving computational efficiency and training convergence; and demonstrating adaptability to non-standard tasks like medical imaging and receipt key detection. Evaluated on benchmark datasets, HTTD achieves state-of-the-art results, with precision rates of 96.98% on ICDAR-2019 cTDaR, 96.43% on TNCR, and 93.14% on TabRecSet. These results validate its effectiveness and efficiency, paving the way for advanced document analysis and data digitization tasks.
Research Date
Research Department
Research File
Research Journal
Mathematics
Research Member
Research Publisher
MDPI
Research Vol
13
Research Website
https://doi.org/10.3390/math13020266
Research Year
2025
Research Abstract