Journal of Beijing University of Posts and Telecommunications
Received:
Revised:
Published:
Abstract: To achieve automatic structured analysis of formatting information in digital and scanned documents, we propose a universal document layout analysis method based on a parameter reallocation strategy. By introducing the concept of parameter reallocation, we optimize the overall balance of the model. Firstly, we integrate the ideas of ODConv and FasterNet into the feature pyramid network structure, lightweighting the neck layer to reduce the risk of overfitting. Next, we introduce the Inception-SPPF spatial pyramid pooling structure to enhance the feature extraction capability for targets of different scales. Finally, we design the C3RepLKBlock universal module, utilizing large convolutional kernels for global feature extraction. Through the gradient flow concept of feature fusion-guided structural reparameterization, we address the issue of excessive smoothing. Experimental results demonstrate that the improved model achieves a mAP of 95.9% on the PubLayNet dataset, significantly outperforming YOLOv5s and other algorithms in the 0.5-0.95 intersection over union range. This method meets the stable, reliable, and high-precision requirements of document layout analysis tasks.
Key words: Document layout analysis, Parameter reallocation, Lightweighting, Spatial pyramid pooling, Large convolutional kernel design.
CLC Number:
TP391
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://journal.bupt.edu.cn/EN/