
  • EI核心期刊


• •    



  1. 南昌航空大学
  • 收稿日期:2023-12-26 修回日期:2024-04-12 发布日期:2024-07-18
  • 通讯作者: 杨词慧
  • 基金资助:

Method of document layout analysis based on parameter reallocation strategy

  • Received:2023-12-26 Revised:2024-04-12 Published:2024-07-18

摘要: 为实现对数字文档和扫描文档中排版格式信息的自动结构化分析,我们提出了一种参数重分配策略的通用文档版面分析方法。通过引入参数重分配思想,优化模型整体参数平衡。首先,在特征金字塔网络结构中融入ODConv和FasterNet思想,轻量化neck层以降低过拟合风险。接着,提出了Inception-SPPF空间金字塔池化结构,提高对不同尺度目标的特征提取能力。最后,设计了C3RepLKBlock通用模块,利用大卷积核实现全局特征提取,通过梯度流思想的特征融合引导结构重参数化,解决过度平滑问题。实验结果显示,改进后的模型在PubLayNet数据集上mAP0.5-0.95达到95.9%,明显优于YOLOv5s和其他算法。本方法可以满足文档版面分析任务的稳定可靠、高精度要求。

关键词: 文档版面分析, 参数重分配, 轻量化, 空间金字塔池化, 大卷积核设计

Abstract: To achieve automatic structured analysis of formatting information in digital and scanned documents, we propose a universal document layout analysis method based on a parameter reallocation strategy. By introducing the concept of parameter reallocation, we optimize the overall balance of the model. Firstly, we integrate the ideas of ODConv and FasterNet into the feature pyramid network structure, lightweighting the neck layer to reduce the risk of overfitting. Next, we introduce the Inception-SPPF spatial pyramid pooling structure to enhance the feature extraction capability for targets of different scales. Finally, we design the C3RepLKBlock universal module, utilizing large convolutional kernels for global feature extraction. Through the gradient flow concept of feature fusion-guided structural reparameterization, we address the issue of excessive smoothing. Experimental results demonstrate that the improved model achieves a mAP of 95.9% on the PubLayNet dataset, significantly outperforming YOLOv5s and other algorithms in the 0.5-0.95 intersection over union range. This method meets the stable, reliable, and high-precision requirements of document layout analysis tasks.

Key words: Document layout analysis, Parameter reallocation, Lightweighting, Spatial pyramid pooling, Large convolutional kernel design.
