TY - GEN
T1 - DocUNet
T2 - 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
AU - Ma, Ke
AU - Shu, Zhixin
AU - Bai, Xue
AU - Wang, Jue
AU - Samaras, Dimitris
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/14
Y1 - 2018/12/14
N2 - Capturing document images is a common way for digitizing and recording physical documents due to the ubiquitousness of mobile cameras. To make text recognition easier, it is often desirable to digitally flatten a document image when the physical document sheet is folded or curved. In this paper, we develop the first learning-based method to achieve this goal. We propose a stacked U-Net [25] with intermediate supervision to directly predict the forward mapping from a distorted image to its rectified version. Because large-scale real-world data with ground truth deformation is difficult to obtain, we create a synthetic dataset with approximately 100 thousand images by warping non-distorted document images. The network is trained on this dataset with various data augmentations to improve its generalization ability. We further create a comprehensive benchmark1 that covers various real-world conditions. We evaluate the proposed model quantitatively and qualitatively on the proposed benchmark, and compare it with previous non-learning-based methods.
AB - Capturing document images is a common way for digitizing and recording physical documents due to the ubiquitousness of mobile cameras. To make text recognition easier, it is often desirable to digitally flatten a document image when the physical document sheet is folded or curved. In this paper, we develop the first learning-based method to achieve this goal. We propose a stacked U-Net [25] with intermediate supervision to directly predict the forward mapping from a distorted image to its rectified version. Because large-scale real-world data with ground truth deformation is difficult to obtain, we create a synthetic dataset with approximately 100 thousand images by warping non-distorted document images. The network is trained on this dataset with various data augmentations to improve its generalization ability. We further create a comprehensive benchmark1 that covers various real-world conditions. We evaluate the proposed model quantitatively and qualitatively on the proposed benchmark, and compare it with previous non-learning-based methods.
UR - https://www.scopus.com/pages/publications/85062338122
U2 - 10.1109/CVPR.2018.00494
DO - 10.1109/CVPR.2018.00494
M3 - Conference contribution
AN - SCOPUS:85062338122
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 4700
EP - 4709
BT - Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
PB - IEEE Computer Society
Y2 - 18 June 2018 through 22 June 2018
ER -