Tesseract 4 disable page segmentation

Symptons

Tesseract version 4 added a new feature named "page segmentation" what is enabled by default. Usually you might be interested in processing OCR document as a single uniform block of text rathern than page segmentation.

Cause

Tesseract version 4 works extracting text with "page segmentation" by default.

Solution

Description of the steps for solving the issue:

https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage

Basically adding new parameter --psm to configuration parameter system.ocr:

system.ocr = /usr/bin/tesseract ${fileIn} ${fileOut} -l eng --psm 6

Properties

Properties
Date	2018-12-21
Applies to	Core.

Keywords

AllVersions

�

Table of contents [ Hide Show ]

Symptons
Cause
Solution
Properties