Tesseract raise Failed loading language 'en' error

Symptons

We have detected the error only in Windows OS.

You might detect the issue in several points of the application:

  • Zone OCR is not working.
  • Text extraction process fails in PDF or image documents and you are not able to find them from search engine.

The application raise an error like:

2018-11-22 15:46:09,835 [http-nio-0.0.0.0-8080-exec-10] [dms.support1] WARN  com.openkm.util.ExecutionUtils - Abnormal program termination: 1
2018-11-22 15:46:09,836 [http-nio-0.0.0.0-8080-exec-10] [dms.support1] WARN  com.openkm.util.ExecutionUtils - CommandLine: [C:\tomcat-8.5.24\extras\Tesseract-OCR-3.05.02\tesseract.exe, C:\tomcat-8.5.24\temp\okm6648884784480326422.jpg, C:\tomcat-8.5.24\temp\okm6036470490263572358]
2018-11-22 15:46:09,836 [http-nio-0.0.0.0-8080-exec-10] [dms.support1] WARN  com.openkm.util.ExecutionUtils - STDERR: Error opening data file C:\tomcat-8.5.24\extras\Tesseract-OCR\tesseract.exe/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

Cause

The tesseract OCR engine is not working because there's a missing or wrong environment variable TESSDATA_PREFIX value.

Solution

Add a new environment variable named TESSDATA_PREFIX and set the value of the Tesserract OCR installation path:

TESSDATA_PREFIX=C:\tomcat-8.5.24\extras\Tesseract-OCR-3.05.02

Properties

Properties

Date

2018-11-22

Applies to

  • Core
  • Thirdparty software integration

Keywords

  • AllVersions
  •