Tesseract raises a "Failed loading language 'en'" error

Symptoms

We have detected this error only on Windows.

You might encounter the issue at several points in the application:

  • Zone OCR is not working.
  • Text extraction fails for PDF or image documents, and you are not able to find them using the search engine.

The application raises an error like:

2018-11-22 15:46:09,835 [http-nio-0.0.0.0-8080-exec-10] [dms.support1] WARN  com.openkm.util.ExecutionUtils - Abnormal program termination: 1
2018-11-22 15:46:09,836 [http-nio-0.0.0.0-8080-exec-10] [dms.support1] WARN  com.openkm.util.ExecutionUtils - CommandLine: [C:\tomcat-8.5.24\extras\Tesseract-OCR-3.05.02\tesseract.exe, C:\tomcat-8.5.24\temp\okm6648884784480326422.jpg, C:\tomcat-8.5.24\temp\okm6036470490263572358]
2018-11-22 15:46:09,836 [http-nio-0.0.0.0-8080-exec-10] [dms.support1] WARN  com.openkm.util.ExecutionUtils - STDERR: Error opening data file C:\tomcat-8.5.24\extras\Tesseract-OCR\tesseract.exe/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

Cause

The Tesseract OCR engine is not working because the TESSDATA_PREFIX environment variable is missing or set incorrectly.

Solution

Add a new environment variable named TESSDATA_PREFIX and set its value to the Tesseract OCR installation path:

TESSDATA_PREFIX=C:\tomcat-8.5.24\extras\Tesseract-OCR-3.05.02

Properties

Properties

Date

2018-11-22

Applies to

  • Core
  • Third-party software integration

Keywords

  • AllVersions
  •