Tesseract raises a "Failed loading language 'en'" error
Symptoms
We have detected this error only on Windows.
You might encounter the issue at several points in the application:
- Zone OCR is not working.
- Text extraction fails for PDF or image documents, and you are not able to find them using the search engine.
The application raises an error like:
2018-11-22 15:46:09,835 [http-nio-0.0.0.0-8080-exec-10] [dms.support1] WARN com.openkm.util.ExecutionUtils - Abnormal program termination: 1
2018-11-22 15:46:09,836 [http-nio-0.0.0.0-8080-exec-10] [dms.support1] WARN com.openkm.util.ExecutionUtils - CommandLine: [C:\tomcat-8.5.24\extras\Tesseract-OCR-3.05.02\tesseract.exe, C:\tomcat-8.5.24\temp\okm6648884784480326422.jpg, C:\tomcat-8.5.24\temp\okm6036470490263572358]
2018-11-22 15:46:09,836 [http-nio-0.0.0.0-8080-exec-10] [dms.support1] WARN com.openkm.util.ExecutionUtils - STDERR: Error opening data file C:\tomcat-8.5.24\extras\Tesseract-OCR\tesseract.exe/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
Cause
The Tesseract OCR engine is not working because the TESSDATA_PREFIX environment variable is missing or set incorrectly.
Solution
Add a new environment variable named TESSDATA_PREFIX and set its value to the Tesseract OCR installation path:
TESSDATA_PREFIX=C:\tomcat-8.5.24\extras\Tesseract-OCR-3.05.02
Properties
Properties | |
---|---|
Date |
2018-11-22 |
Applies to |
|
Keywords
- AllVersions