public class Tesseract3TextExtractor extends AbstractTextExtractor
Constructor and Description |
---|
Tesseract3TextExtractor()
Creates a new
TextExtractor instance. |
Modifier and Type | Method and Description |
---|---|
String |
doOcr(File tmpFileIn)
Performs OCR on image file
|
String |
doOcr(String ocr,
File tmpFileIn)
Performs OCR on image file
|
String |
extractText(File input)
Extract text from image using Tesseract OCR
|
String |
extractText(InputStream stream,
String type,
String encoding)
Returns a reader for the text content of the given binary document.
|
String |
extractText(String ocr,
File input)
Extract text from image using Tesseract OCR
|
String |
extractText(String ocr,
InputStream stream,
String type,
String encoding) |
getContentTypes
public Tesseract3TextExtractor()
TextExtractor
instance.public String extractText(InputStream stream, String type, String encoding) throws IOException
TextExtractor.getContentTypes()
unless the
implementation explicitly permits other content types.
The implementation can choose either to read and parse the given document immediately or to return a reader that does it incrementally. The only constraint is that the implementation must close the given stream latest when the returned reader is closed. The caller on the other hand is responsible for closing the returned reader.
The implementation should only throw an exception on transient errors, i.e. when it can expect to be able to successfully extract the text content of the same binary at another time. An effort should be made to recover from syntax errors and other similar problems.
This method should be thread-safe, i.e. it is possible that this method is invoked simultaneously by different threads to extract the text content of different documents. On the other hand the returned reader does not need to be thread-safe.
stream
- binary document from which to extract texttype
- MIME type of the given document, lower caseencoding
- the character encoding of the binary data,
or null
if not availableIOException
- on transient errorspublic String extractText(File input) throws IOException
IOException
public String extractText(String ocr, InputStream stream, String type, String encoding) throws IOException
IOException
public String extractText(String ocr, File input) throws IOException
IOException
public String doOcr(File tmpFileIn) throws IOException
IOException
public String doOcr(String ocr, File tmpFileIn) throws IOException
IOException
Copyright © 2017 Open Knowledge Management System S.L.. All rights reserved.