public class JackrabbitTextExtractor extends Object implements org.apache.jackrabbit.extractor.TextExtractor
TextFilterclass names and instantiates the configured classes.
CompositeTextExtractorinstance that contains all the configured extractors and to which all text extraction calls are delegated.
TextFilterExtractoradapter for a configured
TextFilterinstance when it is first used and adds that adapter to the composite extractor for use in text extraction.
EmptyTextExtractorinstance for any unsupported content types when first detected. The dummy extractor is added to the composite extractor to prevent future warnings about the same content type.
|Constructor and Description|
Creates a Jackrabbit text extractor containing the configured component classes.
|Modifier and Type||Method and Description|
Extracts the text content from the given binary stream.
Returns the content types that the component extractors are known to support.
public String getContentTypes()
public Reader extractText(InputStream stream, String type, String encoding) throws IOException
If a matching extractor is not found, then the configured text filters searched for an instance that claims to support the given content type. A text extractor adapter is created for that filter and saved in the extractor map for future use before delegating the request to the adapter.
If not even a text filter is found for the given content type, a warning is logged and an empty text extractor is created for that content type and saved in the extractor map for future use before delegating the request to the empty extractor.
stream- binary stream
type- content type
encoding- character encoding, or
IOException- if the binary stream can not be read
Copyright © 2016. All rights reserved.