public class JackrabbitTextExtractor extends Object implements org.apache.jackrabbit.extractor.TextExtractor
TextExtractor
and TextFilter
class names and instantiates the configured classes.
DelegatingTextExtractor
instances.
CompositeTextExtractor
instance that contains
all the configured extractors and to which all text extraction calls
are delegated.
TextFilterExtractor
adapter for a configured
TextFilter
instance when it is first used and adds that adapter
to the composite extractor for use in text extraction.
EmptyTextExtractor
instance
for any unsupported content types when first detected. The dummy
extractor is added to the composite extractor to prevent future
warnings about the same content type.
Constructor and Description |
---|
JackrabbitTextExtractor() |
JackrabbitTextExtractor(List<String> classes)
Creates a Jackrabbit text extractor containing the configured component
classes.
|
Modifier and Type | Method and Description |
---|---|
Reader |
extractText(InputStream stream,
String type,
String encoding)
Extracts the text content from the given binary stream.
|
String[] |
getContentTypes()
Returns the content types that the component extractors are known
to support.
|
org.apache.jackrabbit.extractor.CompositeTextExtractor |
getExtractor(String type) |
public JackrabbitTextExtractor()
public JackrabbitTextExtractor(List<String> classes)
classes
- configured TextExtractor
(and TextFilter
)
class names (space- or comma-separated)public String[] getContentTypes()
getContentTypes
in interface org.apache.jackrabbit.extractor.TextExtractor
public Reader extractText(InputStream stream, String type, String encoding) throws IOException
If a matching extractor is not found, then the configured text filters searched for an instance that claims to support the given content type. A text extractor adapter is created for that filter and saved in the extractor map for future use before delegating the request to the adapter.
If not even a text filter is found for the given content type, a warning is logged and an empty text extractor is created for that content type and saved in the extractor map for future use before delegating the request to the empty extractor.
extractText
in interface org.apache.jackrabbit.extractor.TextExtractor
stream
- binary streamtype
- content typeencoding
- character encoding, or null
IOException
- if the binary stream can not be readpublic org.apache.jackrabbit.extractor.CompositeTextExtractor getExtractor(String type) throws IOException
IOException
Copyright © 2016. All rights reserved.