JackrabbitTextExtractor (OpenKM Web Application 6.3.1 API)

java.lang.Object
- com.openkm.module.jcr.stuff.apache.JackrabbitTextExtractor

All Implemented Interfaces:

org.apache.jackrabbit.extractor.TextExtractor
```
public class JackrabbitTextExtractor
extends Object
implements org.apache.jackrabbit.extractor.TextExtractor
```
Backwards-compatible Jackrabbit text extractor component. This class implements the following functionality:
- Parses the configured TextExtractor and TextFilter class names and instantiates the configured classes.
- Acts as the delegate extractor for any configured DelegatingTextExtractor instances.
- Maintains a CompositeTextExtractor instance that contains all the configured extractors and to which all text extraction calls are delegated.
- Creates a TextFilterExtractor adapter for a configured TextFilter instance when it is first used and adds that adapter to the composite extractor for use in text extraction.
- Logs a warning and creates a dummy EmptyTextExtractor instance for any unsupported content types when first detected. The dummy extractor is added to the composite extractor to prevent future warnings about the same content type.

Constructor Summary

Constructors
Constructor and Description
`JackrabbitTextExtractor()`
`JackrabbitTextExtractor(List<String> classes)` Creates a Jackrabbit text extractor containing the configured component classes.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Reader`	`extractText(InputStream stream, String type, String encoding)` Extracts the text content from the given binary stream.
`String[]`	`getContentTypes()` Returns the content types that the component extractors are known to support.
`org.apache.jackrabbit.extractor.CompositeTextExtractor`	`getExtractor(String type)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - JackrabbitTextExtractor
```
public JackrabbitTextExtractor()
```
  - JackrabbitTextExtractor
```
public JackrabbitTextExtractor(List<String> classes)
```
    Creates a Jackrabbit text extractor containing the configured component classes.
    
    Parameters:
    
    classes - configured TextExtractor (and TextFilter) class names (space- or comma-separated)
- Method Detail
  - getContentTypes
```
public String[] getContentTypes()
```
    Returns the content types that the component extractors are known to support.
    
    Specified by:
    
    getContentTypes in interface org.apache.jackrabbit.extractor.TextExtractor
    
    Returns:
    
    supported content types
  - extractText
```
public Reader extractText(InputStream stream,
                          String type,
                          String encoding)
                   throws IOException
```
    Extracts the text content from the given binary stream. The given content type is used to look up a configured text extractor to which to delegate the request.
    If a matching extractor is not found, then the configured text filters searched for an instance that claims to support the given content type. A text extractor adapter is created for that filter and saved in the extractor map for future use before delegating the request to the adapter.
    If not even a text filter is found for the given content type, a warning is logged and an empty text extractor is created for that content type and saved in the extractor map for future use before delegating the request to the empty extractor.
    
    Specified by:
    
    extractText in interface org.apache.jackrabbit.extractor.TextExtractor
    
    Parameters:
    
    stream - binary stream
    
    type - content type
    
    encoding - character encoding, or null
    
    Returns:
    
    reader for the text content of the binary stream
    
    Throws:
    
    IOException - if the binary stream can not be read
  - getExtractor
```
public org.apache.jackrabbit.extractor.CompositeTextExtractor getExtractor(String type)
                                                                    throws IOException
```
    Throws:
    
    IOException

Class JackrabbitTextExtractor

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

JackrabbitTextExtractor

JackrabbitTextExtractor

Method Detail

getContentTypes

extractText

getExtractor