Creating your own Text extractor
To create your own Text extractot you must create a new class that implements TextExtractor interface:
package com.openkm.plugin.extractor;
import net.xeoh.plugins.base.Plugin;
import java.io.IOException;
import java.io.InputStream;
public interface TextExtractor extends Plugin {
String[] getContentTypes();
String extractText(InputStream stream, String type, String encoding) throws IOException;
}
The new class must be loaded into the package com.openkm.plugin.extractor because application plugins system will try to load from there. See the sample below:
Do not miss the tag @PluginImplementation otherwise the application plugin system will not be able to retrieve the new class.
More information at Register a new plugin.
Methods description
Method | Type | Description |
---|---|---|
getContentTypes() |
String[] |
Returns the MIME types supported by this extractor. The returned strings must be in lower case, and the returned array must not be empty. |
extractText(InputStream stream, String type, String encoding) |
String |
Returns a reader for the text content of the given binary document. The content type and character encoding (if available and applicable) are given as arguments. |
Example of the Text extractor implementation
package com.openkm.plugin.extractor;
import net.xeoh.plugins.base.annotations.PluginImplementation;
import org.apache.commons.io.IOUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
/**
* Text extractor for plain text.
*/
@PluginImplementation
public class PlainTextExtractor extends AbstractTextExtractor {
/**
* Logger instance.
*/
private static final Logger logger = LoggerFactory.getLogger(PlainTextExtractor.class);
/**
* Creates a new <code>PlainTextExtractor</code> instance.
*/
public PlainTextExtractor() {
super(new String[] { "text/plain" });
}
// -------------------------------------------------------< TextExtractor >
/**
* Wraps the given input stream to an {@link InputStreamReader} using
* the given encoding, or the platform default encoding if the encoding
* is not given or is unsupported. Closes the stream and returns an empty
* reader if the given encoding is not supported.
*
* @param stream binary stream
* @param type ignored
* @param encoding character encoding, optional
* @return reader for the plain text content
* @throws IOException if the binary stream can not be closed in case
* of an encoding issue
*/
public String extractText(InputStream stream, String type, String encoding) throws IOException {
try {
if (encoding != null) {
return IOUtils.toString(stream, encoding);
}
} catch (UnsupportedEncodingException e) {
logger.warn("Unsupported encoding '{}', using default ({}) instead.",
new Object[] { encoding, System.getProperty("file.encoding") });
}
return IOUtils.toString(stream);
}
}