Configuring Lucene Analyzer
Depending on the language used in the documents and properties, you can obtain better search results by configuring a proper Lucene Analyzer.
By default, OpenKM uses the org.apache.lucene.analysis.standard.StandardAnalyzer which works fine with English and most languages, but you can get better search results by configuring a more specific analyzer for your language.
Some analyzers:
- org.apache.lucene.analysis.en.EnglishAnalyzer
- org.apache.lucene.analysis.es.SpanishAnalyzer
- org.apache.lucene.analysis.fr.FrenchAnalyzer
- org.apache.lucene.analysis.it.ItalianAnalyzer
- org.apache.lucene.analysis.de.GermanAnalyzer
- org.apache.lucene.analysis.el.GreekAnalyzer
- org.apache.lucene.analysis.hi.HindiAnalyzer
Special analyzers:
- com.openkm.search.lucene.analysis.AccentInsensitiveAnalyzer
More information is available at Lucene documentation site and Guide to Lucene Analyzers.
If you want a non-accent-sensitive analyzer, you can try AccentInsensitiveAnalyzer.
If you are working with East Asian languages like Chinese or Japanese, there are several analyzers you can use. Read Lucene documentation. You can also try ik-analyzer.
If you only want a whitespace-tokenized analyzer, you can try this one WhitespaceAnalyzer.
If you have not configured the search analyzer before you start OpenKM for the first time, then Lucene indexes will be created using this default analyzer.
If you want to change this configuration property after the OpenKM repository has been created, you need to rebuild the Lucene indexes.
Once the operation has been completed, the Lucene indexes will use the new analyzer.
For more information, take a look at Rebuild indexes.
Configure an Analyzer
Edit the $TOMCAT_HOME/openkm.properties file and add the line:
spring.jpa.properties.hibernate.search.analyzer=org.apache.lucene.analysis.es.SpanishAnalyzer
The changes will take effect after restarting the application.