Configuring Lucene Analyzer
Depending on the language used in the documents and properties, you have obtain better search results configuring a proper Lucene Analyzer.
By default, OpenKM use the org.apache.lucene.analysis.standard.StandardAnalyzer which works fine with English and most languages, but you can get better search results configuring more specific analyzer for you language.
Some analyzers:
- org.apache.lucene.analysis.en.EnglishAnalyzer
- org.apache.lucene.analysis.es.SpanishAnalyzer
- org.apache.lucene.analysis.fr.FrenchAnalyzer
- org.apache.lucene.analysis.it.ItalianAnalyzer
- org.apache.lucene.analysis.de.GermanAnalyzer
- org.apache.lucene.analysis.el.GreekAnalyzer
- org.apache.lucene.analysis.hi.HindiAnalyzer
More information at:
If you are working with oriental languages like Chinese or Japanese you have several analyzers to use. Read Lucene documentation. You can also try ik-analyzer
If you want only a white space tokenized analyzer can try with this one WhitespaceAnalyzer.
If you have not configured the search analyzer before the first time you start OpenKM, then Lucene indexed will be created using this default analyzer.
If you want to change this configuration property after the OpenKM repository has been created you need to Rebuild Lucene Indexes.
Once the operation has been completed, the Lucene indexes will be using the new analyzer.
For more information take a look at Rebuild indexes.
Configure an Analyzer
Edit the $TOMCAT_HOME/OpenKM.cfg file and add the line:
hibernate.search.analyzer=org.apache.lucene.analysis.es.SpanishAnalyzer
The changes will take effect after restarting the application.