Performance configuration parameters

Text extractor queue

There are several configuration parameters to adapt the document text extraction to your needs. Text extraction is one of the processes that can use a lot of hardware resources ( memory and CPU's ). Aggressive text extraction policy can decrease the performance of the application and affect the end user feeling with user interface.

When documents are uploaded are automatically added into Text extraction queue. Can see Text extraction queue at Administration > Statistics > Text extraction queue.

There's a crontab task named "Text extractor worker" at Administration > Cron tab what control the indexing cycle

When a document is uploaded, it will not be able to search by content until is processed by Text extraction queue.

In case the application need indexing a lot of files per day, or you imported a considerable amount of files, consider create two crontab task to change these parameters between day ( less aggressive ) and night ( more aggressive ).

Some mime-types like pdf or images ( what needs OCR engine ) can use a lot of CPU ( top 100% ).

Field / Property	Type	Description
managed.text.extraction.batch	Boolean	Indicate the number of documents to be processed on each indexing cycle. When parameter "text.extraction.concurrent" is enabled, is a good practice to be multiple of "text.extraction.threads" value. 20
managed.text.extraction.concurrent	Boolean	Enable or disable concurrent text extraction concurrent threads feature. True
managed.text.extraction.threads	Integer	Number of concurrent threads will be used for text extraction feature. This number should be less or equal the number of hardware cores. 2

Field / Property

Type

Description

managed.text.extraction.batch

Boolean

Indicate the number of documents to be processed on each indexing cycle.

When parameter "text.extraction.concurrent" is enabled, is a good practice to be multiple of "text.extraction.threads" value.

managed.text.extraction.concurrent

Boolean

Enable or disable concurrent text extraction concurrent threads feature.

True

managed.text.extraction.threads

Integer

Number of concurrent threads will be used for text extraction feature.

This number should be less or equal the number of hardware cores.

OpenOffice LibreOffice conversion service

As part of application startup, OpenKM executes an OpenOffice or LibreOffice service. That's used internally for conversion purposes, for example to converting doc files to pdf.

OpenOffice or LibreOffice service can use a lot of hardware resources ( CPU top 100% ), that can decrease the performance of the application. A good practice is move OpenOffice or LibreOffice conversion service to another server.

Field / Property	Type	Description
system.openoffice.server	String	URL to OpenOffice or LibreOffice service. http://192.168.1.34:8080/converter/convert
system.openoffice.tasks	String	Restart service after x conversions to prevent memory leaks. 5

For more information read:

Antivirus checker

There are several configuration parameters to adapt the antivirus checker to your needs. Antivirus checker is one of the process that can use a lot of hardware resources ( memory and CPU's ). Aggressive antivirus checker policy can decrease the performance of the application and affect the end user feeling with user interface. The Antivirus checker works in live mode ( as uploading process antivirus check the document and if a virus is detected a warning error is immediately raised and the document is not incorporated to the repository ).

Antivirus checker in live mode can dramatically decrease the user interface and the end user feeling. Take in consideration that to analyse some documents the antivirus can check it in 3 seconds or more that will be added to the uploading time process.

Field / Property	Type	Description
system.antivir	String	Set the antivirus path.

Re-indexing the whole Lucene repository

Application can use two different ways of rebuilding Lucene indexes: sequential or parallel. By default sequential re-indexing is enabled, but you can select the mode using the "hibernate.indexer.mass.indexer" configuration parameter.

Field / Property	Type	Description
hibernate.indexer.mass.indexer	Boolean	When is enabled the rebuild Lucene indexes process goes into "Parallel mode".

Mode	Description
Sequential mode	You can use the "hibernate.indexer.batch.size.load.objects" configuration parameter to indicate to Hibernate how many object should handle every time. To avoid *OutOfMemory* problems, the repository needs to be re-indexed in batch. If the value of this property is too low, the performance will be bad but if is too high you can have *OutOfMemory* problems. hibernate.indexer.batch.size.load.objects: batch size used to load the root entities. 30
Parallel mode	You have several configuration properties to tune its performance: hibernate.indexer.batch.size.load.objects: batch size used to load the root entities. 30 hibernate.indexer.threads.subsequent.fetching: number of threads used to load the lazy collections related to the indexed entities. 8 hibernate.indexer.threads.load.objects: number of threads used to load the root entities. 4 hibernate.indexer.threads.index.writer: number of threads used to analyse the documents and write to the index. 3

Mode

Description

Sequential mode

You can use the "hibernate.indexer.batch.size.load.objects" configuration parameter to indicate to Hibernate how many object should handle every time. To avoid OutOfMemory problems, the repository needs to be re-indexed in batch. If the value of this property is too low, the performance will be bad but if is too high you can have OutOfMemory problems.

hibernate.indexer.batch.size.load.objects: batch size used to load the root entities.

Parallel mode

You have several configuration properties to tune its performance:

hibernate.indexer.batch.size.load.objects: batch size used to load the root entities.

hibernate.indexer.threads.subsequent.fetching: number of threads used to load the lazy collections related to the indexed entities.

hibernate.indexer.threads.load.objects: number of threads used to load the root entities.

hibernate.indexer.threads.index.writer: number of threads used to analyse the documents and write to the index.

Execution timeout

The Application executes external applications to process documents, for example extracts the text with an OCR engine, analyse with an Antivirus software, transforms documents to other formats among other actions. Sometimes these processes can take a lot of time or for some reason is not finished correctly and the process keep on the OS consuming resources. To prevent it, the parameter "system.execution.timeout" set the maximum allowed time of execution on external applications.

Field / Property	Type	Description
system.execution.timeout	Integer	Set the maximum allowed time of execution on external applications. The units are minutes. By default the configuration is set to 5 minutes. 5

Field / Property

Type

Description

system.execution.timeout

Integer

Set the maximum allowed time of execution on external applications.

The units are minutes. By default the configuration is set to 5 minutes.

Uploading bandwidth

Sometimes you want to restrict the total bandwidth used by each user while uploading files.

Field / Property	Type	Description
upload.throttle.filter	Boolean	Limit the total uploading bandwidth to 10kb/sec per user.

Remote conversion service

On huge repository with a lot of concurrent users is a good idea to configure a server only for document transformation.

To enable this feature is needed installing a specific OpenKM service application. Contact with OpenKM technical staff for more information.

Field / Property	Type	Description
remote.conversion.server	String	Set the URL of the conversion service. http://192.168.2.1:8080/converter/convert

Activity log actions

OpenKM can log a lot of information related to the activity of the users , but sometimes these actions don't need to be logged and fill your activity log table.

The table where activity log is stored - OKM_ACTIVITY - may grow quickly storing millions of records.

There is a configuration property name "activity.log.actions" where you can set which actions to log. By default this is set to the most common or interesting actions. You can use regular expressions to define these actions. Read Java Regex Tutorial for more info about Java regular expressions.

Field / Property	Type	Description
activity.log.actions	List	LOLOGIN LOGOUT CREATE_.* DELETE_.* PURGE_.* MOVE_.* COPY_.* CHECKOUT_DOCUMENT CHECKIN_DOCUMENT GET_DOCUMENT_CONTENT.*

For a complete list of action see Activity log.

Lucene

Lucene search engine can be disabled or configured in serveral ways. Depending on it, the behavior of the application and the performance over all might change.

These configuration parameteres going into the "OpenKM.cfg" configuration file.

Field / Property	Type	Description
hibernate.search	String	Enable or disable Lucene search engine. Allowed values: on off on
hibernate.search.worker.execution	String	Set the Lucene worker execution mode. By default is set to "sync" which means that every entity modification won't end until the Lucene index has been properly updated. In case of "async" these entities modification will be faster because the Lucene part is added to a queue and processed in a background thread. Allowed values: sync async sync
hibernate.search.worker.buffer.queue.max	Integer	The size of the worker queue to store index modification pending tasks when all worker threads are busy. Every index update action will be enqueued to minimise the entity modification time. In addition, a thread will consume this queue executing the index modification actions. This parameter only makes sense in case of async execution. 256
hibernate.search.worker.thread.pool.size	Integer	Set worker pool size. It's the number of threads which handle index modification tasks. Allowed values are Integer starting at 1. More than 2 won't be better because only one thread can modify the index at the same time. This parameter only makes sense in case of async execution. 1

Table of contents [ Hide Show ]

Text extractor queue
OpenOffice LibreOffice conversion service
Antivirus checker
Re-indexing the whole Lucene repository
Execution timeout
Uploading bandwidth
Remote conversion service
Activity log actions
Lucene

OpenKM 6.3 - CE

Performance configuration parameters

Text extractor queue

OpenOffice LibreOffice conversion service

Antivirus checker

Re-indexing the whole Lucene repository

Execution timeout

Uploading bandwidth

Remote conversion service

Activity log actions

Lucene