Cluster configuration

The updated cluster architecture presents enhancements over the previous one and improves specific bugs found in early versions:

Single error point: Only the master can modify the index. If the master fails, the whole system will have problems.
Latency: Slave nodes have to refresh their local index every few minutes. So there is a latency between master and slave while the process completes.

This kind of architecture provides server load spreading as well as high availability. Let's take a look at the proposed architecture. Several OpenKM servers and one (or several) HAProxy servers will distribute the user petitions to these OpenKM servers. HAProxy can also be configured to detect which OpenKM server is down and avoid sending petitions until it is up again.

Optionally you can configure several replicated database servers (also handled by another HAProxy), which prevent the service from being interrupted in case of a database node goes down. By default, OpenKM stores the documents in disk to maximize performance, so you need these files replicated across your OpenKM servers. An option is GlusterFS, a software solution to create distributed filesystems, but NFS is quite good in most situations and easier to set up in the case of Linux servers.

Remember that this is only a proposal, and you are free to implement whatever infrastructure you want to achieve the same results. To avoid an excessive and complex configuration, we will describe a simplified architecture that will provide the same objective: the master node, the database, and the NFS server are in the same machine.

Remember that every OpenKM instance should use the same database instance. Only one database for all the instances. This resource is configured in TOMCAT_HOME/openkm.properties file.

If you look into the TOMCAT_HOME/repository directory, you can see these others:

cache: These are generated files used primarily on the preview. In case of being deleted, they will be re-created again.
Datastore: These files are the binary content of the documents stored in OpenKM. You should take care of them, and a backup is always recommended.
Index: Where Lucene stores the indexes used in document and metadata search. In case of deletion, you can re-index the repository from Administration > Utilities.

You can place cache and datastore folders in another server as shared resources, according to this information. On average, 80% of user actions are read documents, and only 20% write (create or update documents). The problematic one is the index folder because Lucene indexes can't be shared between several OpenKM instances. So, what can we do? The solution is that each OpenKM instance has its index, and they send messages between them to keep their local index updated. If a node needs to update the index, it also sends a message to the other nodes, which will process the modification locally.

Proxy node

This is the only node that should be accessible by the users and will dispatch the petitions between the slave nodes. The most important part is the HAProxy configuration. This configuration format works on HAProxy v1.5

frontend LB
	bind 192.168.0.220:80
	reqadd X-Forwarded-Proto:\ http
	default_backend LB

backend LB 192.168.0.220:80
	mode http
	stats enable
	stats hide-version
	stats uri /stats
	stats realm HAProxy\ Statistics
	stats auth admin:admin
	balance roundrobin
	option httpchk
	option httpclose
	option forwardfor
	cookie LB insert
	server node1 192.168.0.222:8080 cookie node1 check
	server node2 192.168.0.223:8080 cookie node2 check

According to this configuration, user petitions will be balanced using the round-robin algorithm between the two OpenKM servers. If any OpenKM servers fail, the petitions will only be forwarded to the working one. At http://proxy/stats (default user: admin, password: admin), you will be able to see the node's status and other statistics. A complete manual of HAProxy is available at http://cbonte.github.io/haproxy-dconv/configuration-1.5.html. If you use another HAProxy version, some options may have changed.

Master node

This kind of node is not very different from the slaves, and the only real difference is it can run crontab tasks among other administration actions. As we said previously, to simplify the configuration, this master node will also store the cache and datastore files.

In this example, we will also use this server as an NFS server, but it's widespread to have a SAN that acts as the NFS server.

Let's create these directories:

$ sudo mkdir /mnt/okm_cache
$ sudo mkdir /mnt/okm_dstore

$ sudo mkdir /mnt/okm_extraction

$ sudo mkdir /mnt/okm_ocr_template
$ sudo chown openkm:openkm /mnt/okm_*

We are exposing these directories by NFS, so we need to install support:

$ sudo apt-get install nfs-kernel-server

Once installed, edit the /etc/exports file and add these lines:

/mnt/okm_cache *(rw,sync,no_root_squash,no_subtree_check)
/mnt/okm_dstore *(rw,sync,no_root_squash,no_subtree_check)
/mnt/okm_extraction *(rw,sync,no_root_squash,no_subtree_check)
/mnt/okm_ocr_template *(rw,sync,no_root_squash,no_subtree_check)

After saving it, you have to execute this command (every time you modify this file, you should execute it):

$ sudo exportfs -ra

Now let's make the symbolic links:

$ sudo ln -s /mnt/okm_cache TOMCAT_HOME/repository/cache
$ sudo ln -s /mnt/okm_dstore TOMCAT_HOME/repository/datastore

$ sudo ln -s /mnt/okm_extraction TOMCAT_HOME/repository/extraction

$ sudo ln -s /mnt/okm_ocr_template TOMCAT_HOME/repository/ocr_template
$ sudo chown -h openkm:openkm TOMCAT_HOME/repository/cache
$ sudo chown -h openkm:openkm TOMCAT_HOME/repository/datastore

$ sudo chown -h openkm:openkm TOMCAT_HOME/repository/extraction

$ sudo chown -h openkm:openkm TOMCAT_HOME/repository/ocr_template

Instead of making links, you can configure OpenKM to create these directories into other locations:

repository.extraction.home=/mnt/okm_extraction
repository.datastore.home=/mnt/okm_dstore
repository.cache.home=/mnt/okm_cache
repository.ocr.template.home=/mnt/okm_ocr_template

To enable this configuration, edit the TOMCAT_HOME/openkm.properties and uncomment this part:

# Cluster - Master
cluster.enabled=true
cluster.node=master

Slave nodes

These nodes can write to cache and datastore folders. Here we also need to create the shared directories which be used as mount points:

$ sudo mkdir /mnt/okm_cache
$ sudo mkdir /mnt/okm_dstore

$ sudo mkdir /mnt/okm_extraction

$ sudo mkdir /mnt/okm_ocr_template
$ sudo chown openkm:openkm /mnt/okm_*

Install NFS client support:

$ sudo apt-get install nfs-common

We want them to be mounted automatically, so edit the /etc/fstab file and append:

# Cluster
master:/mnt/okm_cache /mnt/okm_cache nfs auto,noatime,rsize=8192,wsize=8192,timeo=14,intr
master:/mnt/okm_dstore /mnt/okm_dstore nfs auto,noatime,rsize=8192,wsize=8192,timeo=14,intr
master:/mnt/okm_extraction /mnt/okm_extraction nfs auto,noatime,rsize=8192,wsize=8192,timeo=14,intr 
master:/mnt/okm_ocr_template /mnt/okm_ocr_template nfs auto,noatime,rsize=8192,wsize=8192,timeo=14,intr

Edit the /etc/hosts file and configure it to resolve the "master" server name to the correct IP.

Now let's mount them:

$ sudo mount -a

Let's make the symbolic links: