Cluster configuration
The updated cluster architecture includes enhancements over the previous one and fixes specific bugs found in earlier versions:
- Single error point: Only the master can modify the index. If the master fails, the whole system will have problems.
- Latency: Slave nodes have to refresh their local index every few minutes. So there is a latency between master and slave while the process completes.
This architecture provides load balancing as well as high availability. Let's take a look at the proposed architecture. Several OpenKM servers and one (or several) HAProxy servers will distribute user requests to these OpenKM servers. HAProxy can also be configured to detect which OpenKM server is down and avoid sending requests until it is up again.
Optionally you can configure several replicated database servers (also handled by another HAProxy), which prevent the service from being interrupted in case a database node goes down. By default, OpenKM stores the documents on disk to maximize performance, so you need these files replicated across your OpenKM servers. An option is GlusterFS, a software solution to create distributed file systems, but NFS is quite good in most situations and easier to set up on Linux servers.
Remember that this is only a proposal, and you are free to implement whatever infrastructure you want to achieve the same results. To avoid an excessive and complex configuration, we will describe a simplified architecture that will provide the same objective: the master node, the database, and the NFS server are on the same machine.
Remember that every OpenKM instance should use the same database instance. Use only one database for all instances. This resource is configured in the TOMCAT_HOME/openkm.properties file.
If you look into the TOMCAT_HOME/repository directory, you can see the following:
- cache: These are generated files used primarily for preview. If deleted, they will be recreated.
- Datastore: These files are the binary content of the documents stored in OpenKM. You should take care of them, and a backup is always recommended.
- Index: This is where Lucene stores the indexes used for document and metadata search. If deleted, you can re-index the repository from Administration > Utilities.
You can place cache and datastore folders on another server as shared resources, according to this information. On average, 80% of user actions are document reads, and only 20% are writes (create or update documents). The problematic folder is the index folder because Lucene indexes can't be shared between several OpenKM instances. So, what can we do? The solution is that each OpenKM instance has its index, and they send messages between them to keep their local index updated. If a node needs to update the index, it also sends a message to the other nodes, which will process the modification locally.
Proxy node
This is the only node that should be accessible to users and will dispatch requests between the slave nodes. The most important part is the HAProxy configuration. This configuration format works with HAProxy v1.5
frontend LB
bind 192.168.0.220:80
reqadd X-Forwarded-Proto:\ http
default_backend LB
backend LB 192.168.0.220:80
mode http
stats enable
stats hide-version
stats uri /stats
stats realm HAProxy\ Statistics
stats auth admin:admin
balance roundrobin
option httpchk
option httpclose
option forwardfor
cookie LB insert
server node1 192.168.0.222:8080 cookie node1 check
server node2 192.168.0.223:8080 cookie node2 check
According to this configuration, user requests will be balanced using the round-robin algorithm between the two OpenKM servers. If any OpenKM servers fail, the requests will only be forwarded to the working server. At http://proxy/stats (default user: admin, password: admin), you will be able to see the nodes' status and other statistics. A complete manual of HAProxy is available at http://cbonte.github.io/haproxy-dconv/configuration-1.5.html. If you use another HAProxy version, some options may have changed.
Master node
This kind of node is not very different from the slaves, and the only real difference is that it can run crontab tasks among other administrative actions. As we said previously, to simplify the configuration, this master node will also store the cache and datastore files.
In this example, we will also use this server as an NFS server, but it's common to have a SAN that acts as the NFS server.
Let's create these directories:
$ sudo mkdir /mnt/okm_cache
$ sudo mkdir /mnt/okm_dstore
$ sudo mkdir /mnt/okm_extraction
$ sudo mkdir /mnt/okm_ocr_template
$ sudo chown openkm:openkm /mnt/okm_*
We are exposing these directories via NFS, so we need to install support:
$ sudo apt-get install nfs-kernel-server
Once installed, edit the /etc/exports file and add these lines:
/mnt/okm_cache *(rw,sync,no_root_squash,no_subtree_check)
/mnt/okm_dstore *(rw,sync,no_root_squash,no_subtree_check)
/mnt/okm_extraction *(rw,sync,no_root_squash,no_subtree_check)
/mnt/okm_ocr_template *(rw,sync,no_root_squash,no_subtree_check)
After saving it, you have to run this command (you should run it every time you modify this file):
$ sudo exportfs -ra
Now let's make the symbolic links:
$ sudo ln -s /mnt/okm_cache TOMCAT_HOME/repository/cache
$ sudo ln -s /mnt/okm_dstore TOMCAT_HOME/repository/datastore
$ sudo ln -s /mnt/okm_extraction TOMCAT_HOME/repository/extraction
$ sudo ln -s /mnt/okm_ocr_template TOMCAT_HOME/repository/ocr_template
$ sudo chown -h openkm:openkm TOMCAT_HOME/repository/cache
$ sudo chown -h openkm:openkm TOMCAT_HOME/repository/datastore
$ sudo chown -h openkm:openkm TOMCAT_HOME/repository/extraction
$ sudo chown -h openkm:openkm TOMCAT_HOME/repository/ocr_template
Instead of making links, you can configure OpenKM to create these directories in other locations:
repository.extraction.home=/mnt/okm_extraction
repository.datastore.home=/mnt/okm_dstore
repository.cache.home=/mnt/okm_cache
repository.ocr.template.home=/mnt/okm_ocr_template
To enable this configuration, edit the TOMCAT_HOME/openkm.properties and uncomment this part:
# Cluster - Master
cluster.enabled=true
cluster.node=master
Slave nodes
These nodes can write to cache and datastore folders. Here we also need to create the shared directories which will be used as mount points:
$ sudo mkdir /mnt/okm_cache
$ sudo mkdir /mnt/okm_dstore
$ sudo mkdir /mnt/okm_extraction
$ sudo mkdir /mnt/okm_ocr_template
$ sudo chown openkm:openkm /mnt/okm_*
Install NFS client support:
$ sudo apt-get install nfs-common
We want them to be mounted automatically, so edit the /etc/fstab file and append:
# Cluster
master:/mnt/okm_cache /mnt/okm_cache nfs auto,noatime,rsize=8192,wsize=8192,timeo=14,intr
master:/mnt/okm_dstore /mnt/okm_dstore nfs auto,noatime,rsize=8192,wsize=8192,timeo=14,intr
master:/mnt/okm_extraction /mnt/okm_extraction nfs auto,noatime,rsize=8192,wsize=8192,timeo=14,intr
master:/mnt/okm_ocr_template /mnt/okm_ocr_template nfs auto,noatime,rsize=8192,wsize=8192,timeo=14,intr
Edit the /etc/hosts file and configure it to resolve the "master" server name to the correct IP address.
Now let's mount them:
$ sudo mount -a
Let's make the symbolic links:
$ sudo ln -s /mnt/okm_cache TOMCAT_HOME/repository/cache
$ sudo ln -s /mnt/okm_dstore TOMCAT_HOME/repository/datastore
$ sudo ln -s /mnt/okm_extraction TOMCAT_HOME/repository/extraction
$ sudo ln -s /mnt/ocr_template TOMCAT_HOME/repository/ocr_template
$ sudo chown -h openkm:openkm TOMCAT_HOME/repository/cache
$ sudo chown -h openkm:openkm TOMCAT_HOME/repository/datastore
$ sudo chown -h openkm:openkm TOMCAT_HOME/repository/extraction
$ sudo chown -h openkm:openkm TOMCAT_HOME/repository/ocr_template
Instead of making links, you can configure OpenKM to create these directories in other locations:
repository.extraction.home=/mnt/okm_extraction
repository.datastore.home=/mnt/okm_dstore
repository.cache.home=/mnt/okm_cache
repository.ocr.template.home=/mnt/okm_ocr_template
To enable this configuration, edit the TOMCAT_HOME/openkm.properties and uncomment this part:
# Cluster - Slave
cluster.enabled=true
cluster.node=slave