Sharding configuration
With sharding, OpenKM can distribute the repository files into different locations. This feature can help in case the main disk is full, so new documents will be placed on another disk.
The best way to understand sharding is to think that each shard is a different disk.
All documents will be stored in the main shard until a new shard is created and set as the current shard. Shards can be managed from administration and these options are available:
- Create a new shard
- Set a shard as current
Deleting a shard is not possible because it is dangerous and may cause data destruction.
A use case of sharding is to have a fast medium like an SSD (which is also expensive) in the main shard, and other slower and cheaper HDDs in the second shard. Documents will be moved from the fast main shard to the slower second one by a process which can identify less-accessed documents. This way the fast shard will be filled with frequently used documents, which will also improve the overall system performance.
By default, there is only one shard and a specific configuration is needed to make use of this new feature.
In order to configure sharding, use this configuration property:
repository.cache.home=${repository.home}/${shard.id}/${tenant.id}/cache
repository.datastore.home=${repository.home}/${shard.id}/${tenant.id}/datastore
repository.extraction.home=${repository.home}/${shard.id}/${tenant.id}/extraction
The key is the "shard.id" variable, which will be replaced in OpenKM by the currently enabled shard. So, if you are in shard #1 and tenant #2, the datastore location will be ${repository.home}/shard_1/tenant_1/datastore.
Currently cache, datastore and extraction can be divided into shards.
This is a sample repository directory distribution:
- index
- shard_1
- tenant_1
- cache
- datastore
- extraction
- tenant_1
- shard_2
- tenant_1
- cache
- datastore
- extraction
- tenant_1
- tenant_1
- ocr_template
The repository.cache.home, repository.datastore.home and repository.extraction.home configuration properties should be set before uploading any documents or emails to OpenKM, otherwise you will have problems with document and email download, preview, and search.
Sharding and Docker
If you are using Docker to deploy OpenKM, we recommend externalizing the openkm.properties configuration file to be able to configure sharding.
This is a sample Docker Compose definition for version 8.1.13 (change the version value for other Docker builds):
services:
openkm_mysql:
image: docker.openkm.com/private/professional:8.1.13
container_name: openkm_mysql_shard
hostname: openkm
ports:
- 8080:8080
volumes:
- ${PWD}/openkm.properties:/opt/tomcat/openkm.properties
- ${PWD}/repository:/opt/tomcat/repository
environment:
- TZ=Europe/Madrid
entrypoint: [ "wait-for-it.sh", "mysql:3306", "--timeout=0", "--strict", "--", "entrypoint.sh" ]
links:
- mysql:mysql
networks:
skynet:
ipv4_address: 172.28.1.1
mysql:
image: mysql:8.0.18
container_name: okmdb_mysql_shard
hostname: mysql
command: --default-authentication-plugin=mysql_native_password --character-set-server=utf8 --collation-server=utf8_bin
volumes:
- ${PWD}/mysql:/var/lib/mysql
environment:
- MYSQL_DATABASE=okmdb
- MYSQL_USER=openkm
- MYSQL_PASSWORD=openkm
- MYSQL_ROOT_PASSWORD=openkm
security_opt:
- seccomp:unconfined
networks:
skynet:
ipv4_address: 172.28.1.2
volumes:
openkm_mysql_shard:
okmdb_mysql_shard:
networks:
skynet:
ipam:
driver: default
config:
- subnet: 172.28.0.0/16
And this is the corresponding openkm.properties (shards with no tenants):
# OpenKM Hibernate configuration values
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.url=jdbc:mysql://mysql:3306/okmdb?useUnicode=true&characterEncoding=UTF8&serverTimezone=CET&nullNamePatternMatchesAll=true
spring.datasource.username=openkm
spring.datasource.password=openkm
# JPA stuff
spring.jpa.hibernate.ddl-auto=create-only
spring.jpa.properties.hibernate.dialect=com.openkm.db.dialect.MySQL5InnoDBDialect
# Logging config
spring.output.ansi.enabled=always
# Extra
repository.datastore.home=/opt/tomcat/repository/${shard.id}/datastore