Sharding configuration

With sharding, OpenKM can distribute the repository files into different locations. This feature can help in case the main disk is full, so new documents will be placed on another disk.

The best way to understand sharding is to think that each shard is a different disk.

All documents will be stored in the main shard until a new shard is created and set as the current shard. Shards can be managed from administration and these options are available:

  • Create a new shard
  • Set a shard as current

Deleting a shard is not possible because it is dangerous and may cause data destruction.

A use case of sharding is to have a fast medium like an SSD (which is also expensive) in the main shard, and other slower and cheaper HDDs in the second shard. Documents will be moved from the fast main shard to the slower second one by a process which can identify less-accessed documents. This way the fast shard will be filled with frequently used documents, which will also improve the overall system performance.

By default, there is only one shard and a specific configuration is needed to make use of this new feature. 

In order to configure sharding, use this configuration property:

repository.cache.home=${repository.home}/${shard.id}/${tenant.id}/cache
repository.datastore.home=${repository.home}/${shard.id}/${tenant.id}/datastore
repository.extraction.home=${repository.home}/${shard.id}/${tenant.id}/extraction

The key is the "shard.id" variable, which will be replaced in OpenKM by the currently enabled shard. So, if you are in shard #1 and tenant #2, the datastore location will be ${repository.home}/shard_1/tenant_1/datastore.

Currently cachedatastore and extraction can be divided into shards.

This is a sample repository directory distribution:

  • index
  • shard_1
    • tenant_1
      • cache
      • datastore
      • extraction
  • shard_2
    • tenant_1
      • cache
      • datastore
      • extraction
  • tenant_1
    • ocr_template

The repository.cache.homerepository.datastore.home and repository.extraction.home configuration properties should be set before uploading any documents or emails to OpenKM, otherwise you will have problems with document and email download, preview, and search.

Sharding and Docker

If you are using Docker to deploy OpenKM, we recommend externalizing the openkm.properties configuration file to be able to configure sharding.

This is a sample Docker Compose definition for version 8.1.13 (change the version value for other Docker builds):

services:

  openkm_mysql:
    image: docker.openkm.com/private/professional:8.1.13
    container_name: openkm_mysql_shard
    hostname: openkm
    ports:
      - 8080:8080
    volumes:
      - ${PWD}/openkm.properties:/opt/tomcat/openkm.properties
      - ${PWD}/repository:/opt/tomcat/repository
    environment:
      - TZ=Europe/Madrid
    entrypoint: [ "wait-for-it.sh", "mysql:3306", "--timeout=0", "--strict", "--", "entrypoint.sh" ]
    links:
      - mysql:mysql
    networks:
      skynet:
        ipv4_address: 172.28.1.1

  mysql:
    image: mysql:8.0.18
    container_name: okmdb_mysql_shard
    hostname: mysql
    command: --default-authentication-plugin=mysql_native_password --character-set-server=utf8 --collation-server=utf8_bin
    volumes:
      - ${PWD}/mysql:/var/lib/mysql
    environment:
      - MYSQL_DATABASE=okmdb
      - MYSQL_USER=openkm
      - MYSQL_PASSWORD=openkm
      - MYSQL_ROOT_PASSWORD=openkm
    security_opt:
      - seccomp:unconfined
    networks:
      skynet:
        ipv4_address: 172.28.1.2

volumes:
  openkm_mysql_shard:
  okmdb_mysql_shard:

networks:
  skynet:
    ipam:
      driver: default
      config:
        - subnet: 172.28.0.0/16

And this is the corresponding openkm.properties (shards with no tenants):

# OpenKM Hibernate configuration values
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.url=jdbc:mysql://mysql:3306/okmdb?useUnicode=true&characterEncoding=UTF8&serverTimezone=CET&nullNamePatternMatchesAll=true
spring.datasource.username=openkm
spring.datasource.password=openkm

# JPA stuff
spring.jpa.hibernate.ddl-auto=create-only
spring.jpa.properties.hibernate.dialect=com.openkm.db.dialect.MySQL5InnoDBDialect

# Logging config
spring.output.ansi.enabled=always

# Extra
repository.datastore.home=/opt/tomcat/repository/${shard.id}/datastore

 

Table of contents [ Hide Show ]