Setup

Step 1 - Deploy

Although you might deploy keas - keyphrase extraction summarization service - into any tomcat we suggest doing into what it comes with OpenKM.

Step 2 - Configure vocabulary sample

We'll use agrovoc for testing purposes, you can downloading from http://oaei.ontologymatching.org/2007/environment/ please read terms of use.

  • Download vocabulary-sample.zip the  sample files.
  • Unzip the file into the $TOMCAT_HOME, will be created a folder named "kea" .

Description of the files into vocabulary-sample.zip file:

  • The keas/vocabulary/ag_skos_20070219.rdf is a thesaurus SKOS file
  • The keas/vocabulary/agrovoc_oaei2007.owl is a thesaurus file.
  • The keas/vocabulary/agrovoc.rdf is a thesaurus file.
  • The keas/vocabulary/stopwords_en.txt is a stop words file.
  • The keas/vocabulary/ag_skos_20070219.model is a training model.
  • The keas/model is an empty folder where will be saved new models.
  • The keas/training contains  the pairs of files .txt and .key used for generating the model.

Step 3 - Create configuration file

Create a file named keas.properties into the $TOMCAT_HOME.

Minimum configuration parameters sample

# OpenKM
openkm.url=https://localhost:8080/OpenKM
base.openkm.url=https://localhost:8080

# OpenKM admin user
admin.user=okmAdmin
admin.password=admin

Available configuration parameters

Field / PropertyTypeDescription

kea.summarization.thesaurus.skos.file

String

 

Location of the thesaurus SKOS file in the file system.

${catalina.home}/kea/vocabulary/ag_skos_20070219.rdf

kea.summarization.thesaurus.vocabulary.serql

String

The SERQL sentence to retrieve thesaurus vocabulary.

SELECT X,UID FROM {X} skos:prefLabel {UID} WHERE lang(UID) =\"en\" USING NAMESPACE rdf=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>, skos=<http://www.w3.org/2004/02/skos/core#>,rdfs=<http://www.w3.org/2000/01/rdf-schema#>,dc=<http://purl.org/dc/elements/1.1/>, dcterms=<http://purl.org/dc/terms/>, foaf=<http://xmlns.com/foaf/0.1/>

kea.summarization.model.file

String

Training model.

${catalina.home}/kea/vocabulary/ag_skos_20070219.model

kea.summarization.stopwords.file

String

Stop words file.

${catalina.home}/kea/vocabulary/stopwords_en.txt

kea.summarization.automatic.keyword.extraction.number

Integer

Number of keywords to extract.

10

kea.summarization.automatic.keyword.extraction.restriction

String

Available values are "on" and "off"

off

kea.summarization.thesaurus.owl.file

String

Thesaurus file.

${catalina.home}/kea/vocabulary/agrovoc_oaei2007.owl

kea.summarization.thesaurus.base.url

String

Thesaurus base URL.

http://www.fao.org/aos/agrovoc

kea.summarization.thesaurus.tree.root

String

The SERQL sentence to retrieve the root nodes in the thesaurus.

The query below retries all the nodes what has not any parent. These are the root nodes.

SELECT DISTINCT UID, TEXT FROM {UID} Y {OBJECT}, {UID} rdfs:label {TEXT} ; [rdfs:subClassOf {CLAZZ}] where not bound(CLAZZ) and lang(TEXT)=\"en\" USING NAMESPACE foaf=<http://xmlns.com/foaf/0.1/>, dcterms=<http://purl.org/dc/terms/>, rdf=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>, owl=<http://www.w3.org/2002/07/owl#>, rdfs=<http://www.w3.org/2000/01/rdf-schema#>, skos=<http://www.w3.org/2004/02/skos/core#>, dc=<http://purl.org/dc/elements/1.1/>

kea.summarization.thesaurus.tree.childs

String

The SERQL sentence to retrieve the child nodes of some node in the thesaurus.

SELECT DISTINCT UID, TEXT FROM {UID} rdfs:subClassOf {CLAZZ}, {UID} rdfs:label {TEXT} where xsd:string(CLAZZ) = \"RDFparentID\" and lang(TEXT)=\"en\" USING NAMESPACE foaf=<http://xmlns.com/foaf/0.1/>, dcterms=<http://purl.org/dc/terms/>, rdf=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>, owl=<http://www.w3.org/2002/07/owl#>, rdfs=<http://www.w3.org/2000/01/rdf-schema#>, skos=<http://www.w3.org/2004/02/skos/core#>, dc=<http://purl.org/dc/elements/1.1/>

kea.summarization.vocabulary.type

String

The type of the vocabulary.

skos

kea.summarization.stemmer.class

String

The stemmer class used.

com.openkm.kea.stemmers.PorterStemmer

kea.summarization.stopword.class

String

The stop word class used.

com.openkm.kea.stopwords.StopwordsEnglish

kea.summarization.language

String

The language code used.

Take a look at ISO 639-1 language code.

en

kea.summarization.document.encoding

String

The encoding of the training files.

UTF-8

application.test.url

String

URL what will be used for testing purposes.

http://localhost:8080/keas

Step 4 - Check the application