Setup
Step 1 - Deploy
Although you might deploy keas - keyphrase extraction summarization service - into any tomcat we suggest doing into what it comes with OpenKM.
- Stop OpenKM application.
- Download the lastest keas-X.X.zip file from the keyphrase-extraction-summarization-service github project.
- Unzip the file and copy keas.war file into the tomcat folder named weapps.
Step 2 - Configure vocabulary sample
We'll use agrovoc for testing purposes, you can downloading from http://oaei.ontologymatching.org/2007/environment/ please read terms of use.
- Download vocabulary-sample.zip the sample files.
- Unzip the file into the $TOMCAT_HOME, will be created a folder named "kea" .
Description of the files into vocabulary-sample.zip file:
- The keas/vocabulary/ag_skos_20070219.rdf is a thesaurus SKOS file
- The keas/vocabulary/agrovoc_oaei2007.owl is a thesaurus file.
- The keas/vocabulary/agrovoc.rdf is a thesaurus file.
- The keas/vocabulary/stopwords_en.txt is a stop words file.
- The keas/vocabulary/ag_skos_20070219.model is a training model.
- The keas/model is an empty folder where will be saved new models.
- The keas/training contains the pairs of files .txt and .key used for generating the model.
Step 3 - Create configuration file
Create a file named keas.properties into the $TOMCAT_HOME.
Minimum configuration parameters sample
# OpenKM
openkm.url=https://localhost:8080/OpenKM
base.openkm.url=https://localhost:8080
# OpenKM admin user
admin.user=okmAdmin
admin.password=admin
Available configuration parameters
Field / Property | Type | Description |
---|---|---|
kea.summarization.thesaurus.skos.file |
String
|
Location of the thesaurus SKOS file in the file system. ${catalina.home}/kea/vocabulary/ag_skos_20070219.rdf |
kea.summarization.thesaurus.vocabulary.serql |
String |
The SERQL sentence to retrieve thesaurus vocabulary. SELECT X,UID FROM {X} skos:prefLabel {UID} WHERE lang(UID) =\"en\" USING NAMESPACE rdf=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>, skos=<http://www.w3.org/2004/02/skos/core#>,rdfs=<http://www.w3.org/2000/01/rdf-schema#>,dc=<http://purl.org/dc/elements/1.1/>, dcterms=<http://purl.org/dc/terms/>, foaf=<http://xmlns.com/foaf/0.1/> |
kea.summarization.model.file |
String |
Training model. ${catalina.home}/kea/vocabulary/ag_skos_20070219.model |
kea.summarization.stopwords.file |
String |
Stop words file. ${catalina.home}/kea/vocabulary/stopwords_en.txt |
kea.summarization.automatic.keyword.extraction.number |
Integer |
Number of keywords to extract. 10 |
kea.summarization.automatic.keyword.extraction.restriction |
String |
Available values are "on" and "off" off |
kea.summarization.thesaurus.owl.file |
String |
Thesaurus file. ${catalina.home}/kea/vocabulary/agrovoc_oaei2007.owl |
kea.summarization.thesaurus.base.url |
String |
Thesaurus base URL. http://www.fao.org/aos/agrovoc |
kea.summarization.thesaurus.tree.root |
String |
The SERQL sentence to retrieve the root nodes in the thesaurus. The query below retries all the nodes what has not any parent. These are the root nodes. SELECT DISTINCT UID, TEXT FROM {UID} Y {OBJECT}, {UID} rdfs:label {TEXT} ; [rdfs:subClassOf {CLAZZ}] where not bound(CLAZZ) and lang(TEXT)=\"en\" USING NAMESPACE foaf=<http://xmlns.com/foaf/0.1/>, dcterms=<http://purl.org/dc/terms/>, rdf=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>, owl=<http://www.w3.org/2002/07/owl#>, rdfs=<http://www.w3.org/2000/01/rdf-schema#>, skos=<http://www.w3.org/2004/02/skos/core#>, dc=<http://purl.org/dc/elements/1.1/> |
kea.summarization.thesaurus.tree.childs |
String |
The SERQL sentence to retrieve the child nodes of some node in the thesaurus. SELECT DISTINCT UID, TEXT FROM {UID} rdfs:subClassOf {CLAZZ}, {UID} rdfs:label {TEXT} where xsd:string(CLAZZ) = \"RDFparentID\" and lang(TEXT)=\"en\" USING NAMESPACE foaf=<http://xmlns.com/foaf/0.1/>, dcterms=<http://purl.org/dc/terms/>, rdf=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>, owl=<http://www.w3.org/2002/07/owl#>, rdfs=<http://www.w3.org/2000/01/rdf-schema#>, skos=<http://www.w3.org/2004/02/skos/core#>, dc=<http://purl.org/dc/elements/1.1/> |
kea.summarization.vocabulary.type |
String |
The type of the vocabulary. skos |
kea.summarization.stemmer.class |
String |
The stemmer class used. com.openkm.kea.stemmers.PorterStemmer |
kea.summarization.stopword.class |
String |
The stop word class used. com.openkm.kea.stopwords.StopwordsEnglish |
kea.summarization.language |
String |
The language code used. Take a look at ISO 639-1 language code. en |
kea.summarization.document.encoding |
String |
The encoding of the training files. UTF-8 |
application.test.url |
String |
URL what will be used for testing purposes. http://localhost:8080/keas |
Step 4 - Check the application
- Start OpenKM service.
- Check the URL http://localhost:8080/keas
- You can login with and OpenKM user with ROLE_ADMIN grant.