Working with Lucene queries

Keep in mind that this information is only valid for Lucene query advanced search, not for regular search. To get this Lucene query tab, go to Administration > Profiles, choose the right profile and enable it at  Components > Search > Tab Lucene query.

With this new feature you will be able to create complex Lucene queries, so you won't be limited by the current search form fields. But keep in mind this can be difficult unless you know how to write these queries. Let's define a couple of items:

  • Terms: A query is broken up into terms and operators. A term is a single word like "cat" or "dog". Multiple terms can be combined with boolean operators for a more specific query. For example:

    cat AND dog

  • Fields: When you perform a Lucene search, you can specify a field or use the default field. In OpenKM, the default field contains the extracted document text. You can search a field by writing the field name, followed by a colon ":" and a term. For example, this query will search for all nodes in the taxonomy whose name is "animals":

    context:okm_root AND name:animals

You can search for a phrase using quotes:

name:"big animals" 

Term modifiers

Lucene supports different term modifiers to create complex searches.

Wildcard searches

You can use single and multiple character wildcard searches within terms.

  • Single: To perform a single-character wildcard search, use the "?" symbol. For example, to search for "text" or "test":

    te?t

  • Multiple: To perform a multiple-character wildcard, use the "*" symbol. For example, to search for "test", "tests" or "tester":

    test*

You can't use a "*" or "?" at the beginning of a query.

Proximity searches

Lucene also supports finding words that are within a specific distance. To perform a proximity search, use the tilde "~" symbol at the end of the phrase.

For example, to search for a "cat" and "dog" within 10 words of each other in a document:

"cat dog"~

Range searches

Range queries allow matching documents whose field values are between a specified lower and upper bound. Sorting is done lexicographically.

For example, to look for documents whose field "name" is between "cat" and "horse" but will not include these terms:

name:{cat TO horse}

If you want these two terms to be included, use this query:

name:[cat TO horse]

Inclusive range queries are denoted by square brackets. Exclusive range queries are denoted by curly brackets. 

Boolean operators

You can combine several terms using boolean operators. Lucene supports "AND", "+", "OR", "NOT" and "-".

Boolean operators must be in UPPERCASE.

Working with Lucene queries

With this new feature you will be able to create complex Lucene queries, so you won't be limited by the current search form fields. But keep in mind this can be difficult unless you know how to write these queries. Let's define a couple of items:

  • Terms: A query is broken up into terms and operators. A term is a single word like "cat" or "dog". Multiple terms can be combined with boolean operators for a more specific query. For example:

    cat AND dog

  • Fields: When you perform a Lucene search, you can specify a field or use the default field. In OpenKM, the default field contains the extracted document text. You can search a field by writing the field name, followed by a colon ":" and a term. For example, this query will search for all nodes in the taxonomy whose name is "animals":

    context:okm_root AND name:animals

 

You can search for a phrase using quotes:

name:"big animals" 

Term modifiers

Lucene supports different term modifiers to create complex searches.

Wildcard searches

You can use single and multiple character wildcard searches within terms.

  • Single: To perform a single-character wildcard search, use the "?" symbol. For example, to search for "text" or "test":

    te?t

  • Multiple: To perform a multiple-character wildcard, use the "*" symbol. For example, to search for "test", "tests" or "tester":

    test*

You can't use a "*" or "?" at the beginning of a query.

Proximity searches

Lucene also supports finding words that are within a specific distance. To perform a proximity search, use the tilde "~" symbol at the end of the phrase.

For example, to search for a "cat" and "dog" within 10 words of each other in a document:

"cat dog"~

Range searches

Range queries allow matching documents whose field values are between a specified lower and upper bound. Sorting is done lexicographically.

For example, to look for documents whose field "name" is between "cat" and "horse" but will not include these terms:

name:{cat TO horse}

If you want these two terms to be included, use this query:

name:[cat TO horse]

Inclusive range queries are denoted by square brackets. Exclusive range queries are denoted by curly brackets. 

Boolean operators

You can combine several terms using boolean operators. Lucene supports "AND", "+", "OR", "NOT" and "-".

Boolean operators must be in UPPERCASE.

The "OR" operator is the default conjunction operator: this means that if there is no boolean operator between two terms, the "OR" operator is used. These two queries are equivalent:

cat animal

And this one:

cat OR animal

AND

The AND operator matches documents where both terms exist. The symbol "&&" can also be used to replace the word "AND". 

Let's look for documents with both "cat" and "animal" in the content:

cat AND animal

+

This is called the required operator and forces the term placed after the "+" to be included in the results.

For example, to search for documents that must contain "animal" and may contain "cat":

+animal cat

NOT

The NOT operator excludes documents that contain the term after the "NOT". The symbol "!" can be used to replace the word "NOT".

For example, to search for documents that contain "animal" but not "cat":

animal NOT cat

 

The NOT operator cannot be used with just one term.

-

This operator excludes documents which contain the term given after the "-" character:

For example, to search documents which contain "animal" but not "cat":

animal -cat

Grouping

Lucene supports using parentheses to group clauses to form subqueries. This can be very useful if you want to control the boolean logic for a query.

For example, to search for "cat" or "dog" and "animal" use this query:

(car OR dog) AND animal

Escaping special characters

Lucene supports escaping some special characters that are part of the query syntax by using the backslash "\" before the character to be escaped. This is the current list of special characters:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

 

The "OR" operator is the default conjunction operator: this means that if there is no boolean operator between two terms, the "OR" operator is used. These two queries are equivalent:

cat animal

And this one:

cat OR animal

AND

The AND operator matches documents where both terms exist. The symbol "&&" can also be used to replace the word "AND". 

Let's look for documents with both "cat" and "animal" in the content:

cat AND animal

+

This is called the required operator and forces the term placed after the "+" to be included in the results.

For example, to search for documents that must contain "animal" and may contain "cat":

+animal cat

NOT

The NOT operator excludes documents that contain the term after the "NOT". The symbol "!" can be used to replace the word "NOT".

For example, to search for documents that contain "animal" but not "cat":

animal NOT cat

The NOT operator cannot be used with just one term.

-

This operator excludes documents which contain the term given after the "-" character:

For example, to search documents which contain "animal" but not "cat":

animal -cat

Grouping

Lucene supports using parentheses to group clauses to form subqueries. This can be very useful if you want to control the boolean logic for a query.

For example, to search for "cat" or "dog" and "animal" use this query:

(car OR dog) AND animal

Escaping special characters

Lucene supports escaping some special characters that are part of the query syntax by using the backslash "\" before the character to be escaped. This is the current list of special characters:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

 

Working with metadata fields

Because OpenKM uses Lucene-restricted characters like ":", the field name must be sanitized. For example, a field named in OpenKM as "okp:consulting.text" should be sanitized as "okp.consulting.text", replacing the character ":" with a single ".".

okp.consulting.text:value

When working with metadata fields, you need to enable the sanitize.lucene.fields configuration property; otherwise you won't be able to search by metadata fields.