Working with Lucene queries

Keep on mind that this information is only valid for Lucene query advanced search, not for regular search. To get this Lucene query tab go to Administration > Profiles, choose the right profile and enabled it at  Components > Seach > Tab Lucene query.

With this new feature you will be able to create complex Lucene queries, so you won't be limited by the current search form fields. But keep on mind this can be difficult unles you know how to write these queries. Let's define a couple of items:

  • Terms: A query is broken up into terms and operators. A term is a single word like "cat" or "dog". Multiple terms can be combined with boolean operators for a more specific query. For example:

    cat AND dog

  • Fields: When you perform a Lucene seach, you can specify a field or use the default field. In OpenKM the default field contains the document text extracted. You can search by a field writing the field name, followed by a colon ":" and a term. For example, this query will search for all nodes in taxonomy which name is "animals":

    context:okm_root AND name:animals

You can search for a phrase using quotes:

name:"big animals" 

Term modifiers

Lucene support different term modifiers to create complex searches.

Wildcard searches

You can use single and multiple character wildcard searches within terms.

  • Single: To perform a single character wirldcard search use the "?" symbol. For example, to search for "text" or "test":

    te?t

  • Multiple: To perform a multiple characted wildcard, use the "*" symbol. For example, to search for "test", "tests" or "tester":

    test*

You can't use a "*" or "?" at the begining of a query.

Proximity searches

Lucene also supports finding words that are in a specific distance. To perform a proximity search use the tild "~" symbol and the end of the phrase.

For example, to search for a "cat" and "dog" within 10 words of each other in a document:

"cat dog"~

Range searches

Range queries allow to match documents whose field values are between an specified lower and upper bound. Sorting is done lexicographically.

For example, to look for documents which field "name" is between "cat" and "horse" but not will include these terms:

name:{cat TO horse}

If you want these two terms to be included, use this query:

name:[cat TO horse]

Inclusive range queries are denoted by square brackets. Exclusive range queries are denoted by curly brackets. 

Boolean operators

You can combine several terms using boolean operators. Lucene supports "AND", "+", "OR", "NOT" and "-".

Boolean operators must be in UPPERCASE.

Working with Lucene queries

With this new feature you will be able to create complex Lucene queries, so you won't be limited by the current search form fields. But keep on mind this can be difficult unles you know how to write these queries. Let's define a couple of items:

  • Terms: A query is broken up into terms and operators. A term is a single word like "cat" or "dog". Multiple terms can be combined with boolean operators for a more specific query. For example:

    cat AND dog

  • Fields: When you perform a Lucene seach, you can specify a field or use the default field. In OpenKM the default field contains the document text extracted. You can search by a field writing the field name, followed by a colon ":" and a term. For example, this query will search for all nodes in taxonomy which name is "animals":

    context:okm_root AND name:animals

 

You can search for a phrase using quotes:

name:"big animals" 

Term modifiers

Lucene support different term modifiers to create complex searches.

Wildcard searches

You can use single and multiple character wildcard searches within terms.

  • Single: To perform a single character wirldcard search use the "?" symbol. For example, to search for "text" or "test":

    te?t

  • Multiple: To perform a multiple characted wildcard, use the "*" symbol. For example, to search for "test", "tests" or "tester":

    test*

You can't use a "*" or "?" at the begining of a query.

Proximity searches

Lucene also supports finding words that are in a specific distance. To perform a proximity search use the tild "~" symbol and the end of the phrase.

For example, to search for a "cat" and "dog" within 10 words of each other in a document:

"cat dog"~

Range searches

Range queries allow to match documents whose field values are between an specified lower and upper bound. Sorting is done lexicographically.

For example, to look for documents which field "name" is between "cat" and "horse" but not will include these terms:

name:{cat TO horse}

If you want these two terms to be included, use this query:

name:[cat TO horse]

Inclusive range queries are denoted by square brackets. Exclusive range queries are denoted by curly brackets. 

Boolean operators

You can combine several terms using boolean operators. Lucene supports "AND", "+", "OR", "NOT" and "-".

Boolean operators must be in UPPERCASE.

The "OR" operator is the default conjuntion operator: this means that if there is no boolean operator between two terms, the "OR" operator is used. These two queries are equivalents:

cat animal

And this one:

cat OR animal

AND

The AND operator matches documents where both terms exists. The symbol "&&" can be also used to replace the word "AND". 

Let's look for documents with both "cat" and "animal" words in the content:

cat AND animal

+

This is called the required operator and force that the term placed after the "+" to be included in the results.

For example, to search for documents that must contain "animal" and may contain "cat":

+animal cat

NOT

The NOT operator excludes documents that contains the term after the "NOT". The symbol "!" can be used to replace the word "NOT".

For example, to search for documents that contains "animal" but not "cat":

animal NOT cat

 

The NOT operator cannot be used with just one term.

-

This operator excludes documents which contains the term given after the "-" character:

For example, to seach documents which contains "animal" but not "cat":

animal -cat

Grouping

Lucene supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.

For example, to search for "cat" or "dog" and "animal" use this query:

(car OR dog) AND animal

Escaping special characters

Lucene supports escaping some special characters that are part of the query syntax using the slash "\" before the character to be escaped. This is the current list of special characters:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

 

The "OR" operator is the default conjuntion operator: this means that if there is no boolean operator between two terms, the "OR" operator is used. These two queries are equivalents:

cat animal

And this one:

cat OR animal

AND

The AND operator matches documents where both terms exists. The symbol "&&" can be also used to replace the word "AND". 

Let's look for documents with both "cat" and "animal" words in the content:

cat AND animal

+

This is called the required operator and force that the term placed after the "+" to be included in the results.

For example, to search for documents that must contain "animal" and may contain "cat":

+animal cat

NOT

The NOT operator excludes documents that contains the term after the "NOT". The symbol "!" can be used to replace the word "NOT".

For example, to search for documents that contains "animal" but not "cat":

animal NOT cat

The NOT operator cannot be used with just one term.

-

This operator excludes documents which contains the term given after the "-" character:

For example, to seach documents which contains "animal" but not "cat":

animal -cat

Grouping

Lucene supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.

For example, to search for "cat" or "dog" and "animal" use this query:

(car OR dog) AND animal

Escaping special characters

Lucene supports escaping some special characters that are part of the query syntax using the slash "\" before the character to be escaped. This is the current list of special characters:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

 

Working with metadata fields

Because OpenKM use lucene restricted characters like ":", the field name must be sanitized. For example, a field named in OpenKM as "okp:consulting.text", should be sanitized as "okp_consulting_text" replacing the character ":" and "." by "_".

okp_consulting_text:value

In case of working with metadata fields it's needed to enable the sanitize.lucene.fields configuration properties, otherwise you won't be able to search by metadata fields.