Lucene queryparser with "/" in query criteria

Lucene queryparser with "/" in query criteria - lucene

When I try to search for something such as "workaround/fix" within Lucene, it throws this error:
org.apache.lucene.queryparser.classic.ParseException: Cannot parse 'workaround/fix': Lexical error at line 1, column 15. Encountered: <EOF> after : "/fix"
at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:131)
at pi.lucengine.LucIndex.main(LucIndex.java:112)
Caused by: org.apache.lucene.queryparser.classic.TokenMgrError: Lexical error at line 1, column 15. Encountered: <EOF> after : "/fix"
at org.apache.lucene.queryparser.classic.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1133)
at org.apache.lucene.queryparser.classic.QueryParser.jj_scan_token(QueryParser.java:599)
at org.apache.lucene.queryparser.classic.QueryParser.jj_3R_2(QueryParser.java:482)
at org.apache.lucene.queryparser.classic.QueryParser.jj_3_1(QueryParser.java:489)
at org.apache.lucene.queryparser.classic.QueryParser.jj_2_1(QueryParser.java:475)
at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:226)
at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:181)
at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170)
at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:121)
This are my lines 111 and 112:
QueryParser parser = new QueryParser(Version.LUCENE_43, field, analyzer);
Query query = parser.parse(newLine);
What do I need to do to allow it to parse the "/"?

The query parser interprets slashes as the beginning/end or a regex query (as of 4.0, see documentation here).
So, to incorporate slashes into the query, you will need to escape them by adding a backslash (\) before them.
You can handle escaping with QueryParser.escape(String).

I encountered a similar problem when using '/' in lucene queries issued from the elastic search kibana dashboard. I was escaping the '/' characters as indicated in the documentation and still not getting any success. I think this is related to the template bug reported here : https://github.com/elastic/kibana/issues/789. Not sure yet, will update when we update the logstash components

I had a case where when using forward slash with wildcard it just wouldn't return any result, even if escaped it:
+(*16/17*)
+(*16\/17*)
The solution was to add double quote:
+("*16/17*")
+("*16\/17*")

Related

Django __iregex crashing for regular expression ^(\\. \\.)$

When I try to make an __iregex call using the regular expression '^(\\. \\.)$' I get:
DataError: invalid regular expression: parentheses () not balanced
I am using PSQL backend so the django documentation states that the equivalent SQL command should be
SELECT ... WHERE title ~* '^(\\. \\.)$';
When I run this query manually through the PSQL command line it works fine. Is there some bug with Django that I don't know about that is causing this to crash?
Edit: Also, it fails for variations of this regular expression, for example
'^(S\\. \\.)$'
'^(\\. S\\.)$'
'^(\\. \\.S)$'

The solution is to replace all " " characters with \s before sending the regexp into __iregex.

Redshift Regex count. Repetition operator error

I am trying to do a simple regex pattern match in Redshift
I have this code and I get the following error:
REGEXP_COUNT ( "code", '^(?=.{8}$)[A-z]{2,5}[0-9]{3,6}$' )
ERROR: Invalid preceding regular expression prior to repetition operator. The error occured while parsing the regular expression fragment: '^(?>>>HERE>>>=.{8}$)[A-'.
The pattern works fine in testing in python and online checkers I'm guessing its a REGEX language problem. I have checked in PostgreSQL documentation on REGEX to try get help as I can't find much details on actual Redshift.
Thanks,

SPARQL Update: Underscore not allowed in language tag

I am trying to insert data into blazegraph using the 'Update' tab of blazegraph workbench. Below is a sample code snippet:
INSERT DATA
{
ns:MyNode ns:hasValue "MyValue"#en_us
}
I am specifying language tag with # symbol. However, it throws following exception:
org.openrdf.query.MalformedQueryException: Lexical error at line 8,
column 49. Encountered: "u" (117), after : "_"
It seems that it does not allow an underscore as part of language tag. If try just with 'en' it works fine.
Why is that so? Is underscore a special character here? If so, what is the way to escape it?

The syntax for language tags is defined by an RFC, now revised in RFC5646. Registration of language tags is controlled by IANA.
Subtags are separated by "-"; only A-Z,0-9 are legal in subtags.
When adopted for RDF syntaxes (N3, SPARQL, Turtle etc), the grammar pattern adopted was a compromise syntax that weakly matches the RFC. '#' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* without taking all the details. The subtag separator is "-". "_" is not allowed in a language tag.

Issue with using wildcards with lucene .net QueryParser

I have following code for Lucene .Net search:
If I use query like:
AccountId:1 AND CompanyId:1 AND CreatedOn:[636288660000000000 TO 636315443990000000] AND AuditProperties.FriendlyName.NewValue:CustomerId|235
It works fine with exact match with CustomerId = 235.
However, if I try to search for a wildcard match like for example:
AccountId:1 AND CompanyId:1 AND CreatedOn:[636288660000000000 TO 636315443990000000] AND AuditProperties.FriendlyName.NewValue:CustomerId|*235*
it doesn't fetch me any results. I think it is still going for an exact match with value "*235*" Am I missing anything here?
Thanks!

As per the QueryParser syntax documentation, the character | is not supported. However, it is not very clear whether you intended it to be a logical OR or a literal character.
Logical OR
The correct syntax for logical OR is either CustomerId OR *235*, CustomerId *235* or CustomerId||*235*.
Also, if this is meant to be a logical OR, you have to allow for a leading wildcard character as pointed out in Howto perform a 'contains' search rather than 'starts with' using Lucene.Net.
parser.AllowLeadingWildcard = true;
Literal |
To search for a literal pipe character, you should escape the character so the parser doesn't confuse it with a logical OR.
CustomerId\|*235*

Lucene 5.0.0 - search string with special characters

I am using Lucene version 5.0.0.
In my search string, there is a minus character like “test-”.
I read that the minus sign is a special character in Lucene. So I have to escape that sign, as in the queryparser documentation:
Escaping Special Characters:
Lucene supports escaping special characters that are part of the query syntax. The current list special characters are:
- + - && || ! ( ) { } [ ] ^ " ~ * ? : \ /`
To escape these character use the \ before the character. For example to search for (1+1):2 use the query:
\(1\+1\)\:2
To do that I use the QueryParser.escape method:
query = parser.parse(QueryParser.escape(searchString));
I use the classic Analyzer because I noticed that the standard Analyzer has some problems with escaping special characters.
The problem is that the Parser deletes the special characters and so the Query has the term
content:test
How can I set up the parser and searcher to search for the real value “test-“?
I also created my own query with the content test- but that also didn’t work. I recieved 0 results but my index has entries like:
Test-VRF
Test-IPLS
I am really confused about this problem.

While escaping special characters for the queryparser deals with part of the problem, it doesn't help with analysis.
Neither classic nor standard analyzer will keep punctuation in the indexed form of the field. For each of these examples, the indexed form will be in two terms:
test and vrf
test and ipls
This is why a manually constructed query for "test-" finds nothing. That term does not exist in the index.
The goal of these analyzers is to attempt to index words. As such, punctuation is mostly eliminated, and is not searchable. A phrase query for "test vrf" or "test-vrf" or "test_vrf" are all effectively identical. If that is not what you need, you'll need to look to other analyzers.

The goal to fix this issue is to store the value content in an NOT_ANALYZED way.
Field fieldType = new Field(key.toLowerCase(),value, Field.Store.YES, Field.Index.NOT_ANALYZED);
Someone who has the same problem has to take care how to store the contents in the index.
To request the result create a query in this way
searchString = QueryParser.escape(searchString);
and use for example a WhitespaceAnalyzer.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Lucene queryparser with "/" in query criteria - lucene

The query parser interprets slashes as the beginning/end or a regex query (as of 4.0, see documentation here). So, to incorporate slashes into the query, you will need to escape them by adding a backslash (\) before them. You can handle escaping with QueryParser.escape(String).

I had a case where when using forward slash with wildcard it just wouldn't return any result, even if escaped it: +(16/17) +(16\/17) The solution was to add double quote: +("16/17") +("16\/17")

Related

Django __iregex crashing for regular expression ^(\\. \\.)$

Redshift Regex count. Repetition operator error

SPARQL Update: Underscore not allowed in language tag

Issue with using wildcards with lucene .net QueryParser

Lucene 5.0.0 - search string with special characters

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Lucene queryparser with "/" in query criteria - lucene

The query parser interprets slashes as the beginning/end or a regex query (as of 4.0, see documentation here). So, to incorporate slashes into the query, you will need to escape them by adding a backslash (\) before them. You can handle escaping with QueryParser.escape(String).

I had a case where when using forward slash with wildcard it just wouldn't return any result, even if escaped it: +(*16/17*) +(*16\/17*) The solution was to add double quote: +("*16/17*") +("*16\/17*")

Related

Django __iregex crashing for regular expression ^(\\. \\.)$

Redshift Regex count. Repetition operator error

SPARQL Update: Underscore not allowed in language tag

Issue with using wildcards with lucene .net QueryParser

Lucene 5.0.0 - search string with special characters

Categories

Resources

I had a case where when using forward slash with wildcard it just wouldn't return any result, even if escaped it: +(16/17) +(16\/17) The solution was to add double quote: +("16/17") +("16\/17")