RavenDb : Issue of matching with Search() and wilcards

RavenDb : Issue of matching with Search() and wilcards - ravendb

I use the Search() method with wildcards in a query.
Example of my problem :
To match the string :
uri:fooA:fooB:123.456.789
Examples of wilcards working :
uri:*
uri:fooA:
uri:fooA:fooB:
Examples of wilcards not working :
uri:fooA:fooB:123.*
uri:fooA:fooB:123.456.*
Is the problem due to the numbers in my wildcard ??

I found the solution, it's due to the choice of the analyzer for the index : http://ravendb.net/docs/article-page/2.5/Csharp/client-api/querying/static-indexes/configuring-index-options
I choose the "KeywordAnalyzer".

Related

Rails - Regex to remove accent from bank word

I'm new to Rails and I'm using Google Translate to post here. I have a doubt.
I have the following scope:
I'm trying to remove the accent from the bank words when performing a search.
In the parameters it already works with parameterize, but how do I query the bank (inside the where) to remove the accent?
I'm using postgresql.
Initially I tried to use regex_replace, but apparently it didn't work, could I be applying it wrong?
scope :filter_occupation, -> (params) {
params[:occupation].present? ?
where("lower(regexp_replace(occupation, '[^\w]+','')) LIKE ?",
# where("lower(occupation) LIKE ?",
"%#{params[:occupation].parameterize(separator: ' ')}%")
:
all
}

Lucene query : parse execption

I am using alfresco and trying to execute these queries,
These are my queries.
1st:
PATH:"/app:company_home/st:sites/cm:swsdp/cm:dataLists/cm:aea88103-517e-4aa0-a3be-de258d0e6465//*"
1st query is working properly but 2nd query is not able to parse
2nd
+PATH:"/app:company_home/st:swsdp/cm:/cm:dataLists/cm:9787a75b-cbc9-4d42-b76c-df88461e62c6//*"
Exception :
Cannot parse '+PATH:"/app:company_home/st:swsdp/cm:/cm:dataLists/cm:9787a75b-cbc9-4d42-b76c-df88461e62c6//*" AND +TYPE:"fdm:formDatalist"': Failed to parse XPath...
Unexpected '9787'
I tried by escaping but still getting same.
+PATH:"/app:company_home/st:swsdp/cm:/cm:dataLists/cm:9787a75b\-cbc9\-4d42-b76c\-df88461e62c6//
I noticed that in 1st query cm:aea88103-517e-4aa0-a3be-de258d0e6465 starts with latter but in 2nd query cm:9787a75b-cbc9-4d42-b76c-df88461e62c6 this contains numbers in starting so that it's not able to parse.
Please solve this error.

Certain characters need to be encoded in hexa for lucene PATH queries.
You need to encode your path this way :
var rawString = "//test:123 DIR/FILE.TXT #";
=> rawString: //test:123 DIR/FILE.TXT #
var encodedString = search.ISO9075Encode(rawString);
=> encodedString: _x002f__x002f_test_x003a_123_x0020_DIR_x002f_FILE.TXT_x0020__x0040_
var decodedString = search.ISO9075Decode(encodedString);
=> decodedString: //test:123 DIR/FILE.TXT #
See alfresco documentation for more information : http://docs.alfresco.com/5.2/references/API-JS-iso9075Encode.html

Open CMIS - Querying string property results in weird behavior

I'm executing the following SQL query:
SELECT doc.cmis:description, doc.cmis:name
FROM cmis:document doc
WHERE IN_FOLDER(doc,'folderID')
This result in something like below:
doc.cmis:description = "this is description"
doc.cmis:name = "fileName"
Now, if I add following statements, it returns zero result:
and doc.cmis:description = 'this is description'
However, if I modify and-statement with following, it works:
and doc.cmis:description like '%'
If I add one character (but not two interestingly...) as below, it also works:
and doc.cmis:description like '%t%'
It's very interesting to note that and-statement work very well with doc.cmis:name (as well as other properties).
Does anyone have clue as to why this strange / mysterious behavior is occurring?

The specifications delegate to the implementer if the cmis:description is queryable or not.
Anyway, which Alfresco version are you using ? There was an issue/bug time ago, but this should be solved: The cmis:description field should be queryable, although I don't know if it's fixed in enterprise or community.
By the way, I am currently using Alfresco Community 4.2.f and I have the same problem.

Fuzzy search with stop words produces unexpected results with Lucene / ElasticSearch

I am noticing that the fuzzy operator on stop words does not produce the results I'd expect.
Here's my configuration:
index :
analysis :
analyzer :
my_analyzer :
tokenizer : my_tokenizer
filter : [standard, my_stop_english_filter]
tokenizer :
my_tokenizer :
type : standard
max_token_length : 512
filter :
my_stop_english_filter :
type : stop
stopwords : [the]
ignore_case : true
And suppose I have indexed:
the brown fox
If I search for:
the brown~ fox~, then I get a hit as expected.
However, if I search for: the~ brown~ fox~, then I do not get a hit, presumably because the fuzzy operator prevents the from being treated as a stop word.
Is there a way I can combine stop words with fuzzy search?
Thanks,
Eric

If I recall correctly, this is the way Lucene is supposed to work as it is currently written -- using a fuzzy search disable the stopping of the stop words. It would take some work, but you could create a modified version of the query parser so stop words are ignored when applying fuzzy search (but then how do handle a fuzzy search on something that looks like a stop word?)

Do Lucene and Sphinx support prefix matching?

If not how do you make this work with them and which is better?
e.g. when searching for "mi" i would like results with "microsoft" to potentially show up in a result even though there is no "keyword" like "mi" specifically.

Yes and Yes.
Lucene has PrefixQuery:
BooleanQuery query = new BooleanQuery();
for (String token : tokenize(queryString)) {
query.add(new PrefixQuery(new Term(LABEL_FIELD_NAME, token)), Occur.MUST);
}
return query;
You can also use the Lucene query parser syntax and define the prefix search by using a wildcard exam*. The query parser syntax works if you want to deploy a separate Lucene search server, Solr, that is called using a HTTP API
In Sphinx it seams you have to do the following:
Set minimum prefix length to a value larger than 0
Enable wildcard syntax
Generate a query string with a willdcard exam*

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

RavenDb : Issue of matching with Search() and wilcards - ravendb

I found the solution, it's due to the choice of the analyzer for the index : http://ravendb.net/docs/article-page/2.5/Csharp/client-api/querying/static-indexes/configuring-index-options I choose the "KeywordAnalyzer".

Related

Rails - Regex to remove accent from bank word

Lucene query : parse execption

Open CMIS - Querying string property results in weird behavior

Fuzzy search with stop words produces unexpected results with Lucene / ElasticSearch

Do Lucene and Sphinx support prefix matching?

Categories

Resources