Search for a numeric range inside string in elastic search - lucene

I wanted to search a numeric expression in elastic search.
Example
indent code 4.8663 spaces
indent code 121.232 spaces
indent code 12.3232 spaces
Example query
get all string with "indent code between 1 and 100"
It should get 1st and 3rd but not 2nd.
{
"span_near": {
"in_order": 1,
"clauses": [
{
"span_term": {
"request": "indent"
}
},
{
"span_term": {
"request": "code"
}
}
,
{
"span_multi": {
"match":{
"range": {
"request": {
"to": 100,
"from": 1
}
}
}
}
}
],
"slop": 0,
"collect_payloads": 0
}
}
giving wrong result. as it is comparing using TermRangeQuery rather than NumericRangeQuery.

If you can either replace float numbers by integer numbers (4.8663 => 5) or multiply your float numbers by a chosen power of 10 so that all numbers become integers (4.8663 => 48663), then you might be able to use the regexp query for this.
I've indexed three documents with integer numbers (5, 121 and 12) and I've been able to successfully retrieve the two in the 1-100 interval using the following query.
{
"query": {
"regexp": {
"request": {
"value": "<1-100>"
}
}
}
}
If you absolutely need to keep the precision for other reasons, then this might not work out for you.

Related

How to queryexactly when a field has long text in Elasticsearch?

I have a filed with 100~300 characters. If I want to query this field, if I just use the sentences like:
GET _search
{
"query": {
"match": {
"question": {
"query":"asdasdasd",
"minimum_should_match": "75%"
}
}
}
}
Even if just tap the keyboard, I also can get some results.But these results are not relevant at all! I don't want to get them. What I can do to prevent the return of these results? Thanks!

elastic search query filter out ids by wildcard

I'm hoping to create a query where it will filter out IDs containing a wildcard. For instance, I would like to search for something everywhere except where the ID contains the word current. Is this possible?
Yes it is possible using Regex Filter/Regex Query. I could not figure a way to directly do it using the Complement option hence I've used bool must_not to solve your problem for the time being. I'll refine the answer later if possible.
POST <index name>/_search
{
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must_not": [
{
"regexp": {
"ID": {
"value": ".*current.*"
}
}
}
]
}
}
}

Elasticsearch: How to prevent the increase of score when search term appears multiple times in document?

When a search term appears not only once but several times in the document I'm searching the score goes up. While this might be wanted most of the times, it is not in my case.
The query:
"query": {
"bool": {
"should": {
"nested": {
"path": "editions",
"query": {
"match": {
"title_author": {
"query": "look me up",
"operator": "and",
"boost": 2
}
}
}
}
},
"must": {
"nested": {
"path": "editions",
"query": {
"match": {
"title_author": {
"query": "look me up",
"operator": "and",
"fuzziness": 0.5,
"boost": 1
}
}
}
}
}
}
}
doc_1
{
"editions": [
{
"editionid": 1,
"title_author": "look me up look me up",
},
{
"editionid": 2,
"title_author": "something else",
}
]
}
and doc_2
{
"editions": [
{
"editionid": 3,
"title_author": "look me up",
},
{
"editionid": 4,
"title_author": "something else",
}
]
}
Now, doc_1 would have a higher score due to the fact that the search terms are included twice. I don't want that. How do I turn this behavior off? I want the same score - no matter if the search term was found once or twice in the matching document.
In addition to what #keety and #Sid1199 talked about there is another way to do that: special property for fields with type "text" called index_options. By default it is set to "positions", but you can explicitly set it to "docs", so term frequencies will not be placed in the index and Elasticsearch will not know about repetitions while searching.
"title_author": {
"type": "text",
"index_options": "docs"
}
There is a property in Elastic search known as "similarity". There are a lot of types of similarities, but the one that is useful here is "boolean". If you set similarity to "boolean" in your mapping, it will prevent multiple boosting of your query.
"title_author":{"type":"text","similarity":"boolean"}
If you run your query on this mapping, it will boost only once regardless of the number of time the word appears. You can read up more on similarities here
This is only available in ES versions 5.4 and above

elasticsearch exact match containing hash value

I am facing problem with elastic search, i am using query to search data from document. following is the query to search single data from document.
"query": {
"filtered": {
"query": {
"query_string": {
"query": "'.$lotnumber.'",
"fields": ["LotNumber"]
}
}
}
}
}'
It is working fine for simple value but if $lotnumber contains any value with hash in between then it is showing all the data from document.any one here who can help me to resolve problem of searching exact value from document with hash value ??
The first things that I would think of in this case is make the field lotnumber not-analyzed in your mapping. That should do the trick.
In your mapping
"album": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}

protect certain phrases for search

I am currently trying to improve the corner cases of my elasticsearch-results. One particular is giving me a headache: "google+" which is simply reduced to "google". Omitting special chars is usually fine but for this one I would want an exception. Any ideas how to achieve this?
I tried the following setup:
{
"index": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "standard",
"filter": [
"synonym",
"word_delimiter"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/synonym.txt"
},
"word_delimiter": {
"type": "word_delimiter",
"protected_words_path": "analysis/protected.txt"
}
}
}
}
}
protected.txt contains one line with google+
I guess the Standard tokenizer is stripping out the + from google+. You can check it using the analyze api. I'd use the Whitespace tokenizer instead and properly configure the Word delimiter token filter that you're already using.
I think pattern replace would be a better idea - http://www.elasticsearch.org/guide/reference/index-modules/analysis/pattern_replace-tokenfilter.html