Simple full text search in ElasticSearch - lucene

I'm trying to understand how ElasticSearch Query DSL works.
It would be a lot of help if anyone could give me an example how to perform a search like the following MySQL query:
SELECT * FROM products
WHERE shop_id = 1
AND MATCH(title, description) AGAINST ('test' IN BOOLEAN MODE)

Assuming that you indexed some documents containing at least the shop_id, title and description fields, something like the following example:
{
"shop_id" : "here goes your shop_id",
"title" : "here goes your title",
"description" : "here goes your description"
}
You can execute a multi match query against multiple fields, and give them a different weight (usually title is more important). You can also combine the query with a term filter on shop_id:
{
"query" : {
"multi_match" : {
"query" : "here goes your query",
"fields" : [ "title^2", "description" ]
},
"filter" : {
"term" : { "shop_id" : "here goes your shop id" }
}
}
You need to submit the query using the search API. Filters are used to reduce the set of documentsthe query is eecute against. Filters are faster since don't involve scoring and cached. In my example I applied a top level filter, which might be or not a good fit for you depending on what else you want to do next. If you want to make a facet, for instance, the filter would be ignored in the facet. Another way to add a filter, which would be taken into account while computing the facets as well, is the filtered query.

Related

Is it correct to do 1-to-1 mapping in Update API request param

There is a need for me to do bulk update of user details.
Let the object details have the following fields,
User First Name
User ID
User Last Name
User Email ID
User Country
An admin can upload the updated data of the users through a csv file. Values with mismatching data needs to be updated. The most probable request format for this bulk update request will be like:(Method 1)
"data" : {
"userArray" : [
{
"id" : 2343565432,
"f_name" : "David",
"email" : "david#testmail.com"
},
{
"id" : 2344354351,
"country" : "United States",
}
.
.
.
]
}
Method 2 : I would send the details in two arrays, one containing the list of similar filed values with respect to their user ids
"data" : {
"userArray" : [
{
"ids" : [23234323432, 4543543543, 45654543543],
"country" : ["United States", "Israel", "Mexico"]
},
{
"ids" : [2323432334543, 567676565],
"email" : ["groove#drivein.com", "zara#foobar.com"]
},
.
.
.
]
}
In method 1, i need to query the database for every user update, which will be more as the no of user edited is more. In contrast, if i use method 2, i query the database only once for each param(i add the array in the query and get those rows whose user id is present in the given array in a single query). And then i can update the each row with their respective details.
But overall in the internet, most of the update api had params in the format specified in method 1 which gives user good readability. But i need to know what will be advantage if i go with method 1 rather than method 2? (I save some query time in method 2 if the no of users count is large which can improve my performance)
I almost always see it being method 1 style.
Woth that said, I don't understand why your DB performance is based on the way the input data is structured. That's just the way information gets into your code.
You can have the client send the data as method 1 and then shim it to method 2 on the backend if that helps you structure the DB queries better

Elasticsearch query context vs filter context

I am little bit confused with ElasticSearch Query DSL's query context and filter context. I have 2 below queries. Both queries return same result, first one evaluate score and second one does not. Which one is more appropriate ?
1st Query :-
curl -XGET 'localhost:9200/xxx/yyy/_search?pretty' -d'
{
"query": {
"bool": {
"must": {
"terms": { "mcc" : ["5045","5499"]}
},
"must_not":{
"term":{"maximum_flag":false}
},
"filter": {
"geo_distance": {
"distance": "500",
"location": "40.959334, 29.082142"
}
}
}
}
}'
2nd Query :-
curl -XGET 'localhost:9200/xxx/yyy/_search?pretty' -d'
{
"query": {
"bool" : {
"filter": [
{"term":{"maximum_flag":true}},
{"terms": { "mcc" : ["5045","5499"]}}
],
"filter": {
"geo_distance": {
"distance": "500",
"location": "40.959334, 29.082142"
}
}
}
}
}'
Thanks,
In the official guide you have a good explanation:
Query context
A query clause used in query context answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a _score representing how well the document matches, relative to other documents.
Query context is in effect whenever a query clause is passed to a query parameter, such as the query parameter in the search API.
Filter context
In filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, e.g.
Does this timestamp fall into the range 2015 to 2016?
Is the status field set to "published"?
Frequently used filters will be cached automatically by Elasticsearch, to speed up performance.
Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation.
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-filter-context.html
About your case, we would need more information, but taking into account you are looking for exact values, a filter would suit it better.
The first query is evaluating score because your are using "term" here directly inside without wrapping it inside "filter" so by default "term" written directly inside query run in Query context format which result in calculating score.
But in the case of second query you "term" inside "filter" which change it's context from Query Context to filter Context . And in the case of filter no score is calculated (by default _score 1 is allocated to all matching documents).
You can find more details about queries behavior in this article
https://towardsdatascience.com/deep-dive-into-querying-elasticsearch-filter-vs-query-full-text-search-b861b06bd4c0

Lucene query where two fields will be compared

I have an elasticsearch cluster. All documents in the cluster have the same index and type. Each document has two number fields -> field1 and field2.
I want to display all documents in Grafana, where value of field1 > value of field2.
Is there a query like:
document_type:test AND field1 > field2 ?
As far as I'm aware there is no way to perform that sort of query using elasticsearch (lucene). It does support range queries, but not comparison between different fields in the document.
You can do this with a (groovy) script query, like this:
{
"query" : {
"term" : {
"document_type" : "test"
}
},
"filter" : {
"script" : {
"script" : "doc['field1'].value > doc['field2'].value"
}
}
}
See also, more documentation on what is available from the Elasticsearch scripting module.

Count in mongoDB

I have something like this for every user in mongoDB:
{
"id" : 1234,
"name" : "Mr. Someone",
"userdata" : {
"living" : {
"city" : "Somecity",
"address" : "Main Street 10.",
"zip" : "1023"
},
"interest" : "Cars"
}
I'm trying to find a way to count how many subscribers live in Somecity.
My best guess was the following:
db.users.count({userdata:{living:{city:"Somecity"}}}
But the result was 0.
How can I properly count "rows" by a given value in mongoDB?
I'm using mongoDB's documentation (for example: http://docs.mongodb.org/manual/reference/sql-comparison/) but could not resolve my problem yet.
I'm using mongoDB trough shell.
I think I have found the sollution to my problem:
db.users.count({"userdata.living.city":"Somecity"})
This "dotting" method allowed me to search for only one value in the array, while the method I tried first wanted to find an exact match.
Further reading: http://docs.mongodb.org/manual/reference/operator/query/elemMatch/
Quote:
Since the $elemMatch only specifies a single condition, the $elemMatch
expression is not necessary, and instead you can use the following
query:
db.survey.find( { "results.product": "xyz" } )

How do I return filtering meta data in a REST API search query

I'm currently designing and implementing a RESTful API in PHP.
The API allows users to search for hotels.
A simplified example of the search request is:
GET hotels/searchresults?location=<location> #collection of hotels within location
The response also contains some meta information about the returned collection.
The basic structure of the response is:
“meta": {
“totalNrOfHotels": 100,
"totalNrAvailable": 80
},
“hotels": [
{
“id": 123,
“name": "Hotel A"
},
{
“id": 135,
“name": "Hotel B"
},
...
]
This resource also supports pagination:
GET hotels/searchresults?location=<location>&offset=0&limit=20
Now, there are a few filters that can be applied to the search results, e.g. stars, rating score.
For example, if I want just 2 star hotels, I can query:
GET hotels/searchresults?location=<location>&offset=0&limit=20&stars=2
Now, in the user interface for filtering, it is common to display the number of options available per filter setting:
In my opinion, these numbers can be seen as meta data about the search query. So, we could add an extra field to the meta in the response:
“meta": {
“totalNrOfHotels": 100,
"totalNrAvailable": 80
“filterNrs": {
"stars”: {
“1": 1,
“2”: 9,
“3”: 39,
“4”: 12,
“5”: 11,
“none”: 9
}
}
},
“hotels": [
{“id": 123,
“name": "Hotel A"
},
{“id": 135,
“name": "Hotel B"
},
...
]
So, I have two questions:
Should this “filterNrs” property sit in the meta section, as proposed above? To me, it doesn’t make sense to be a separate resource/request
How can we deal with the fact that this can slow down the query? I’d prefer to make the “filterNrs” field optional. We are thinking of using a “metaFields" parameter to allow the user to specify which fields in the meta she would like to recieve. We already support this for the hotels returned, with a “fields” parameter. (Similar to: https://developers.google.com/youtube/2.0/developers_guide_protocol_partial). Alternatively, we put this field filterNrs (or the full meta info) in a separate resource, something like hotels/searchresults/meta. From a developers perspective would you prefer to have this split into multiple resources or have a single resource with the option to show full or partial meta information?
Does the number rated per star count varies? For example, do I get different "filterNrs" for the queries below?
GET hotels/searchresults?location=1
GET hotels/searchresults?location=2
I would expect such filters to be contextual, so different locations would return different numbers per star count, which indicates this is some form of contextual information related to the query.
Otherwise if the results are global this indicates it's a separate resource. If it's a separate resource scenario, you can use links to access the numbers and other details about it:
“meta": {
“totalNrOfHotels": 100,
"totalNrAvailable": 80
“filterNrs": {
"stars”: {
"options" : ["1", "2", "3", "4", "5", "none"],
"details" : "http://example.com/stars"
}
}
},