ElasticSearch for Attribute(Key) value data set - e-commerce

I am using Elasticsearch with Haystacksearch and Django and want to search the follow structure:
{
{
"title": "book1",
"category" : ["Cat_1", "Cat_2"],
"key_values" :
[
{
"key_name" : "key_1",
"value" : "sample_value_1"
},
{
"key_name" : "key_2",
"value" : "sample_value_12"
}
]
},
{
"title": "book2",
"category" : ["Cat_3", "Cat_2"],
"key_values" :
[
{
"key_name" : "key_1",
"value" : "sample_value_1"
},
{
"key_name" : "key_3",
"value" : "sample_value_6"
},
{
"key_name" : "key_4",
"value" : "sample_value_5"
}
]
}
}
Right now I have set up an index model using Haystack with a "text" that put all the data together and runs a full text search! In my opinion this is not the a well established search 'cause I am not using my data set structure and hence this is some kind odd.
As an example if for an object I have a key-value
{
"key_name": "key_1",
"value": "sample_value_1"
}
and for another object I have
{
"key_name": "key_2",
"value": "sample_value_1"
}
and we it gets a query like "Key_1 sample_value_1" comes I get a thoroughly mixed result of objects who have these words in their fields rather than using their structures.
P.S. I am totally new to ElasticSearch and better to say new to the search technologies and challenges. I have searched the web and SO button didn't find anything satisfying. Please let me know if there is something wrong with my thoughts and expectations from these search engines and if there is SO duplicate question! And also if there is a better approach to design a database for this kind of search

Read the es docs on nested mappings and do something like this:
"book_type" : {
"properties" : {
// title, cat mappings
"key_values" : {
"type" : "nested"
"properties": {
"key_name": {
"type": "string", "index": "not_analyzed"
},
"value": {
"type": "string"
}
}
}
}
}
Then query using a nested query
"nested" : {
"path" : "key_values",
"query" : {
"bool" : {
"must" : [
{
"term" : {"key_values.key_name" : "key_1"}
},
{
"match" : {"key_values.value" : "sample_value_1"}
}
]
}
}
}

Related

Querying data from Elasticsearch

Using Elasticsearch 7.*, trying to execute SQL query on an index 'com-prod':
GET /com-prod/_search
{
"script_fields": {
"test1": {
"script": {
"lang": "painless",
"source": "params._source.ElapsedTime"
}
}
}
}
It gives the output and below as one of the hit successfully:
"hits" : [
{
"_index" : "com-prod",
"_type" : "_doc",
"_id" : "abcd",
"_score" : 1.0,
"fields" : {
"test1" : [
"29958"
]
}
}
Now, I am trying to increment the ElapsedTime by 2, as below:
GET /com-prod/_search
{
"script_fields": {
"test2": {
"script": {
"lang": "painless",
"source": "params._source.ElapsedTime + 2"
}
}
}
}
But its actually adding number 2 to the output, as below:
"hits" : [
{
"_index" : "com-prod",
"_type" : "_doc",
"_id" : "abcd",
"_score" : 1.0,
"fields" : {
"test2" : [
"299582"
]
}
}
Please guide what could be wrong here, and how to get the output as 29960.
You are getting 299582, instead of 29960, because the ElapsedTime field is of string type ("29958"), so when you are adding 2 in this using script, 2 gets appended at the end (similar to concat two strings).
So, in order to solve this issue, you can :
Create a new index, with updated mapping of the ElaspsedTIme field of int type, then reindex the data. Then you can use the same search query as given in the question above.
Convert the string to an int type value, using Integer.parseInt()
GET /com-prod/_search
{
"script_fields": {
"test2": {
"script": {
"lang": "painless",
"source": "Integer.parseInt(params._source.ElapsedTime) + 2"
}
}
}
}

Json Schema required validation

I have my json schema where all values are required. For example:
....
{
"properties" : {
"minimumDelay" : {
"type" : "number"
},
"length" : {
"type" : "number"
},
},
"required": {
"minimumDelay",
"length"
}
Here the json data will be valid if I enter both minimumDelay and length values.
But my requirement is json data must be valid when I enter either 1 of the values(like XOR case). How my schema must be modified to achieve the same?
In JSON Schema, the XOR operator is oneOf.
{
"properties" : {
"minimumDelay" : {
"type" : "number"
},
"length" : {
"type" : "number"
}
},
"oneOf": [
{ "required": ["minimumDelay"] },
{ "required": ["length"] }
]
}

What is the default doc sequence of the result from an Elasticsearch filter request?

I recently run an Elasticsearch filter request that is
{
"from" : 0,
"size" : 10,
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : {
"terms" : {
"a_id" : [ 257793, 257798, 257844 ]
}
}
}
}
}
},
"explain" : false,
"fields" : "a_id"
}
So that I can find all docs with a_id in 257793, 257798, 257844 and the results are 257844, 257798, 257793. So far so good.
Then I find that whatever the sequence of the term numbers are, the return docs are always in the same a_id order. That is, even I run
"terms" : {
"a_id" : [257798, 257844, 257793 ]
}
The result docs are in the order of 257844, 257798, 257793 as well.
So I am so curious about the mechanism behind the Elasticsearch filtering. Can anyone help me and give me a hint?
By default, ES returns in descending order of _score. You can provide the sort option, to say in which order and based on what you want the results to be returned. For e.g., for based on date field
{
"sort": { "date": { "order": "desc" }}
"query" : {
"term" : { "user" : "kimchy" }
}
}
You can get more information:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/_sorting.html

How to know if a geo coordinate lies within a geo polygon in elasticsearch?

I am using elastic search 1.4.1 - 1.4.4. I'm trying to index a geo polygon shape (document) into my index and now when the shape is indexed i want to know if a geo coordinate lies within the boundaries of that particular indexed geo-polygon shape.
GET /city/_search
{
"query":{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"location" : {
"points" : [
[72.776491, 19.259634],
[72.955705, 19.268060],
[72.945406, 19.189611],
[72.987291, 19.169507],
[72.963945, 19.069596],
[72.914506, 18.994300],
[72.873994, 19.007933],
[72.817689, 18.896882],
[72.816316, 18.941052],
[72.816316, 19.113720],
[72.816316, 19.113720],
[72.790224, 19.192205],
[72.776491, 19.259634]
]
}
}
}
}
}
}
With above geo polygon filter i'm able get all indexed geo-coordinates lies within described polygon but i also need to know if a non-indexed geo-coordinate lies with in this geo polygon or not. My doubt is that if that is possible in the elastic search 1.4.1.
Yes, Percolator can be used to solve this problem.
As in normal use case of Elasticsearch, we index our docs into elasticsearch and then we run queries on indexed data to retrieve matched/ required documents.
But percolators works in a different way of it.
In percolators you register your queries and then you percolate your documents through registered queries and gets back the queries which matches your documents.
After going through infinite number of google results and many of blogs i wasn't able to find any thing which could explain how i can use percolators to solve this problem.
So i'm explaining this with an example so that other people facing same problem can take a hint from my problem and the solution i found. I would like if someone can improve my answer or can share a better approach of doing it.
e.g:-
First of all we need to create an index.
PUT /city/
then, we need to add a mapping for user document which consist a user's
latitude-longitude for percolating against registered queries.
PUT /city/user/_mapping
{
"user" : {
"properties" : {
"location" : {
"type" : "geo_point"
}
}
}
}
Now, we can register our geo polygon queries as percolators with id as city name or any other identifier you want to.
PUT /city/.percolator/mumbai
{
"query":{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"location" : {
"points" : [
[72.776491, 19.259634],
[72.955705, 19.268060],
[72.945406, 19.189611],
[72.987291, 19.169507],
[72.963945, 19.069596],
[72.914506, 18.994300],
[72.873994, 19.007933],
[72.817689, 18.896882],
[72.816316, 18.941052],
[72.816316, 19.113720],
[72.816316, 19.113720],
[72.790224, 19.192205],
[72.776491, 19.259634]
]
}
}
}
}
}
}
Let's register another geo polygon filter for another city
PUT /city/.percolator/delhi
{
"query":{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"location" : {
"points" : [
[76.846998, 28.865160],
[77.274092, 28.841104],
[77.282331, 28.753252],
[77.482832, 28.596619],
[77.131269, 28.395064],
[76.846998, 28.865160]
]
}
}
}
}
}
}
Now we have registered 2 queries as percolators and we can make sure by making this API call.
GET /city/.percolator/_count
Now to know if a geo point exist with any of registered cities we can percolate a user document using below query.
GET /city/user/_percolate
{
"doc": {
"location" : {
"lat" : 19.088415,
"lon" : 72.871248
}
}
}
This will return : _id as "mumbai"
{
"took": 25,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"total": 1,
"matches": [
{
"_index": "city",
"_id": "mumbai"
}
]
}
trying another query with different lat-lon
GET /city/user/_percolate
{
"doc": {
"location" : {
"lat" : 28.539933,
"lon" : 77.331770
}
}
}
This will return : _id as "delhi"
{
"took": 25,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"total": 1,
"matches": [
{
"_index": "city",
"_id": "delhi"
}
]
}
Let's run another query with random lat-lon
GET /city/user/_percolate
{
"doc": {
"location" : {
"lat" : 18.539933,
"lon" : 45.331770
}
}
}
and this query will return no matched results.
{
"took": 5,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"total": 0,
"matches": []
}

Can Elasticsearch make suggestions for mapping?

Playing around with Elasticsearch I added a document to my index called "pets", that looks like this:
{
"name" : "Piper",
"type" : "dog"
}
Then I added a second document:
{
"name" : "Max",
"type" : "dog",
"breed": "Scottish Terrier"
}
Now, I understand that the mapping of my "pets" index is initially created based on my first document ( unless i define a mapping at some point ). However, I am curious to know if ES can suggest a mapping based on the existing data ( like MySQL's "Propose table structure" ) or maybe update the mapping automatically.
Yes, ElasticSearch will automatically update the mapping.
Sometimes the language in the ElasticSearch documentation makes it sound like once the mapping is set, it cannot be changed. This is only true for the existing fields. Any additional fields will be automatically assigned a type and added to the mapping.
Remember you can always check the mapping of an index with the get mapping API:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-get-mapping.html
For example, with the example you have above, after your first "pet" document the mapping is:
{
"my_index": {
"mappings": {
"pet": {
"properties": {
"name": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
And after the second "pet" document, your mapping is:
{
"my_index": {
"mappings": {
"pet": {
"properties": {
"breed": {
"type": "string"
},
"name": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
I'm not familiar with MySQL's propose table structure, so I can't comment on that...