Field being ignored when querying - lucene

I have the following document
{
"authors" : "Nanna Friis",
"authorsId" : [ "4642" ],
"description" : "Med denne praktiske og pædagogiske håndbog kommer du hele vejen rundt om at skrive godt til nettet. Du bliver taget ved hånden og får en grundig gennemgang af de helt særlige præmisser, der hersker på nettet. ",
"iSBN" : "9788762904118",
"mediaType" : "10",
"name" : "Kort, klart og klikbart",
"nameSort" : "Kort, klart og klikbart",
"price" : 250.0,
"productId" : "9788762904118",
"publicationAreaCode" : "3077",
"tags" : [ ],
"titleId" : "25004"
}
When doing a query like this http://localhost:9200/titles/_search?q=Nanna* i don't get any results. If i instead query on ie. the productId like this http://localhost:9200/titles/_search?q=9788762904118 im getting the document in question.
What is going on?

you do not specify the query field in the request
in such case you will search the Default Search Field
When not explicitly specifying the field to search on in
the query string syntax, the index.query.default_field will be used to
derive which field to search on. It defaults to _all field.
So, if _all field is disabled, it might make sense to change it to set
a different default field.

Related

ElasticSearch Query results in error "input_mismatch_exception" when executing LOWER()

I am working on search functionality and I need to execute a simple query that checks if there is anything matching the search string converted to lowercase. In simpler terms, user searches "SiteName", and I query if there is anything matching "sitename".
However, I get an error when I use LOWER() function in the query.
This is what I tried:
POST /_sql?format=json
{
"query":"SELECT siteid, sitename FROM zones WHERE
sitename LIKE LOWER('SiteFirst') ", "fetch_size" : 90
}
and I get this error:
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "line 1:70: mismatched input 'LOWER' expecting {'?',
STRING}"
}
],
"type" : "parsing_exception",
"reason" : "line 1:70: mismatched input 'LOWER' expecting {'?', STRING}",
"caused_by" : {
"type" : "input_mismatch_exception",
"reason" : null
}
},
"status" : 400
}
This same query works without LOWER().
Any suggestions about how to fix this error?
Thanks!
I'm pretty sure the LOWER is called LCASE in ES SQL.
More importantly, LIKE works only on exact fields, plus it's recommended to use MATCH instead of LIKE.
So try this:
POST /_sql?format=json
{
"query": "SELECT siteid, sitename FROM zones WHERE MATCH(sitename, 'SiteFirst')",
"fetch_size": 90
}

How do I query an array in MongoDB?

I have been trying multiple queries but still can't figure it out. I have multiple documents that look like this:
{
"_id" : ObjectId("5b51f519a33e7f54161a0efb"),
"assigneesEmail" : [
"felipe#gmail.com"
],
"organizationId" : "5b4e0de37accb41f3ac33c00",
"organizationName" : "PaidUp Volleyball Club",
"type" : "athlete",
"firstName" : "Mylo",
"lastName" : "Fernandes",
"description" : "",
"status" : "active",
"createOn" : ISODate("2018-07-20T14:43:37.610Z"),
"updateOn" : ISODate("2018-07-20T14:43:37.610Z"),
"__v" : 0
}
I need help writing a query where I can find this document by looking up the email in any part of the array element assigneeEmail. Any suggestions? I have tried $elemMatch but still could not get it to work.
Looks like my query was just incorrect. I figured it out.

On inserting province/state on orders using Shopify API

I can't seem to insert the state name or code on API orders in Shopify.
When I use this to create an order using API.
"shipping_address" : {
"first_name" : "Ajo",
"last_name" : "Fod",
"address1" : "90 selwyn rd",
"address2" : "",
"city" : "Braintree ",
"province" : "Ma",
"zip" : "02184",
"country" : "Usa",
"province_code" : "MA"
},
I get this in the order dashboard on Shopify:
Why is the state code missing?
How do I get the state code back in there?
The countries and provinces are validated. Instead of Usa try using US or another ISO 3166-1 alpha-2 country code.

Search multiple fields with "and" operator (but use fields' own analyzers)

ElasticSearch Version: 0.90.2
Here's the problem: I want to find documents in the index so that they:
match all query tokens across multiple fields
fields own analyzers are used
So if there are 4 documents:
{ "_id" : 1, "name" : "Joe Doe", "mark" : "1", "message" : "Message First" }
{ "_id" : 2, "name" : "Ann", "mark" : "3", "message" : "Yesterday Joe Doe got 1 for the message First"}
{ "_id" : 3, "name" : "Joe Doe", "mark" : "2", "message" : "Message Second" }
{ "_id" : 4, "name" : "Dan Spencer", "mark" : "2", "message" : "Message Third" }
And the query is "Joe First 1" it should find ids 1 and 2. I.e., it should find documents which contain all the tokens from search query, no matter in which fields they are (maybe all tokens are in one field, or maybe each token is in its own field).
One solution would be to use elasticsearch "_all" field functionality: that way it will merge all the fields I need (name, mark, message) into one and I'll be able to query it with something like
"match": {
"_all": {
"query": "Joe First 1",
"operator": "and"
}
}
But this way I can specify analyzer for the "_all" field only. And I need "name" and "message" fields to have different set of tokenizers/token filters (let's say name will have phonetic analyzer and message will have some stemming token filter).
Is there a way to do this?
Thanks to guys at elasticsearch group, here's the solution... pretty simple need to say :)
All I needed to do is to use query_string query http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query/ with default_operator = AND and it will do the trick:
{
"query": {
"query_string": {
"fields": [
"name",
"mark",
"message"
],
"query": "Joe First 1",
"default_operator": "AND"
}
}
}
I think using a multi match query makes sense here. Something like:
"multi_match": {
"query": "Joe First 1",
"operator": "and"
"fields": [ "name", "message", "mark"]
}
As you say, you can set the analyzer (or search_analyzer/index_analyzer) to be used on the _all field. It seems to me that should indeed be your first step to achieve the query results you're looking for.
From http://jontai.me/blog/2012/10/lucene-scoring-and-elasticsearch-_all-field/, we have this tasty quote:
... the _all field copies the text from the other fields and analyzes
them again; it doesn’t copy the pre-analyzed tokens. You can set a
separate analyzer for the _all field.
Which I interpret to mean that you should set your _all analyzer(s) as well as individual field analyzer(s). The _all field won't re-analyze the individual field data; it will grab the original field contents.

elasticsearch: how to index terms which are stopwords only?

I had much success building my own little search with elasticsearch in the background. But there is one thing I couldn't find in the documentation.
I'm indexing the names of musicians and bands. There is one band called "The The" and due to the stop words list this band is never indexed.
I know I can ignore the stop words list completely but this is not what I want since the results searching for other bands like "the who" would explode.
So, is it possible to save "The The" in the index but not disabling the stop words at all?
You can use the synonym filter to convert The The into a single token eg thethe which won't be removed by the stopwords filter.
First, configure the analyzer:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"filter" : {
"syn" : {
"synonyms" : [
"the the => thethe"
],
"type" : "synonym"
}
},
"analyzer" : {
"syn" : {
"filter" : [
"lowercase",
"syn",
"stop"
],
"type" : "custom",
"tokenizer" : "standard"
}
}
}
}
}
'
Then test it with the string "The The The Who".
curl -XGET 'http://127.0.0.1:9200/test/_analyze?pretty=1&text=The+The+The+Who&analyzer=syn'
{
"tokens" : [
{
"end_offset" : 7,
"position" : 1,
"start_offset" : 0,
"type" : "SYNONYM",
"token" : "thethe"
},
{
"end_offset" : 15,
"position" : 3,
"start_offset" : 12,
"type" : "<ALPHANUM>",
"token" : "who"
}
]
}
"The The" has been tokenized as "the the", and "The Who" as "who" because the preceding "the" was removed by the stopwords filter.
To stop or not to stop
Which brings us back to whether we should include stopwords or not? You said:
I know I can ignore the stop words list completely
but this is not what I want since the results searching
for other bands like "the who" would explode.
What do you mean by that? Explode how? Index size? Performance?
Stopwords were originally introduced to improve search engine performance by removing common words which are likely to have little effect on the relevance of a query. However, we've come a long way since then. Our servers are capable of much more than they were back in the 80s.
Indexing stopwords won't have a huge impact on index size. For instance, to index the word the means adding a single term to the index. You already have thousands of terms - indexing the stopwords as well won't make much difference to size or to performance.
Actually, the bigger problem is that the is very common and thus will have a low impact on relevance, so a search for "The The concert Madrid" will prefer Madrid over the other terms.
This can be mitigated by using a shingle filter, which would result in these tokens:
['the the','the concert','concert madrid']
While the may be common, the the isn't and so will rank higher.
You wouldn't query the shingled field by itself, but you could combine a query against a field tokenized by the standard analyzer (without stopwords) with a query against the shingled field.
We can use a multi-field to analyze the text field in two different ways:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"test" : {
"properties" : {
"text" : {
"fields" : {
"shingle" : {
"type" : "string",
"analyzer" : "shingle"
},
"text" : {
"type" : "string",
"analyzer" : "no_stop"
}
},
"type" : "multi_field"
}
}
}
},
"settings" : {
"analysis" : {
"analyzer" : {
"no_stop" : {
"stopwords" : "",
"type" : "standard"
},
"shingle" : {
"filter" : [
"standard",
"lowercase",
"shingle"
],
"type" : "custom",
"tokenizer" : "standard"
}
}
}
}
}
'
Then use a multi_match query to query both versions of the field, giving the shingled version more "boost"/relevance. In this example the text.shingle^2 means that we want to boost that field by 2:
curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d '
{
"query" : {
"multi_match" : {
"fields" : [
"text",
"text.shingle^2"
],
"query" : "the the concert madrid"
}
}
}
'