How to search for an element in literal array or integer array in cloud search - amazon-cloudsearch

I have an cloud search domain where I have index on a column named "color_f_la". Its a faceted index and is a literal array. Sample value for it is :
[
"Blue",
"Green"
]
I've been trying to find out the documentation to construct a query which would search for a particular Color, but to no avail. Is it even possible?

If you're running the search with the "Structured" Query Parser, you should be able to pass your value directly as if it were a literal. There is no difference for querying literal-array fields.
For example:
(and color_f_la:'Blue')
Or if you want to return things that have blue or green:
(or color_f_la:'Blue' color_f_la:'Green')
Or if you want to return things that have both blue and green only:
(and color_f_la:'Blue' color_f_la:'Green')

Related

get each number in String and Compare in TCL/tk

I have string output:
1 4 2 1 4
I want to get each character in string to compare.
I did it to want to know whether the list is sorted yet.
It's not exactly clear to me what you are trying to achieve. Going by "to know whether the list is sorted", and assuming a list of integers, you can use tcl::mathop::< or tcl::mathop::<=, depending on whether you want to allow duplicate values:
if {[tcl::mathop::<= {*}$list]} {
puts "List is sorted"
} else {
puts "List is mixed up"
}
This will also work for ASCII comparison of strings. For more complex comparisons, like using dictionary rules or case insensitive, it's probably easiest to combine that with lsort along with the -indices option:
tcl::mathop::< {*}[lsort -indices -dictionary $list]
The -indices option returns the original index of each list element in sorted order. By checking if those indices are in incremental order, you know if the original list was already sorted.
Of course, if the point of the exercise was to avoid unnecessary sorting, then this is no use. But then again, bubble sort of an already sorted list is very fast and will basically do exactly the comparisons you described. So just sorting will probably be faster than first checking for a sorted list via a scripted loop.
To get each character in the string, do split $the_string "" (yes, on the empty string). That gives you a list of all the characters in the string; you can use foreach to iterate over them. Remember, you can iterate over two (or more) lists at once:
foreach c1 [split $the_string ""] c2 $target_comparison_list {
if {$c1 ne $c2} {
puts "The first not equal character is “$c1” when “$c2” was expected"
break
}
}
Note that it's rarely useful to continue comparison after a difference is found as the most common differences are (relative to the target string) insertions and deletions; almost everything after either of those will differ.

How to get the equivalent of combinig [contains] and [in] operators in the same query?

So I have a field that's a multi-choice on the Directus back end so when the JSON comes out of the API it's a one-dimensional array, like so:
"field_name": [
"",
"option 6",
"option 11",
""
]
(btw I have no idea why all these fields produce those blank values, but that's a matter for another day)
I am trying to make an interface on the front end where you can select one or more of these values and the result will come back if ANY of them are found for that record. Think of it like a tag list, if the item has just one of the values it should be returned.
I can use the [contains] operator to find if it has one of the values I'm looking for, but I can only pass a single value, whereas I need all that have either optionX OR optionY OR optionZ. I would basically need a combination of [contains] and [in] to achieve what I'm trying to do. Is there a way to achieve this?
I've also tried setting the [logical] operator to OR, but then that screws up the other filters that need to be included as AND (or I'm doing something wrong). Not to mention the query gets completely unruly.
Help?

Lucene NOT_ANALYZED not working with uppercase characters

I have build an index using a StandardAnalyzer, in this index are a few fields. For example purposes, imagine it has Id and Type. Both are NON_ANALYZED, meaning you can only search for them as-is.
There are a few entries in my index:
{Id: "1", Type: "Location"},
{Id: "2", Type: "Group"},
{Id: "3", Type: "Location"}
When I search for +Id:1 or any other number, I get the appropriate result (again using StandardAnalyzer).
However, when I search for +Type:Location or the +Type:Group, I'm not getting any results. The strange thing is that when I enable leading wildcards, that +Type:*ocation does return results! +Type:*Location or other combinations do not.
This got me leading to believe the indexer/query doesn't like uppercase characters! After lowercasing the Type to location and group before indexing them, I could search for them as such.
If I turn the Type-field to ANALYZED, it works with pretty much any search (uppercase/lowercase, etc), but I want to query for the Type-field as-is.
I'm completely baffled why it's doing this. Could anyone explain to me why my indexer doesn't let me search for NON_ANALYZED fields that have a capital in their value?
Are you using StandardAnalyzer when parsing your your query string (+Type:Location)? The StandardAnalyzer will lower-case all terms, so you're really searching with +Type:location.
Always use the same analyzer when searching and indexing. Look into using the PerFieldAnalyzer and set the Type field to use the KeywordAnalyzer.

Cloudant - Lucene range search using numbers stored as text

I have a number of documents in Cloudant, that have ID field of type string. ID can be a simple string, like "aaa", "bbb" or number stored as text, e.g. "111", "222", etc. I need to be able to full text search using the above field, but I encountered some problems.
Assuming that I have two documents, having ID="aaa" and ID="111", then searching with query:
ID:aaa
ID:"aaa"
ID:[aaa TO zzz]
ID:["aaa" TO "zzz"]
returns first document, as expected
ID:111
returns nothing, but
ID:"111"
returns second document, so at least there is a way to retrieve it.
Unfortunately, when searching for range:
ID:[111 TO 999]
ID:["111" TO "999"]
I get no results, and I have no idea what to do to get around this problem. Is there any special syntax for such case?
UPDATE:
Index function:
function(doc){
if(!doc.ID) return;
index("ID", doc.ID, { index:'not_analyzed_no_norms', store:true });
}
Changing index to analyzed doesn't help. Analyzer itself is keyword, but changing to standard doesn't help either.
UPDATE 2
Just to add some more context, because I think I missed one key point. The field I'm indexing will be searched using ranges, and both min and max values can be provided by user. So it is possible that one of them will be number stored as a string, while other will be a standard non-numeric text. For example search all document where ID >= "11" and ID <= "foo".
Assumig that database contains documents with ID "1", "5", "alpha", "beta", "gamma", this query should return "5", "alpha", "beta". Please note that "5" should actually be returned, because string "5" is greater than string "11".
Our team just came to a workaround solution. We managed to get proper results by adding some arbitrary character, e.g. 'a' to an upper range value, and by introducing additional search term, to exclude documents having ID between upper range value and upper range value + 'a'.
When searching for a range
ID:[X TO Y]
actual query would be
(ID:[X TO Ya] AND -ID:{Y TO Ya])
For example, to find a documents having ID between 23 and 758, we execute
(ID:[23 TO 758a] AND -ID:{758 TO 758a]).
First of all, I would suggest to use keyword analyzer, so you can control the right tokenization during both indexing and search.
"analyzer": "keyword",
"index": "function(doc){\n if(!doc.ID) return;\n index(\"ID\", doc.ID, {store:true });\n}
To retrieve you document with _id "111", use the following range query:
curl -X GET "http://.../facetrangetest/_design/ddoc/_search/f?q=ID:\[111%20TO%A\]"
If you use a query q=ID:\[111%20TO%20999\], Cloudant search seeing numbers on both size of the range, will interpret it as NumericRangeQuery; and since your ID of "111" is a String, it will not be part of the results returned. Including a string into query [111%20TO%20A], will make Cloudant interpret it as a range query on strings.
You can get both docs returned like this:
q=ID:["111" TO "CCC"]
Here's a working live example:
https://rajsingh.cloudant.com/facetrangetest/_design/ddoc/_search/f?q=ID:[%22111%22%20TO%20%22CCC%22]
I found something quirky. It seems that range queries on strings only work if at least one of the range values is a string. Querying on ID:["111" TO "555"] doesn't return anything either, so maybe this is resolving to a numeric query somehow? Could be a bug.
This could also be achieved using regular expressions in queries. Something line this:
curl -X POST "https://.../facetrangetest/_design/ddoc/_search/f" -d '{"q":"ID:/<23-758>/"}' | jq .
This regular expressions means to retrieve all documents with ID field from 23 to 758. Slashes: / / are used to enclose a regular expression; the interval is enclosed inside <>.

Elasticsearch: match every position only once

In my Elasticsearch index I have documents that have multiple tokens at the same position.
I want to get a document back when I match at least one token at every position.
The order of the tokens is not important.
How can I accomplish that? I use Elasticsearch 0.90.5.
Example:
I index a document like this.
{
"field":"red car"
}
I use a synonym token filter that adds synonyms at the same positions as the original token.
So now in the field, there are 2 positions:
Position 1: "red"
Position 2: "car", "automobile"
My solution for now:
To be able to ensure that all positions match, I index the maximum position as well.
{
"field":"red car",
"max_position": 2
}
I have a custom similarity that extends from DefaultSimilarity and returns 1 tf(), idf() and lengthNorm(). The resulting score is the number of matching terms in the field.
Query:
{
"custom_score": {
"query": {
"match": {
"field": "a car is an automobile"
}
},
"_script": "_score*100/doc[\"max_position\"]+_score"
},
"min_score":"100"
}
Problem with my solution:
The above search should not match the document, because there is no token "red" in the query string. But it matches, because Elasticsearch counts the matches for car and automobile as two matches and that gives a score of 2 which leads to a script score of 102, which satisfies the "min_score".
If you needed to guarantee 100% matches against the query terms you could use minimum_should_match. This is the more common case.
Unfortunately, in your case, you wish to provide 100% matches of the indexed terms. To do this, you'll have to drop down to the Lucene level and write a custom (java - here's boilerplate you can fork) Similarity class, because you need access to low-level index information that is not exposed to the Query DSL:
Per document/field scanned in the query scorer:
Number of analyzed terms matched (overlap is the Lucene terminology, it is used the the coord() method of the DefaultSimilarity class)
Number of total analyzed terms in the field: Look at this thread for a couple different ways to get this information: How to count the number of terms for each document in lucene index?
Then your custom similarity (you can probably even extend DefaultSimilarity) will need to detect queries where terms matched < total terms and multiply their score by zero.
Since query and index-time analysis have already happened at this level of scoring, the total number of indexed terms will already be expanded to include synonyms, as should the query terms, avoiding the false-positive "a car is an automobile" issue above.