Elasticsearch query context vs filter context - apache

I am little bit confused with ElasticSearch Query DSL's query context and filter context. I have 2 below queries. Both queries return same result, first one evaluate score and second one does not. Which one is more appropriate ?
1st Query :-
curl -XGET 'localhost:9200/xxx/yyy/_search?pretty' -d'
{
"query": {
"bool": {
"must": {
"terms": { "mcc" : ["5045","5499"]}
},
"must_not":{
"term":{"maximum_flag":false}
},
"filter": {
"geo_distance": {
"distance": "500",
"location": "40.959334, 29.082142"
}
}
}
}
}'
2nd Query :-
curl -XGET 'localhost:9200/xxx/yyy/_search?pretty' -d'
{
"query": {
"bool" : {
"filter": [
{"term":{"maximum_flag":true}},
{"terms": { "mcc" : ["5045","5499"]}}
],
"filter": {
"geo_distance": {
"distance": "500",
"location": "40.959334, 29.082142"
}
}
}
}
}'
Thanks,

In the official guide you have a good explanation:
Query context
A query clause used in query context answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a _score representing how well the document matches, relative to other documents.
Query context is in effect whenever a query clause is passed to a query parameter, such as the query parameter in the search API.
Filter context
In filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, e.g.
Does this timestamp fall into the range 2015 to 2016?
Is the status field set to "published"?
Frequently used filters will be cached automatically by Elasticsearch, to speed up performance.
Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation.
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-filter-context.html
About your case, we would need more information, but taking into account you are looking for exact values, a filter would suit it better.

The first query is evaluating score because your are using "term" here directly inside without wrapping it inside "filter" so by default "term" written directly inside query run in Query context format which result in calculating score.
But in the case of second query you "term" inside "filter" which change it's context from Query Context to filter Context . And in the case of filter no score is calculated (by default _score 1 is allocated to all matching documents).
You can find more details about queries behavior in this article
https://towardsdatascience.com/deep-dive-into-querying-elasticsearch-filter-vs-query-full-text-search-b861b06bd4c0

Related

How to specify 'includes' for nested properties in a document in RavenDB

I am trying to use RavenDB's REST API to make some calls to my database and I'm wondering if there's a way to use the 'includes' feature to return documents that are nested in a document.
For example, I have an object, Order, that looks similar to this:
"Order": {
"Lines": [
{
"Product": "products/11-A"
},
{
"Product": "products/42-A"
},
{
"Product": "products/72-A"
}
],
"OrderedAt": "1996-07-04T00:00:00.0000000",
"Company": "companies/85-A"
}
Company maps to another document and is simple enough to include in the query.
{ "Query": "from Orders include Company" }
My problem is Product that is nested in Lines, which is an array of order lines. Since I didn't find anything in the documentation about it I tried things like include Product or include Lines.Product, but those didn't work.
Is this kind of thing possible with the REST API? If so, how would I go about doing it?
The syntax to Query for Related Documents from the client can be found in this demo:
https://demo.ravendb.net/demos/csharp/related-documents/query-related-documents
The matching RQL to be used in the Query when using REST API is:
from 'Orders' include 'Lines[].Product'

Cumulocity Inventory API filter by Creation Date

I'm currently trying to implement a simple date filter for the Inventory API using the query language. The filter should return a list of managed objects which were created after a given date. For some reasons I always receive an empty list as result but the example in the query language documentation looks the same as my query:
GET {{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.351Z'
gives me
{
"managedObjects": [],
"next": "{{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.351Z'&pageSize=5&currentPage=2",
"statistics": {
"currentPage": 1,
"pageSize": 5
},
"self": "{{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.351Z'&pageSize=5&currentPage=1"
}
And if I try this structure for the timestamp I even receive an error:
GET {{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.3512B1:00'
{
"error": "inventory/Invalid Data",
"info": "https://www.cumulocity.com/guides/reference-guide/#error_reporting",
"message": "Find by filter query failed : Query 'creationTime gt '2018-12-01T09:00:00'' could not be understood. Please try again."
}
Try to filter by
creationTime.date
Background is that the timestamps are stored as MongoDb dates.
You can also check the device list filter in device management which has a filter on creationTime as well.

How to implement a backend for GraphQL connections?

In GraphQL the recommended way for pagination is to use connections as described here. I understand the reasons and advantages of this usage but I need an advice how to implement it.
The server side of the application works on top of a SQL database (Postgres in my case). Some of the GraphQL connection fields have optional argument to specify sorting. Now with knowing the sorting columns and a cursor from the GraphQL query, how can I build an SQL query? Of course it should be efficient - if there is a SQL index index for the combination of sorting columns it should be used.
The problem is that SQL doesn't know anything like GraphQL cursors - we can't tell it to select all rows after certain row. There is just WHERE, OFFSET and LIMIT. From my point of view it seems I need to firstly select a single row based on the cursor and then build a second SQL query using the values of the sorting columns in that row to specify a complicated WHERE clause - not sure if the database would use index in that case.
What bothers me is that I could not find any article on this topic. Does it mean that SQL database is not usually used when implementing a GraphQL server? What database should be used then? How are GraphQL queries to connection fields usually transformed to queries for the underlying database?
EDIT: This is more or less what I came up with myself. The problem is how to extend it to support sorting as well and how to implement it efficiently using database indexes.
The trick here is that, as the server implementer, the cursor can be literally any value you want encoded as a string. Most examples I've seen have been base64-encoded for a bit of opacity, but it doesn't have to be. (Try base64-decoding the cursors from the Star Wars examples in your link, for example.)
Let's say your GraphQL schema looks like
enum ThingColumn { FOO BAR }
input ThingFilter {
foo: Int
bar: Int
}
type Query {
things(
filter: ThingFilter,
sort: ThingColumn,
first: Int,
after: String
): ThingConnection
}
Your first query might be
query {
things(filter: { foo: 1 }, sort: BAR, first: 2) {
edges {
node { bar }
}
pageInfo {
endCursor
hasNextPage
}
}
}
This on its own could fairly directly translate into an SQL query like
SELECT bar FROM things WHERE foo=1 ORDER BY bar ASC LIMIT 2;
Now as you iterate through each item you can just use a string version of its offset as its cursor; that's totally allowed by the spec.
{
"data": {
"things": {
"edges": [
{ "node": { "bar": 17 } },
{ "node": { "bar": 42 } }
],
"pageInfo": {
"endCursor": "2",
"hasNextPage": true
}
}
}
}
Then when the next query says after: "2", you can turn that back into an SQL OFFSET and repeat the query.
If you're trying to build a generic GraphQL interface that gets translated to reasonably generic SQL queries, it's impossible to create indexes such that every query is "fast". Like other cases, you need to figure out what your common and/or slow queries are and CREATE INDEX as needed. You might be able to limit the options in your schema to things you know you can index:
type Other {
things(first: Int, after: String): ThingConnection
}
query OtherThings($id: ID!, $cursor: String) {
node(id: $id) {
... on Other {
things(first: 100, after: $cursor) { ... FromAbove }
}
}
}
SELECT * FROM things WHERE other_id=? ORDER BY id LIMIT ?;
CREATE INDEX things_other ON things(other_id);

How to create elasticsearch index alias that excludes specific fields

I'm using Elasticsearch's index aliases to create restricted views on a more-complete index to support a legacy search application. This works well. But I'd also like to exclude certain sensitive fields from the returned result (they contain email addresses, and we want to preclude harvesting.)
Here's what I have:
PUT full-index/_alias/restricted-index-alias
{
"_source": {
"exclude": [ "field_with_email" ]
},
"filter": {
"term": { "indexflag": "noindex" }
}
}
This works for queries (I don't see field_with_email), and the filter term works (I get a restricted index) but I still see the field_with_email in query results from the index alias.
Is this supposed to work?
(I don't want to exclude from _source in the mapping, as I'm also using partial updates; these are easier if the entire document is available in _source.)
No, it is not supposed to work, and the documentation doesn't suggest that it should work.

Simple full text search in ElasticSearch

I'm trying to understand how ElasticSearch Query DSL works.
It would be a lot of help if anyone could give me an example how to perform a search like the following MySQL query:
SELECT * FROM products
WHERE shop_id = 1
AND MATCH(title, description) AGAINST ('test' IN BOOLEAN MODE)
Assuming that you indexed some documents containing at least the shop_id, title and description fields, something like the following example:
{
"shop_id" : "here goes your shop_id",
"title" : "here goes your title",
"description" : "here goes your description"
}
You can execute a multi match query against multiple fields, and give them a different weight (usually title is more important). You can also combine the query with a term filter on shop_id:
{
"query" : {
"multi_match" : {
"query" : "here goes your query",
"fields" : [ "title^2", "description" ]
},
"filter" : {
"term" : { "shop_id" : "here goes your shop id" }
}
}
You need to submit the query using the search API. Filters are used to reduce the set of documentsthe query is eecute against. Filters are faster since don't involve scoring and cached. In my example I applied a top level filter, which might be or not a good fit for you depending on what else you want to do next. If you want to make a facet, for instance, the filter would be ignored in the facet. Another way to add a filter, which would be taken into account while computing the facets as well, is the filtered query.