How to query field inside nested arrays in CosmosDB SQL - sql

How can I return all documents which have parameter.code = "123", given this document structure, using CosmosDB SQL query? Is it necessary to use a UDF? (If so, how?)
{
"batch_id": "abc",
"samples": [
{
"sample_id": "123",
"tests": [
{
"parameter": {
"code": "123", // <- target
}
}
]
}
]
}

No need to use UDF(User Defined Function),just use cosmos db query sql with double JOIN.
SQL:
SELECT c.batch_id FROM c
join samples in c.samples
join tests in samples.tests
where tests.parameter.code = "123"
Output:

Related

There's a count difference between Druid Native Query and Druid SQL when using query

I have a problem with Druid Query.
I wanted to get data count with hour granularity.
So, I used Druid SQL like this.
SELECT TIME_FLOOR(__time, 'PT1H') AS t, count(1) AS cnt FROM mydatasource GROUP BY 1
then I got response like this.
[
{
"t": "2022-08-31T09:00:00.000Z",
"cnt": 12427
},
{
"t": "2022-08-31T10:00:00.000Z",
"cnt": 16693
},
{
"t": "2022-08-31T11:00:00.000Z",
"cnt": 16694
},
...
But, When using native query like this,
{
"queryType": "timeseries",
"dataSource": "mydatasource",
"intervals": "2022-08-31T07:01Z/2022-09-01T07:01Z",
"granularity": {
"type": "period",
"period": "PT1H",
"timeZone": "Etc/UTC"
},
"aggregations": [
{
"name": "count",
"type": "longSum",
"fieldName": "count"
}
],
"context": {
"skipEmptyBuckets": "true"
}
}
There's a difference result.
[
{
"timestamp": "2022-08-31T09:00:00.000Z",
"result": {
"count": 1288965
}
},
{
"timestamp": "2022-08-31T10:00:00.000Z",
"result": {
"count": 1431215
}
},
{
"timestamp": "2022-08-31T11:00:00.000Z",
"result": {
"count": 1545258
}
},
...
I want to use the result of Native Query.
What's the problem in my Druid SQL query??
How do I create a query to get native query results?
I found what's difference.
when using longSum type aggregation, I get result like native query.
So, I want to know how to query aggregate like below using sql.
"aggregations": [
{
"type": "longSum",
"name": "count",
"fieldName": "count"
}
]
I found solution.
Query like this.
SELECT TIME_FLOOR(__time, 'PT1H') AS t, sum("count") AS cnt FROM mydatasource GROUP BY 1
Given that your datasource has a "count" column, I'm assuming it comes from an ingestion that uses rollup. This means that the original raw rows have been aggregated and the "count" column contains the count of raw rows that were summarized into the each aggregate row.
The Native query is using the longSum function over the "count" column.
The original SQL you used, is just counting the aggregate rows.
So yes, the correct way to get the count of raw rows is SUM("count").

I want to combine json rows in t-sql into single json row

I have a table
id
json
1
{"url":"url2"}
2
{"url":"url2"}
I want to combine these into a single statement where the output is :
{
"graphs": [
{
"id": "1",
"json": [
{
"url": "url1"
}
]
},
{
"id": "2",
"json": [
{
"url": "url2"
}
]
}
]
}
I am using T-SQL, I've notice there is some stuff in postgres but can't find much on tsql.
Any help would be greatly appreciated..
You need to use JSON_QUERY on the json column to ensure it is not escaped.
SELECT
id,
JSON_QUERY('[' + json + ']')
FROM YourTable t
FOR JSON PATH, ROOT('graphs');
db<>fiddle

How to convert this sql query to mongodb

Considering this query written in sql server how would I efficiently convert it to mongodb:
select * from thetable where column1 = column2 * 2
You can use below aggregation.
You project a new field comp to calculate the expression value followed by $match to keep the docs with eq(0) value and $project with exclusion to drop comp field.
db.collection.aggregate([
{ $addFields: {"comp": {$cmp: ["$column1", {$multiply: [ 2, "$column2" ]} ]}}},
{ $match: {"comp":0}},
{ $project:{"comp":0}}
])
If you want to run your query in mongo Shell,
try below code,
db.thetable .find({}).forEach(function(tt){
var ttcol2 = tt.column2 * 2
var comapreCurrent = db.thetable.findOne({_id : tt._id,column1 : ttcol2});
if(comapreCurrent){
printjson(comapreCurrent);
}
});
I liked the answer posted by #Veeram but it would also be possible to achieve this using $project and $match pipeline operation.
This is just for understanding the flow
Assume we have the below 2 documents stored in a math collection
Mongo Documents
{
"_id" : ObjectId("58a055b52f67a312c3993553"),
"num1" : 2,
"num2" : 4
}
{
"_id" : ObjectId("58a055be2f67a312c3993555"),
"num1" : 2,
"num2" : 6
}
Now we need to find if num1 = 2 times of num2 (In our case the document with _id ObjectId("58a055b52f67a312c3993553") will be matching this condition)
Query:
db.math.aggregate([
{
"$project": {
"num2": {
"$multiply": ["$num2",1]
},
"total": {
"$multiply": ["$num1",2]
},
"doc": "$$ROOT"
}
},
{
"$project": {
"areEqual": {"$eq": ["$num2","$total"]
},
doc: 1
}
},
{
"$match": {
"areEqual": true
}
},
{
"$project": {
"_id": 1,
"num1": "$doc.num1",
"num2": "$doc.num2"
}
}
])
Pipeline operation steps:-
The 1st pipeline operation $project calculates the total
The 2nd pipeline operation $project is used to check if the total
matches the num2. This is needed as we cannot use the comparision
operation of num2 with total in the $match pipeline operation
The 3rd pipeline operation matches if areEqual is true
The 4th pipeline operation $project is just used for projecting the fields
Note:-
In the 1st pipeline operation I have multiplied num2 with 1 as num1 and num2 are stored as integers and $multiply returns double value. So incase I do not use $mutiply for num2, then it tries to match 4 equals 4.0 which will not match the document.
Certainly no need for multiple pipeline stages when a single $redact pipeline will suffice as it beautifully incorporates the functionality of $project and $match pipeline steps. Consider running the following pipeline for an efficient query:
db.collection.aggregate([
{
"$redact": {
"$cond": [
{
"$eq": [
"$column1",
{ "$multiply": ["$column2", 2] }
]
},
"$$KEEP",
"$$PRUNE"
]
}
}
])
In the above, $redact will return all documents that match the condition using $$KEEP and discards those that don't match using the $$PRUNE system variable.

Nest Elastic - Building Dynamic Nested Query

I have to query a nested object using Nest, however the query is built in dynamic way. Below is code that demonstrate using query on nested "books" in a static way
QueryContainer qry;
qry = new QueryStringQuery()
{
DefaultField = "name",
DefaultOperator = Operator.And,
Query = "salman"
};
QueryContainer qry1 = null;
qry1 = new RangeQuery() // used to search for range ( from , to)
{
Field = "modified",
GreaterThanOrEqualTo = Convert.ToDateTime("21/12/2015").ToString("dd/MM/yyyy"),
};
QueryContainer all = qry && qry1;
var results = elastic.Search<Document>(s => s
.Query(q => q
.Bool(qb => qb
.Must(all)))
.Filter(f =>
f.Nested(n => n
.Path("books")
.Filter(f3 => f3.And(
f1 => f1.Term("book.isbn", "122"),
f2 => f2.Term("book.author", "X"))
)
)
)
);
The problem is that i need to combine multiple queries (using And,OR operators) for "books" in dynamic fashion. For example, get the books that satisfy these set of conditions:
Condition 1: Books that has Author "X" and isbn "1"
Condition 2: Books that has Author "X" and isbn "2"
Condition 3: Books that has Author "Z" and isbn "3"
Other Condtions: .....
Now, the filter in the nested Query should retrieve books if:
Condition 1 AND Condition 2 Or Condition 3
Suppose that i have class name FilterOptions that contains the following attributes:
FieldName
Value
Operator (which will combine the next filter)
I am going to loop on the given FilterOptions array to build the query.
Question:
What should i use to build the nested query? Is it a FilterDesciptor and how to combine them add the nested query to the Search Method?
Please, recommend any valuable link or example?
I agree with paweloque, it seems your first two conditions are contradictory and wouldn't work if AND-ed together. Ignoring that, here's my solution. I've implemented this in such a way that allows for more than the three specific conditions you have. I too feel it would fit better in a bool statement.
QueryContainer andQuery = null;
QueryContainer orQuery = null;
foreach(var authorFilter in FilterOptions.Where(f=>f.Operator==Operator.And))
{
andQuery &= new TermQuery
{
Field = authorFilter.FieldName,
Value = authorFilter.Value
};
}
foreach(var authorFilter in FilterOptions.Where(f=>f.Operator==Operator.Or))
{
orQuery |= new TermQuery
{
Field = authorFilter.FieldName,
Value = authorFilter.Value
};
}
After that, in the .Nested call I would put:
.Path("books")
.Query(q=>q
.Bool(bq=>bq
.Must(m=>m.MatchAll() && andQuery)
.Should(orQuery)
))
In the specific case of the Condition 1 and Condition 2 you'd probably not get any results because these are exclusive conditions. But I assume now, that you want to get results which match either of those conditions. You've chosen nested which is definitely the way to go. With the nested type you can combine parameters for a single book.
Combining nested queries
For your use case I'd use bool query type with must or should clauses.
A query to get books for either Condition 1 or Condition 2 would be:
POST /books/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "books",
"query": {
"bool": {
"must": [
{
"match": {
"books.isbn": "2"
}
},
{
"match": {
"books.author": "X"
}
}
]
}
}
}
},
{
"nested": {
"path": "books",
"query": {
"bool": {
"must": [
{
"match": {
"books.isbn": "1"
}
},
{
"match": {
"books.author": "X"
}
}
]
}
}
}
}
]
}
}
}
Can you explain, why are your books nested? Without nesting them in a top structure but indexing directly as a top level object in an index/type you could simplify your queries.
Not-Analyzed
There is another caveat that you have to remind: If you want to have an exact match on the author and the ISBN you have to make sure that the ISBN and author fields are set to not_analyzed. Otherwise they get analyzed and splitted into parts and your match would'n work very well.
E.g. if you have a ISBN Number with dashes, then it would get split into parts:
978-3-16-148410-0
would become indexed as:
978
3
16
148410
0
And a search with exactly the same ISBN number would give you all the books which have one of the sub-numbers in their ISBN number. If you want to prevent this, use the not_analyzed index-type and Multi-fields:
"isbn": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
Then to address the not_analyzed isbn field you'd have to call it:
books.isbn.raw
Hope this helps.

elasticsearch match two fields

How can I get this simple SQL query running on elasticsearch ?
SELECT * FROM [mytype] WHERE a = -23.4807339 AND b = -46.60068
I'm really having troubles with it's syntax, multi-match queries doesn't work in my case, which query type should I use?
For queries like yours bool filter is preferred over and filter. See here the whole story about this suggestion and why is considered to be more efficient.
These being said, I would choose to do it like this:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{"term": {"a": -23.4807339}},
{"term": {"b": -46.60068}}
]
}
}
}
}
}
You can approach this with the and filter. For your example, something like:
{
"size": 100,
"query" : {"filtered" : {
"query" : {"match_all" : {}},
"filter" : {"and" : [
"term" : {"a": -23.4807339},
"term" : {"b": -46.60068}
]}
}}
}
Be sure to direct the query against the correct index and type. Note that I specified the size of the return set as 100 arbitrarily - you'd have to specify a value that fits your use case.
There's more on filtered queries here, and more on the and filter here.