Query with a logical expression comparing field values - sql

I need to do a simple mongo query which resembles like this SQL
Select * from insights where category = 1 and param_count > param_mean + 1

Ideally you would use $redact as an aggregation expression for this, coupled with an initial $match to at least possibly use an index for the non-calculated expression:
db.collection.aggregate([
{ "$match": { "category": 1 } },
{ "$redact": {
"$cond": {
"if": { "$gt": [ "$param_count", { "$add": [ "$param_mean", 1 ] } ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
If your MongoDB "server" version is less than 2.6 without the $redact operator, then you can use $where which evaluates a JavaScript expression to boolean to return results:
db.collection.find({
"category": 1,
"$where": "this.param_count > this.param_mean + 1"
})
Which while shorter in syntax, it takes considerable more processing time due to the need to evaluate the JavaScript expression.
Where possible, then you should use $redact, or avoid calculations altogether and store the calculated evaluation in the document instead. That last statement is true for "all" databases really.

Related

JsonSchema number validation with multiple ranges

Is there a supported way (without using the anyOf keyword) to specify multiple ranges for a number in JsonSchema?
E.g., a value can either be in the range of 0-10 or 50-100 (the range of 10 < x < 50 is considered invalid).
The anyOf keyword can be used as follows:
{
"anyOf": [
{
"type": "number",
"minimum": 0,
"maximum": 10
},
{
"type": "number",
"minimum": 5,
"maximum": 100
}
]
}
Additionally, if the only allowed values were whole integers, I could use an enum and actually hand-specify each allowed number, but obviously that's less ideal than specifying ranges.
So, just wondering if there is a way to accomplish this with something like a "restrictions" keyword:
//NOTE: the below is not actually supported (I don't think), just using it as an example of what I'm interested in
{
"type": "number",
"restrictions": [
{
"minimum": 0,
"maximum": 10
},
{
"minimum": 50,
"maximum": 100
}
]
}
Also, for those wondering why if anyOf is available, it's that I have some custom tooling to maintain and supporting anyOf would be more of a lift than something that is specific to numeric validation.
The scenario you describe is exactly why anyOf exists. So no, if you want to express a logical OR, you need to implement the keyword that implements that. I don't see why adding a new custom keyword would make things any easier.

PostgreSQL how to query jsonb by a value?

SELECT * FROM some_table;
I can query to get the following results:
{
"sku0": {
"Id": "18418",
"Desc": "yes"
},
"sku1": {
"Id": "17636",
"Desc": "no"
},
"sku2": {
"Id": "206714",
"Desc": "yes"
},
"brand": "abc",
"displayName": "something"
}
First, the number of skus is not fixed. It may be sku0, sku1, sku2, sku3, sku4 ... but they all start with sku.
Then, I want to query Id with 17636 and determine whether its value of Desc is yes or no. After reading the PostgreSQL JSON Functions and Operators documentation, Depressing I didn't find a good way.
I can convert the result into a Python dictionary, and then use python's method can easily achieve my requirements.
If the requirements can also be achieved with postgresql statements, which method is more recommended than the Python dictionary?
I am not sure I completely understand what the result is you want. But if you want to filter on the Id, you need to unnest all the elements inside the JSON column:
select d.v ->> 'Desc' as description
from the_table t
cross join jsonb_each(t.data) as d(k,v)
where d.v ->> 'Id' = '17636'
You could use the new jsonpath notation of PostgreSQL v12:
SELECT data ## '$.* ? (#.Id == "17636").Desc == "yes"'
FROM some_table;
That will start with the root of data ($), find any attribute in it (*), filter only those attributes that contain an Id with value "17636", get their Desc attribute and return TRUE only if that attribute is "yes".
Nice, isn't it?
This will probably give you what you need.
select value->>'Desc' from jsonb_each('{
"sku0": {
"Id": "18418",
"Desc": "yes"
},
"sku1": {
"Id": "17636",
"Desc": "no"
},
"sku2": {
"Id": "206714",
"Desc": "yes"
},
"brand": "abc",
"displayName": "something"
}'::jsonb)
where key like 'sku%'
and value->>'Id'='17636'
Best regards,
Bjarni

How to "zip" multiple nested JSON arrays without using id key?

I'm trying to merge some nested JSON arrays without looking at the id. Currently I'm getting this when I make a GET request to /surveyresponses:
{
"surveys": [
{
"id": 1,
"name": "survey 1",
"isGuest": true,
"house_id": 1
},
{
"id": 2,
"name": "survey 2",
"isGuest": false,
"house_id": 1
},
{
"id": 3,
"name": "survey 3",
"isGuest": true,
"house_id": 2
}
],
"responses": [
{
"question": "what is this anyways?",
"answer": "test 1"
},
{
"question": "why?",
"answer": "test 2"
},
{
"question": "testy?",
"answer": "test 3"
}
]
}
But I would like to get it where each survey has its own question and answers so something like this:
{
"surveys": [
{
"id": 1,
"name": "survey 1",
"isGuest": true,
"house_id": 1
"question": "what is this anyways?",
"answer": "test 1"
}
]
}
Because I'm not going to a specific id I'm not sure how to make the relationship work. This is the current query I have that's producing those results.
export function getSurveyResponse(id: number): QueryBuilder {
return db('surveys')
.join('questions', 'questions.survey_id', '=', 'surveys.id')
.join('questionAnswers', 'questionAnswers.question_id', '=', 'questions.id')
.select('surveys.name', 'questions.question', 'questions.question', 'questionAnswers.answer')
.where({ survey_id: id, question_id: id })
}
Assuming jsonb in current Postgres 10 or 11, this query does the job:
SELECT t.data, to_jsonb(s) AS new_data
FROM t
LEFT JOIN LATERAL (
SELECT jsonb_agg(s || r) AS surveys
FROM (
SELECT jsonb_array_elements(t.data->'surveys') s
, jsonb_array_elements(t.data->'responses') r
) sub
) s ON true;
db<>fiddle here
I unnest both nested JSON arrays in parallel to get the desired behavior of "zipping" both directly. The number of elements in both nested JSON arrays has to match or you need to do more (else you lose data).
This builds on implementation details of how Postgres deals with multiple set-returning functions in a SELECT list to make it short and fast. See:
What is the expected behaviour for multiple set-returning functions in select clause?
One could be more explicit with a ROWS FROM expression, which works properly since Postgres 9.4:
SELECT t.data
, to_jsonb(s) AS new_data
FROM tbl t
LEFT JOIN LATERAL (
SELECT jsonb_agg(s || r) AS surveys
FROM ROWS FROM (jsonb_array_elements(t.data->'surveys')
, jsonb_array_elements(t.data->'responses')) sub(s,r)
) s ON true;
The manual about combining multiple table functions.
Or you could use WITH ORDINALITY to get original order of elements and combine as you wish:
PostgreSQL unnest() with element number

Use sprintf syntax inside logstash's sprintf syntax

For the below data structure:
{
"sprints": [
{
"id": 17193,
"name": "Sprint 12"
},
{
"id": 16510,
"name": "Sprint 11"
}
],
"velocityStatEntries": {
"16510": {
"estimated": {
"value": 49
},
"completed": {
"value": 36
}
},
"17193": {
"estimated": {
"value": 52
},
"completed": {
"value": 70
}
}
}
}
Given this, I want to be able to produce an Elasticsearch object that's easier to handle, by adding the values of the Estimated and Completed fields to the sprints with their matching IDs.
Ideally, I would like to handle this without writing Ruby, but I am not finding a logstash-native solution that handles this scnenario.
First, I split the data on the sprints field using split, so, I only have a single sprints object, and can use [sprints][id] to know what sprint I'm processing.
Then, I have attempted to work with the mutate filter, in one of two ways:
- using merge to add the [velocityStateEntries][] object to the
current sprint
- using add_field to add the two fields I need
Syntactically, is this possible? Ideally, I would want to be able to do a 'double substitution' of sorts, obtaining the estimated time for the current sprint something like:
add_field => {
"estimatedTime" => "%{[velocityStatEntries][%{[sprints][id]}][estimated][value]}"
}
but this only seems to work with a hardcoded format such as "estimatedTime" => "%{[velocityStatEntries][1234][estimated][value]}"
Do I have to use the Ruby format for this?
For what it's worth, the Ruby solution is very simple:
ruby {
code => "
sprintId = event.get('[sprints][id]');
estimated = event.get('[velocityStatEntries]['+(sprintId).to_s+'][estimated][value]');
completed = event.get('[velocityStatEntries]['+(sprintId).to_s+'][completed][value]');
event.set('[sprints][estimatedUnits]', estimated);
event.set('[sprints][completedUnits]', completed);
"
}

How to perform a SELECT in the results returned from a GROUP BY Druid?

I am having a hard time converting this simple SQL Query below into Druid:
SELECT country, city, Count(*)
FROM people_data
WHERE name="Mary"
GROUP BY country, city;
So I came up with this query so far:
{
"queryType": "groupBy",
"dataSource" : "people_data",
"granularity": "all",
"metric" : "num_of_pages",
"dimensions": ["country", "city"],
"filter" : {
"type" : "and",
"fields" : [
{
"type": "in",
"dimension": "name",
"values": ["Mary"]
},
{
"type" : "javascript",
"dimension" : "email",
"function" : "function(value) { return (value.length !== 0) }"
}
]
},
"aggregations": [
{ "type": "longSum", "name": "num_of_pages", "fieldName": "count" }
],
"intervals": [ "2016-07-20/2016-07-21" ]
}
The query above runs but it doesn't seem like groupBy in the Druid datasource is even being evaluated since I see people in my output with names other than Mary. Does anyone have any input on how to make this work?
Simple answer is that you cannot select arbitrary dimensions in your groupBy queries.
Strictly speaking even SQL query does not make sense. If for a given combination of country, city there are many different values of name and street, then how do you squeeze that into a single row? You have to aggregate them, e.g. by using max function.
In this case you can include the same column in your data as both dimension and metric, e.g. name_dim and name_metric, and include corresponding aggregation over your metric, max(name_metric).
Please note, that if these columns, name etc, have high granularity values, then that will kill Druid's roll-up feature.