Ignore data from stream analytics input array containing a certain value - azure-stream-analytics

Using this example data:
{
"device":"TestDevice",
"id":30,
"flags":[ "New", "Ignore"]
}
I want to select all data without the flag "Ignore", I got it working with an udf:
SELECT
device, id, flags
FROM input
WHERE udf.ArrayContains(flags, "Ignore) = 0
Is it possible to do this without an user defined function?

This will do the trick
with cte as
(
select
i.*
from
localInput as i
outer APPLY
getarrayelements(i.flags) ae
where
ae.ArrayValue != 'Ignore'
or
getarraylength(i.flags) = 0
)
select
c.id,
c.device,
c.flags
from
cte c
group by
c.id,
c.device,
c.flags,
System.Timestamp
having
count(*) = getarraylength(c.flags)
or
getarraylength(c.flags) = 0
I tested it with the following sample data:
{"device":"TestDevice1","id":1,"flags":[ "New", "Ignore"]}
{"device":"TestDevice2","id":2,"flags":[ "New"]}
{"device":"TestDevice3","id":3,"flags":[ "Ignore"]}
{"device":"TestDevice2","id":4,"flags":[ "Something", "Else"]}
{"device":"TestDevice2","id":5,"flags":[]}

Related

bigquery joining a field and json unnested resulted only left hand side table

Need help to get the select statement of normal text record and json unnest answers.
I am getting only left hand normal text record. Am I missing some thing?.
#standardSQL
CREATE TEMP FUNCTION jsonunnest(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(j=>JSON.stringify(j));
""";
WITH `Impact_JSON` AS (
SELECT
Impact_Question_id,
Impact_Question_text, json,
ROW_NUMBER() OVER (PARTITION BY bdmp_id, DATE(Impact_Question_aktualisiert_am_ts)
ORDER BY
Impact_Question_aktualisiert_am_ts DESC) AS ROW
FROM
`<project.dataset.table` basetable
),
json_answers AS (
SELECT
regexp_replace(SPLIT(ANY_VALUE(JSON_EXTRACT_SCALAR(Impact, '$.Impact_antwort_id')),'_')[SAFE_OFFSET(1)], "[^0-9]+"," " ) AS Interview_ID,
regexp_replace(SPLIT(ANY_VALUE(JSON_EXTRACT_SCALAR(Impact, '$.Impact_antwort_id')),'_')[SAFE_OFFSET(3)], "[^0-9]+"," " ) AS Quest_ID,
STRING_AGG(DISTINCT(JSON_EXTRACT_SCALAR(Impact, '$.Impact_antwort_id')), ',\n')
AS Impact_antwort_id,
STRING_AGG(DISTINCT(JSON_EXTRACT_SCALAR(Impact, '$.Impact_antwort_daten_typ')),',\n')
AS Impact_reply_data_type,
IFNULL(JSON_EXTRACT_SCALAR(Impact, '$.Impact_topic_text'), 'Empty') AS Impact_topic_text,
IFNULL(JSON_EXTRACT_SCALAR(Impact, '$.Impact_reply_get'), 'Empty') AS Impact_reply_get,
FROM `Impact_JSON`,
UNNEST(jsonunnest(JSON_EXTRACT(json, '$.reply'))) Impact
GROUP by 5,6
),
Impact_Question_id_TBL AS (
select Impact_Question_id from `Impact_JSON` AS C
)
SELECT
Impact_Question_id
FROM
`Impact_JSON` AS T
left join
json_answers as J
ON
(SAFE_CAST(J.Interview_ID as INT64))
=
T.Impact_Question_id
The left hand side records and right hand side records in same table should be captured.
What do you expect as output?
For me the udf did not work. Otherwise I generated some sample data and shorten your query for testing. This had been a good starting point for a question!
Also I changed the join to a full outer join.
For each dataset there is a number in column Impact_Question_id. There is json column containing a nested array data structure. The json data is unnested and group by the column Impact_reply_get. The first part of the column Impact_antwort_id is extracted and named Interview_ID. Because of the grouping, you select a random value. Then you join by this to the master table to the column Impact_Question_id.
The random selecting by any_value of the column Impact_antwort_id (Interview_ID) could be the issue for the mismatch. I would also group by this value and risk double matches.
#standardSQL
CREATE TEMP FUNCTION jsonunnest(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
try{
return [].concat(JSON.parse(input) || [] ).map(j=>JSON.stringify(j));
} catch(e) {return ["no",JSON.parse(input)]}
""";
WITH
basetable as (Select row_number() over () Impact_Question_id,
"txt" as Impact_Question_text,
json
#1 as bdmp_id,
#current_date() Impact_Question_aktualisiert_am_ts,
from unnest([ '{"reply":[{"Impact_antwort_id":"anytext_2","Impact_reply_get":"ok test 2"}]}','{"reply":[{"Impact_antwort_id":"anytext_3"}]}']) as json
),
`Impact_JSON` AS (
SELECT
Impact_Question_id,
Impact_Question_text, json,
#ROW_NUMBER() OVER (PARTITION BY bdmp_id, DATE(Impact_Question_aktualisiert_am_ts)
# ORDER BY
# Impact_Question_aktualisiert_am_ts DESC) AS ROW
FROM
basetable
),
json_answers AS (
SELECT
regexp_replace(SPLIT(ANY_VALUE(JSON_EXTRACT_SCALAR(Impact, '$.Impact_antwort_id')),'_')[SAFE_OFFSET(1)], "[^0-9]+"," " ) AS Interview_ID,
# regexp_replace(SPLIT(ANY_VALUE(JSON_EXTRACT_SCALAR(Impact, '$.Impact_antwort_id')),'_')[SAFE_OFFSET(3)], "[^0-9]+"," " ) AS Quest_ID,
# STRING_AGG(DISTINCT(JSON_EXTRACT_SCALAR(Impact, '$.Impact_antwort_id')), ',\n')
# AS Impact_antwort_id,
#STRING_AGG(DISTINCT(JSON_EXTRACT_SCALAR(Impact, '$.Impact_antwort_daten_typ')),',\n')
# AS Impact_reply_data_type,
# IFNULL(JSON_EXTRACT_SCALAR(Impact, '$.Impact_topic_text'), 'Empty') AS Impact_topic_text,
IFNULL(JSON_EXTRACT_SCALAR(Impact, '$.Impact_reply_get'), 'Empty') AS Impact_reply_get,
string_agg(Impact) as impact_parsed_json,
FROM `Impact_JSON`,
UNNEST(jsonunnest(JSON_EXTRACT(json, '$.reply'))) Impact
GROUP by 2 #5,6
)
SELECT
*
FROM
`Impact_JSON` AS T
full join
json_answers as J
ON
(SAFE_CAST(J.Interview_ID as INT64))
=
T.Impact_Question_id

Hierarchically aggregate JSON depending on value in row using PostgreSQL10

I have a PostgreSQL 10 table that works as a "dictionary" and is structured as follows:
key
value
style_selection_color
style_selection_weight
style_line_color
style_line_weight
...
Now I was wondering if there is a way of building a JSON with the values in the table where it would build a hierarchy depending on the value of "key"?
Something like:
style --> selection --> color and
style --> line --> color
Ending up with a JSON:
{
style: [
selection: {
color: "...",
weight: "..."
},
line: {
color: "...",
weight: "..."
}
]
}
Is such a feat achievable? If so, how would I go about it?
Could it be done so that regardless of what keys I have in my table it always returns the JSON properly built?
Thanks in advance
Working solution with PosGres 10 and above
I propose you a generic solution which convert the key data into text[] type so that it can be used as jsonpath inside the standard jsonb_set() function.
But as we will iterate on the jsonb_set() function, we need first to create an aggregate function associated to that function :
CREATE AGGREGATE jsonb_set_agg(p text[], z jsonb, b boolean)
( sfunc = jsonb_set
, stype = jsonb
, initcond = '{}'
)
Then we convert the key data into text[] and we automatically generate the list of jsonpath that will allow to build progressively and iteratively the final jsonb data :
SELECT i.id
, max(i.id) OVER (PARTITION BY t.key) AS id_max
, p.path[1 : i.id] AS jsonbpath
, to_jsonb(t.value) AS value
FROM mytable AS t
CROSS JOIN LATERAL string_to_array(t.key, '_') AS p(path)
CROSS JOIN LATERAL generate_series(1, array_length(p.path, 1)) AS i(id)
The final query looks like this :
WITH list AS
( SELECT i.id
, max(i.id) OVER (PARTITION BY t.key) AS id_max
, p.path[1 : i.id] AS jsonpath
, to_jsonb(t.value) AS value
FROM mytable AS t
CROSS JOIN LATERAL string_to_array(t.key, '_') AS p(path)
CROSS JOIN LATERAL generate_series(1, array_length(p.path, 1)) AS i(id)
)
SELECT jsonb_set_agg( l.jsonpath
, CASE
WHEN l.id = l.id_max THEN l.value
ELSE '{}' :: jsonb
END
, true
ORDER BY l.id
)
FROM list AS l
And the result is slightly different from your expectation (the top-level json array is replaced by a json object) but it sounds like more logic to me :
{"style": {"line": {"color": "C"
, "weight": "D"
}
, "selection": {"color": "A"
, "weight": "B"
}
}
}
full test result in dbfiddle.
Well, I am not sure about Postgres version, hoping this would work on your version, I tried this on version 11.
;WITH dtbl as (
select split_part(tbl.col, '_', 1) as style,
split_part(tbl.col, '_', 2) as cls,
split_part(tbl.col, '_', 3) as property_name,
tbl.val
from (
select 'style_selection_color' as col, 'red' as val
union all
select 'style_selection_weight', '1rem'
union all
select 'style_line_color', 'gray'
union all
select 'style_line_weight', '200'
union all
select 'stil_line_weight', '200'
) as tbl
),
classes as (
select dtbl.style,
dtbl.cls,
(
SELECT json_object_agg(
nested_props.property_name, nested_props.val
)
FROM (
SELECT dtbl2.property_name,
dtbl2.val
FROM dtbl dtbl2
where dtbl2.style = dtbl.style
and dtbl2.cls = dtbl.cls
) AS nested_props
) AS properties
from dtbl
group by dtbl.style, dtbl.cls),
styles as (
select style
from dtbl
group by style
)
,
class_obj as (
select classes.style,
classes.cls,
json_build_object(
classes.cls, classes.properties) as cls_json
from styles
join classes on classes.style = styles.style
)
select json_build_object(
class_obj.style,
json_agg(class_obj.cls_json)
)
from class_obj
group by style
;
If you change the first part of the query to match your table and column names this should work.
The idea is to build the json objects nested, but you cannot do this on one pass, as it does not let you nest json_agg functions, this is why we have to use more than 1 query. first build line and selection objects then aggregate them in the style objects.
Sorry for the naming, this is the best I could do.
EDIT1:
This is the output of that query.
"{""stil"" : [{""line"" : [{""weight"" : ""200""}]}]}"
"{""style"" : [{""selection"" : [{""color"" : ""red""}, {""weight"" : ""1rem""}]}, {""line"" : [{""color"" : ""gray""}, {""weight"" : ""200""}]}]}"
Looking at this output, it is not what exactly you wanted, you got an array of objects for properties:)
You wanted {"color":"red", "weight": "1rem"} but the output is
[{"color":"red"}, {"weight": "1rem"}]
EDIT2:
Well, json_object_agg is the solution, so I combined json_object_agg to build the prop objects, now I am thinking this might be made even more simpler.
This is the new output from the query.
"{""stil"" : [{""line"" : { ""weight"" : ""200"" }}]}"
"{""style"" : [{""selection"" : { ""color"" : ""red"", ""weight"" : ""1rem"" }}, {""line"" : { ""color"" : ""gray"", ""weight"" : ""200"" }}]}"
This is trimmed down version, as I thought json_object_agg made things a bit more simpler, so I got rid off some subselects. Tested on postgres 10.
https://www.db-fiddle.com/f/tjzNBoQ3LTbECfEWb9Nrcp/0
;
WITH dtbl as (
select split_part(tbl.col, '_', 1) as style,
split_part(tbl.col, '_', 2) as cls,
split_part(tbl.col, '_', 3) as property_name,
tbl.val
from (
select 'style_selection_color' as col, 'red' as val
union all
select 'style_selection_weight', '1rem'
union all
select 'style_line_color', 'gray'
union all
select 'style_line_weight', '200'
union all
select 'stil_line_weight', '200'
) as tbl
),
result as (
select dtbl.style,
dtbl.cls,
json_build_object(dtbl.cls,
(
SELECT json_object_agg(
nested_props.property_name, nested_props.val
)
FROM (
SELECT dtbl2.property_name,
dtbl2.val
FROM dtbl dtbl2
where dtbl2.style = dtbl.style
and dtbl2.cls = dtbl.cls
) AS nested_props
)) AS cls_json
from dtbl
group by dtbl.style, dtbl.cls)
select json_build_object(
result.style,
json_agg(result.cls_json)
)
from result
group by style
;
You can think of dtbl as your main table, I just added a bonus row called stil similar to other rows, to make sure that grouping is correct.
Here is the output;
{"style":
[{"line":{"color":"gray", "weight":"200"}},
{"selection":{"color":"red","weight":"1rem"}}]
}
{"stil":[{"line":{"weight":"200"}}]}

Using COUNTIF inside aggregation function

Im making a betting system app. I need to count the points of a user based on his bets, knowing that some of the bets can be 'combined', which makes the calcul a bit more complex than a simple addition.
So if i have 3 bets: {points: 3, combined: false}, {points: 5, combined: true}, {points: 10, combined: true}, there are two bets combined here, so the total points should be 3 + (5 * 2) + (10 * 2). Reality is a bit more complex since the points are not directly in the bet object but in the match it refers to
Here is a part of my query. As you can see, i first check if the bet is right based on the match result, in that case if the bet is combined I multiply it by the value of combinedLength, else i'll just sum the value of that bet. I tried to replicate the COUNTIF inside the CASE, which gaves me an error like 'cannot aggregation inside aggregation'.
SELECT
JSON_EXTRACT_SCALAR(data, '$.userId') AS userId,
COUNTIF(JSON_EXTRACT_SCALAR(data, '$.combined') = 'true') AS combinedLength,
SUM (
(
CASE WHEN JSON_EXTRACT_SCALAR(data, '$.value') = match.result
THEN IF(JSON_EXTRACT_SCALAR(data, '$.combined') = "true", match.odd * combinedLength, match.odd)
ELSE 0
END
)
) AS totalScore,
FROM data.user_bets_raw_latest
INNER JOIN matchLines ON matchLines.match.matchId = JSON_EXTRACT(data, '$.fixtureId')
GROUP BY userId
I've been looking for days... thanks so much for the help !
If I follow you correctly, then you need to count the total number of combined bets per user in a subquery, using a window function. Then, you can use this information while aggregating.
Consider:
select
user_id,
sum(case when combined = 'true' then odd * cnt_combined else odd end) total_score
from (
select
u.*,
m.match.odd,
countif(b.combined = 'true') over(partition by userid) as cnt_combined,
from (
select
json_extract_scalar(data, '$.userid') userid,
json_extract_scalar(data, '$.combined') combined,
json_extract_scalar(data, '$.value') value,
json_extract_scalar(data, '$.fixtureid') fixtureid
from data.user_bets_raw_latest
) b
left join matchlines m
on m.match.matchid = b.fixtureid
and m.match.result = b.value
) t
group by userid
I find that it is simpler to use a left join and put the condition on the match result in there.
I moved the json extractions to a subquery to reduce the length of the query.

Redshift Postgresql - How to Parse Nested JSON

I am trying to parse a JSON text using JSON_EXTRACT_PATH_TEXT() function.
JSON sample:
{
"data":[
{
"name":"ping",
"idx":0,
"cnt":27,
"min":16,
"max":33,
"avg":24.67,
"dev":5.05
},
{
"name":"late",
"idx":0,
"cnt":27,
"min":8,
"max":17,
"avg":12.59,
"dev":2.63
}
]
}
'
I tried JSON_EXTRACT_PATH_TEXT(event , '{"name":"late"}', 'avg') function to get 'avg' for name = "late", but it returns blank.
Can anyone help, please?
Thanks
This is a rather complicated task in Redshift, that, unlike Postgres, has poor support to manage JSON, and no function to unnest arrays.
Here is one way to do it using a number table; you need to populate the table with incrementing numbers starting at 0, like:
create table nums as
select 0 i union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 n union all select 6
union all select 7 union all select 8 union all select 9
;
Once the table is created, you can use it to walk the JSON array using json_extract_array_element_text(), and check its content with json_extract_path_text():
select json_extract_path_text(item, 'avg') as my_avg
from (
select json_extract_array_element_text(t.items, n.i, true) as item
from (
select json_extract_path_text(mycol, 'data', true ) as items
from mytable
) t
inner join nums n on n.i < json_array_length(t.items, true)
) t
where json_extract_path_text(item, 'name') = 'late';
You'll need to use json_array_elements for that:
select obj->'avg'
from foo f, json_array_elements(f.event->'data') obj
where obj->>'name' = 'late';
Working example
create table foo (id int, event json);
insert into foo values (1,'{
"data":[
{
"name":"ping",
"idx":0,
"cnt":27,
"min":16,
"max":33,
"avg":24.67,
"dev":5.05
},
{
"name":"late",
"idx":0,
"cnt":27,
"min":8,
"max":17,
"avg":12.59,
"dev":2.63
}]}');

need to use update the alternate rows of the data getting by the below query. Not able to use widows function in update statement ms sql server

I need some help with below query: I want to update every alternate row of a table given some conditions, which includes multiple tables I am not able to use windows function under update how can I modify this query to work
UPDATE loanacct
SET
collection_officer_no =
(
CASE
WHEN
ROW_NUMBER()OVER (ORDER BY acctrefno) %2 = 0
THEN
4
ELSE
7
END
)
WHERE acctrefno in
(
SELECT
[acctrefno]
FROM
[NLS].[dbo].[loanacct] L
INNER JOIN nlsusers U ON U.userno = L.collection_officer_no
WHERE
U.username like 'house' AND
L.loan_group_no in ( '2', '4', '5') AND`enter code here`
L.days_past_due > 25 AND
status_code_no = 0)
You can use a updatable CTE. This is pseudo-SQL, but should get you on the right path:
WITH CTE AS(
SELECT {YourColumns},
ROW_NUMBER() OVER (/* PARTITION BY ??? */ ORDER BY {Column} AS RN
FROM YourTable
WHERE ...
)
UPDATE CTE
SET ...
WHERE RN % 2 = 0;