Select with subtotals using postgres sql - sql

I've the following query:
select
json_build_object('id', i.id, 'task_id', i.task_id, 'time_spent', i.summary)
from
intervals I
where
extract(month from "created_at") = 10
and extract(year from "created_at") = 2021
group by
i.id, i.task_id
order by i.task_id
Which gives the following output:
json_build_object
{"id" : 53, "task_id" : 1, "time_spent" : "3373475"}
{"id" : 40, "task_id" : 1, "time_spent" : "3269108"}
{"id" : 60, "task_id" : 2, "time_spent" : "2904084"}
{"id" : 45, "task_id" : 4, "time_spent" : "1994341"}
{"id" : 38, "task_id" : 5, "time_spent" : "1933766"}
{"id" : 62, "task_id" : 5, "time_spent" : "2395378"}
{"id" : 44, "task_id" : 6, "time_spent" : "3304280"}
{"id" : 58, "task_id" : 6, "time_spent" : "3222501"}
{"id" : 48, "task_id" : 6, "time_spent" : "1990195"}
{"id" : 55, "task_id" : 7, "time_spent" : "1984300"}
How can I add subtotals of time_spent by each task?
I'd like to have an array structure of objects like this:
{
"total": 3968600,
"details:" [
{"id" : 55, "task_id" : 7, "time_spent" : "1984300"},
{"id" : 55, "task_id" : 7, "time_spent" : "1984300"}
]
}
How can I achieve it? Thank you!

You may try the following modification which groups your data based on the task_id and uses json_agg and json_build_object to produce your desired schema.
select
json_build_object(
'total', SUM(i.summary),
'details',json_agg(
json_build_object(
'id', i.id,
'task_id', i.task_id,
'time_spent', i.summary
)
)
) as result
from
intervals I
where
extract(month from "created_at") = 10
and extract(year from "created_at") = 2021
group by
i.task_id
order by i.task_id
See working demo fiddle online here

Related

Postgres how to sum (aggregate) all the values in a key/value json?

My query is here:
select datetime,array_to_json(array_agg(json_build_object('parameter',parameter,'channel_id',channel_id,'value',value,'status',status,'units',units))) as parameters
from dbp_istasyondata
where site_id=10
and channel_id IN (0,1,2,3,4)
and datetime between '2022-12-01T00:00:00' and '2022-12-01T01:30:00'
group by 1
order by 1;
This is how it came out as a response to this query:
"datetime" "parameters"
"2022-12-01 00:00:00" "[{""channel_id"" : 0, ""value"" : 7.72},{""channel_id"" : 1, ""value"" : 1593.87}]"
"2022-12-01 00:01:00" "[{""channel_id"" : 1, ""value"" : 1612.26},{""channel_id"" : 0, ""value"" : 7.72}]"
"2022-12-01 00:02:00" "[{""channel_id"" : 0, ""value"" : 7.72},{""channel_id"" : 1, ""value"" : 1615.36}]"
"2022-12-01 00:03:00" "[{""channel_id"" : 0, ""value"" : 7.72},{""channel_id"" : 1, ""value"" : 1625.99}]"
"2022-12-01 00:04:00" "[{""channel_id"" : 0, ""value"" : 7.71},{""channel_id"" : 1, ""value"" : 1623.12}]"
"2022-12-01 00:05:00" "[{""channel_id"" : 0, ""value"" : 7.72},{""channel_id"" : 1, ""value"" : 1638.58}]"
"2022-12-01 01:00:00" "[{""channel_id"" : 0, ""value"" : 7.74},{""channel_id"" : 1, ""value"" : 1647.09}]"
"2022-12-01 01:01:00" "[{""channel_id"" : 0, ""value"" : 7.74},{""channel_id"" : 1, ""value"" : 1656.71}]"
"2022-12-01 01:02:00" "[{""channel_id"" : 1, ""value"" : 1646.86},{""channel_id"" : 0, ""value"" : 7.74}]"
"2022-12-01 01:03:00" "[{""channel_id"" : 1, ""value"" : 1656.34},{""channel_id"" : 0, ""value"" : 7.74}]"
"2022-12-01 01:04:00" "[{""channel_id"" : 1, ""value"" : 1652.63},{""channel_id"" : 0, ""value"" : 7.74}]"
"2022-12-01 01:05:00" "[{""channel_id"" : 0, ""value"" : 7.74},{""channel_id"" : 1, ""value"" : 1648.01}]"
The result I want:
"datetime" "parameters"
"2022-12-01 00:00:00" "[{""channel_id"" : 0, ""value"" : 50},{""channel_id"" : 1, ""value"" : 1593.87}]"
"2022-12-01 01:00:00" "[{""channel_id"" : 1, ""value"" : 102348},{""channel_id"" : 0, ""value"" : 7.72}]"
The result I want is just an hourly collection of values and display as json. Is there a way to this?
As a result of my research, I found such a sum, but I could not get it to give me the answer I wanted. Can you help me ?
select date_trunc('hour', datetime), SUM (value) as total
from dbp_istasyondata
where channel_id=3
and site_id=16
and datetime between '2022-11-01T00:00:00' and '2022-12-01T04:00:00'
group by 1;
As I understand your question, you want a resultset with one row per hour, and a column holding a JSON array that gives the sum of values of each channel.
You would typically need two levels of aggregation ; one to compute the aggregates per hour and per channel, and then another to aggregate per hour only.
select datehour, jsonb_agg( to_jsonb(t) - 'datehour') res
from (
select date_trunc('hour', datetime) datehour, channel_id, sum(value) value
from bp_istasyondata
where site_id = 10
and channel_id in (0, 1, 2, 3, 4)
and datetime between '2022-12-01T00:00:00' and '2022-12-01T01:30:00'
group by 1, 2
) t
group by 1
Notes:
date_trunc truncates a timestamp or date to a given precision
to_jsonb(t) converts each recordset returned by the subquery to a JSONB object (from which we remove the datehour key using jsonb operator -)
aggregate function json_agg gathers all objects into a JSONB array

How input data in line chart in Pentaho?

I want to make this chart in Pentaho CDE:
based in this chart (I think that is the most similar from among CCC Components):
(The code is in this link.)
but I don't know how I can adapt my data input to that graph.
For example, I want to consume the data with this format:
[Year, customers_A, customers_B, cars_A, cars_B] [2014, 8, 4, 23, 20]
[2015, 20, 6, 30, 38]
How I can input my data in this chart?
Your data should come as an object such as this:
data = {
metadata: [
{ colName: "Year", colType:"Numeric", colIndex: 1},
{ colName: "customers_A", colType:"Numeric", colIndex: 2},
{ colName: "customers_B", colType:"Numeric", colIndex: 3},
{ colName: "cars_A", colType:"Numeric", colIndex: 4},
{ colName: "cars_B", colType:"Numeric", colIndex: 5}
],
resultset: [
[2014, 8, 4, 23, 20],
[2015, 20, 6, 30, 38]
],
queryInfo: {totalRows: 2}
}

How can I select records that match a value in a json field array?

I'm trying to return records matching an array element that equals a specific value from a json field.
I found how to select all records containing certain values from a postgres json field containig an array that is close to my question. However, I think the key difference is I use json and not jsonb. For reasons, I need to use json at the moment. When I tried the steps from that other post, I get the same error as below.
I have this example data
{"name": "Bob", "scores": [64, 66]}
{"name": "Sally", "scores": [66, 65]}
{"name": "Kurt", "scores": [69, 71, 72, 67, 68]}
{"name": "Libby", "scores": [72, 73, 74, 75]}
{"name": "Frank", "scores": [80, 81, 82, 83]}
I'm trying to run this query:
SELECT data
FROM tests.results
where (data->>'scores') #> '[72]';
I expect two rows from the results:
{"name": "Kurt", "scores": [69, 71, 72, 67, 68]}
{"name": "Libby", "scores": [72, 73, 74, 75]}
but I get:
SQL Error [42883]: ERROR: operator does not exist: text #> integer
Hint: No operator matches the given name and argument type(s).
You might need to add explicit type casts.
I currently using Postgres 10, but am likely upgrading to 12. Any help is appreciated.
You would need to use -> instead of ->>. The former returns a json object, while the latter returns text; meanwhile, the #> operator operates on json objects, not on text.
SELECT data
FROM tests.results
where (data->'scores')::jsonb #> '[72]';
Demo on DB Fiddle:
WITH results AS (
SELECT '{"name": "Bob", "scores": [64, 66]}'::json mydata
UNION ALL SELECT '{"name": "Sally", "scores": [66, 65]}'::json
UNION ALL SELECT '{"name": "Kurt", "scores": [69, 71, 72, 67, 68]}'::json
UNION ALL SELECT '{"name": "Libby", "scores": [72, 73, 74, 75]}'::json
UNION ALL SELECT '{"name": "Frank", "scores": [80, 81, 82, 83]}'::json
)
SELECT mydata::text
FROM results
where (mydata->'scores')::jsonb #> '[72]';
| mydata |
| ------------------------------------------------ |
| {"name": "Kurt", "scores": [69, 71, 72, 67, 68]} |
| {"name": "Libby", "scores": [72, 73, 74, 75]} |

Parsing string in a hive table

I have a hive table which has two columns (day, type_of_day) both of type string
"monday" [{"temp" : 45, "weather": "rainny"}, {"temp" : 25, "weather": "sunny"}, {"temp" : 15, "weather": "storm"}]
"tuesday" [{"temp" : 5, "weather": "winter"}, {"temp" : 10, "weather": "sun"}, {"temp" : 18, "weather": "dawn"}]
I wanna split ( I guess explode is the technical term) and then just get a list of weather for each day. I'm familiar with how to do this in python but is there a way to directly do this in hive.
"monday" [45, 25, 15]
"tuesday" [5, 10, 18]
Testing with your data example. Replace CTE with your table. Read comments in the code:
with your_table as (--use your table instead of this CTE
select stack(2,
"monday",'[{"temp" : 45, "weather": "rainny"}, {"temp" : 25, "weather": "sunny"}, {"temp" : 15, "weather": "storm"}]',
"tuesday" ,'[{"temp" : 5, "weather": "winter"}, {"temp" : 10, "weather": "sun"}, {"temp" : 18, "weather": "dawn"}]'
)as (day, type_of_day)
) --use your table instead of this CTE
select s.day, array(get_json_object(type_of_day_array[0],'$.temp'),
get_json_object(type_of_day_array[1],'$.temp'),
get_json_object(type_of_day_array[2],'$.temp')
) as result_array --extract JSON elements and construct array
from
(
select day, split(regexp_replace(regexp_replace(type_of_day,'\\[|\\]',''), --remove square brackets
'\\}, *\\{','\\}##\\{'), --make convenient split separator
'##') --split
as type_of_day_array
from your_table --use your table instead of this CTE
)s;
Result:
s.day result_array
monday ["45","25","15"]
tuesday ["5","10","18"]
If the array of JSON can contain more than three elements, then you can use lateral view explode or posexplode and then build the resulting array like in this answer: https://stackoverflow.com/a/51570035/2700344.
Wrap array elements in cast(... as int) if you need array<int> as a result instead of array<string>:
cast(get_json_object(type_of_day[0],'$.temp') as int)...

sed Pattern matching between two strings only display first match

I am trying to use the sed pattern match between two strings to parse a file and find the first match , using this first match , I am trying to perform some actions in a loop iteratively , the sed pattern match between two strings prints all matches , I am only looking to get the first match :
File :
},{
"prefix" : "AD",
"prefix" : "CQ",
"last" : 0,
"last" : 0,
"month" : 0,
"month" : 5,
"today": 0,
"today": 0,
"yesterday": 2,
"yesterday": 0,
"agents": 0
},{
"prefix" : "CS",
"prefix" : "AE",
"last" : 1,
"last" : 0,
"month" : 130,
"month" : 0,
"today": 0,
"today": 20,
"yesterday": 0,
"yesterday": 38,
"agents": 0
},{
"prefix" : "AF",
"prefix" : "CZ",
"last" : 0,
"last" : 0,
"month" : 6,
I am trying to extract between prefix and agents , but only the first match using the below sed command:
sed -n '/prefix/,/agents/p' /var/saas/stats/usage_1499245200.json.2 >> /var/saas/stats/try
Is there a way I can only extract first match from the file during first iteration usage_1499245200.json.2 and execute the loop .
Thanks,
Sriram.V
Using awk :
[akshay#localhost test]$ awk '/prefix/{found=1}found;/agents/{exit}' infile
"prefix" : "AD",
"prefix" : "CQ",
"last" : 0,
"last" : 0,
"month" : 0,
"month" : 5,
"today": 0,
"today": 0,
"yesterday": 2,
"yesterday": 0,
"agents": 0
Input
[akshay#localhost test]$ cat infile
},{
"prefix" : "AD",
"prefix" : "CQ",
"last" : 0,
"last" : 0,
"month" : 0,
"month" : 5,
"today": 0,
"today": 0,
"yesterday": 2,
"yesterday": 0,
"agents": 0
},{
"prefix" : "CS",
"prefix" : "AE",
"last" : 1,
"last" : 0,
"month" : 130,
"month" : 0,
"today": 0,
"today": 20,
"yesterday": 0,
"yesterday": 38,
"agents": 0
},{
"prefix" : "AF",
"prefix" : "CZ",
"last" : 0,
"last" : 0,
"month" : 6,
you could add a command to quit when agents is encountered:
sed -n -e '/prefix/,/agents/p' -e '/agents/q'
result:
"prefix" : "AD",
"prefix" : "CQ",
"last" : 0,
"last" : 0,
"month" : 0,
"month" : 5,
"today": 0,
"today": 0,
"yesterday": 2,
"yesterday": 0,
"agents": 0