SQL - Group By based on sequence - Oracle/Postgre

SQL - Group By based on sequence - Oracle/Postgre - sql

I need some help.
To perform a group by based on a sequence.
I'm using Oracle or Postgres.
I have the following scenario. The ID_SEQ is based on equip_id, Day and Stat. Creating a sequence.
I need to group the intervals between these sequences.
Example:
EQUIP_ID DAY STAT DATE ID_SEQ
JSTD123 19/06/2017 ON 19/06/2017 16:39 1
JSTD123 19/06/2017 OFF 19/06/2017 16:41 1
JSTD123 01/07/2017 ON 01/07/2017 13:50 1
JSTD123 01/07/2017 OFF 01/07/2017 13:51 1
JSTD123 01/07/2017 OFF 01/07/2017 14:40 2
JSTD123 01/07/2017 ON 01/07/2017 15:20 1
JSTD123 01/07/2017 ON 01/07/2017 15:20 2
JSTD123 01/07/2017 ON 01/07/2017 15:22 3
JSTD123 01/07/2017 ON 01/07/2017 15:22 4
JSTD123 01/07/2017 ON 01/07/2017 15:23 5
JSTD123 01/07/2017 ON 01/07/2017 15:26 6
JSTD123 01/07/2017 ON 01/07/2017 15:26 7
I would like to have the following result:
EQUIP_ID DATE STAT START END
JSTD123 19/06/2017 ON 19/06/2017 16:39 19/06/2017 16:39
JSTD123 19/06/2017 OFF 19/06/2017 16:41 19/06/2017 16:41
JSTD123 01/07/2017 ON 01/07/2017 13:50 01/07/2017 13:50
JSTD123 01/07/2017 OFF 01/07/2017 13:51 01/07/2017 14:40
JSTD123 01/07/2017 ON 01/07/2017 15:20 01/07/2017 15:26
I can't get a similar output.

I think this is what you are trying to do. Grouping consecutive rows with the same stat on a given day and getting the min date and max date of that group.
The logic is to assign groups by getting the previous value of stat (per equip_id and day) using lag and then using a running sum to reset when a new stat value is encountered. After this group assignment is done, you can just use group by to get the min and max date per equip_id,stat,day and grp.
SELECT equip_id,
day,
stat,
min(date),
max(date)
FROM
(SELECT t.*,
sum(col) over(partition BY equip_id,day ORDER BY date) AS grp
FROM
(SELECT t.*,
CASE WHEN stat=lag(stat) over(partition BY equip_id,day ORDER BY date) THEN 0 ELSE 1 END AS col
FROM t
) t
) t
GROUP BY equip_id,day,stat,grp
Sample Demo

Related

Getting wrong(?) average when calculating values in a time range

I am working with AWS Redshift / PostgreSQL. I have two tables that can be joined on the interval_date (DATE data_type) and interval_time_utc (VARCHAR data type) and/or the status and price_source columns. Source A is equivalent to the Y status and Source B is equivalent to the N status. I am trying to get the average price and the sum of mw_power for a given hour for each status / price_source. An hour is the timestamps from XX:05 to XX:00 so for 15:00, the values should be from the 14:05 to the 15:00 timestamps. Even if for an hour interval where all status are one value, I still need to calculate the average price for both price_source values, but the sum of mw_power would be 0. I am passing in the date and time intervals through my application code. I am seeing a different average price for the 15:00 hour than I expect so either I am bad at math or there is a bug in my query I can't determine. The 14:00 and 16:00 hour results come back as expected.
power_table
interval_date
interval_time_utc
mw_power
status
2022-05-09
13:00
92.25
N
2022-05-09
13:05
90.75
N
2022-05-09
13:10
91.25
N
2022-05-09
13:15
92.00
N
2022-05-09
13:20
92.00
N
2022-05-09
13:25
90.00
N
2022-05-09
13:30
93.00
N
2022-05-09
13:35
91.75
N
2022-05-09
13:40
90.25
N
2022-05-09
13:45
93.00
N
2022-05-09
13:50
91.00
N
2022-05-09
13:55
94.00
N
2022-05-09
14:00
91.00
N
2022-05-09
14:05
91.00
N
2022-05-09
14:10
94.00
N
2022-05-09
14:15
92.00
N
2022-05-09
14:20
91.00
N
2022-05-09
14:25
94.00
Y
2022-05-09
14:30
92.00
Y
2022-05-09
14:35
91.75
Y
2022-05-09
14:40
92.25
Y
2022-05-09
14:45
91.00
Y
2022-05-09
14:50
92.00
Y
2022-05-09
14:55
93.00
Y
2022-05-09
15:00
90.00
Y
price_table
interval_date
interval_time_utc
price
price_source
2022-05-09
13:00
54.20
Source A
2022-05-09
13:05
54.20
Source A
2022-05-09
13:10
54.20
Source A
2022-05-09
13:00
54.20
Source B
2022-05-09
13:05
54.20
Source B
2022-05-09
13:10
54.20
Source B
2022-05-09
13:15
34.11
Source A
2022-05-09
13:20
34.11
Source A
2022-05-09
13:25
34.11
Source A
2022-05-09
13:15
39.61
Source B
2022-05-09
13:20
39.61
Source B
2022-05-09
13:25
39.61
Source B
2022-05-09
13:30
2.81
Source A
2022-05-09
13:35
2.81
Source A
2022-05-09
13:40
2.81
Source A
2022-05-09
13:30
17.13
Source B
2022-05-09
13:35
17.13
Source B
2022-05-09
13:40
17.13
Source B
2022-05-09
13:45
1.58
Source A
2022-05-09
13:50
1.58
Source A
2022-05-09
13:55
1.58
Source A
2022-05-09
13:45
15.98
Source B
2022-05-09
13:50
15.98
Source B
2022-05-09
13:55
15.98
Source B
2022-05-09
14:00
4.60
Source A
2022-05-09
14:05
4.60
Source A
2022-05-09
14:10
4.60
Source A
2022-05-09
14:00
18.09
Source B
2022-05-09
14:05
18.09
Source B
2022-05-09
14:10
18.09
Source B
2022-05-09
14:15
2.46
Source A
2022-05-09
14:20
2.46
Source A
2022-05-09
14:25
2.46
Source A
2022-05-09
14:15
16.66
Source B
2022-05-09
14:20
16.66
Source B
2022-05-09
14:25
16.66
Source B
2022-05-09
14:30
3.36
Source A
2022-05-09
14:35
3.36
Source A
2022-05-09
14:40
3.36
Source A
2022-05-09
14:30
21.52
Source B
2022-05-09
14:35
21.52
Source B
2022-05-09
14:40
21.52
Source B
2022-05-09
14:45
4.55
Source A
2022-05-09
14:50
4.55
Source A
2022-05-09
14:55
4.55
Source A
2022-05-09
14:45
16.30
Source B
2022-05-09
14:50
16.30
Source B
2022-05-09
14:55
16.30
Source B
2022-05-09
15:00
-21.87
Source A
2022-05-09
15:00
4.96
Source B
-- query that i am using to get hourly values
SELECT pricet.price_source,
COALESCE(powert.volume, 0),
pricet.price,
powert.status
FROM (SELECT status,
SUM(mw_power) volume
FROM power_table
WHERE (interval_date || ' ' || interval_time_utc)::timestamp BETWEEN '2022-05-09 14:05:00.0' AND '2022-05-09 15:00:00.0'
GROUP BY status) powert
RIGHT JOIN (SELECT price_source,
AVG(price) price
FROM price_table
WHERE (interval_date || ' ' || interval_time_utc)::timestamp BETWEEN '2022-05-09 14:05:00.0' AND '2022-05-09 15:00:00.0'
GROUP BY price_source) pricet
ON pricet.price_source = CASE WHEN powert.status = 'Y' THEN 'Source A'
ELSE 'Source B'
END;
I am looking to get an expected output of the following for the 15:00 hour:
price_source
volume
price
status
Source A
736.00
0.54
Y
Source B
368.00
17.38
N
Result that I'm getting from query:
price_source
volume
price
status
Source A
736.00
1.54
Y
Source B
368.00
17.05
N
db fiddle link of tables and query and results: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=474b009c5cf5366961751a61c0f96c6c

I think you made a calculator error. I changed your fiddle to add a rolling sum and rolling average for the second part of your query. To get an average of .54 (Source A) your sum would need to be 12 less than the total of the values for this hour. 12 is the count of values for the hour so a possible slip in subtracting 12 before dividing by 12?
The other source (B) the total would need to be off by 4m (an addition of 4 to the sum). Not sure how this could have happened but ...
Anyway the fiddle is at https://dbfiddle.uk/?rdbms=postgres_14&fiddle=e65c38677f3ab92607bbff778bc0f69e

get time series in 8 hours of interval

I am generating one time-series from using the below query.
SELECT * from (
select * from generate_series(
date_trunc('hour', '2021-11-13 10:01:38'::timestamp),
'2021-12-13 10:01:38'::timestamp,
concat(480, ' minutes')::interval) as t(time_ent)) as t
where t."time_ent" between '2021-11-13 10:01:38'::timestamp and '2021-12-13 10:01:38'::timestamp
and it will give me output like below.
2021-11-13 18:00:00.000
2021-11-14 02:00:00.000
2021-11-14 10:00:00.000
2021-11-14 18:00:00.000
2021-11-15 02:00:00.000
but I need output like.
2021-11-13 16:00:00.000
2021-11-14 00:00:00.000
2021-11-14 08:00:00.000
2021-11-14 16:00:00.000
2021-11-15 00:00:00.000
currently, the time series hours depend upon the timestamp that I pass. in above it gives me hours like 02,10,18...but I want the hours like 00,08,16...hours should not depend on the time I passed in query. I tried many things but not any success.

as your start of generate_series is set to 10:00:00, so your next step will be 18:00:00
you have to start your serie from 00:00:00 (cast to date) e.g.:
SELECT
time_ent::timestamp without time zone
from (
select * from generate_series(
date_trunc('hour', '2021-11-13 10:01:38'::date),
'2021-12-13 10:01:38'::timestamp ,
concat(480, ' minutes')::interval) as t(time_ent)
) as t
where t."time_ent" between '2021-11-13 10:01:38'::timestamp and '2021-12-13 10:01:38'::timestamp
and the result will be:
2021-11-13 16:00:00.000
2021-11-14 00:00:00.000
2021-11-14 08:00:00.000
2021-11-14 16:00:00.000
2021-11-15 00:00:00.000
2021-11-15 08:00:00.000

SQL Server : update query doesn't change anything, shows no error

I am trying to update the table with the values from the same table.
What I want is to change the connection setup in the rows where the worker and client are same and that the changed row Connection setup started in 5mins after the other connection (with the same worker and client) ended.
I first created a SELECT query that returned me all the rows that needed to be changed
SELECT t.*
FROM Table1 t
WHERE EXISTS (SELECT 1 FROM Table1
WHERE worker = t.worker
AND client = t.client
AND t.SessionNo != SessionNo
AND t.[Connection setup] <= DATEADD(mi, 5, [Connection end])
AND t.[Connection setup] >= [Connection end])
Then I tried to import this query inside of an UPDATE query, but it didn't change anything :/ and it doesn't show me any errors.
UPDATE t
SET t.Start = t2.Start
FROM Table1 t
INNER JOIN Table1 t2 ON (t.SessionNo = t2.SessionNo)
WHERE t.worker = t2.worker
AND t.client = t2.client
AND t2.SessionNo <> t.SessionNo
AND t.[Connection setup] <= DATEADD(mi, 5, t2.[Connection end])
AND t.[Connection setup] >= t2.[Connection end]
Example:
The first table are the rows that should be changed. As you can see there is a column "right time" that shows what value should they have after the update.
SessionNo worker Tag Start Ende Dauer Connection setup Connection end client right_time
1 424568 mh 09.01.2020 00:00:00 13:45 13:49 00:04 09.01.2020 13:45:00 09.01.2020 13:49:00 OBENAT1D0209 13:44
2 269650 mg 09.03.2020 00:00:00 10:25 10:47 00:21 09.03.2020 10:25:00 09.03.2020 10:47:00 OBENAT1D0117 10:24
3 280892 mg 09.03.2020 00:00:00 12:19 12:22 00:03 09.03.2020 12:19:00 09.03.2020 12:22:00 OBENAT1D0117 12:19
4 175250 mg 09.03.2020 00:00:00 13:12 13:13 00:01 09.03.2020 13:12:00 09.03.2020 13:13:00 ORTNERAT1D0001 13:04
5 332684 dg 09.05.2020 00:00:00 16:05 16:33 00:28 09.05.2020 16:05:00 09.05.2020 16:33:00 KILLYAT3D0102 15:57
but as you can see here Start column is still the same.
SessionNo worker Tag Start Ende Dauer Connection setup Connection end client right_time
1 317045 mh 09.01.2020 00:00:00 09:29 09:38 00:09 09.01.2020 09:29:00 09.01.2020 09:38:00 AUMAAT1D0124 09:29
2 144035 sb 09.01.2020 00:00:00 11:09 11:27 00:18 09.01.2020 11:09:00 09.01.2020 11:27:00 OBENAT1D0231 11:09
3 437704 mh 09.01.2020 00:00:00 13:44 13:44 00:00 09.01.2020 13:44:00 09.01.2020 13:44:00 OBENAT1D0209 13:44
4 424568 mh 09.01.2020 00:00:00 13:45 13:49 00:04 09.01.2020 13:45:00 09.01.2020 13:49:00 OBENAT1D0209 13:44
5 219640 mh 09.01.2020 00:00:00 15:16 15:26 00:10 09.01.2020 15:16:00 09.01.2020 15:26:00 OBENAT1D0209 15:16
6 201023 mh 09.01.2020 00:00:00 16:29 16:35 00:06 09.01.2020 16:29:00 09.01.2020 16:35:00 OBENAT1D0209 16:29
7 236114 mg 09.03.2020 00:00:00 08:55 09:08 00:12 09.03.2020 08:55:00 09.03.2020 09:08:00 NULL NULL
8 271379 mg 09.03.2020 00:00:00 10:24 10:25 00:00 09.03.2020 10:24:00 09.03.2020 10:25:00 OBENAT1D0117 10:24
9 269650 mg 09.03.2020 00:00:00 10:25 10:47 00:21 09.03.2020 10:25:00 09.03.2020 10:47:00 OBENAT1D0117 10:24
10 290765 mg 09.03.2020 00:00:00 12:19 12:19 00:00 09.03.2020 12:19:00 09.03.2020 12:19:00 OBENAT1D0117 12:19
11 280892 mg 09.03.2020 00:00:00 12:19 12:22 00:03 09.03.2020 12:19:00 09.03.2020 12:22:00 OBENAT1D0117 12:19
12 538583 mg 09.03.2020 00:00:00 12:30 12:58 00:28 09.03.2020 12:30:00 09.03.2020 12:58:00 RATTAYAT1D0107 NULL
13 697202 mg 09.03.2020 00:00:00 13:04 13:08 00:04 09.03.2020 13:04:00 09.03.2020 13:08:00 ORTNERAT1D0001 13:04
14 175250 mg 09.03.2020 00:00:00 13:12 13:13 00:01 09.03.2020 13:12:00 09.03.2020 13:13:00 ORTNERAT1D0001 13:04
15 330580 dg 09.05.2020 00:00:00 15:57 16:05 00:08 09.05.2020 15:57:00 09.05.2020 16:05:00 KILLYAT3D0102 15:57
16 332684 dg 09.05.2020 00:00:00 16:05 16:33 00:28 09.05.2020 16:05:00 09.05.2020 16:33:00 KILLYAT3D0102 15:57
NOTE : In this case, in order to test the values I am changing the Start column instead of the connection startup.

You are updating zero rows, because of:
ON (t.SessionNo = t2.SessionNo)
...
AND t2.SessionNo <> t.SessionNo
You want to find rows with another session number, but you have t.SessionNo = t2.SessionNo, so this is exactly what you don't want.
You seem to think that a join needs a comparision with = on a single column, but this is not true. A join condition can be any boolean expression.
This may work for you:
UPDATE t
SET t.Start = t2.Start
FROM Table1 t
INNER JOIN Table1 t2 ON t.worker = t2.worker
AND t.client = t2.client
AND t.SessionNo <> t2.SessionNo
AND t.[Connection setup] <= DATEADD(mi, 5, t2.[Connection end])
AND t.[Connection setup] >= t2.[Connection end];

SQL : GROUP and MAX multiple columns

I am a SQL beginner, can anyone please help me about a SQL query?
my table looks like below
PatientID Date Time Temperature
1 1/10/2020 9:15 36.2
1 1/10/2020 20:00 36.5
1 2/10/2020 8:15 36.1
1 2/10/2020 18:20 36.3
2 1/10/2020 9:15 36.7
2 1/10/2020 20:00 37.5
2 2/10/2020 8:15 37.1
2 2/10/2020 18:20 37.6
3 1/10/2020 8:15 36.2
3 2/10/2020 18:20 36.3
How can I get each patient everyday's max temperature:
PatientID Date Temperature
1 1/10/2020 36.5
1 2/10/2020 36.3
2 1/10/2020 37.5
2 2/10/2020 37.6
Thanks in advance!

For this dataset, simple aggregation seems sufficient:
select patientid, date, max(temperature) temperature
from mytable
group by patientid, date
On the other hand, if there are other columns that you want to display on the row that has the maximum daily temperature, then it is different. You need some filtering; one option uses window functions:
select *
from (
select t.*,
rank() over(partition by patientid, date order by temperature desc)
from mytable t
) t
where rn = 1

Create an event log from an excel file by turning columns into repeated rows

I have an Excel sheet like the following:
ID Arrival Passed Berthing Date UnBerthing Date Departure Passed
1 13/05/2017 15:30 13/05/2017 16:00 31/05/2017 20:44 31/05/2017
2 15/05/2017 16:56 15/05/2017 17:15 16/05/2017 00:00 16/05/2017
3 20/05/2017 09:54 20/05/2017 10:26 20/05/2017 18:07 20/05/2017
4 24/05/2017 16:09 24/05/2017 16:35 25/05/2017 01:03 25/05/2017
5 29/05/2017 10:30 29/05/2017 10:45 29/05/2017 17:33 29/05/2017
I need this in the following format:
ID Event Time
1 Arrival 13/05/2017 15:30
1 Berth 13/05/2017 16:00
1 UnBerth 31/05/2017 20:44
1 Departure 31/05/2017 20:58
2 Arrival 15/05/2017 16:56
2 Berth 15/05/2017 17:15
2 UnBerth 16/05/2017 00:00
2 Departure 16/05/2017 00:04
etc
I've searched the web and this site(youtube...), but with no right answer, i've tried the transpose function and pivot table, but i couldn't make it.
Any help would be appreciated.
Thanks you.

Assuming that your dataset is in range A2:E6.
For getting ID:
=INDEX($A$2:$E$6,CEILING(ROWS($A$1:A1)/4,1),1)
For getting Event:
=CHOOSE(MOD(ROWS($A$1:A1)-1,4)+1,"Arrival","Berth","Unberth","Departure")
For getting Time:
=INDEX($A$2:$E$6,CEILING(ROWS($A$1:A1)/4,1),MOD(ROWS($A$1:A1)-1,4)+2)
and then copy down until you get error.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Group By based on sequence - Oracle/Postgre - sql

Related

Getting wrong(?) average when calculating values in a time range

get time series in 8 hours of interval

SQL Server : update query doesn't change anything, shows no error

SQL : GROUP and MAX multiple columns

Create an event log from an excel file by turning columns into repeated rows

Categories

Resources