Getting wrong(?) average when calculating values in a time range - sql

I am working with AWS Redshift / PostgreSQL. I have two tables that can be joined on the interval_date (DATE data_type) and interval_time_utc (VARCHAR data type) and/or the status and price_source columns. Source A is equivalent to the Y status and Source B is equivalent to the N status. I am trying to get the average price and the sum of mw_power for a given hour for each status / price_source. An hour is the timestamps from XX:05 to XX:00 so for 15:00, the values should be from the 14:05 to the 15:00 timestamps. Even if for an hour interval where all status are one value, I still need to calculate the average price for both price_source values, but the sum of mw_power would be 0. I am passing in the date and time intervals through my application code. I am seeing a different average price for the 15:00 hour than I expect so either I am bad at math or there is a bug in my query I can't determine. The 14:00 and 16:00 hour results come back as expected.
power_table
interval_date
interval_time_utc
mw_power
status
2022-05-09
13:00
92.25
N
2022-05-09
13:05
90.75
N
2022-05-09
13:10
91.25
N
2022-05-09
13:15
92.00
N
2022-05-09
13:20
92.00
N
2022-05-09
13:25
90.00
N
2022-05-09
13:30
93.00
N
2022-05-09
13:35
91.75
N
2022-05-09
13:40
90.25
N
2022-05-09
13:45
93.00
N
2022-05-09
13:50
91.00
N
2022-05-09
13:55
94.00
N
2022-05-09
14:00
91.00
N
2022-05-09
14:05
91.00
N
2022-05-09
14:10
94.00
N
2022-05-09
14:15
92.00
N
2022-05-09
14:20
91.00
N
2022-05-09
14:25
94.00
Y
2022-05-09
14:30
92.00
Y
2022-05-09
14:35
91.75
Y
2022-05-09
14:40
92.25
Y
2022-05-09
14:45
91.00
Y
2022-05-09
14:50
92.00
Y
2022-05-09
14:55
93.00
Y
2022-05-09
15:00
90.00
Y
price_table
interval_date
interval_time_utc
price
price_source
2022-05-09
13:00
54.20
Source A
2022-05-09
13:05
54.20
Source A
2022-05-09
13:10
54.20
Source A
2022-05-09
13:00
54.20
Source B
2022-05-09
13:05
54.20
Source B
2022-05-09
13:10
54.20
Source B
2022-05-09
13:15
34.11
Source A
2022-05-09
13:20
34.11
Source A
2022-05-09
13:25
34.11
Source A
2022-05-09
13:15
39.61
Source B
2022-05-09
13:20
39.61
Source B
2022-05-09
13:25
39.61
Source B
2022-05-09
13:30
2.81
Source A
2022-05-09
13:35
2.81
Source A
2022-05-09
13:40
2.81
Source A
2022-05-09
13:30
17.13
Source B
2022-05-09
13:35
17.13
Source B
2022-05-09
13:40
17.13
Source B
2022-05-09
13:45
1.58
Source A
2022-05-09
13:50
1.58
Source A
2022-05-09
13:55
1.58
Source A
2022-05-09
13:45
15.98
Source B
2022-05-09
13:50
15.98
Source B
2022-05-09
13:55
15.98
Source B
2022-05-09
14:00
4.60
Source A
2022-05-09
14:05
4.60
Source A
2022-05-09
14:10
4.60
Source A
2022-05-09
14:00
18.09
Source B
2022-05-09
14:05
18.09
Source B
2022-05-09
14:10
18.09
Source B
2022-05-09
14:15
2.46
Source A
2022-05-09
14:20
2.46
Source A
2022-05-09
14:25
2.46
Source A
2022-05-09
14:15
16.66
Source B
2022-05-09
14:20
16.66
Source B
2022-05-09
14:25
16.66
Source B
2022-05-09
14:30
3.36
Source A
2022-05-09
14:35
3.36
Source A
2022-05-09
14:40
3.36
Source A
2022-05-09
14:30
21.52
Source B
2022-05-09
14:35
21.52
Source B
2022-05-09
14:40
21.52
Source B
2022-05-09
14:45
4.55
Source A
2022-05-09
14:50
4.55
Source A
2022-05-09
14:55
4.55
Source A
2022-05-09
14:45
16.30
Source B
2022-05-09
14:50
16.30
Source B
2022-05-09
14:55
16.30
Source B
2022-05-09
15:00
-21.87
Source A
2022-05-09
15:00
4.96
Source B
-- query that i am using to get hourly values
SELECT pricet.price_source,
COALESCE(powert.volume, 0),
pricet.price,
powert.status
FROM (SELECT status,
SUM(mw_power) volume
FROM power_table
WHERE (interval_date || ' ' || interval_time_utc)::timestamp BETWEEN '2022-05-09 14:05:00.0' AND '2022-05-09 15:00:00.0'
GROUP BY status) powert
RIGHT JOIN (SELECT price_source,
AVG(price) price
FROM price_table
WHERE (interval_date || ' ' || interval_time_utc)::timestamp BETWEEN '2022-05-09 14:05:00.0' AND '2022-05-09 15:00:00.0'
GROUP BY price_source) pricet
ON pricet.price_source = CASE WHEN powert.status = 'Y' THEN 'Source A'
ELSE 'Source B'
END;
I am looking to get an expected output of the following for the 15:00 hour:
price_source
volume
price
status
Source A
736.00
0.54
Y
Source B
368.00
17.38
N
Result that I'm getting from query:
price_source
volume
price
status
Source A
736.00
1.54
Y
Source B
368.00
17.05
N
db fiddle link of tables and query and results: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=474b009c5cf5366961751a61c0f96c6c

I think you made a calculator error. I changed your fiddle to add a rolling sum and rolling average for the second part of your query. To get an average of .54 (Source A) your sum would need to be 12 less than the total of the values for this hour. 12 is the count of values for the hour so a possible slip in subtracting 12 before dividing by 12?
The other source (B) the total would need to be off by 4m (an addition of 4 to the sum). Not sure how this could have happened but ...
Anyway the fiddle is at https://dbfiddle.uk/?rdbms=postgres_14&fiddle=e65c38677f3ab92607bbff778bc0f69e

Related

SQL query pivot values to new columns at change in value

I am looking for help with a query relating to staff clocking in/out.
My code is currently:
SELECT CAST(EVENTTIME AS Date) AS Date, FORMAT(EVENTTIME, 'HH:mm') AS Time,SUM(UserID) AS UserID, FirstName + Space(1) + Surname, EventSubTypeDescription
FROM EventsEx
WHERE EventTime >= '2022-05-09' AND (PeripheralName ='clock (In)' OR PeripheralName ='clock (Out)')
GROUP BY userid, EventTime, FirstName, Surname, EventSubTypeDescription
ORDER BY Date,UserID, Time ASC
Which results in:
Date
Time
UserID
UserName
EventSubTypeDescription
2022-05-09
07:53
393
Jennifer
Clock in
2022-05-09
13:33
393
Jennifer
Clock out
2022-05-09
14:06
393
Jennifer
Clock in
2022-05-09
16:57
393
Jennifer
Clock out
2022-05-09
07:59
401
agency 2
Clock in
2022-05-09
12:58
401
agency 2
Clock out
2022-05-09
13:27
401
agency 2
Clock in
2022-05-09
16:56
401
agency 2
Clock out
2022-05-09
07:57
422
Tash
Clock in
2022-05-09
13:56
422
Tash
Clock out
2022-05-09
07:58
432
agency 4
Clock in
2022-05-09
13:00
432
agency 4
Clock out
2022-05-09
13:30
432
agency 4
Clock in
2022-05-09
16:56
432
agency 4
Clock out
2022-05-09
07:57
434
Jordan
Clock in
2022-05-09
13:32
434
Jordan
Clock out
2022-05-09
14:03
434
Jordan
Clock in
2022-05-09
16:59
434
Jordan
Clock out
2022-05-09
07:59
438
Adam
Clock in
2022-05-09
12:59
438
Adam
Clock out
2022-05-09
13:29
438
Adam
Clock in
2022-05-09
16:56
438
Adam
Clock out
Each user clocks in and out during the day. I need to move the Times to separate columns therefore each user has one row per day.
Date
UserID
Username
EventSubTypeDescription
Clock in
Clock Out
Clock in 2
Clock out 2
09/05/2022
393
Jennifer
Clock in
07:53
13:33
14:06
16:57
09/05/2022
401
agency 2
Clock in
07:59
12:58
13:27
16:56
09/05/2022
422
Tash
Clock in
07:57
13:56
09/05/2022
432
agency 4
Clock in
07:58
13:00
13:30
16:56
09/05/2022
434
Jordan
Clock in
07:57
13:32
14:03
16:59
09/05/2022
438
Adam
Clock in
07:59
12:59
13:29
16:56
Beginning from your current output table, you can assign a column number to each of your row, by partitioning on the specific date and user. Then extract the values for clock in, clock out, clock in 2, clock out 2 separately using MAX function over CASE construct:
SELECT Date,
UserID,
Username,
MAX(CASE WHEN ColNum=1 THEN Time ELSE NULL END) AS Clock_in,
MAX(CASE WHEN ColNum=2 THEN Time ELSE NULL END) AS Clock_out,
MAX(CASE WHEN ColNum=3 THEN Time ELSE NULL END) AS Clock_in_2,
MAX(CASE WHEN ColNum=4 THEN Time ELSE NULL END) AS Clock_out_2
FROM (SELECT *,
ROW_NUMBER() OVER(PARTITION BY Date, UserID
ORDER BY Time ) ColNum
FROM output_table) ranked_clock_events
GROUP BY Date,
UserID,
Username
ORDER BY UserID
Try it here.
Side Note 1: to use this query with your query, it's sufficient to replace your code in place of output table
Side Note 2: For a better query that avoids you an intermediate result (given by the output_table result set), ping me here in case you manage to obtain the EventEx table.

Create an event log from an excel file by turning columns into repeated rows

I have an Excel sheet like the following:
ID Arrival Passed Berthing Date UnBerthing Date Departure Passed
1 13/05/2017 15:30 13/05/2017 16:00 31/05/2017 20:44 31/05/2017
2 15/05/2017 16:56 15/05/2017 17:15 16/05/2017 00:00 16/05/2017
3 20/05/2017 09:54 20/05/2017 10:26 20/05/2017 18:07 20/05/2017
4 24/05/2017 16:09 24/05/2017 16:35 25/05/2017 01:03 25/05/2017
5 29/05/2017 10:30 29/05/2017 10:45 29/05/2017 17:33 29/05/2017
I need this in the following format:
ID Event Time
1 Arrival 13/05/2017 15:30
1 Berth 13/05/2017 16:00
1 UnBerth 31/05/2017 20:44
1 Departure 31/05/2017 20:58
2 Arrival 15/05/2017 16:56
2 Berth 15/05/2017 17:15
2 UnBerth 16/05/2017 00:00
2 Departure 16/05/2017 00:04
etc
I've searched the web and this site(youtube...), but with no right answer, i've tried the transpose function and pivot table, but i couldn't make it.
Any help would be appreciated.
Thanks you.
Assuming that your dataset is in range A2:E6.
For getting ID:
=INDEX($A$2:$E$6,CEILING(ROWS($A$1:A1)/4,1),1)
For getting Event:
=CHOOSE(MOD(ROWS($A$1:A1)-1,4)+1,"Arrival","Berth","Unberth","Departure")
For getting Time:
=INDEX($A$2:$E$6,CEILING(ROWS($A$1:A1)/4,1),MOD(ROWS($A$1:A1)-1,4)+2)
and then copy down until you get error.

How to group field by id and find the sum?

I have the following data
id starting_point ending_point Date
A 2525 6565 25/05/2017 13:25:00
B 5656 8989 25/01/2017 10:55:00
A 1234 5656 20/05/2017 03:20:00
A 4562 6245 01/02/2017 19:45:00
B 6496 9999 06/12/2016 21:55:00
B 1122 2211 20/03/2017 18:30:00
How to group the data by their id in the ascending order of date and find the sum of first stating point and last starting point. In this case,
Expected output is :
id starting_point ending_point Date Value
A 4562 6245 01/02/2017 19:45:00
A 1234 5656 20/05/2017 03:20:00
A 2525 6565 25/05/2017 13:25:00 4532 + 6565 = 11127
B 6496 9999 06/12/2016 21:55:00
B 1122 2211 20/03/2017 18:30:00 6496 + 2211 = 8707
IIUC:
In [146]: x.groupby('id').apply(lambda df: df['starting_point'].head(1).values[0]
+ df['ending_point'].tail(1).values[0])
Out[146]:
id
A 8770
B 7867
dtype: int64

SQL Server SUM(Values)

I have the following query that works perfectly well. The query sums the values in a given day.
SELECT
SUM(fldValue) AS 'kWh',
DAY(fldDateTime) AS 'Day',
MONTH(fldDateTime) AS 'Month',
YEAR(fldDateTime) AS 'Year'
FROM
[Data.tblData]
WHERE
tblData_Id IN (SELECT DISTINCT tblData_Id
FROM [Data.tblData])
GROUP BY
YEAR(fldDateTime), MONTH(fldDateTime), DAY(fldDateTime),
tblData_Id,fldDateTime
ORDER BY
YEAR(fldDateTime), MONTH(fldDateTime), DAY(fldDateTime)
The problem I have is that it sums from midnight to midnight, I need it to sum the values after midnight ( >= Midnight) then up to midnight of the next day. The reason for this is the data that comes in for a day, is always after midnight. For example the first logged data will be '2016-01-01 00:01:00', the final logged data will be '2016-01-02 00:00:00'. This is how the hardware works that sends me the data.
I would like to know how to encapsulate >= midnight to midnight in the query.
Dataset:
DateTime Value
20/03/2016 00:30 69.00
20/03/2016 01:00 69.00
20/03/2016 01:30 69.00
20/03/2016 02:00 69.00
20/03/2016 02:30 69.00
20/03/2016 03:00 69.00
20/03/2016 03:30 11.88
20/03/2016 04:00 0.52
20/03/2016 04:30 1.51
20/03/2016 05:00 2.22
20/03/2016 05:30 2.11
20/03/2016 06:00 0.05
20/03/2016 06:30 6.78
20/03/2016 07:00 14.79
20/03/2016 07:30 1.57
20/03/2016 08:00 1.51
20/03/2016 08:30 4.81
20/03/2016 09:00 0.11
20/03/2016 09:30 8.99
20/03/2016 10:00 10.06
20/03/2016 10:30 15.28
20/03/2016 11:00 3.22
20/03/2016 11:30 1.73
20/03/2016 12:00 19.10
20/03/2016 12:30 2.08
20/03/2016 13:00 2.61
20/03/2016 13:30 0.84
20/03/2016 14:00 8.65
20/03/2016 14:30 2.37
20/03/2016 15:00 16.34
20/03/2016 15:30 12.66
20/03/2016 16:00 2.64
20/03/2016 16:30 0.19
20/03/2016 17:00 3.91
20/03/2016 17:30 2.39
20/03/2016 18:00 0.57
20/03/2016 18:30 1.30
20/03/2016 19:00 5.06
20/03/2016 19:30 17.45
20/03/2016 20:00 13.04
20/03/2016 20:30 5.00
20/03/2016 21:00 7.47
20/03/2016 21:30 5.09
20/03/2016 22:00 0.33
20/03/2016 22:30 5.29
20/03/2016 23:00 15.33
20/03/2016 23:30 5.39
21/03/2016 00:00 6.74
Thank you in advance.
The expected sum output value for 20/03/2016 is: 662.98
The output table will look like:
SumValue Day Month Year Meter Id
659.18 20 3 2016 6
251.37 21 3 2016 6
279.03 22 3 2016 6
280.03 23 3 2016 6
284.22 24 3 2016 6
310.12 25 3 2016 6
320.84 26 3 2016 6
269.29 27 3 2016 6
276.11 28 3 2016 6
279.11 29 3 2016 6
The value column is the sum of the values for that day, made up of lots of individual times.
Use the below query for summing up the midnight value with previous day.
SELECT
SUM(fldValue) AS 'kWh',
CASE WHEN CONVERT(VARCHAR(8), fldDateTime, 108)='00:00:00' THEN DAY(fldDateTime)-1 ELSE DAY(fldDateTime) END AS 'Day',
MONTH(fldDateTime) AS 'Month',
YEAR(fldDateTime) AS 'Year'
FROM
Data.[tblData]
GROUP BY
YEAR(fldDateTime), MONTH(fldDateTime),CASE WHEN CONVERT(VARCHAR(8), fldDateTime, 108)='00:00:00' THEN DAY(fldDateTime)-1 ELSE DAY(fldDateTime) END
ORDER BY
YEAR(fldDateTime), MONTH(fldDateTime), CASE WHEN CONVERT(VARCHAR(8), fldDateTime, 108)='00:00:00' THEN DAY(fldDateTime)-1 ELSE DAY(fldDateTime) END
Sample output :
First, I have no idea what the WHERE clause is doing, so I'm going to remove it.
Second, don't use single quotes for column names.
Third, your GROUP BY clause is too complicated. You only need to include the unaggregated columns in the SELECT.
Finally, the key idea is to subtract one hour from the values everywhere they are used. Here is a simple method:
SELECT SUM(fldValue) AS kWh,
DAY(newdt) AS [Day],
MONTH(newdt) AS [Month],
YEAR(newdt) AS [Year]
FROM (SELECT d.*, DATEADD(hour, -1, fldDateTime) as newdt
FROM Data.tblData d
) d
GROUP BY YEAR(newdt), MONTH(newdt), DAY(newdt)
ORDER BY YEAR(newdt), MONTH(newdt), DAY(newdt)
Same answer as #Gordon but you can subtract one minute instead of one hour.
SELECT SUM(fldValue) AS kWh,
DAY(newdt) AS [Day],
MONTH(newdt) AS [Month],
YEAR(newdt) AS [Year]
FROM (SELECT d.*, DATEADD(minute, -1, fldDateTime) as newdt
FROM Data.tblData d
) d
GROUP BY YEAR(newdt), MONTH(newdt), DAY(newdt)
ORDER BY YEAR(newdt), MONTH(newdt), DAY(newdt)
declare #tempTable table ([DateTime] datetime, Value Float)
insert into #tempTable ([DateTime], [Value])
select convert(datetime,'20/03/2016 00:30',103), 69.00 union all
select convert(datetime,'20/03/2016 01:00',103), 69.00 union all
select convert(datetime,'21/03/2016 00:00',103), 6.74
select * from #tempTable
select [sum] = SUM(value), [year] = year(DT), [month] = month(DT), [day] = day(DT)
from (select Value, DT = dateadd(second, -1, [DateTime]) from #tempTable) x
group by year(DT), month(DT), day(DT)

How to calculate time difference of rows using Lag/Lead in DB2?

I have this following table:
id Date Hour Description Username
1 2015-05-13 10:08 SessionClosed Thierry
2 2015-05-12 23:30 SessionClosed Leao
3 2015-05-12 20:50 SessionOpened Thierry
4 2015-05-11 17:10 SessionOpened Leao
How can I calculate the difference in time of each user's session?
I'm using DB2.
The result should look like this:
id Date Hour Description Username DiffTime
1 2015-05-13 10:08 SessionClosed Thierry 14:18
2 2015-05-12 23:30 SessionClosed Leao 30:20
3 2015-05-12 20:50 SessionOpened Thierry 00:00
4 2015-05-11 17:10 SessionOpened Leao 00:00