SQL query pivot values to new columns at change in value - sql

I am looking for help with a query relating to staff clocking in/out.
My code is currently:
SELECT CAST(EVENTTIME AS Date) AS Date, FORMAT(EVENTTIME, 'HH:mm') AS Time,SUM(UserID) AS UserID, FirstName + Space(1) + Surname, EventSubTypeDescription
FROM EventsEx
WHERE EventTime >= '2022-05-09' AND (PeripheralName ='clock (In)' OR PeripheralName ='clock (Out)')
GROUP BY userid, EventTime, FirstName, Surname, EventSubTypeDescription
ORDER BY Date,UserID, Time ASC
Which results in:
Date
Time
UserID
UserName
EventSubTypeDescription
2022-05-09
07:53
393
Jennifer
Clock in
2022-05-09
13:33
393
Jennifer
Clock out
2022-05-09
14:06
393
Jennifer
Clock in
2022-05-09
16:57
393
Jennifer
Clock out
2022-05-09
07:59
401
agency 2
Clock in
2022-05-09
12:58
401
agency 2
Clock out
2022-05-09
13:27
401
agency 2
Clock in
2022-05-09
16:56
401
agency 2
Clock out
2022-05-09
07:57
422
Tash
Clock in
2022-05-09
13:56
422
Tash
Clock out
2022-05-09
07:58
432
agency 4
Clock in
2022-05-09
13:00
432
agency 4
Clock out
2022-05-09
13:30
432
agency 4
Clock in
2022-05-09
16:56
432
agency 4
Clock out
2022-05-09
07:57
434
Jordan
Clock in
2022-05-09
13:32
434
Jordan
Clock out
2022-05-09
14:03
434
Jordan
Clock in
2022-05-09
16:59
434
Jordan
Clock out
2022-05-09
07:59
438
Adam
Clock in
2022-05-09
12:59
438
Adam
Clock out
2022-05-09
13:29
438
Adam
Clock in
2022-05-09
16:56
438
Adam
Clock out
Each user clocks in and out during the day. I need to move the Times to separate columns therefore each user has one row per day.
Date
UserID
Username
EventSubTypeDescription
Clock in
Clock Out
Clock in 2
Clock out 2
09/05/2022
393
Jennifer
Clock in
07:53
13:33
14:06
16:57
09/05/2022
401
agency 2
Clock in
07:59
12:58
13:27
16:56
09/05/2022
422
Tash
Clock in
07:57
13:56
09/05/2022
432
agency 4
Clock in
07:58
13:00
13:30
16:56
09/05/2022
434
Jordan
Clock in
07:57
13:32
14:03
16:59
09/05/2022
438
Adam
Clock in
07:59
12:59
13:29
16:56

Beginning from your current output table, you can assign a column number to each of your row, by partitioning on the specific date and user. Then extract the values for clock in, clock out, clock in 2, clock out 2 separately using MAX function over CASE construct:
SELECT Date,
UserID,
Username,
MAX(CASE WHEN ColNum=1 THEN Time ELSE NULL END) AS Clock_in,
MAX(CASE WHEN ColNum=2 THEN Time ELSE NULL END) AS Clock_out,
MAX(CASE WHEN ColNum=3 THEN Time ELSE NULL END) AS Clock_in_2,
MAX(CASE WHEN ColNum=4 THEN Time ELSE NULL END) AS Clock_out_2
FROM (SELECT *,
ROW_NUMBER() OVER(PARTITION BY Date, UserID
ORDER BY Time ) ColNum
FROM output_table) ranked_clock_events
GROUP BY Date,
UserID,
Username
ORDER BY UserID
Try it here.
Side Note 1: to use this query with your query, it's sufficient to replace your code in place of output table
Side Note 2: For a better query that avoids you an intermediate result (given by the output_table result set), ping me here in case you manage to obtain the EventEx table.

Related

Getting wrong(?) average when calculating values in a time range

I am working with AWS Redshift / PostgreSQL. I have two tables that can be joined on the interval_date (DATE data_type) and interval_time_utc (VARCHAR data type) and/or the status and price_source columns. Source A is equivalent to the Y status and Source B is equivalent to the N status. I am trying to get the average price and the sum of mw_power for a given hour for each status / price_source. An hour is the timestamps from XX:05 to XX:00 so for 15:00, the values should be from the 14:05 to the 15:00 timestamps. Even if for an hour interval where all status are one value, I still need to calculate the average price for both price_source values, but the sum of mw_power would be 0. I am passing in the date and time intervals through my application code. I am seeing a different average price for the 15:00 hour than I expect so either I am bad at math or there is a bug in my query I can't determine. The 14:00 and 16:00 hour results come back as expected.
power_table
interval_date
interval_time_utc
mw_power
status
2022-05-09
13:00
92.25
N
2022-05-09
13:05
90.75
N
2022-05-09
13:10
91.25
N
2022-05-09
13:15
92.00
N
2022-05-09
13:20
92.00
N
2022-05-09
13:25
90.00
N
2022-05-09
13:30
93.00
N
2022-05-09
13:35
91.75
N
2022-05-09
13:40
90.25
N
2022-05-09
13:45
93.00
N
2022-05-09
13:50
91.00
N
2022-05-09
13:55
94.00
N
2022-05-09
14:00
91.00
N
2022-05-09
14:05
91.00
N
2022-05-09
14:10
94.00
N
2022-05-09
14:15
92.00
N
2022-05-09
14:20
91.00
N
2022-05-09
14:25
94.00
Y
2022-05-09
14:30
92.00
Y
2022-05-09
14:35
91.75
Y
2022-05-09
14:40
92.25
Y
2022-05-09
14:45
91.00
Y
2022-05-09
14:50
92.00
Y
2022-05-09
14:55
93.00
Y
2022-05-09
15:00
90.00
Y
price_table
interval_date
interval_time_utc
price
price_source
2022-05-09
13:00
54.20
Source A
2022-05-09
13:05
54.20
Source A
2022-05-09
13:10
54.20
Source A
2022-05-09
13:00
54.20
Source B
2022-05-09
13:05
54.20
Source B
2022-05-09
13:10
54.20
Source B
2022-05-09
13:15
34.11
Source A
2022-05-09
13:20
34.11
Source A
2022-05-09
13:25
34.11
Source A
2022-05-09
13:15
39.61
Source B
2022-05-09
13:20
39.61
Source B
2022-05-09
13:25
39.61
Source B
2022-05-09
13:30
2.81
Source A
2022-05-09
13:35
2.81
Source A
2022-05-09
13:40
2.81
Source A
2022-05-09
13:30
17.13
Source B
2022-05-09
13:35
17.13
Source B
2022-05-09
13:40
17.13
Source B
2022-05-09
13:45
1.58
Source A
2022-05-09
13:50
1.58
Source A
2022-05-09
13:55
1.58
Source A
2022-05-09
13:45
15.98
Source B
2022-05-09
13:50
15.98
Source B
2022-05-09
13:55
15.98
Source B
2022-05-09
14:00
4.60
Source A
2022-05-09
14:05
4.60
Source A
2022-05-09
14:10
4.60
Source A
2022-05-09
14:00
18.09
Source B
2022-05-09
14:05
18.09
Source B
2022-05-09
14:10
18.09
Source B
2022-05-09
14:15
2.46
Source A
2022-05-09
14:20
2.46
Source A
2022-05-09
14:25
2.46
Source A
2022-05-09
14:15
16.66
Source B
2022-05-09
14:20
16.66
Source B
2022-05-09
14:25
16.66
Source B
2022-05-09
14:30
3.36
Source A
2022-05-09
14:35
3.36
Source A
2022-05-09
14:40
3.36
Source A
2022-05-09
14:30
21.52
Source B
2022-05-09
14:35
21.52
Source B
2022-05-09
14:40
21.52
Source B
2022-05-09
14:45
4.55
Source A
2022-05-09
14:50
4.55
Source A
2022-05-09
14:55
4.55
Source A
2022-05-09
14:45
16.30
Source B
2022-05-09
14:50
16.30
Source B
2022-05-09
14:55
16.30
Source B
2022-05-09
15:00
-21.87
Source A
2022-05-09
15:00
4.96
Source B
-- query that i am using to get hourly values
SELECT pricet.price_source,
COALESCE(powert.volume, 0),
pricet.price,
powert.status
FROM (SELECT status,
SUM(mw_power) volume
FROM power_table
WHERE (interval_date || ' ' || interval_time_utc)::timestamp BETWEEN '2022-05-09 14:05:00.0' AND '2022-05-09 15:00:00.0'
GROUP BY status) powert
RIGHT JOIN (SELECT price_source,
AVG(price) price
FROM price_table
WHERE (interval_date || ' ' || interval_time_utc)::timestamp BETWEEN '2022-05-09 14:05:00.0' AND '2022-05-09 15:00:00.0'
GROUP BY price_source) pricet
ON pricet.price_source = CASE WHEN powert.status = 'Y' THEN 'Source A'
ELSE 'Source B'
END;
I am looking to get an expected output of the following for the 15:00 hour:
price_source
volume
price
status
Source A
736.00
0.54
Y
Source B
368.00
17.38
N
Result that I'm getting from query:
price_source
volume
price
status
Source A
736.00
1.54
Y
Source B
368.00
17.05
N
db fiddle link of tables and query and results: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=474b009c5cf5366961751a61c0f96c6c
I think you made a calculator error. I changed your fiddle to add a rolling sum and rolling average for the second part of your query. To get an average of .54 (Source A) your sum would need to be 12 less than the total of the values for this hour. 12 is the count of values for the hour so a possible slip in subtracting 12 before dividing by 12?
The other source (B) the total would need to be off by 4m (an addition of 4 to the sum). Not sure how this could have happened but ...
Anyway the fiddle is at https://dbfiddle.uk/?rdbms=postgres_14&fiddle=e65c38677f3ab92607bbff778bc0f69e

How to merge records with aggregate historical data?

I have a table with individual records and another which holds historical information about the individuals in the former.
I want to extract information about the individuals from the second table. Both tables have timestamp. It is very important that the historical information happened before the record in the first table.
Date_Time name
0 2021-09-06 10:46:00 Leg It Liam
1 2021-09-06 10:46:00 Hollyhill Island
2 2021-09-06 10:46:00 Shani El Bolsa
3 2021-09-06 10:46:00 Kilbride Fifi
4 2021-09-06 10:46:00 Go
2100 2021-10-06 11:05:00 Slaneyside Babs
2101 2021-10-06 11:05:00 Hillview Joe
2102 2021-10-06 11:05:00 Fairway Flyer
2103 2021-10-06 11:05:00 Whiteys Surprise
2104 2021-10-06 11:05:00 Astons Lucy
The name is the variable by which you connect the two tables:
Date_Time name cc
13 2021-09-15 12:16:00 Hollyhill Island 6.00
14 2021-09-06 10:46:00 Hollyhill Island 4.50
15 2021-05-30 18:28:00 Hollyhill Island 3.50
16 2021-05-25 10:46:00 Hollyhill Island 2.50
17 2021-05-18 12:46:00 Hollyhill Island 2.38
18 2021-04-05 12:31:00 Hollyhill Island 3.50
19 2021-04-28 12:16:00 Hollyhill Island 3.75
I want to add aggregated data from this table to the first. Such as adding the cc mean and count.
Date_Time name
1 2021-09-06 10:46:00 Hollyhill Island
This line I would add 5 for cc count and 3.126 for the cc mean. Remember the historical records need to be before the date time of the individual records.
I am a bit confused how to do this efficiently. I know I need to groupby the historical data.
Also the individual records are usually in groups of Date_Time, if that makes it any easier.
IIUC:
try:
out=df1.merge(df2,on='name',suffixes=('','_y'))
#merging both df's on name
out=out.mask(out['Date_Time']<=out['Date_Time_y']).dropna()
#filtering results
out=out.groupby(['Date_Time','name'])['cc'].agg(['count','mean']).reset_index()
#aggregrating values
output of out:
Date_Time name count mean
0 2021-09-06 10:46:00 Hollyhill Island 5 3.126

SQL : GROUP and MAX multiple columns

I am a SQL beginner, can anyone please help me about a SQL query?
my table looks like below
PatientID Date Time Temperature
1 1/10/2020 9:15 36.2
1 1/10/2020 20:00 36.5
1 2/10/2020 8:15 36.1
1 2/10/2020 18:20 36.3
2 1/10/2020 9:15 36.7
2 1/10/2020 20:00 37.5
2 2/10/2020 8:15 37.1
2 2/10/2020 18:20 37.6
3 1/10/2020 8:15 36.2
3 2/10/2020 18:20 36.3
How can I get each patient everyday's max temperature:
PatientID Date Temperature
1 1/10/2020 36.5
1 2/10/2020 36.3
2 1/10/2020 37.5
2 2/10/2020 37.6
Thanks in advance!
For this dataset, simple aggregation seems sufficient:
select patientid, date, max(temperature) temperature
from mytable
group by patientid, date
On the other hand, if there are other columns that you want to display on the row that has the maximum daily temperature, then it is different. You need some filtering; one option uses window functions:
select *
from (
select t.*,
rank() over(partition by patientid, date order by temperature desc)
from mytable t
) t
where rn = 1

Create an event log from an excel file by turning columns into repeated rows

I have an Excel sheet like the following:
ID Arrival Passed Berthing Date UnBerthing Date Departure Passed
1 13/05/2017 15:30 13/05/2017 16:00 31/05/2017 20:44 31/05/2017
2 15/05/2017 16:56 15/05/2017 17:15 16/05/2017 00:00 16/05/2017
3 20/05/2017 09:54 20/05/2017 10:26 20/05/2017 18:07 20/05/2017
4 24/05/2017 16:09 24/05/2017 16:35 25/05/2017 01:03 25/05/2017
5 29/05/2017 10:30 29/05/2017 10:45 29/05/2017 17:33 29/05/2017
I need this in the following format:
ID Event Time
1 Arrival 13/05/2017 15:30
1 Berth 13/05/2017 16:00
1 UnBerth 31/05/2017 20:44
1 Departure 31/05/2017 20:58
2 Arrival 15/05/2017 16:56
2 Berth 15/05/2017 17:15
2 UnBerth 16/05/2017 00:00
2 Departure 16/05/2017 00:04
etc
I've searched the web and this site(youtube...), but with no right answer, i've tried the transpose function and pivot table, but i couldn't make it.
Any help would be appreciated.
Thanks you.
Assuming that your dataset is in range A2:E6.
For getting ID:
=INDEX($A$2:$E$6,CEILING(ROWS($A$1:A1)/4,1),1)
For getting Event:
=CHOOSE(MOD(ROWS($A$1:A1)-1,4)+1,"Arrival","Berth","Unberth","Departure")
For getting Time:
=INDEX($A$2:$E$6,CEILING(ROWS($A$1:A1)/4,1),MOD(ROWS($A$1:A1)-1,4)+2)
and then copy down until you get error.

How do you summarize row data in sybase table

I have this table in sybase:
Date File_name File_Size customer Id
1/1/205 11:00:00 temp.csv 100000 ESPN 1111
1/1/205 11:10:00 temp.csv 200000 ESPN 1122
1/1/205 11:20:00 temp.csv 400000 ESPN 1456
1/1/205 11:30:00 temp.csv 400000 ESPN 2345
1/2/205 11:00:00 llc.csv 100000 LLC 445
1/2/205 11:10:00 llc1.txt 200000 LLC 677
1/2/205 11:20:00 dtt.txt 500000 LLC 76
1/2/205 11:30:00 jpp.txt 400000 LLC 666
I need to come up with a query to summarize this data by day which will be month/day/Year.
Date total_file_size number_of_unique_customers number_unique_id
1/1/2015 110,000 1 4
1/2/2015 120,000 1 4
How would I do this in sql query? I tried this:
select convert(varchar,arrived_at,110) as Date
sum(File_Size),
count(distinct(customer)),
count(distinct(id))
group by Date
Does not seem to be working, any ideas?
try
select
convert(varchar,arrived_at,110) as Date,
SUM(File_Size),
count(distinct customer) as number_of_unique_customers,
count(distinct id ) as number_unique_id
group by convert(varchar,arrived_at,110)