Different where condition for each column - sql

Is there a way to write query like this in SQL Server, without using select two times and then join?
select trans_date, datepart(HOUR,trans_time) as hour,
(datepart(MINUTE,trans_time)/30)*30 as minute,
case
when paper_number = 11111/*paperA*/
then sum(t1.price*t1.amount)/SUM(t1.amount)*100
end as avgA,
case
when paper_number = 22222/*PaperB*/
then sum(t1.price*t1.amount)/SUM(t1.amount)*100
end as avgB
from dbo.transactions t1
where trans_date = '2006-01-01' and (paper_number = 11111 or paper_number = 22222)
group by trans_date, datepart(HOUR,trans_time), datepart(MINUTE,trans_time)/30
order by hour, minute
SQL Server asks me to add paper_number to group by, and returns nulls when I do so
trans_date hour minute avgA avgB
2006-01-01 9 30 1802.57199725463 NULL
2006-01-01 9 30 NULL 169125.886524823
2006-01-01 10 0 1804.04742534103 NULL
2006-01-01 10 0 NULL 169077.777777778
2006-01-01 10 30 1806.18773535637 NULL
2006-01-01 10 30 NULL 170274.550381867
2006-01-01 11 0 1804.43466045433 NULL
2006-01-01 11 0 NULL 170743.4
2006-01-01 11 30 1807.04532012137 NULL
2006-01-01 11 30 NULL 171307.00280112

Try:
with cte as
(select trans_date,
datepart(HOUR,trans_time) as hour,
(datepart(MINUTE,trans_time)/30)*30 as minute,
sum(case when paper_number = 11111/*paperA*/
then t1.price*t1.amount else 0 end) as wtdSumA,
sum(case when paper_number = 11111/*paperA*/
then t1.amount else 0 end) as amtSumA,
sum(case when paper_number = 22222/*PaperB*/
then t1.price*t1.amount else 0 end) as wtdSumB,
sum(case when paper_number = 22222/*PaperB*/
then t1.amount else 0 end) as amtSumB
from dbo.transactions t1
where trans_date = '2006-01-01'
group by trans_date, datepart(HOUR,trans_time), datepart(MINUTE,trans_time)/30)
select trans_date, hour, minute,
case amtSumA when 0 then 0 else 100 * wtdSumA / amtSumA end as avgA,
case amtSumB when 0 then 0 else 100 * wtdSumB / amtSumB end as avgB
from cte
order by hour, minute
(SQLFiddle here)
You can derive this without the CTE, like so:
select trans_date,
datepart(HOUR,trans_time) as hour,
(datepart(MINUTE,trans_time)/30)*30 as minute,
case sum(case when paper_number = 11111/*paperA*/ then t1.amount else 0 end)
when 0 then 0
else 100 * sum(case when paper_number = 11111 then t1.price*t1.amount else 0 end)
/ sum(case when paper_number = 11111 then t1.amount else 0 end) end as avgA,
case sum(case when paper_number = 22222/*paperA*/ then t1.amount else 0 end)
when 0 then 0
else 100 * sum(case when paper_number = 22222 then t1.price*t1.amount else 0 end)
/ sum(case when paper_number = 22222 then t1.amount else 0 end) end as avgB
from dbo.transactions t1
where trans_date = '2006-01-01'
group by trans_date, datepart(HOUR,trans_time), datepart(MINUTE,trans_time)/30
order by 1,2,3

Use SUM() function on the entire CASE expression
select trans_date, datepart(HOUR,trans_time) as hour, (datepart(MINUTE,trans_time)/30)*30 as minute,
sum(case when paper_number = 11111/*paperA*/ then t1.price*t1.amount end) * 1.00
/ sum(case when paper_number = 11111/*paperA*/ then t1.amount end) * 100 as avgA,
sum(case when paper_number = 22222/*PaperB*/ then t1.price*t1.amount end) * 1.00
/ sum(case when paper_number = 22222/*paperB*/ then t1.amount end) * 100 as avgB
from dbo.transactions t1
where trans_date = '2006-01-01'
group by trans_date, datepart(HOUR,trans_time), datepart(MINUTE,trans_time)/30
order by hour, minute
Demo on SQLFiddle

You could also try using UNPIVOT and PIVOT like below:
WITH prepared AS (
SELECT
trans_date,
trans_time = DATEADD(MINUTE, DATEDIFF(MINUTE, '00:00', trans_time) / 30 * 30, CAST('00:00' AS time)),
paper_number,
total = price * amount,
amount
FROM transactions
),
unpivoted AS (
SELECT
trans_date,
trans_time,
attribute = attribute + CAST(paper_number AS varchar(10)),
value
FROM prepared
UNPIVOT (value FOR attribute IN (total, amount)) u
),
pivoted AS (
SELECT
trans_date,
trans_time,
avgA = total11111 * 100 / amount11111,
avgB = total22222 * 100 / amount22222
FROM unpivoted
PIVOT (
SUM(value) FOR attribute IN (total11111, amount11111, total22222, amount22222)
) p
)
SELECT *
FROM pivoted
;
As an attempt at explaining how the above query works, below is a description of transformations that the original dataset undergoes in the course of the query's execution, using the following example:
trans_date trans_time paper_number price amount
---------- ---------- ------------ ----- ------
2013-04-09 11:12:35 11111 10 15
2013-04-09 11:13:01 22222 24 10
2013-04-09 11:28:44 11111 12 5
2013-04-09 11:36:20 22222 20 11
The prepared CTE produces the following column set:
trans_date trans_time paper_number total amount
---------- ---------- ------------ ----- ------
2013-04-09 11:00:00 11111 150 15
2013-04-09 11:00:00 22222 240 10
2013-04-09 11:00:00 11111 60 5
2013-04-09 11:30:00 22222 220 11
where trans_time is the original trans_time rounded down to the nearest half-hour and total is price multiplied by amount.
The unpivoted CTE unpivots the total and amount values to produce attribute and value:
trans_date trans_time paper_number attribute value
---------- ---------- ------------ --------- -----
2013-04-09 11:00:00 11111 total 150
2013-04-09 11:00:00 11111 amount 15
2013-04-09 11:00:00 22222 total 240
2013-04-09 11:00:00 22222 amount 10
2013-04-09 11:00:00 11111 total 60
2013-04-09 11:00:00 11111 amount 5
2013-04-09 11:30:00 22222 total 220
2013-04-09 11:30:00 22222 amount 11
Then paper_number is combined with attribute to form a single column, also called attribute:
trans_date trans_time attribute value
---------- ---------- ----------- -----
2013-04-09 11:00:00 total11111 150
2013-04-09 11:00:00 amount11111 15
2013-04-09 11:00:00 total22222 240
2013-04-09 11:00:00 amount22222 10
2013-04-09 11:00:00 total11111 60
2013-04-09 11:00:00 amount11111 5
2013-04-09 11:30:00 total22222 220
2013-04-09 11:30:00 amount22222 11
Finally, the pivoted CTE pivots the value data back aggregating them along the way with SUM() and using the attribute values for column names:
trans_date trans_time total11111 amount11111 total22222 amount22222
---------- ---------- ---------- ----------- ---------- -----------
2013-04-09 11:00:00 210 20 240 10
2013-04-09 11:30:00 NULL NULL 220 11
The pivoted values are then additionally processed (every totalNNN is multiplied by 100 and divided by the corresponding amountNNN) to form the final output:
trans_date trans_time avgA avgB
---------- ---------- ---- ----
2013-04-09 11:00:00 1050 2400
2013-04-09 11:30:00 NULL 2000
There's a couple of issues that may need to be addressed:
If price and amount are different data types, the total and amount may end up different data types as well. For UNPIVOT, it is mandatory that the values being unpivoted are of exactly the same type, and so you'll need to add an explicit conversion of total and amount to some common type, possibly one which would prevent data/precision loss. That would could be done in the prepared CTE like this (assuming the common type to be decimal(10,2)):
total = CAST(price * amount AS decimal(10,2)),
amount = CAST(amount AS decimal(10,2))
If aggregated amounts may ever end up 0, you'll need to account for the division by 0 issue. One way to do that could be to substitute the 0 amount with NULL, which would make the result of the division NULL as well. Applying ISNULL or COALESCE to that result would allow you to transform it to some default value, 0 for instance. So, change this bit in the pivoted CTE:
avgA = ISNULL(total11111 * 100 / NULLIF(amount11111, 0), 0),
avgB = ISNULL(total22222 * 100 / NULLIF(amount22222, 0), 0)

Related

Calculate how long a process took, taking into account opening hours

I have two tables. An opening hours table that says for each seller and store, which are the opening and closing times for each day of the week. The second table is the operation one which has all information about the processes.
What I need is to calculate how many seconds each process took considering only the hours when the store was opened.
I tried to solve that with case when. I solved the problem when the process take less than 2 days. But I don't know how to handle it when it takes more days. The other problem I had with this code is that case when takes a lot of time to process. Can anybody help me with these issues?
Opening hours table:
sellerid
sellerstoreid
day
dayweek
opening
closing
next_day
opening_next_day
days_to_next
123
abc
1
monday
09:00:00
17:00:00
2
09:00:00
1
123
abc
2
tuesday
09:00:00
17:00:00
4
09:00:00
2
123
abc
4
thursday
09:00:00
17:00:00
5
09:30:00
1
123
abc
5
friday
09:30:00
17:00:00
1
09:00:00
3
Where:
sellerid + sellerstoreid + day works as a primary key;
dayweek translates day from number to name;
opening and closing are the opening and closing time for that day;
opening_next_day shows the opening time o the next available date for that store and seller;
days_to_next informes in how many days will the store reopen
Process table:
delivery_id
sellerid
sellerstoreid
process
end_time
a1
123
abc
p1
05/12/2022 16:00:00.000
a1
123
abc
p2
06/12/2022 16:00:00.000
a1
123
abc
p3
06/12/2022 16:00:00.000
a1
123
abc
p4
08/12/2022 16:00:00.000
a1
123
abc
p5
13/12/2022 16:00:00.000
Where:
The end_time of the previous process will be the the start time of the process.
with
table_1 as (
select
delivery_id
, sellerid
, sellerstoreid
, process
, lag(end_time, 1) over (partition by delivery_id order by end_time) as start_time
, extract(dow from lag(end_time, 1) over (partition by delivery_id order by end_time)) as dow_start_time
, end_time
, extract(dow from end_time) as dow_end_time
from process_table
),
table_2 as (
select
table_1.*
, oh_start.opening as start_opening
, oh_start.closing as start_closing
, oh_end .opening as end_opening
, oh_end .closing as end_closing
from table_1 tb1
left join opening_hours oh_start
on oh_start.sellerid = tb1.sellerid
and oh_start.sellerstoreid = tb1.sellerstoreid
and oh_start.day = dow_start_time
left join opening_hours oh_end
on oh_end .sellerid = tb1.sellerid
and oh_end.sellerstoreid = tb1.sellerstoreid
and oh_end.day = dow_end_time
)
select
*
, case
when dow_start_time = dow_end_time then
extract(epoch from
(case
when end_time::time > start_opening then
(case
when end_time::time > start_closing then start_closing
else end_time::time
end)
else start_opening
end
-
case
when start_time::time > start_opening then
(case
when start_time::time < start_closing then start_time::time
else start_closing
end
)
else start_opening
end))
when dow_start_time <> dow_end_time then
extract(epoch from
(start_closing
-
case
when start_time::time > start_opening then
(case
when start_time::time < start_closing then start_time::time
else start_closing
end)
else start_opening
end)
+
(case
when end_time::time > end_opening then
(case
when end_time::time > end_closing then end_closing
else end_time::time
end)
else end_opening
end
-
end_opening)
end status_duration
from table_2

SQL query to get top 24 records, then average the first 12 and bottom 12

I'm attempting to analyze each account's performance (A_Count & B_Count) during their first year versus their second year. This should only return clients who have at least 24 months of totals (records).
Volume Table
Account
ReportDate
A_Count
B_Count
1001A
2019-01-01
47
100
1001A
2019-02-01
50
105
1002A
2019-02-01
50
105
I think I'm on the right track by wanting to grab the top 24 records for each account (only if 24 exist) and then grabbing the top 12 and bottom 12, but not sure how to get there.
I guess ideal output would be:
Account
YR1_A_Avg
YR1_B_Avg
YR2_A_Avg
YR2_B_Avg
FirstDate
LastDate
1001A
47
100
53
115
2019-01-01
2021-12-31
1002A
50
105
65
130
2019-02-01
2022-01-01
1003A
15
180
38
200
2017-05-01
2019-04-01
I'm not too worried about performance.
Assuming there are no gaps in ReportDate (per Account).
select Account
,avg(case when year_index = 1 then A_Count end) as YR1_A_Avg
,avg(case when year_index = 1 then B_Count end) as YR1_B_Avg
,avg(case when year_index = 2 then A_Count end) as YR2_A_Avg
,avg(case when year_index = 2 then B_Count end) as YR2_B_Avg
,min(ReportDate) as FirstDate
,max(ReportDate) as LastDate
from
(
select *
,count(*) over(partition by Account) as cnt
,(row_number() over(partition by Account order by ReportDate)-1)/12 +1 as year_index
from Volume
) t
where cnt >= 24 and year_index <= 2
group by Account

Add a counting condition into dense_rank window Function SQL

I have a function that counts how many times you've visited and if you have converted or not.
What I'd like is for the dense_rank to re-start the count, if there has been a conversion:
SELECT
uid,
channel,
time,
conversion,
dense_rank() OVER (PARTITION BY uid ORDER BY time asc) as visit_order
FROM table
current table output:
this customer (uid) had a conversion at visit 18 and now I would want the visit_order count from dense_rank to restart at 0 for the same customer until it hits the next conversion that is non-null.
See this (I do not like "try this" πŸ˜‰):
SELECT
id,
ts,
conversion,
-- SC,
ROW_NUMBER() OVER (PARTITION BY id,SC) R
FROM (
SELECT
id,
ts,
conversion,
-- COUNT(conversion) OVER (PARTITION BY id, conversion=0 ORDER BY ts ) CC,
SUM(CASE WHEN conversion=1 THEN 1000 ELSE 1 END) OVER (PARTITION BY id ORDER BY ts ) - SUM(CASE WHEN conversion=1 THEN 1000 ELSE 1 END) OVER (PARTITION BY id ORDER BY ts )%1000 SC
FROM sample
ORDER BY ts
) x
ORDER BY ts;
DBFIDDLE
output:
id
ts
conversion
R
1
2022-01-15 10:00:00
0
1
1
2022-01-16 10:00:00
0
2
1
2022-01-17 10:00:00
0
3
1
2022-01-18 10:00:00
1
1
1
2022-01-19 10:00:00
0
2
1
2022-01-20 10:00:00
0
3
1
2022-01-21 10:00:00
0
4
1
2022-01-22 10:00:00
0
5
1
2022-01-23 10:00:00
0
6
1
2022-01-24 10:00:00
0
7
1
2022-01-25 10:00:00
1
1
1
2022-01-26 10:00:00
0
2
1
2022-01-27 10:00:00
0
3

MsSql Compare specific datetimes in sequence based on ID

I have a table where we store our data from a call and it looks like this:
CallID Arrive_Seq DateTime ActivitytypeID
1 1 2018-01-01 05:00:00 1
1 2 2018-01-01 05:00:01 2
1 3 2018-01-01 06:00:00 21
1 4 2018-01-01 06:00:01 28
1 5 2018-01-01 06:00:02 13
1 6 2018-01-01 06:00:03 22
1 7 2018-01-01 06:00:05 29
1 8 2018-01-01 06:05:00 21
1 9 2018-01-01 06:05:01 28
1 10 2018-01-01 06:05:02 13
1 11 2018-01-01 06:05:03 22
1 12 2018-01-01 06:07:45 29
Now I want to select the datediff between ActivitytypeID 21 and 29 in the arrive_sew order. In this example they occur twice (on arrive_seq 3,8 and 7,12). This order is not specific and ActivitytypeID can occur both more and less times in the sequence but they are always connected with eachother. Think of it as ActivitytypeID 21 = 'call started' AND ActivitytypeID = 29 'Call ended'.
In the example the answer whould be:
SELECT DATEDIFF (SECOND, '2018-01-01 06:00:00', '2018-01-01 06:00:05') = 5 -- Compares datetime of arrive_seq 3 and 7
AND
SELECT DATEDIFF (SECOND, '2018-01-01 06:00:05', '2018-01-01 06:07:45') = 460 -- Compares datetime of arrive_seq 21 and 29
Total duration = 465
I have tried with this code but it doesn't work all the time due to row# can change based on arrive_seq and ActivitytypeID
;WITH CallbackDuration AS (
SELECT ROW_NUMBER() OVER(ORDER BY a.time_stamp ASC) AS RowNumber, DATEDIFF(second, a.time_stamp, b.time_stamp) AS 'Duration'
FROM Table a
JOIN Table b on a.call_id = b.call_id
WHERE a.call_id = 1 AND a.activity_type = 21 AND b.activity_type = 29
GROUP BY a.time_stamp, b.time_stamp,a.call_id)
SELECT SUM(Duration) AS 'Duration' FROM CallbackDuration WHERE RowNumber in (1,5,9)
I think this is what you want:
select
call_start,
call_end,
datediff (second, call_start, call_end) as duration
from
(
select
call_timestamp as call_end,
lag(call_timestamp) over (partition by call_id order by call_timestamp) as call_start,
activity_type as call_end_activity,
lag (activity_type) over (partition by call_id order by call_timestamp) as call_start_activity
from
call_log
where
activity_type in (21, 29)
) x
where
call_start_activity = 21;
Result:
call_start call_end duration
----------------------- ----------------------- -----------
2018-01-01 06:00:00.000 2018-01-01 06:00:05.000 5
2018-01-01 06:05:00.000 2018-01-01 06:07:45.000 165
(2 rows affected)
Note that the time of the second call is based on your sample data with start time 2018-01-01 06:05:00
This query seems to return your expected result
declare #x int = 21
declare #y int = 29
;with cte(CallID, Arrive_Seq, DateTime, ActivitytypeID) as (
select
a, b, cast(c as datetime), d
from (values
(1,1,'2018-01-01 05:00:00',1)
,(1,2,'2018-01-01 05:00:01',2)
,(1,3,'2018-01-01 06:00:00',21)
,(1,4,'2018-01-01 06:00:01',28)
,(1,5,'2018-01-01 06:00:02',13)
,(1,6,'2018-01-01 06:00:03',22)
,(1,7,'2018-01-01 06:00:05',29)
,(1,8,'2018-01-01 06:05:00',21)
,(1,9,'2018-01-01 06:05:01',28)
,(1,10,'2018-01-01 06:05:02',13)
,(1,11,'2018-01-01 06:05:03',22)
,(1,12,'2018-01-01 06:07:45',29)
) t(a,b,c,d)
)
select
sum(ss)
from (
select
*, ss = datediff(ss, DateTime, lead(datetime) over (order by Arrive_Seq))
, rn = row_number() over (order by Arrive_Seq)
from
cte
where
ActivitytypeID in (#x, #y)
) t
where
rn % 2 = 1

How to calculate time duration using Microsoft SQL?

I want to find the time duration for each person from one start time. I want to calculate the time duration from 1 start time for each day and multiple end times for multiple users. This is my code:
SELECT *,
CAST(DATEDIFF(n, CAST(End_Time AS datetime),
CAST(Start_Time AS datetime)) AS FLOAT) / 60 AS Time_Duration
FROM
( SELECT NAME,
MAX(CASE WHEN DESCRIPTION = 'Green' THEN Final_Value END) AS Start_Time,
MAX(CASE WHEN DESCRIPTION = 'Red' THEN Final_Value END) AS End_Time
FROM mydata
WHERE NAME != β€˜NA’
GROUP BY NAME
) C
I am not able to get any results for time duration.
This is what my output looks like:
Name Start_time End_time Time_Duration
1 Day_1 5/6/15 2:30
2 John 5/6/15 3:30
3 Ben 5/6/15 4:30
4 Mike 5/6/15 5:30
5 Day_2 5/7/15 2:30
6 John_2 5/7/15 4:30
7 Ben_2 5/7/15 5:30
8 Mike_2 5/7/15 6:30
I want it to look like this:
Name Start_time End_time Time_Duration
1 Day_1 5/6/15 2:30
2 John 5/6/15 3:30 1.00
3 Ben 5/6/15 4:30 2.00
4 Mike 5/6/15 5:30 3.00
5 Day_2 5/7/15 2:30
6 John_2 5/7/15 4:30 2.00
7 Ben_2 5/7/15 5:30 3.00
8 Mike_2 5/7/15 6:30 4.00
Assuming that the values in name column has suffix of the day number (and none for day 1)
WITH td AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [day] ORDER BY final_value) rnum
FROM (SELECT *,
CASE WHEN CHARINDEX('_', name) = 0
THEN '1'
ELSE SUBSTRING(name, CHARINDEX('_', name) + 1, LEN(name) - CHARINDEX('_', name))
END [day]
FROM t_dur
) tt
)
SELECT t1.name,
CASE WHEN rnum = 1 THEN t1.final_value END start_time,
CASE WHEN rnum <> 1 THEN t1.final_value END end_time,
CASE CAST(DATEDIFF(hour, (SELECT t2.final_value FROM td t2 WHERE t2.[day] = t1.[day] AND t2.rnum = 1),
t1.final_value) AS DECIMAl(5,2))
WHEN 0 THEN NULL
ELSE CAST(DATEDIFF(hour, (SELECT t2.final_value FROM td t2 WHERE t2.[day] = t1.[day] AND t2.rnum = 1),
t1.final_value) AS DECIMAl(5,2))
END time_duration
FROM td t1
Result
name start_time end_time time_duration
Day_1 2015-05-06 02:30:00.000 NULL NULL
John NULL 2015-05-06 03:30:00.000 1.00
Ben NULL 2015-05-06 04:30:00.000 2.00
Mike NULL 2015-05-06 05:30:00.000 3.00
Day_2 2015-05-07 02:30:00.000 NULL NULL
John_2 NULL 2015-05-07 04:30:00.000 2.00
Ben_2 NULL 2015-05-07 05:30:00.000 3.00
Mike_2 NULL 2015-05-07 06:30:00.000 4.00