Counting Consecutive Weeks based on separate numeric column - sql

I'm working on a problem where the employees get a certain score each week. They will only have 1 score each week, being saved each Saturday. I want to count the number of consecutive weeks (working backwards from today) that they are above 50. If the previous week is not above 50 then they would have 0 consecutive weeks. If they've had a score above 50 each week for the past year, then they would have 52 consecutive weeks.
I've tried using the Row_Number() function to get this, but can't figure out how to incorporate the score as a factor in that.
This is an example of the data set:
EmpID Last Week Score
A 7/6/2019 60
A 6/29/2019 84
A 6/22/2019 21
B 7/6/2019 41
B 6/29/2019 92
C 7/6/2019 77
C 6/29/2019 55
C 6/22/2019 71
C 6/15/2019 63
This is what I've tried so far
SELECT
EmpID,
EOW,
SCORE,
ROW_NUMBER() OVER(PARTITION BY EMP ORDER BY EOW DESC) AS RN
FROM a
ORDER BY EmpID, EOW DESC
But that only gives me a row count of each employee. I need the count to stop when their score is below 50 as below:
EmpID Last Week Score RN
A 7/6/2019 60 1
A 6/29/2019 84 2
A 6/22/2019 21 -
B 7/6/2019 41 -
B 6/29/2019 92 -
C 7/6/2019 77 1
C 6/29/2019 55 2
C 6/22/2019 71 3
C 6/15/2019 63 4
I then need to get a single number of the consecutive weeks for each employee so that I can join the results to a larger query that pulls additional info about the employee. The scores are in a different table which is why I have to join it. The query should produce a result like:
EmpID Last Week Consecutive Week
A 7/6/2019 2
B 7/6/2019 0
C 7/6/2019 4
Does this make sense? Any help would be appreciated

I used conditional aggregation and running total.
The basic idea is:
If the number >= 50, the derived column will sum 0.
The consecutive zero(s) will stop at the first <50 value.
Then count the number of zeros.
I added the special case [group D]:
('D','7/6/2019' , 51 )
('D','6/29/2019' , 49)
('D','6/22/2019' ,52 )
There will be one single zero in this case.
If there is only one zero, I think the consecutive weeks should be zero instead of one.
I added [group D] into the test sample.
Try this:
SELECT B.EmpID,B.[Last Week], CASE WHEN B.TOTAL <= 1 THEN 0 ELSE B.TOTAL END AS RN
FROM (
SELECT A.EmpID, MAX(EOW) AS [Last Week], SUM(CASE WHEN A.COUNT1 = 0 THEN 1 ELSE 0 END) AS TOTAL
FROM
(
SELECT EMPID,EOW, Score
, SUM(CASE WHEN SCORE >= 50 THEN 0 ELSE 1 END) OVER (PARTITION BY EMPID ORDER BY EOW DESC) AS COUNT1
FROM TEST
GROUP BY EMPID,EOW,Score
)A
GROUP BY A.EmpID
)B
Test Result:
DB<>Fiddle

Related

Frequency of Address changes in number of days SQL

Hi I'm trying to find out how frequently a business would change their address. I've got two tables one with trading address and the other with office address. The complicated part is one id will have several sequence numbers. I need to find out the difference between one address's create date and another address create date.
Trading address table
ID
Create_date
Seq_no
Address
1
2002-03-23
1
20 bottle way
1
2002-05-23
2
12 sunset blvd
2
2003-01-14
1
76 moonrise ct
Office address table
ID
Create_date
Seq_no
Address
1
2004-02-13
1
12 paper st
2
2005-03-01
1
30 pencil way
2
2005-04-01
2
25 mouse rd
2
2005-08-01
3
89 glass cct
My result set will be
Difference
NumberOfID's
30 days
1
60 days
1
120 days
1
Other
2
I think I solved it. Steps are
I did an union and created a separate column to find out actual
sequence no for the union set.
Used LEAD function to create a separate column of to bring up the date.
Date difference to find out the actual difference between id's
Case statement to categorize the days and counting the id's
WITH BASE AS (
SELECT ID,SEQ_NO,CREATE_DATE
FROM TradingAddress
UNION ALL
SELECT ID,SEQ_NO,CREATE_DATE
FROM OfficeAddress
),
WORKINGS AS (
SELECT ID,CREATE_DATE,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY CREATE_DATE ASC) AS SNO,
LEAD(CREATE_DATE) OVER (PARTITION BY ID ORDER BY CREATE_DATE) AS REF_DATE,
DATEDIFF(DAY,CREATE_DATE,LEAD(CREATE_DATE) OVER (PARTITION BY ID ORDER BY CREATE_DATE)) AS DATE_DIFFERENCE
FROM BASE
),
WORKINGS_2 AS (
SELECT *,
CASE WHEN DATE_DIFFERENCE BETWEEN 1 AND 30 THEN '1-30 DAYS'
WHEN DATE_DIFFERENCE BETWEEN 31 AND 60 THEN '31-60 DAYS'
WHEN DATE_DIFFERENCE BETWEEN 61 AND 90 THEN '61-90 DAYS'
WHEN DATE_DIFFERENCE BETWEEN 91 AND 120 THEN '91-120 DAYS'ELSE 'MORE THAN 120 DAYS'
END AS DIFFERENCE_DAYS
FROM WORKINGS
WHERE REF_DATE IS NOT NULL
)
SELECT DIFFERENCE_DAYS,COUNT(DIFFERENCE_DAYS) AS NUMBEROFIDS
FROM WORKINGS_2
GROUP BY DIFFERENCE_DAYS
you can do this in this way
SELECT DATEDIFF(day,t1.create_date,t2.create_date) AS 'yourdats', Count (*) as ids FROM test1 t1 join test2 t2 on t1.id = t2.id GROUP BY DATEDIFF(day,t1.create_date,t2.create_date)

rolling sum to calculate YTD for each month group by product and save to separate columns using SQL

I have a data like this:
Order_No Product Month Qty
3001 r33 1 8
3002 r34 1 11
3003 r33 1 17
3004 r33 2 3
3005 r34 2 11
3006 r34 3 1
3007 r33 3 -10
3008 r33 3 18
I'd like to calculate total YTD qty for product and each month and save to separate columns. Below is what I want
Product Qty_sum_jan Qty_sum_feb Qty_sum_mar
r33 25 28 36
r34 11 22 23
I know how to use window function to calculate rolling sums but I have no idea to group them to separate columns. I currently use something like this:
case when Month = 1 then sum(Qty) over(partition by Product order by Month) else 0 end as Qty_sum_jan,
case when Month <=2 then sum(Qty) over(partition by Product order by Month) else 0 end as Qty_sum_feb,
case when Month <=3 then sum(Qty) over(partition by Product order by Month) else 0 end as Qty_sum_mar,
This will get me rolling sum by order but how to get to product level like what I show above? If I use group by then it will throw an error since Month is not in group by clause. I also cannot just use max to get the last value since qty can be negative so the last value may not be maximum. I use sparkSQL by the way
To my understanding, there is no need to use window functions. The following query achieves your desired output:
select
product,
sum(case when month = 1 then qty else 0 end) as sum_qty_jan,
sum(case when month <= 2 then qty else 0 end) as sum_qty_feb,
sum(case when month <= 3 then qty else 0 end) as sum_qty_mar
from your_table
group by 1;
Output:
product
sum_qty_jan
sum_qty_feb
sum_qty_mar
r33
25
28
36
r34
11
22
23

SQL How to calculate Average time between Order Purchases? (do sql calculations based on next and previous row)

I have a simple table that contains the customer email, their order count (so if this is their 1st order, 3rd, 5th, etc), the date that order was created, the value of that order, and the total order count for that customer.
Here is what my table looks like
Email Order Date Value Total
r2n1w#gmail.com 1 12/1/2016 85 5
r2n1w#gmail.com 2 2/6/2017 125 5
r2n1w#gmail.com 3 2/17/2017 75 5
r2n1w#gmail.com 4 3/2/2017 65 5
r2n1w#gmail.com 5 3/20/2017 130 5
ation#gmail.com 1 2/12/2018 150 1
ylove#gmail.com 1 6/15/2018 36 3
ylove#gmail.com 2 7/16/2018 41 3
ylove#gmail.com 3 1/21/2019 140 3
keria#gmail.com 1 8/10/2018 54 2
keria#gmail.com 2 11/16/2018 65 2
What I want to do is calculate the time average between purchase for each customer. So lets take customer ylove. First purchase is on 6/15/18. Next one is 7/16/18, so thats 31 days, and next purchase is on 1/21/2019, so that is 189 days. Average purchase time between orders would be 110 days.
But I have no idea how to make SQL look at the next row and calculate based on that, but then restart when it reaches a new customer.
Here is my query to get that table:
SELECT
F.CustomerEmail
,F.OrderCountBase
,F.Date_Created
,F.Total
,F.TotalOrdersBase
FROM #FullBase F
ORDER BY f.CustomerEmail
If anyone can give me some suggestions, that would be greatly appreciated.
And then maybe I can calculate value differences (in percentage). So for example, ylove spent $36 on their first order, $41 on their second which is a 13% increase. Then their second order was $140 which is a 341% increase. So on average, this customer increased their purchase order value by 177%. Unrelated to SQL, but is this the correct way of calculating a metric like this?
looking to your sample you clould try using the diff form min and max date divided by total
select email, datediff(day, min(Order_Date), max(Order_Date))/(total-1) as avg_days
from your_table
group by email
and for manage also the one order only
select email,
case when total-1 > 0 then
datediff(day, min(Order_Date), max(Order_Date))/(total-1)
else datediff(day, min(Order_Date), max(Order_Date)) end as avg_days
from your_table
group by email
The simplest formulation is:
select email,
datediff(day, min(Order_Date), max(Order_Date)) / nullif(total-1, 0) as avg_days
from t
group by email;
You can see this is the case. Consider three orders with od1, od2, and od3 as the order dates. The average is:
( (od2 - od1) + (od3 - od2) ) / 2
Check the arithmetic:
--> ( od2 - od1 + od3 - od2 ) / 2
--> ( od3 - od1 ) / 2
This pretty obviously generalizes to more orders.
Hence the max() minus min().

Snapshot Table Status Change

I am trying to write a sql query (in amazon redshift) that counts the number of times that customer goes from not meeting criteria to meeting criteria, so when a 1 occurs the date after a 0.
I'm stuggling to figure out the logic to do this
ID Snapshot_date Meets Criteria
55 1/1/2018 0
55 1/5/2018 1
55 1/10/2018 1
55 1/15/2018 1
55 1/20/2018 0
55 1/25/2018 1
Use lag to get the previous value,check for the conditions and count.
select id,count(*)
from (select id,snapshot_date
,lag(meets_critetria,1) over(partition by id order by snapshot_date) as prev_m_c
from tbl
) t
where prev_m_c = 0 and meets_criteria = 1
group by id

Query help, rank function maybe?

rank function? can anybody assist me with this. I'd like my query to only return the lowest date for the P0260 field, and the lowest date for the painting field.
JOB ID LINE ORDER RCVD USE DATE
P0260 61785 1 2400 24 10/26/2012
P0260 63462 3 2400 24 11/14/2012
P0260 66372 1 1 0 2/15/2013
P0260 66371 1 5 0 3/1/2013
PAINTING 12246 1 29 27 11/30/2006
PAINTING 30885 1 160 0 9/29/2009
Painting 30885 2 160 0 9/29/2009
PAINTING 31155 1 25 0 11/6/2009
Ok, without knowing wich RDBMS (and version) you are using, this solution should work on most of them:
SELECT A.*
FROM YourTable A
INNER JOIN (SELECT JOB, MIN([USE DATE]) MinUseDate
FROM YourTable
GROUP BY JOB) B
ON A.JOB = B.JOB AND A.[USE DATE] = B.MinUseDate
SELECT query2.PART_ID, query2.ID, query2.LINE_NO, query2.DEL_SCHED_LINE_NO, query2.SYSADM_PURC_LINE_DEL.ORDER_QTY, query2.RECEIVED_QTY, query2.DESIRED_RECV_DATE, query2.SYSADM_PURC_ORDER_LINE.ORDER_QTY, query2.TOTAL_RECEIVED_QTY, query2.[USE DATE]
FROM query2
INNER JOIN (SELECT query2.PART_ID, MIN(query2.[USE DATE]) MinUseDate
FROM query2
GROUP BY PART_ID) B
ON query2.PART_ID = B.PART_ID AND query2.[USE DATE] = B.MinUseDate;