Looping Through Dates in SQL? (Databricks) - sql

I am currently learning SQL and ran into a problem. Through my searches I have found that looping in SQL is a big no-no, so I was wondering if anyone could point me into the correct direction?
The dataframe looks like this:
Group
ATP Date
JTH Date
A
5/17/2022
6/17/2022
A
5/17/2022
Null
B
5/17/2022
Null
A
5/16/2022
6/16/2022
B
5/16/2022
6/16/2022
B
5/15/2022
6/17/2022
B
5/15/2022
Null
A
5/14/2022
6/1/2022
A
5/13/2022
Null
A
5/13/2022
6/1/2022
A
5/13/2022
6/5/2022
I am trying to make a query to pull this:
Date
Group
CountNo
CountYes
Ratio (No/Yes)
5/17/2022
A
1
1
1
5/17/2022
B
0
1
0
5/16/2022
A
1
0
Null
5/16/2022
B
2
1
2
5/14/2022
A
1
0
Null
5/13/2022
A
2
0
Null
This is what I currently created:
max(ATP_Date) as Date,
Group,
sum(
case
when ATP_Date < '2022-05-18'
and JTH_Date > '2022-05-18' then 1
else 0
END
) as CountNo,
sum(
case
when ATP_Date < '2022-05-18'
and JTH_Date IS null then 1
else 0
END
) as CountYes,
sum(
case
when ATP_date < '2022-05-18'
and JTH_Date > '2022-05-18' then 1
else 0
END
) / sum(
case
when ATP_Date < '2022-05-18'
and JTH_Date IS null then 1
else 0
END
) as ratio
from
dataframe
where group = "A"
GROUP BY group
Which outputs this:
Date
Group
CountNo
CountYes
Ratio
5/17/2022
A
318
1064
0.3
This is what I want, but I need to do it for each date for ~ last 4 years, so it looks like the second table posted. I could manually edit the dates for each query, but that would take forever. This made me think of looping. I believe I would basically need to loop through the Select portion with dates, in order to get the output I want. If anyone has advice or could point in me in the correction direction, it would be greatly appreciated, thanks.

I couldn't write the query as a comment, so posting it here. If it's not what you are expecting, let me know will delete this.
Assuming 2022-05-18 is constant (as per your sample)
Creating sample table
create or replace table dt_query
(group string, atp_date date, jth_date date);
insert into dt_query values
('A','2022-05-17','2022-06-17')
,('A','2022-05-17',NULL)
,('B','2022-05-17',NULL)
,('A','2022-05-16','2022-06-16')
,('B','2022-05-16','2022-06-16')
,('B','2022-05-16','2022-06-15')
,('B','2022-05-16',NULL)
Slightly modified your select statement
select
max(ATP_Date) as Date,
Group,
sum(
case
when ATP_Date < '2022-05-18'
and JTH_Date > '2022-05-18' then 1
else 0
END
) as CountNo,
sum(
case
when ATP_Date < '2022-05-18'
and JTH_Date IS null then 1
else 0
END
) as CountYes,
sum(
case
when ATP_date < '2022-05-18'
and JTH_Date > '2022-05-18' then 1
else 0
END
) / sum(
case
when ATP_Date < '2022-05-18'
and JTH_Date IS null then 1
else 0
END
) as ratio
from
dt_query
GROUP BY group,atp_date
The result matches with what you are expecting.

To avoid using loops here you can include atp_date into the group by clause which will result in 1 row for each combination of that date plus the "grouping" column
It isn't clear why you compare to '2022-05-18' but this appears to be the day following the maximum date found in atp_date. So to avoid hardcoding, you could approach it by using a derived table of 1 row, cross joined to the data:
SELECT
ATP_Date
, grouping
, sum(CASE
WHEN ATP_Date < cj.max_dt AND JTH_Date > cj.max_dt
THEN 1
ELSE 0
END) AS CountNo
, sum(CASE
WHEN ATP_Date < cj.max_dt AND JTH_Date IS NULL
THEN 1
ELSE 0
END) AS CountYes
, sum(CASE
WHEN ATP_date < cj.max_dt AND JTH_Date > cj.max_dt
THEN 1
ELSE 0
END)
/ sum(CASE
WHEN ATP_Date < cj.max_dt AND JTH_Date IS NULL
THEN 1
ELSE NULL
END) AS ratio
FROM dt_query
CROSS JOIN (select max(atp_date) + interval '1 day' max_dt from dt_query) AS cj
GROUP BY
grouping
, atp_date
ORDER BY
atp_date DESC
, grouping
atp_date | grouping | countno | countyes | ratio
:--------- | :------- | ------: | -------: | ----:
2022-05-17 | A | 1 | 1 | 1
2022-05-17 | B | 0 | 1 | 0
2022-05-16 | A | 1 | 0 | null
2022-05-16 | B | 2 | 1 | 2
2022-05-14 | A | 1 | 0 | null
2022-05-13 | A | 2 | 1 | 2
db<>fiddle here
nb: to avoid issues with the term "group" I have used the column name "grouping" instead, and the example sql is written in postgres so there may be some syntax that needs alteration (e.g. the addition of 1 day). Also note that the ratio calcuation can result in a divide by zero error so instead of zero I used NULL.

Related

How to achieve the bucket values in SQL?

I have schedule table like this (table name = testSch)
ID Amount scheduleDate
1 7230.00 2018-07-13
1 7272.00 2018-07-27
1 7314.00 2018-08-10
1 7356.00 2018-08-24
1 7398.00 2018-09-07
1 7441.00 2018-09-21
1 7439.00 2018-10-08
1 7526.00 2018-10-22
1 7570.00 2018-11-05
1 7613.00 2018-11-19
1 5756.00 2018-12-03
I need to sum the Amount field based on specific bucket values as shown below
Principal_7To30_Days
Principal_1To3_Months
Principal_3To6_Months
Principal_6To12_Months
Principal_1To3_Years
by giving an input date
And my input date is 2018-07-09 and below is my query;
;with cteSchedule as (
select *,DATEDIFF(DAY,'20180709',scheduleDate) as datedifference,
DATEDIFF(MONTH,'20180709',scheduleDate) as monthdifference from testSch)
select ISNULL((SELECT SUM(cteSchedule.Amount)
FROM cteSchedule
WHERE cteSchedule.datedifference <7),0) AS Principal_0To7_Days,
ISNULL((SELECT SUM(cteSchedule.Amount)
FROM cteSchedule
WHERE cteSchedule.datedifference>=7 and cteSchedule.datedifference<30),0)
AS Principal_7To30_Days,
ISNULL((SELECT SUM(cteSchedule.Amount)
FROM cteSchedule
WHERE cteSchedule.datedifference>=30 and cteSchedule.datedifference<90),0) AS Principal_1To3_Months,
ISNULL((SELECT SUM(cteSchedule.Amount)
FROM cteSchedule
WHERE cteSchedule.datedifference>=90 and cteSchedule.datedifference<180),0) AS Principal_3To6_Months,
ISNULL((SELECT SUM(cteSchedule.Amount)
FROM cteSchedule
WHERE cteSchedule.datedifference>=180 and cteSchedule.datedifference<365),0) AS Principal_6To12_Months
And below is my output
Principal_0To7_Days Principal_7To30_Days Principal_1To3_Months Principal_3To6_Months Principal_6To12_Months
7230.00 7272.00 29509.00 35904.00 0.00
But the correct output should be
Principal_0To7_Days Principal_7To30_Days Principal_1To3_Months Principal_3To6_Months Principal_6To12_Months
7230.00 7272.00 36948.00 28465.00 0.00
So the problem is i'm getting wrong values for Principal_1To3_Months and Principal_3To6_Months, When I asked my client how do they calculate this in their legacy system, they replied that they calculate using +-months by adding number of months and not days. So if today is 2018-07-09 + 3 months we will get 2018-10-09.
So I used the month difference in my cte query as below
DATEDIFF(MONTH,'20180709',scheduleDate) as monthdifference
And use this in my overall query as below
ISNULL((SELECT SUM(cteSchedule.Amount)
FROM cteSchedule
WHERE cteSchedule.monthdifference>=1 and cteSchedule.monthdifference<=3),0) AS Principal_1To3_Months
But this time also I get the same values as mentioned in my very first output.
Can someone please point out where is my mistake and how to achieve this values as mentioned in correct output
I wouldn't use DATEDIFF to calculation day or month difference days, because there is some month have 31 days, others month have 30 days.
Therefore, the calculated difference days are not accurate.
I would use DATEADD instead of DATEDIFF to do the condition.
;with cteSchedule as (
select *,'20180709' compareDay
from testSch
)
SELECT Sum(CASE
WHEN t.scheduleDate < DATEADD(day, 7, compareDay)
THEN t.amount
ELSE 0
END) AS Principal_0To7_Days,
Sum(CASE
WHEN t.scheduleDate >=DATEADD(day, 7, compareDay) AND t.scheduleDate < DATEADD(day, 30, compareDay)
THEN t.amount
ELSE 0
END) AS Principal_7To30_Days,
Sum(CASE
WHEN t.scheduleDate >=DATEADD(month,1,compareDay) AND t.scheduleDate < DATEADD(month,3,compareDay)
THEN t.amount
ELSE 0
END) AS Principal_1To3_Months,
Sum(CASE
WHEN t.scheduleDate >=DATEADD(month,3,compareDay) AND t.scheduleDate < DATEADD(month,6,compareDay)
THEN t.amount
ELSE 0
END) AS Principal_3To6_Months,
Sum(CASE
WHEN t.scheduleDate >=DATEADD(month,6,compareDay) AND t.scheduleDate < DATEADD(month,12,compareDay)
THEN t.amount
ELSE 0
END) AS Principal_6To12_Months
from cteSchedule t
SQLFiddle
[Results]:
| Principal_0To7_Days | Principal_7To30_Days | Principal_1To3_Months | Principal_3To6_Months | Principal_6To12_Months |
|---------------------|----------------------|-----------------------|-----------------------|------------------------|
| 7230 | 7272 | 36948 | 28465 | 0 |
Note
You can use CASE WHEN with SUM
Aggregate function instead of select subquery, the performance will be better.

Average call volumes between times , total for months

Table Calls c
CALL_ID | CALL_START | CONTACT_ID | CALL_TYPE
--------------------------------------------------
2 | 25/11/2010 11:28:16 | 850 | I
3 | 25/11/2010 11:28:57 | 850 | I
5 | 29/11/2010 10:18:44 | 848 | I
Table Contacts ct
CONTACT_ID | COMPANY_ID | RECORD_STATUS
-----------------------------------
1 | 1 | A
19 | 2 | A
20 | 3 | A
21 | 4 | A
22 | 5 | A
I want to extract the number of Incoming and Outgoing calls per month
For one particular Company_id and for all the others.
I also need to have the average number of calls between 0700 and 1500 / 1500 and 2200 / 2200 0700
All starts from October 2016 and I have a list of operator's ID to include...
select
extract(year from c.call_start) as MyYear,
extract(month from c.call_start) as MyMonth,
count(case when call_type='I' and ct.Company_id not like 391 then 1 else null end) as Incoming_Main,
count(case when call_type='O' and ct.Company_id not like 391 then 1 else null end) as Outgoing_Main,
count(case when call_type='I' and ct.Company_id = 391 then 1 else null end) as Incoming_SMG,
count(case when call_type='O' and ct.Company_id = 391 then 1 else null end) as Outgoing_SMG,
SUM(CASE WHEN CAST(c.CALL_START AS TIME) BETWEEN '07:00:00' and '15:00:00' and ct.company_id=391 THEN 1 ELSE 0 END)/31 Am391,
SUM(CASE WHEN CAST(c.CALL_START AS TIME) BETWEEN '07:00:00' and '15:00:00' and ct.company_id not like 391 ELSE 0 END)/31 AmMAIN,
SUM(CASE WHEN CAST(c.CALL_START AS TIME) BETWEEN '15:00:00' and '22:00:00' and ct.company_id =391 THEN 1 ELSE 0 END)/31 PM391,
SUM(CASE WHEN CAST(c.CALL_START AS TIME) BETWEEN '15:00:00' and '22:00:00' and ct.company_id=391 THEN 1 ELSE 0 END)/31 PmMAIN,
SUM(CASE WHEN CAST(c.CALL_START AS TIME) BETWEEN '07:00:00' and '15:00:00' and ct.company_id=391 THEN 1 ELSE 0 END)/31 Am391,
SUM(CASE WHEN CAST(c.CALL_START AS TIME) BETWEEN '07:00:00' and '15:00:00' and ct.company_id=391 THEN 1 ELSE 0 END)/31 Am391,
from
CALLS c
left outer join CONTACTS ct on c.CONTACT_ID= ct.CONTACT_ID
where ct.RECORD_STATUS='A'
and c.OPERATOR_ID in (1,19,22)
and c.call_start >='2016/10/01

Is it possible to return fields conditionally in SQL?

I'd like to build a query that returns if tasks are very late/late/near on time/on time.
Task status :
early if -2 days
near on time if -1 day
late if 1 day
vary late if 2 days
What i've tried :
SELECT field_1, diff,
COUNT(CASE WHEN diff <= -2 THEN 1 END) onTime,
COUNT(CASE WHEN diff <= -1 THEN 1 END) nearOnTime,
Count(CASE WHEN diff >= 2 THEN 1 END) veryLate,
Count(CASE WHEN diff >= 0 THEN 1 END) Late
FROM(
SELECT field_1, DATEDIFF(day,Max(predicted_date), realization_date) as diff
FROM table
Group by field_1, realization_date
HAVING end_date is not null) as req1
GROUP BY field_1, diff)
diff : difference between a predicated date and a realization date
=> returns the number of day between these two dates
It returns :
field_1 | diff | onTime | nearOnTime | veryLate | Late
---------+--------+----------+--------------+------------+-------
task1 | -3 | 1 | 1 | 0 | 0
task2 | 2 | 0 | 0 | 1 | 1
I think my approach is bad, so what is or are my options to returns task status?
maybe something along these lines.. ( a fiddle would help - this has not been tested)
SELECT field_1, diff,
CASE WHEN diff <= -2 THEN 'On Time',
WHEN diff <= -1 THEN 'nearOnTime',
WHEN diff >= 2 THEN 'veryLate',
WHEN diff >= 0 THEN 'Late'
else 'OK' END as status
FROM(
SELECT field_1, DATEDIFF(day,Max(predicted_date), realization_date) as diff
FROM table
Group by field_1, realization_date
HAVING end_date is not null) as req1
GROUP BY field_1, diff)

How to count sql from one column, and display it in two column

I have a table like this:
idrecord | date
----------------------------------------------
INC-20140308102029 | 2014-03-08 00:00:00.000
INC-20140308102840 | 2014-03-06 00:00:00.000
INC-20140310164404 | 2014-03-10 00:00:00.000
INC-20140311075714 | 2014-03-09 00:00:00.000
NRM-20140310130512 | 2014-04-02 00:00:00.000
NRM-20140311134720 | 2014-03-11 00:00:00.000
USF-20140317212232 | 2014-03-17 00:00:00.000
USF-20140321075402 | 2014-03-18 00:00:00.000
USF-20140321083137 | 2014-03-21 00:00:00.000
how to count this table and display result like this:
month | INC | NRM | USF
march | 4 | 1 | 3
April | 0 | 1 | 0
Thank you
You'd use case to count 1 or zero depending on the string matching or not. Use sum to count.
select
extract(month from thedate) as whichmonth,
sum( case when idrecord like 'INC%' then 1 else 0 end) as inc,
sum( case when idrecord like 'NRM%' then 1 else 0 end) as nrm,
sum( case when idrecord like 'USF%' then 1 else 0 end) as usf
from mytable
group by extract(month from thedate);
The function to extract the month from the date may vary from dbms to dbms. Look the appropriate function up in Google, if extract doesn't work for you.
Don't use the name date for a column. Date is a reserved word in SQL.
Try this
SELECT convert(char(3), date, 0) AS Month,
SUM(Case when LEFT(idrecord,3) = 'INC' then 1 else 0 end) as 'INC',
SUM(Case when LEFT(idrecord,3) = 'NRM' then 1 else 0 end) as 'NRM',
SUM(Case when LEFT(idrecord,3) = 'USF' then 1 else 0 end) as 'USF'
FROM Table1
Group By convert(char(3), date, 0)
Fiddle Demo
or:
SELECT datename(mm, date) AS Month,
SUM(Case when LEFT(idrecord,3) = 'INC' then 1 else 0 end) as 'INC',
SUM(Case when LEFT(idrecord,3) = 'NRM' then 1 else 0 end) as 'NRM',
SUM(Case when LEFT(idrecord,3) = 'USF' then 1 else 0 end) as 'USF'
FROM Table1
Group By datename(mm, date)
Fiddle Demo
Output:
month | INC | NRM | USF
march | 4 | 1 | 3
April | 0 | 1 | 0
try this one
select month (date) as month,
count( case when idrecord like 'INC%' then 1 else 0 end) as inc,
count( case when idrecord like 'NRM%' then 1 else 0 end) as nrm,
count( case when idrecord like 'USF%' then 1 else 0 end) as usf
from table
group by month;

How to separate positive and negative numbers into their own columns?

I have a table with the following columns and data:
activity_dt | activity_amt
2009-01-01 | -500
2009-01-01 | 750
Can I write a query that looks at the sign of activity_amt and puts it in the credits column if it's positive, and the debits column if it's negative? (I'm using Sybase)
activity_dt | debits | credits
2009-01-01 | -500 | 750
select activity_dt,
sum(case when activity_amt < 0 then activity_amt else 0 end) as debits,
sum(case when activity_amt > 0 then activity_amt else 0 end) as credits
from the_table
group by activity_dt
order by activity_dt
I'm not sure about the exact syntax in Sybase, but you should be able to group on the date and sum up the positive and negative values:
select
activity_dt,
sum(case when activity_amt < 0 then activity_amt else 0 end) as debits,
sum(case when activity_amt >= 0 then activity_amt else 0 end) as credits
from
theTable
group by
activity_dt
I found a new answer to this problem using the DECODE function.
I hope this turns out to be useful for everyone.
select activity_dt,
sum((DECODE(activity_amt /-ABS(activity_amt), 1, activity_amt, 0))) as credits,
sum((DECODE(activity_amt /-ABS(activity_amt), -1, activity_amt, 0))) as debits
from the_table
group by activity_dt
order by activity_dt;
select (select JV_GroupsHead.GroupTitle
from JV_GroupsHead
whereJV_GroupsHead.Id=jv.GroupId) as 'GroupName'
,jv.Revenue
,jv.AccountNo
,jv.AccountNoTitle
,(case when jv.Revenue < 0 then jv.Revenue else 0 end) as 'debits'
,(case when jv.Revenue> 0 then jv.Revenue else 0 end) as 'credits'
from JVFunction1('2010-07-08','2010-08-08') as jv