How to get Maximum Difference between Dates in a row using SQL - sql

Date 1
Date 2
Date 3
Date 4
LineCount
Month_Gap
2020-01-01
2019-10-01
2019-09-06
1
2020-01-01
2019-10-01
2019-09-13
2019-09-06
2
0
2020-01-01
2019-10-01
2019-08-13
2019-09-06
2
1
If the LineCount is 1, then Month_Gap should be the maximum month difference between (Date1 & Date3) and (Date2 & Date3). Date3 will always be in between Date1 and Date2.
In this Case, the output should be the max month difference between (2020/01/01 - 2019/09/06) and (2019/10/01 - 2019/09/06), which is 3 months:
Date 1
Date 2
Date 3
Date 4
LineCount
Month_Gap
2020-01-01
2019-10-01
2019-09-06
1
3
2020-01-01
2019-10-01
2019-09-13
2019-09-06
2
0
2020-01-01
2019-10-01
2019-08-13
2019-09-06
2
1
I was trying something like this but not sure how to go about it -
CASE WHEN LineCount = 1 THEN MAX(DATE_DIFF(.....), which won't work I guess.

The pattern you should use is
SELECT TIMESTAMPDIFF("MONTH", LEAST(date1,date2,date3,date4), GREATEST(date1,date2,date3,date4)) as `maximum_difference`;
This will simply look through your columns, find the least and greatest, and return the result.

SELECT
CASE WHEN LineCount = 1 THEN GREATEST(DATE_DIFF('month', Date3, Date1),
DATE_DIFF('month', Date3, Date2)) END AS Month_Gap

Related

How can i create a new column count in SQL table where count=1 if hours column >=6 else count=0

I aim to first achieve this
id
employee
Datelog
TimeIn
TimeOut
Hours
Count
5
Two
2022-08-10
09:00:00
16:00:00
07:00:00
1
4
Two
2022-08-09
09:00:00
16:00:00
07:00:00
1
3
Two
2022-08-08
09:00:00
16:00:00
07:00:00
1
2
One
2022-08-05
09:00:00
16:00:00
07:00:00
1
1
Two
2022-08-04
09:00:00
10:00:00
01:00:00
0
and now my main objective here is to give a bonus of 2k to employees whose Totalcount per month >=3.
employee
Month
TotalCount
Bonus
Two
August
3
2000
One
August
1
0
Here's the answer using Postgres. It's pretty much generic other than extracting the month out of datelog that might have a slightly different syntax.
select employee
,max(date_part('month', datelog ))
,count(*)
,case when count(*) >= 3 then 2000 else 0 end as bonus
from t
where hours >= time '06:00:00'
group by employee
employee
max
count
bonus
Two
8
3
2000
One
8
1
0
Fiddle

Comparing dates from Multiple rows with the same IDs

I have the following table
ID FromDate ToDate
1 2020-01-01 2020-12-31
1 2021-01-01 2021-12-31
1 2022-03-01 2022-12-31
If the difference between "ToDate" from any row and FromDate in the subsequent row is less than
30 days then I should get 1 row with FromDate and the second ToDate.
Below is what I would expect to get:
ID FromDate ToDate
1 2020-01-01 2021-12-31
1 2022-03-01 2022-12-31
Any suggestions would be greatly appreciated

filter data based on month start and month end

Given a dataframe with date column in this format.
Date Group
2020-05-18 1
2020-06-22 1
2019-07-11 1
2018-03-01 1
2021-01-21 2
2021-05-05 2
2021-09-11 2
And two strings;
Start = 2020-05 (indicating month start)
End = 2021-09 (indicating month end)
I want to filter out the data so that only the dates that fall within the start and end date are available in the dataframe.
Expected output:
Date Group
2020-05-18 1
2020-06-22 1
2021-01-21 2
2021-05-05 2
2021-09-11 2
# Creating dummy data
d = {'dt':['2020-05-18',
'2020-06-22',
'2019-07-11',
'2018-03-01',
'2021-01-21',
'2021-05-05',
'2021-09-11'],
'group':[1,1,1,1,2,2,2]}
dt_df = pd.DataFrame(data=d)
dt_df
dt_df['dt'] = pd.to_datetime(dt_df['dt'])
dt_df
Inital Input:
0 2020-05-18
1 2020-06-22
2 2019-07-11
3 2018-03-01
4 2021-01-21
5 2021-05-05
6 2021-09-11
Name: dt, dtype: datetime64[ns]
Start = '2020-05'
End = '2021-09'
Start = pd.to_datetime(Start)
End = pd.to_datetime(End)
End = End+np.timedelta64(1, 'M')
Use loc to select only dates between Start and End timestamp.
dt_df.loc[(dt_df['dt'] - Start >= np.timedelta64(0,'D')) & (dt_df['dt'] - End <= np.timedelta64(0, 'D'))]
Output:
dt group
0 2020-05-18 1
1 2020-06-22 1
4 2021-01-21 2
5 2021-05-05 2
6 2021-09-11 2

Replacing values in a row based on conditions

I'm trying to fill down a column based on 2 conditions. In this case, whether the index (time series) falls between sunrise and sunset, in which case I want 1 in a new column called "sunlight'. Otherwise, I want the value to be zero. I'm new to pandas from excel so I'm trying to do this as I would there, probably wrongly.
df['sunlight'] = 0
mask1 = df.index > df['sunrise']
mask2 = df.index < df['sunset']
df[mask1 & mask2]
df.loc[df[mask1 & mask2],'sunlight'] = 1
df
enter image description here
Index
sunrise
sunset
Sunlight
08:18:00
08:19:17
15:56:43
0
08:19:00
08:19:17
15:56:43
0
08:20:00
08:19:17
15:56:43
1
08:21:00
08:19:17
15:56:43
1
08:22:00
08:19:17
15:56:43
1
Let`s look on a DataFrame with only on day of data with a frequency of one hour (not minutes) as an example.
df = pd.DataFrame({'sunrais':[pd.to_datetime('2020-01-01 08:19:17')]*24,
'sunset':[pd.to_datetime('2020-01-01 15:46:43')]*24 },
index=pd.date_range('2020-01-01 00:00:00', '2020-01-01 23:00:00', freq='H')
)
If you now cast the truth value as integer you can multiply both selections in one step.
df['sunlight'] = (df['sunrais']<df.index).astype(int) * (df.index<df['sunset']).astype(int)
The the output looks like this:
sunrais sunset sunlight
2020-01-01 07:00:00 2020-01-01 08:19:17 2020-01-01 15:46:43 0
2020-01-01 08:00:00 2020-01-01 08:19:17 2020-01-01 15:46:43 0
2020-01-01 09:00:00 2020-01-01 08:19:17 2020-01-01 15:46:43 1
2020-01-01 10:00:00 2020-01-01 08:19:17 2020-01-01 15:46:43 1

Get quarter start/end dates for more than a year (start year to current year)

I've been trying to get start and end dates range for each quarter given a specific date/year, like this:
SELECT DATEADD(mm, (quarter - 1) * 3, year_date) StartDate,
DATEADD(dd, 0, DATEADD(mm, quarter * 3, year_date)) EndDate
--quarter QuarterNo
FROM
(
SELECT '2012-01-01' year_date
) s CROSS JOIN
(
SELECT 1 quarter UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4
) q
which produces the following output:
2012-01-01 00:00:00 2012-04-01 00:00:00
2012-04-01 00:00:00 2012-07-01 00:00:00
2012-07-01 00:00:00 2012-10-01 00:00:00
2012-10-01 00:00:00 2013-01-01 00:00:00
Problem: I need to do this for a given start_date and end_date, the problem being the end_date=current_day, so how can I achieve this:
2012-01-01 00:00:00 2012-04-01 00:00:00
2012-04-01 00:00:00 2012-07-01 00:00:00
2012-07-01 00:00:00 2012-10-01 00:00:00
2012-10-01 00:00:00 2013-01-01 00:00:00
... ...
2021-01-01 00:00:00 2021-01-06 00:00:00
I think here is what you want to do :
SET startdatevar AS DATEtime = '2020-01-10'
;WITH RECURSIVE cte AS (
SELECT startdatevar AS startdate , DATEADD(QUARTER, 1 , startdatevar) enddate , 1 quarter
UNION ALL
SELECT enddate , CASE WHEN DATEADD(QUARTER, 1 , enddate) > CURRENT_DATE() THEN GETDATE() ELSE DATEADD(QUARTER, 1 , enddate) END enddate, quarter + 1
FROM cte
WHERE
cte.enddate <= CURRENT_DATE()
and quarter < 4
)
SELECT * FROM cte
to use your code , if you want to have more than 4 quarters :
SET quarter_limit = DATEDIFF(quarter , <startdate>,<enddate>)
;WITH RECURSIVE cte(q, qDate,enddate) as
(
select 1,
DATEFROMPARTS(year('2012-01-01'::date), 1, 1) -- First quarter date
,time_slice('2012-01-01'::date, 3, 'MONTH', 'END')
UNION ALL
select q+1,
DATEADD(q, 1, qdate) -- next quarter start date
,time_slice(qdate::date, (q+1)*3, 'MONTH', 'END')
from cte
where q < quarter_limit -- limiting the number of next quarters
AND cte.endDate <= <enddate>
)
SELECT * FROM cte
After #eshirvana's answer, I came up with this slightly change after your answer:
WITH RECURSIVE cte(q, qDate,enddate) as
(
select 1,
DATEFROMPARTS(year('2012-01-01'::date), 1, 1) -- First quarter date
,time_slice('2012-01-01'::date, 3, 'MONTH', 'END')
UNION ALL
select q+1,
DATEADD(q, 1, qdate) -- next quarter start date
,time_slice(qdate::date, (q+1)*3, 'MONTH', 'END')
from cte
where q <4 -- limiting the number of next quarters
AND cte.endDate <= CURRENT_DATE()
)
SELECT * FROM cte
Which works fine for whatever year I pass there (2012 will produce 4 records, 2021 just one, since we're still on the first quarter right now).
[EDIT]: it still doesn't work as expected after your 2nd code sugestion:
WITH RECURSIVE cte(q, qDate,enddate) as
(
select 1,
DATEFROMPARTS(year('2012-01-01'::date), 1, 1) -- First quarter date
,CASE WHEN time_slice('2012-01-01'::date, 3, 'MONTH', 'END') > CURRENT_DATE
THEN current_date
ELSE time_slice('2012-01-01'::date, 3, 'MONTH', 'END')
END
UNION ALL
select q+1,
DATEADD(q, 1, qdate) -- next quarter start date
,time_slice(qdate::date, (q+1)*3, 'MONTH', 'END')
from cte
where q < DATEDIFF(quarter , '2012-01-01'::date,'2021-01-06'::date)
AND cte.endDate <= '2021-01-06'::date
)
SELECT * FROM cte
is outputing this:
Sorry #eshirvana, it doesn't work as expected though. It all goes well to some point, but it's not returning all the records. Instead, it produces less records and wrong one, like this:
1 2012-01-01 2012-04-01
2 2012-04-01 2012-07-01
3 2012-07-01 2012-10-01
4 2012-10-01 2013-01-01
5 2013-01-01 2013-10-01
6 2013-04-01 2013-07-01
7 2013-07-01 2013-10-01
8 2013-10-01 2014-01-01
9 2014-01-01 2015-01-01
10 2014-04-01 2015-01-01
11 2014-07-01 2016-10-01
12 2014-10-01 2015-01-01
13 2015-01-01 2015-07-01
14 2015-04-01 2015-07-01
15 2015-07-01 2018-10-01
16 2015-10-01 2018-01-01
17 2016-01-01 2016-10-01
18 2016-04-01 2019-07-01
19 2016-07-01 2017-07-01
20 2016-10-01 2020-01-01
21 2017-01-01 2017-04-01
22 2017-04-01 2019-07-01
23 2017-07-01 2021-10-01
Although my logic it's still not ok for not printing just Q1 dates for 2021, could this output issues be related to date format or something?
Now, it seems to be working, at least for 2012-01-01 till today (2021-01-06).
The code :
WITH RECURSIVE cte(q, qDate,enddate) as
(
select
-- it might not be the first quarter, so better to protect that:
quarter('2012-01-01'::date)::numeric
, DATEFROMPARTS(year('2012-01-01'::date), 1, 1) -- First quarter date
, CASE WHEN time_slice('2012-01-01'::date, 3, 'MONTH', 'END') > '2021-01-06'::date
THEN '2021-01-06'::date
ELSE time_slice('2012-01-01'::date, 3, 'MONTH', 'END')
END
UNION ALL
select q+1
, DATEADD(q, 1, qdate) -- next quarter start date
,CASE WHEN time_slice(DATEADD(q, 1, qdate), 3, 'MONTH', 'END')> '2021-01-06'::date
THEN '2021-01-06'::date
ELSE time_slice(DATEADD(q, 1, qdate), 3, 'MONTH', 'END')
END
from cte
where q <= DATEDIFF(quarter , '2012-01-01'::date,'2021-01-06'::date)
AND cte.endDate <= '2021-01-06'::date
)
SELECT * FROM cte
The output:
1 2012-01-01 2012-04-01
2 2012-04-01 2012-07-01
3 2012-07-01 2012-10-01
4 2012-10-01 2013-01-01
5 2013-01-01 2013-04-01
6 2013-04-01 2013-07-01
7 2013-07-01 2013-10-01
8 2013-10-01 2014-01-01
9 2014-01-01 2014-04-01
10 2014-04-01 2014-07-01
11 2014-07-01 2014-10-01
12 2014-10-01 2015-01-01
13 2015-01-01 2015-04-01
14 2015-04-01 2015-07-01
15 2015-07-01 2015-10-01
16 2015-10-01 2016-01-01
17 2016-01-01 2016-04-01
18 2016-04-01 2016-07-01
19 2016-07-01 2016-10-01
20 2016-10-01 2017-01-01
21 2017-01-01 2017-04-01
22 2017-04-01 2017-07-01
23 2017-07-01 2017-10-01
24 2017-10-01 2018-01-01
25 2018-01-01 2018-04-01
26 2018-04-01 2018-07-01
27 2018-07-01 2018-10-01
28 2018-10-01 2019-01-01
29 2019-01-01 2019-04-01
30 2019-04-01 2019-07-01
31 2019-07-01 2019-10-01
32 2019-10-01 2020-01-01
33 2020-01-01 2020-04-01
34 2020-04-01 2020-07-01
35 2020-07-01 2020-10-01
36 2020-10-01 2021-01-01
37 2021-01-01 2021-01-06
In case you're wondering: yes, the idea is to present the end_date as last_day of the month+one. But it could easily be adapted.
It's not pretty, but I think it's somehow easy to understand.