How to get time difference in months in SQL - sql

Here is how my current dataset is formatted:
USER START_DATE END_DATE NB_MONTHS
--------------------------------------------
111 2020-01-01 2021-02-01 13
222 2020-05-17 2020-09-28 16
333 2020-02-01 2020-03-01 0
Each of my users currently have a start date and an end date for an action they've completed.
I wish to find the time duration of their action in MONTHS (as defined by the NB_MONTHS_ flag).
Here is my current query to get this NB_MONTHS flag:
SELECT
USERS,
FLOOR((END_DATE)-(START_DATE))/30.00 as NB_MONTHS
FROM
TABLE1;
I am currently rounding down this flag as that is what makes most sense for my analysis.
Here is where I get an issue:
My user 333 who has technically taking 1 month to complete the action (duration of February) is currently beeing flagged as "0 months" because February has 28 days (which doesnt work with my query).
Anyone know how I can avoid this problem?

Does datediff() do what you want?
SELECT USERS,
DATEDIFF(MONTH, START_DATE, END_DATE) as NB_MONTHS
FROM TABLE1;

Related

SQL Query - Identifying entries between payment dates greater than 6 years

I have this table (in reality it has more fields but for simplicity, it will demonstrate what I'm after)
Payment_Type
Person ID
Payment_date
Payment_Amount
Normal
1
2015-01-01
£1.00
Normal
1
2017-01-01
£2.00
Reversal
1
2022-01-09
£3.00
Normal
2
2016-12-29
£3.00
Reversal
2
2022-01-02
£4.00
I need 2 specific things from this:
I need all entries where there is over 6 years difference between any given payment dates (when its been greater than or equal to 6 years from the date of the latest payment date). I don't need to count them, I just need it to return all the entries that meet this criteria.
I also need it to specify where a normal payment hasn't been made for 6 years or more from todays date but a reversal has however occurred within the last 6 years. (This might need to be a separate query but will take suggestions)
I'm using Data Lake (Hue).
Thank you.
I've tried to run a sub query with join and union but I'm not getting the desired results so will need to start from scratch. Any advice/insight on this is greatly appreciated.
Ideally, query one will show:
Payment_Type
Person ID
Payment_date
Payment_Amount
Normal
1
2015-01-01
£1.00
Normal
1
2017-01-01
£2.00
Normal
2
2016-12-29
£3.00
Query 2 results should show:
Payment_Type
Person ID
Payment_date
Payment_Amount
Normal
1
2017-01-01
£2.00
Reversal
1
2022-01-09
£3.00
Normal
2
2016-12-29
£3.00
Reversal
2
2022-01-02
£4.00

Calculating difference (or deltas) between current and previous row with clickhouse

It would be awesome if there was a way to index rows during a query.
Is there a way to SELECT (compute) the difference of a single column between consecutive rows?
Let's say, something like the following query
SELECT
toStartOfDay(stamp) AS day,
count(day ) AS events ,
day[current] - day[previous] AS difference, -- how do I calculate this
day[current] / day[previous] as percent, -- and this
FROM records
GROUP BY day
ORDER BY day
I want to get the integer and percentage difference between the current row's 'events' column and the previous one for something similar to this:
day
events
difference
percent
2022-01-06 00:00:00
197
NULL
NULL
2022-01-07 00:00:00
656
459
3.32
2022-01-08 00:00:00
15
-641
0.02
2022-01-09 00:00:00
7
-8
0.46
2022-01-10 00:00:00
137
130
19.5
My version of Clickhouse doesn't support window-function but, on looking about the LAG() function mentioned in the comments, I found neighbor(), which works perfectly for what I'm trying to do
SELECT
toStartOfDay(stamp) AS day,
count(day ) AS events ,
(events - neighbor(events, -1)) as diff,
(events / neighbor(events, -1)) as perc
FROM records
GROUP BY day
ORDER BY day

Function for business hours in seconds based on calendar table

I'm fairly new to this forum and to T-SQL.
I'm looking for a function to calculate business hours in seconds based on my calendar table. In my calendar table I have 2 columns in it. 1st column is date and opening time and 2nd column date and end time.
I tried the solution from #Ezlo SQL Server counting hours between dates excluding Fri 6pm - Mon 6am
In his solution when its the same date it doubles the time for example the output has to be 75 secs its then 150 secs. I want to be able to call the function in a query like WorkTime (#StartDate DATETIME, #FinishDate DATETIME) while it passes through my calendar table. The startdate and finishdate has to be anything I put as value in it ie a columns (datecreated,dateclosed) with dates.
Ie: a query with 1000 rows like this format.
Scenario 1
TicketID: 111111
DateCreated: 2019-01-01 10:00:52
DateClosed: 2019-01-02 08:35:00
Function result has to be 300 secs while it checked my calendar table.
Scenario 2
TicketID: 111112
DateCreated: 2019-01-02 16:30:00
DateClosed: 2019-01-02 16:15:00
Function result has to be 900 secs while it checked my calendar table.
Scenario 3
TicketID: 111113
DateCreated: 2019-01-02 20:00:00
DateClosed: 2019-01-03 09:30:00
Function result: 3600 secs
Scenario 4
TicketID: 111114
DateCreated: 2019-01-05 20:00:00
DateClosed: 2019-01-07 09:00:00
Function result: 1800 secs
Calendar table
As you can see I have ie 1st of January set to 08:30 so it doesn't calculate the time (Holiday). And so I have a set of Holidays set the same way.
Weekends are left out see calendar table, in that way it is excluded and the time starts to count on the first business day.
I have tried multiple times but with no success of getting it to work as I wish.
Hopefully you gurus can me help me to achieve this.
After days of searching this forum. The answer I was looking for can be found here.
Calculate time difference (only working hours) in minutes between two dates

Check if date range fall within a WW

I have a question about comparing date ranges.
I have a table that hold a state log of a machine. The state can be 0 or 1. In addition I have the date when the machine state change started and when it ended
START_DATE | END_DATE | STATE
For example:
START_DATE | END_DATE | STATE
2019-05-28 07:12:43 2019-05-29 09:12:43 1
2019-05-29 09:12:43 2019-06-01 08:12:43 0
2019-06-11 10:12:43 2019-06-12 16:12:43 1
2019-06-12 16:12:43 2019-06-12 17:12:43 0
I want to make a report that iterate through each WW (work week) and check what average state was on that WW.
My problem is that a state change could have happened on WW22 and ended on WW24 so when I GROUP BY WW I get no values on WW23 because there was no start or end state on WW23. But on WW23 that machine was on state 0 because it started on WW22 and ended on WW24 but through all the this time the state was 0.
It seems that I cant use GROUP BY WW to solve it.
I may need to check START_DATE and END_DATE on cases there is no records on WW23. to add something like:
CASE WHEN WW BETWEEN START_DATE AND END_DATE THEN...
But im not sure how to loop on the WW without using GROUP BY.
I use SQL ORACLE
Thanks.
I hope I understood correctly. It would be good if You showed us your query and tell us how you count average state and where these weeks come from. Anyway here is query which generates all weeks for year 2019 and joins with your log.
select to_char(wsd, 'iw') week, wsd, start_date, end_date, state
from (
select trunc(date '2019-01-01', 'iw') + level * 7 - 7 wsd
from dual
connect by trunc(date '2019-01-01', 'iw') + level * 7 <= date '2020-01-01')
left join log on wsd < end_date and start_date < wsd + 7
Interesting is this range:
week week_start_date log_start log_end state
21 2019-05-20
22 2019-05-27 2019-05-28 07:12:43 2019-05-29 09:12:43 1
22 2019-05-27 2019-05-29 09:12:43 2019-06-01 08:12:43 0
23 2019-06-03
24 2019-06-10 2019-06-11 10:12:43 2019-06-12 16:12:43 1
24 2019-06-10 2019-06-12 16:12:43 2019-06-12 17:12:43 0
25 2019-06-17
I don't know how you count average state for weeks 22 and 24. Maybe it is weighted average of substracted times, maybe somehow other. But it's not important, now you have row for week 23, with missing state.
If this means that previous value is valid for this week use:
nvl(state, lag(state) over (order by wsd))
or
coalesce(state, lag(state) over (order by wsd), 0)
when you want 0 as default value when we also miss previous week(s). If two weeks are missing add ignore nulls to lag.
Then you can group data by weeks and count average values.
dbfiddle demo

GROUP BY several hours

I have a table where our product records its activity log. The product starts working at 23:00 every day and usually works one or two hours. This means that once a batch started at 23:00, it finishes about 1:00am next day.
Now, I need to take statistics on how many posts are registered per batch but cannot figure out a script that would allow me achiving this. So far I have following SQL code:
SELECT COUNT(*), DATEPART(DAY,registrationtime),DATEPART(HOUR,registrationtime)
FROM RegistrationMessageLogEntry
WHERE registrationtime > '2014-09-01 20:00'
GROUP BY DATEPART(DAY, registrationtime), DATEPART(HOUR,registrationtime)
ORDER BY DATEPART(DAY, registrationtime), DATEPART(HOUR,registrationtime)
which results in following
count day hour
....
1189 9 23
8611 10 0
2754 10 23
6462 11 0
1885 11 23
I.e. I want the number for 9th 23:00 grouped with the number for 10th 00:00, 10th 23:00 with 11th 00:00 and so on. How could I do it?
You can do it very easily. Use DATEADD to add an hour to the original registrationtime. If you do so, all the registrationtimes will be moved to the same day, and you can simply group by the day part.
You could also do it in a more complicated way using CASE WHEN, but it's overkill on the view of this easy solution.
I had to do something similar a few days ago. I had fixed timespans for work shifts to group by where one of them could start on one day at 10pm and end the next morning at 6am.
What I did was:
Define a "shift date", which was simply the day with zero timestamp when the shift started for every entry in the table. I was able to do so by checking whether the timestamp of the entry was between 0am and 6am. In that case I took only the date of this DATEADD(dd, -1, entryDate), which returned the previous day for all entries between 0am and 6am.
I also added an ID for the shift. 0 for the first one (6am to 2pm), 1 for the second one (2pm to 10pm) and 3 for the last one (10pm to 6am).
I was then able to group over the shift date and shift IDs.
Example:
Consider the following source entries:
Timestamp SomeData
=============================
2014-09-01 06:01:00 5
2014-09-01 14:01:00 6
2014-09-02 02:00:00 7
Step one extended the table as follows:
Timestamp SomeData ShiftDay
====================================================
2014-09-01 06:01:00 5 2014-09-01 00:00:00
2014-09-01 14:01:00 6 2014-09-01 00:00:00
2014-09-02 02:00:00 7 2014-09-01 00:00:00
Step two extended the table as follows:
Timestamp SomeData ShiftDay ShiftID
==============================================================
2014-09-01 06:01:00 5 2014-09-01 00:00:00 0
2014-09-01 14:01:00 6 2014-09-01 00:00:00 1
2014-09-02 02:00:00 7 2014-09-01 00:00:00 2
If you add one hour to registrationtime, you will be able to group by the date part:
GROUP BY
CAST(DATEADD(HOUR, 1, registrationtime) AS date)
If the starting hour must be reflected accurately in the output (as 9, 23, 10, 23 rather than as 10, 0, 11, 0), you could obtain it as MIN(registrationtime) in the SELECT clause:
SELECT
count = COUNT(*),
day = DATEPART(DAY, MIN(registrationtime)),
hour = DATEPART(HOUR, MIN(registrationtime))
Finally, in case you are not aware, you can reference columns by their aliases in ORDER BY:
ORDER BY
day,
hour
just so that you do not have to repeat the expressions.
The below query will give you what you are expecting..
;WITH CTE AS
(
SELECT COUNT(*) Count, DATEPART(DAY,registrationtime) Day,DATEPART(HOUR,registrationtime) Hour,
RANK() over (partition by DATEPART(HOUR,registrationtime) order by DATEPART(DAY,registrationtime),DATEPART(HOUR,registrationtime)) Batch_ID
FROM RegistrationMessageLogEntry
WHERE registrationtime > '2014-09-01 20:00'
GROUP BY DATEPART(DAY, registrationtime), DATEPART(HOUR,registrationtime)
)
SELECT SUM(COUNT) Count,Batch_ID
FROM CTE
GROUP BY Batch_ID
ORDER BY Batch_ID
You can write a CASE statement as below
CASE WHEN DATEPART(HOUR,registrationtime) = 23
THEN DATEPART(DAY,registrationtime)+1
END,
CASE WHEN DATEPART(HOUR,registrationtime) = 23
THEN 0
END