Subtracting a date from another date - sql

I am trying to subtract one date from another but having issues:
SELECT MIN(date) AS first_day,
MAX(date) AS last_date,
((MAX(date)) - (MIN(date))) AS totaL_days
FROM dates;
Could someone please clarify the number format of the number it is returning below?
+------------+
| total_days |
+------------+
| 29001900 |
I have tried using DATEDIFF but this rounds the days to the nearest whole number and I need to carry out further calculations with the data. The rounding means my solutions are a little off.
In the version of DB I am using DATEDIFF() only takes two parameters so always has to be days as far as I'm aware, I get an error if I try to use hours.

SELECT DATEDIFF
(
SELECT MAX(date) FROM dates,
SELECT MIN(date) FROM dates
)
AS totaL_days;
should do the trick.

I'm assuming your RDBMS is a MySql.
Then that number you got would be the seconds between those 2 datetimes.
Because if you subtract 2 DATE types you would get the days between them.
There's more than DATEDIFF to work with.
Test data
create table mytest (
id int auto_increment primary key,
date_col date not null,
datetime_col datetime not null
);
insert into mytest(date_col, datetime_col) values
('2021-06-16', '2021-06-16 14:15:30')
,('2021-07-16', '2021-07-16 19:06:15')
Test using dates
select
min(date_col) as min_date
, max(date_col) as max_date
, max(date_col) - min(date_col) as subtracted
, datediff(max(date_col), min(date_col)) as days
from mytest
min_date | max_date | subtracted | days
:--------- | :--------- | ---------: | ---:
2021-06-16 | 2021-07-16 | 100 | 30
Test using datetimes
select
min(datetime_col) as min_date
, max(datetime_col) as max_date
, max(datetime_col) - min(datetime_col) as seconds
, datediff(max(datetime_col), min(datetime_col)) as days
from mytest
min_date | max_date | seconds | days
:------------------ | :------------------ | --------: | ---:
2021-06-16 14:15:30 | 2021-07-16 19:06:15 | 100049085 | 30
Using sec_to_time and extract
select seconds
, sec_to_time(seconds) as tm
, extract(day from sec_to_time(seconds)) as days
from (
select
max(datetime_col) - min(datetime_col) as seconds
from mytest
) q
seconds | tm | days
--------: | :-------- | ---:
100049085 | 838:59:59 | 30
db<>fiddle here

Related

Querying the retention rate on multiple days with SQL

Given a simple data model that consists of a user table and a check_in table with a date field, I want to calculate the retention date of my users. So for example, for all users with one or more check ins, I want the percentage of users who did a check in on their 2nd day, on their 3rd day and so on.
My SQL skills are pretty basic as it's not a tool that I use that often in my day-to-day work, and I know that this is beyond the types of queries I am used to. I've been looking into pivot tables to achieve this but I am unsure if this is the correct path.
Edit:
The user table does not have a registration date. One can assume it only contains the ID for this example.
Here is some sample data for the check_in table:
| user_id | date |
=====================================
| 1 | 2020-09-02 13:00:00 |
-------------------------------------
| 4 | 2020-09-04 12:00:00 |
-------------------------------------
| 1 | 2020-09-04 13:00:00 |
-------------------------------------
| 4 | 2020-09-04 11:00:00 |
-------------------------------------
| ... |
-------------------------------------
And the expected output of the query would be something like this:
| day_0 | day_1 | day_2 | day_3 |
=================================
| 70% | 67 % | 44% | 32% |
---------------------------------
Please note that I've used random numbers for this output just to illustrate the format.
Oh, I see. Assuming you mean days between checkins for users -- and users might have none -- then just use aggregation and window functions:
select sum( (ci.date = ci.min_date)::numeric ) / u.num_users as day_0,
sum( (ci.date = ci.min_date + interval '1 day')::numeric ) / u.num_users as day_1,
sum( (ci.date = ci.min_date + interval '2 day')::numeric ) / u.num_users as day_2
from (select u.*, count(*) over () as num_users
from users u
) u left join
(select ci.user_id, ci.date::date as date,
min(min(date::date)) over (partition by user_id order by date) as min_date
from checkins ci
group by user_id, ci.date::date
) ci;
Note that this aggregates the checkins table by user id and date. This ensures that there is only one row per date.

SQL (BigQuery) Grouping Runtime Per Day

I have the following data which I want to group into seconds per day in BigQuery.
Source Table:
+--------------+---------------------+---------------------+
| ComputerName | StartDatetime | EndDatetime |
+--------------+---------------------+---------------------+
| Computer1 | 2020-06-10T21:01:28 | 2020-06-10T21:20:19 |
+--------------+---------------------+---------------------+
| Computer1 | 2020-06-10T22:54:01 | 2020-06-11T05:21:48 |
+--------------+---------------------+---------------------+
| Computer2 | 2020-06-08T09:11:54 | 2020-06-10T11:36:27 |
+--------------+---------------------+---------------------+
I want to be able to visualise the data in the following way
+------------+--------------+------------------+
| Date | ComputerName | Runtime(Seconds) |
+------------+--------------+------------------+
| 2020-10-10 | Computer1 | 5089 |
+------------+--------------+------------------+
| 2020-10-11 | Computer1 | 19308 |
+------------+--------------+------------------+
| 2020-10-08 | Computer2 | 53285 |
+------------+--------------+------------------+
| 2020-10-09 | Computer2 | 86400 |
+------------+--------------+------------------+
| 2020-10-10 | Computer2 | 41787 |
+------------+--------------+------------------+
I am not too sure of the way I should approach this. Some input would be greatly appreciated.
This is an interval overlap problem. You can solve this by splitting each time period into separate days and then looking at the overlap for each day:
with t as (
select 'Computer1' as computername, datetime '2020-06-10T21:01:28' as startdatetime, datetime '2020-06-10T21:20:19' as enddatetime union all
select 'Computer1' as computername, datetime '2020-06-10T22:54:01' as startdatetime, datetime '2020-06-11T05:21:48' as enddatetime union all
select 'Computer2' as computername, datetime '2020-06-08T09:11:54' as startdatetime, datetime '2020-06-10T11:36:27' as enddatetime
)
select dte, t.computername,
sum(case when enddatetime >= dte and
startdatetime < date_add(dte, interval 1 day)
then datetime_diff(least(date_add(dte, interval 1 day), enddatetime),
greatest(dte, startdatetime),
second)
end) as runtime_seconds
from (select t.*,
generate_date_array(date(t.startdatetime), date(t.enddatetime), interval 1 day) gda
from t
) t cross join
unnest(gda) dte
group by dte, t.computername;
Below is for BigQuery Standard SQL
#standardSQL
select Date, ComputerName,
sum(datetime_diff(
least(datetime (Date + 1), EndDatetime),
greatest(datetime(Date), StartDatetime),
second
)) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date
group by Date, ComputerName
if to apply to sample data in your question - as in below example
#standardSQL
with `project.dataset.table` as (
select 'Computer1' ComputerName, datetime '2020-06-10T21:01:28' StartDatetime, datetime '2020-06-10T21:20:19' EndDatetime union all
select 'Computer1', '2020-06-10T22:54:01', '2020-06-11T05:21:48' union all
select 'Computer2', '2020-06-08T09:11:54', '2020-06-10T11:36:27'
)
select Date, ComputerName,
sum(datetime_diff(
least(datetime (Date + 1), EndDatetime),
greatest(datetime(Date), StartDatetime),
second
)) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date
group by Date, ComputerName
output is
Another option for BigQuery Standard SQL
Straightforward, "little silly" and almost logic-less option of just "stupidly" counting seconds in respective days - still looks like an option to me
#standardSQL
select Date, ComputerName,
countif(second >= timestamp(StartDatetime) and second < timestamp(EndDatetime)) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date,
unnest(generate_timestamp_array(timestamp(Date + 1), timestamp(Date), interval -1 second)) second with offset
where offset > 0
group by Date, ComputerName
if applied to sample data from your question - output is

Is there way to add date difference values we get to the date automatically?

What I was trying to do is I have two dates and using DateDiff to get a difference between dates. For example, I Have planned Start Date and actual start Date and I got the difference between this date is 5, now I want to add this day to the Finish date.
If my Finish date is not what I assumed, but behind, then that difference we got I want to add and want to find next finish date because we are behind so next upcoming dates.
Sum (DATEDIFF(day, sa.PlannedStartDate, sa.ActualStartDate)) OVER
(Partition
By ts.Id)as TotalVariance,
Case when (Sum (DATEDIFF(day, sa.PlannedStartDate, sa.ActualStartDate))
OVER
(Partition By ts.Id) >30) then 'Positive' end as Violation,
DATEADD (day, DATEDIFF(day, sa.PlannedStartDate, sa.ActualStartDate))as
Summar violations,
If the activity 1 - planned Start date is 8/21/2019 but the actual start date is 9/21/2019, in this case we are behind 30 days.
Now the next activity will be delayed, so I want to add this difference to the next activity.
If the second activity planned Start date was 08/25/2019, but because of the delay of activity 1 the start date will change for second activity, in this case I want to find that new date.
Activity PlannedStartdate ActualStartDate Variance NewPlannedstartdate
Activity 1 8/21/2019 9/21/2019 30
Acivity 2 8/26/2019 null 9/26/2019
Here's an example you can run in SSMS:
-- CREATE ACTIVITY TABLE AND ADD SOME DATA --
DECLARE #Activity TABLE ( ActivityId INT, PlannedStart DATE, ActualStart DATE );
INSERT INTO #Activity (
ActivityId, PlannedStart, ActualStart
)
VALUES
( 1, '08/21/2019', '08/27/2019' ), ( 1, '08/26/2019', NULL ), ( 1, '09/14/2019', NULL );
Query #Activity to see what's in it:
SELECT * FROM #Activity ORDER BY ActivityId, PlannedStart;
#Activity content:
+------------+--------------+-------------+
| ActivityId | PlannedStart | ActualStart |
+------------+--------------+-------------+
| 1 | 2019-08-21 | 2019-08-27 |
| 1 | 2019-08-26 | NULL |
| 1 | 2019-09-14 | NULL |
+------------+--------------+-------------+
Query #Activity to factor the new starting dates:
;WITH Activity_CTE AS (
SELECT
ROW_NUMBER() OVER ( ORDER BY PlannedStart ) AS Id,
ActivityId, PlannedStart, ActualStart, DATEDIFF( dd, PlannedStart, ActualStart ) Delayed
FROM #Activity
WHERE
ActivityId = #ActivityId
)
SELECT
ActivityId,
PlannedStart,
ActualStart,
DATEADD( dd, Delays.DaysDelayed, PlannedStart ) AS NewStart
FROM Activity_CTE AS Activity
OUTER APPLY (
SELECT CASE
WHEN ( Delayed IS NOT NULL ) THEN Delayed
ELSE ISNULL( ( SELECT TOP 1 Delayed FROM Activity_CTE WHERE Id < Activity.Id AND Delayed IS NOT NULL ORDER BY Id DESC ), 0 )
END AS DaysDelayed
) AS Delays
ORDER BY
PlannedStart;
Returns
+------------+--------------+-------------+------------+
| ActivityId | PlannedStart | ActualStart | NewStart |
+------------+--------------+-------------+------------+
| 1 | 2019-08-21 | 2019-08-27 | 2019-08-27 |
| 1 | 2019-08-26 | NULL | 2019-09-01 |
| 1 | 2019-09-14 | NULL | 2019-09-20 |
+------------+--------------+-------------+------------+
The real "magic" here is this line:
ELSE ISNULL( ( SELECT TOP 1 Delayed FROM Activity_CTE WHERE Id < Activity.Id AND Delayed IS NOT NULL ORDER BY Id DESC ), 0 )
It's checking for any prior records to itself that has a delay. If none are found, it returns 0. This value is then used to add days to the PlannedStart date to determine the NewStart date. The ORDER BY is of particular note too. Sorting in a DESC order ensures we get the "closest" delay prior to the current row.
Using a CTE in this way also takes into account the idea that the delay may not happen on the very first record (e.g., say the 08/26 planned was delayed instead of 08/21). It conveniently gives us a subtable to query against in our OUTER APPLY.
This is what you would see if you included all columns on the CTE's SELECT:
+----+------------+--------------+-------------+---------+-------------+
| Id | ActivityId | PlannedStart | ActualStart | Delayed | DaysDelayed |
+----+------------+--------------+-------------+---------+-------------+
| 1 | 1 | 2019-08-21 | 2019-08-27 | 6 | 6 |
| 2 | 1 | 2019-08-26 | NULL | NULL | 6 |
| 3 | 1 | 2019-09-14 | NULL | NULL | 6 |
+----+------------+--------------+-------------+---------+-------------+
Because the very first record is the only record with a delay, its delay of 6 days persists through each of the following records.

PostgreSQL query group by two "parameters"

I've been trying to figure out the following PostgreSQL query with no success for two days now.
Let's say I have the following table:
| date | value |
-------------------------
| 2018-05-11 | 0.20 |
| 2018-05-11 | -0.12 |
| 2018-05-11 | 0.15 |
| 2018-05-10 | -1.20 |
| 2018-05-10 | -0.70 |
| 2018-05-10 | -0.16 |
| 2018-05-10 | 0.07 |
And I need to find out the query to count positive and negative values per day:
| date | positives | negatives |
------------------------------------------
| 2018-05-11 | 2 | 1 |
| 2018-05-10 | 1 | 3 |
I've been able to figure out the query to extract only positives or negatives, but not both at the same time:
SELECT to_char(table.date, 'DD/MM') AS date
COUNT(*) AS negative
FROM table
WHERE table.date >= DATE(NOW() - '20 days' :: INTERVAL) AND
value < '0'
GROUP BY to_char(date, 'DD/MM'), table.date
ORDER BY table.date DESC;
Can please someone assist? This is driving me mad. Thank you.
Use a FILTER clause with the aggregate function.
SELECT to_char(table.date, 'DD/MM') AS date,
COUNT(*) FILTER (WHERE value < 0) AS negative,
COUNT(*) FILTER (WHERE value > 0) AS positive
FROM table
WHERE table.date >= DATE(NOW() - '20 days'::INTERVAL)
GROUP BY 1
ORDER BY DATE(table.date) DESC
I would simply do:
select date_trunc('day', t.date) as dte,
sum( (value < 0)::int ) as negatives,
sum( (value > 0)::int ) as positives
from t
where t.date >= current_date - interval '20 days'
group by date_trunc('day', t.date),
order by dte desc;
Notes:
I prefer using date_trunc() to casting to a string for removing the time component.
You don't need to use now() and convert to a date. You can just use current_date.
Converting a string to an interval seems awkward, when you can specify an interval using the interval keyword.

Is it possible to group by day, month or year with timestamp values?

I have a table ORDERS(idOrder, idProduct, Qty, OrderDate) where OrderDate is a varchar column with timestamp values, is it possible to get the Qty of each day, week, month or year ?
The table looks like this :
---------------------------------------
|idOrder | idProduct | Qty | OrderDate|
---------------------------------------
| 1 | 5 | 20 | 1504011790 |
| 2 | 5 | 50 | 1504015790 |
| 3 | 5 | 60 | 1504611790 |
| 4 | 5 | 90 | 1504911790 |
-----------------------------------------
and i want something like this
------------------------------
| idProduct | Qty | OrderDate|
-------------------------------
| 5 | 70 | 08/29/2017|
| 5 | 60 | 09/05/2017|
| 5 | 90 | 09/08/2017|
-------------------------------
looks like you want to do 2 things here: first group by your idProduct and OrderDate
select idProduct, sum(Qty), OrderDate from [yourtable] group by idProduct, OrderDate
This will get you the sums that you want. Next, you want to convert time formats. I assume that your stamps are in Epoch time (number of seconds from Jan 1, 1970) so converting them takes the form:
dateadd(s,[your time field],'19700101')
It also looks like you wanted your dates formatted as mm/dd/yyyy.
convert(NVARCHAR, [date],101) is the format for accomplishing that
Together:
select idProduct, sum(Qty), convert(NVARCHAR,dateadd(s,OrderDate,'19700101'), 101)
from [yourtable]
group by idProduct, OrderDate
Unfortunately, the TSQL TIMESTAMP data type isn't really a date. According to this SO question they're even changing the name because it's such a misnomer. You're much better off creating a DATETIME field with a DEFAULT = GETDATE() to keep an accurate record of when a line was created.
That being said, the most performant way I've seen to track dates down to the day/week/month/quarter/etc. is to use a date dimension table that just lists every date and has fields like WeekOfMonth and DayOfYearand. Once you join your new DateCreated field to it you can get all sorts of information about that date. You can google scripts that will create a date dimension table for you.
Yes its very simple:
TRUNC ( date [, format ] )
Format can be:
TRUNC(TO_DATE('22-AUG-03'), 'YEAR')
Result: '01-JAN-03'
TRUNC(TO_DATE('22-AUG-03'), 'MONTH')
Result: '01-AUG-03'
TRUNC(TO_DATE('22-AUG-03'), 'DDD')
Result: '22-AUG-03'
TRUNC(TO_DATE('22-AUG-03'), 'DAY')
Result: '17-AUG-03'