Sum of item count in an SQL query based on DATE_TRUNC - sql

I've got a table which contains event status data, similar to this:
ID Time Status
------ -------------------------- ------
357920 2019-12-25 09:31:38.854764 1
362247 2020-01-02 09:31:42.498483 1
362248 2020-01-02 09:31:46.166916 1
362249 2020-01-02 09:31:47.430933 1
362300 2020-01-03 09:31:46.932333 1
362301 2020-01-03 09:31:47.231288 1
I'd like to construct a query which returns the number of successful events each day, so:
Time Count
-------------------------- -----
2019-12-25 00:00:00.000000 1
2020-01-02 00:00:00.000000 3
2020-01-03 00:00:00.000000 2
I've stumbled across this SO answer to a similar question, but the answer there is for all the data returned by the query, whereas I need the sum grouped by date range.
Also, I cannot use BETWEEN to select a specific date range, since this query is for a Grafana dashboard, and the date range is determined by the dashboard's UI. I'm using Postgres for the SQL dialect, in case that matters.

You need to remove the time from time component. In most databases, you can do this by converting to a date:
select cast(time as date) as dte,
sum(case when status = 1 then 1 else 0 end) as num_successful
from t
group by cast(time as date)
order by dte;
This assumes that 1 means "successful".
The cast() does not work in all databases. Other alternatives are things like trunc(time), date_trunc('day', time), date_trunc(time, day) -- and no doubt many others.
In Postgres, I would phrase this as:
select date_trunc('day', time) as dte,
count(*) filter (where status = 1) as num_successful
from t
group by dte
order by dte;

How about like this:
SELECT date(Time), sum(status)
FROM table
GROUP BY date(Time)
ORDER BY min(Time)

Related

T-SQL filtering records based on dates and time difference with other records

I have a table for which I have to perform a rather complex filter: first a filter by date is applied, but then records from the previous and next days should be included if their time difference does not exceed 8 hours compared to its prev or next record (depending if the date is less or greater than filter date).
For those adjacent days the selection should stop at the first record that does not satisfy this condition.
This is how my raw data looks like:
Id
Desc
EntryDate
1
Event type 1
2021-03-12 21:55:00.000
2
Event type 1
2021-03-12 01:10:00.000
3
Event type 1
2021-03-11 20:17:00.000
4
Event type 1
2021-03-11 05:04:00.000
5
Event type 1
2021-03-10 23:58:00.000
6
Event type 1
2021-03-10 11:01:00.000
7
Event type 1
2021-03-10 10:00:00.000
In this example set, if my filter date is '2021-03-11', my expected result set should be all records from that day plus adjacent records from 03-12 and 03-10 that satisfy the 8 hours condition. Note how record with Id 7 is not be included because record with Id 6 does not comply:
Id
EntryDate
2
2021-03-12 01:10:00.000
3
2021-03-11 20:17:00.000
4
2021-03-11 05:04:00.000
5
2021-03-10 23:58:00.000
Need advice how to write this complex query
This is a variant of gaps-and-islands. Define the difference . . . and then groups based on the differences:
with e as (
select t.*
from (select t.*,
sum(case when prev_entrydate > dateadd(hour, -8, entrydate) then 0 else 1 end) over (order by entrydate) as grp
from (select t.*,
lag(entrydate) over (order by entrydate) as prev_entrydate
from t
) t
)
select e.*
from e.*
where e.grp in (select e2.grp
from t e2
where date(e2.entrydate) = #filterdate
);
Note: I'm not sure exactly how filter date is applied. This assumes that it is any events on the entire day, which means that there might be multiple groups. If there is only one group (say the first group on the day), the query can be simplified a bit from a performance perspective.
declare #DateTime datetime = '2021-03-11'
select *
from t
where t.EntryDate between DATEADD(hour , -8 , #DateTime) and DATEADD(hour , 32 , #DateTime)

PostgreSQL: how do build a table with an ever-increasing daily total?

Assuming this data:
ID DATE
44 2019-12-31
45 2020-01-01
46 2020-01-01
47 2020-01-02
48 2020-01-03
48 2020-01-03
48 2020-01-03
How do I make a query that returns something like this?
TOTAL DATE
2 2020-01-01
3 2020-01-02
6 2020-01-03
I want all entries after a certain data, but with a counter that starts with the number of entries on the first day, then for every day, it adds the number of entries on that day. I want to plot them on a chart that shows the speed of the growth.
Is this possible? I'm on PostgreSQL 10.
Thanks!
You can use aggregation and window functions:
select date,
count(*) as count_on_date,
sum(count(*)) over (order by min(date)) as running_count
from t
where date >= '2020-01-01'
group by date
order by date;
If you wanted a count going back in time, then you would use:
select greatest(date, '2020-01-01'::date) as date
count(*) as count_on_date,
sum(count(*)) over (order by min(date)) as running_count
from t
group by greatest(date, '2020-01-01'::date)
order by greatest(date, '2020-01-01'::date);

Is there a method to write a SQL query that returns cumulative results based on the count of another column?

I have a query where I am counting the total number of new users signed up to a particular service each day since the service started.
So far I have:
SELECT DISTINCT CONVERT(DATE, Account_Created) AS Date_Created,
COUNT(ID) OVER (PARTITION BY CONVERT(DATE, Account_Created)) AS New_Users
FROM My_Table.dbo.NewAccts
ORDER BY Date_Created
This returns:
Date_Created | New_Users
--------------------------
2020-01-01 1
2020-01-03 3
2020-01-04 2
2020-01-06 5
2020-01-07 9
What I would like is to return a third column with a cumulative total for each day starting from the beginning until the present. So the first day there was only one new user. On January 3rd, three new users signed up for a total of four since the beginning--so on and so forth.
Date_Created | New_Users | Cumulative_Tot
------------------------------------------
2020-01-01 1 1
2020-01-03 3 4
2020-01-04 2 6
2020-01-06 5 11
2020-01-07 9 20
My thought process was to involve the ROW_NUMBER() function so that I can separate and add each consecutive row together, though I am not sure if that is correct. My feeling is that I am probably thinking about this too hard and the logic is simply just escaping me at the moment. Thank you for any help.
As a starter: I would recommend aggregation rather than DISTINCT and a window count. This makes the intent clearer, and is likely more efficient.
Then, you can make use of a window sum to compute the cumulative count.
SELECT
CONVERT(DATE, Account_Created) AS Date_Created,
COUNT(*) AS New_Users
SUM(COUNT(*)) OVER(ORDER BY CONVERT(DATE, Account_Created)) Cumulative_New_Users
FROM My_Table.dbo.NewAccts
GROUP BY CONVERT(DATE, Account_Created)
ORDER BY Date_Created

SQL Calculating time from last transaction for each ID

Hello I'm stuck trying to calculate the difference in time between each transaction for each ID.
The data looks like
Customer_ID | Transaction_Time
1 00:30
1 00:35
1 00:37
1 00:38
2 00:20
2 00:21
2 00:23
I'm trying to get the result to look something like
Customer_ID | Time_diff
1 5
1 2
1 1
2 1
2 2
I would really appreciate any help.
Thanks
Most databases support the LAG() function. However, the date/time functions can depend on the database. Here is an example for SQL Server:
select t.*
from (select t.*,
datediff(second,
lag(transaction_time) over (partition by customer_id order by transaction_time),
transaction_time
) as diff
from t
) t
where diff is not null;
The logic would be similar in most databases, although the function for calculating the time difference varies.

Assign a counter in SQL Server to records with sequential dates, and only increment when dates not sequential

I am trying to assign a Trip # to records for Customers with sequential days, and increment the Trip ID if they have a break in sequential days, and come later in the month for example. The data structure looks like this:
CustomerID Date
1 2014-01-01
1 2014-01-02
1 2014-01-04
2 2014-01-01
2 2014-01-05
2 2014-01-06
2 2014-01-08
The desired output based upon the above example dataset would be:
CustomerID Date Trip
1 2014-01-01 1
1 2014-01-02 1
1 2014-01-04 2
2 2014-01-01 1
2 2014-01-05 2
2 2014-01-06 2
2 2014-01-08 3
So if the Dates for that Customer are back-to-back, it is considered the same Trip, and has the same Trip #. Is there a way to do this in SQL Server? I am using MSSQL 2012.
My initial thoughts are to use the LAG, ROW_NUMBER, or OVER/PARTITION BY function, or even a Recursive Table Variable Function. I can paste some code, but in all honesty, my code isn't working so far. If this is a simple query, but I am just not thinking about it correctly, that would be great.
Thank you in advance.
Since Date is a DATE (ie has no hours), you could for example use DENSE_RANK() by Date - ROW_NUMBER() days which will give a constant value for continuous days, something like;
WITH cte AS (
SELECT CustomerID, Date,
DATEADD(DAY,
-ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Date),
Date) dt
FROM trips
)
SELECT CustomerID, Date,
DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY dt)
FROM cte;
An SQLfiddle to test with.