Count first occurrences in time (SQL) - sql

I have a table like this
+----+---------------------+
| Id | Date application |
+----+---------------------+
| 1 | 2016-08-22 03:05:06 |
| 2 | 2016-08-22 03:05:06 |
| 1 | 2016-08-23 03:05:06 |
| 2 | 2016-08-23 03:05:06 |
+----+---------------------+
I would like to find out when was the first application for each user (ID)
and then to count how many occurred in the past 7 days
so far here is what I have
SELECT id,
min(date_of_application)
FROM mytable
GROUP BY id
ORDER BY date_of_application ASC
Will the min() work on dates ?
From there, how do I count how many first applications there are in the past 7 days ?

Please tag your database. min() will work on dates.
Assuming your is mysql db here is what you can do to get the application usage count in the past 7 days from now.
select
id, count(*) as 'appUsageCount'
from
mytable
where
dateApplication >= DATE(DATE_SUB(NOW(),INTERVAL 7 DAY))
and date_of_application <= DATE(NOW()))
group by id

#Neeraj: Using your query with little modification.
Try this:
select
id, count(id) as 'appUsageCount', min(date_of_application)
from
mytable
where
date_of_application >= DATE(DATE_SUB(NOW(),INTERVAL 7 DAY))
and date_of_application <= DATE(NOW()))
group by id

Related

Running sum of unique users in redshift

I have a table with as follows with user visits by day -
| date | user_id |
|:-------- |:-------- |
| 01/31/23 | a |
| 01/31/23 | a |
| 01/31/23 | b |
| 01/30/23 | c |
| 01/30/23 | a |
| 01/29/23 | c |
| 01/28/23 | d |
| 01/28/23 | e |
| 01/01/23 | a |
| 12/31/22 | c |
I am looking to get a running total of unique user_id for the last 30 days . Here is the expected output -
| date | distinct_users|
|:-------- |:-------- |
| 01/31/23 | 5 |
| 01/30/23 | 4 |
.
.
.
Here is the query I tried -
SELECT date
, SUM(COUNT(DISTINCT user_id)) over (order by date rows between 30 preceding and current row) AS unique_users
FROM mytable
GROUP BY date
ORDER BY date DESC
The problem I am running into is that this query not counting the unique user_id - for instance the result I am getting for 01/31/23 is 9 instead of 5 as it is counting user_id 'a' every time it occurs.
Thank you, appreciate your help!
Not the most performant approach, but you could use a correlated subquery to find the distinct count of users over a window of the past 30 days:
SELECT
date,
(SELECT COUNT(DISTINCT t2.user_id)
FROM mytable t2
WHERE t2.date BETWEEN t1.date - INTERVAL '30 day' AND t1.date) AS distinct_users
FROM mytable t1
ORDER BY date;
There are a few things going on here. First window functions run after group by and aggregation. So COUNT(DISTINCT user_id) gives the count of user_ids for each date then the window function runs. Also, window function set up like this work over the past 30 rows, not 30 days so you will need to fill in missing dates to use them.
As to how to do this - I can only think of the "expand to the data so each date and id has a row" method. This will require a CTE to generate the last 2 years of dates plus 30 days so that the look-back window works for the first dates. Then window over the past 30 days for each user_id and date to see which rows have an example of this user_id within the past 30 days, setting the value to NULL if no uses of the user_id are present within the window. Then Count the user_ids counts (non NULL) grouping by just date to get the number of unique user_ids for that date.
This means expanding the data significantly but I see no other way to get truly unique user_ids over the past 30 days. I can help code this up if you need but will look something like:
WITH RECURSIVE CTE to generate the needed dates,
CTE to cross join these dates with a distinct set of all the user_ids in user for the past 2 years,
CTE to join the date/user_id data set with the table of real data for past 2 years and 30 days and window back counting non-NULL user_ids, partition by date and user_id, order by date, and setting any zero counts to NULL with a DECODE() or CASE statement,
SELECT, grouping by just date count the user_ids by date;

How to aggregate based on various conditions

lets say I have a table which stores itemID, Date and total_shipped over a period of time:
ItemID | Date | Total_shipped
__________________________________
1 | 1/20/2000 | 2
2 | 1/20/2000 | 3
1 | 1/21/2000 | 5
2 | 1/21/2000 | 4
1 | 1/22/2000 | 1
2 | 1/22/2000 | 7
1 | 1/23/2000 | 5
2 | 1/23/2000 | 6
Now I want to aggregate based on several periods of time. For example, I Want to know how many of each item was shipped every two days and in total. So the desired output should look something like:
ItemID | Jan20-Jan21 | Jan22-Jan23 | Jan20-Jan23
_____________________________________________
1 | 7 | 6 | 13
2 | 7 | 13 | 20
How do I do that in the most efficient way
I know I can make three different subqueries but I think there should be a better way. My real data is large and there are several different time periods to be considered i. e. in my real problem I want the shipped items for current_week, last_week, two_weeks_ago, three_weeks_ago, last_month, two_months_ago, three_months_ago so I do not think writing 7 different subqueries would be a good idea.
Here is the general idea of what I can already run but is very expensive for the database
WITH
sq1 as (
SELECT ItemID, sum(Total_shipped) sum1
FROM table
WHERE Date BETWEEN '1/20/2000' and '1/21/2000'
GROUP BY ItemID),
sq2 as (
SELECT ItemID, sum(Total_Shipped) sum2
FROM table
WHERE Date BETWEEN '1/22/2000' and '1/23/2000'
GROUP BY ItemID),
sq3 as(
SELECT ItemID, sum(Total_Shipped) sum3
FROM Table
GROUP BY ItemID)
SELECT ItemID, sq1.sum1, sq2.sum2, sq3.sum3
FROM Table
JOIN sq1 on Table.ItemID = sq1.ItemID
JOIN sq2 on Table.ItemID = sq2.ItemID
JOIN sq3 on Table.ItemID = sq3.ItemID
I dont know why you have tagged this question with multiple database.
Anyway, you can use conditional aggregation as following in oracle:
select
item_id,
sum(case when "date" between date'2000-01-20' and date'2000-01-21' then total_shipped end) as "Jan20-Jan21",
sum(case when "date" between date'2000-01-22' and date'2000-01-23' then total_shipped end) as "Jan22-Jan23",
sum(case when "date" between date'2000-01-20' and date'2000-01-23' then total_shipped end) as "Jan20-Jan23"
from my_table
group by item_id
Cheers!!
Use FILTER:
select
item_id,
sum(total_shipped) filter (where date between '2000-01-20' and '2000-01-21') as "Jan20-Jan21",
sum(total_shipped) filter (where date between '2000-01-22' and '2000-01-23') as "Jan22-Jan23",
sum(total_shipped) filter (where date between '2000-01-20' and '2000-01-23') as "Jan20-Jan23"
from my_table
group by 1
item_id | Jan20-Jan21 | Jan22-Jan23 | Jan20-Jan23
---------+-------------+-------------+-------------
1 | 7 | 6 | 13
2 | 7 | 13 | 20
(2 rows)
Db<>fiddle.

Calculate time span over a number of records

I have a table that has the following schema:
ID | FirstName | Surname | TransmissionID | CaptureDateTime
1 | Billy | Goat | ABCDEF | 2018-09-20 13:45:01.098
2 | Jonny | Cash | ABCDEF | 2018-09-20 13:45.01.108
3 | Sally | Sue | ABCDEF | 2018-09-20 13:45:01.298
4 | Jermaine | Cole | PQRSTU | 2018-09-20 13:45:01.398
5 | Mike | Smith | PQRSTU | 2018-09-20 13:45:01.498
There are well over 70,000 records and they store logs of transmissions to a web-service. What I'd like to know is how would I go about writing a script that would select the distinct TransmissionID values and also show the timespan between the earliest CaptureDateTime record and the latest record? Essentially I'd like to see what the rate of records the web-service is reading & writing.
Is it even possible to do so in a single SELECT statement or should I just create a stored procedure or report in code? I don't know where to start aside from SELECT DISTINCT TransmissionID for this sort of query.
Here's what I have so far (I'm stuck on the time calculation)
SELECT DISTINCT [TransmissionID],
COUNT(*) as 'Number of records'
FROM [log_table]
GROUP BY [TransmissionID]
HAVING COUNT(*) > 1
Not sure how to get the difference between the first and last record with the same TransmissionID I would like to get a result set like:
TransmissionID | TimeToCompletion | Number of records |
ABCDEF | 2.001 | 5000 |
Simply GROUP BY and use MIN / MAX function to find min/max date in each group and subtract them:
SELECT
TransmissionID,
COUNT(*),
DATEDIFF(second, MIN(CaptureDateTime), MAX(CaptureDateTime))
FROM yourdata
GROUP BY TransmissionID
HAVING COUNT(*) > 1
Use min and max to calculate timespan
SELECT [TransmissionID],
COUNT(*) as 'Number of records',datediff(s,min(CaptureDateTime),max(CaptureDateTime)) as timespan
FROM [log_table]
GROUP BY [TransmissionID]
HAVING COUNT(*) > 1
A method that returns the average time for all transmissionids, even those with only 1 record:
SELECT TransmissionID,
COUNT(*),
DATEDIFF(second, MIN(CaptureDateTime), MAX(CaptureDateTime)) * 1.0 / NULLIF(COUNT(*) - 1, 0)
FROM yourdata
GROUP BY TransmissionID;
Note that you may not actually want the maximum of the capture date for a given transmissionId. You might want the overall maximum in the table -- so you can consider the final period after the most recent record.
If so, this looks like:
SELECT TransmissionID,
COUNT(*),
DATEDIFF(second,
MIN(CaptureDateTime),
MAX(MAX(CaptureDateTime)) OVER ()
) * 1.0 / COUNT(*)
FROM yourdata
GROUP BY TransmissionID;

SQL get the time of different rows

I want to do a select that gives me the time of an employee resolving a ticket.
The problem is that the ticket is divided in actions, so its not only getting the time of a row, it can be from n rows.
This is an abbreviation of what I have:
Tickets
TicketID | Days | Hours | Minutes
------------------------------------------------
12 | 0 | 2 | 32
12 | 1 | 0 | 12
12 | 4 | 6 | 0
13 | 2 | 5 | 12
13 | 0 | 2 | 33
And this is what I want to get:
TicketID | Time (in minutes)
------------------------------------------------
12 | 2994
13 | 1425
(Or just one row with the condition where specifying TicketID)
This is the select that im doing right now:
select distinct ((Days*8)*60) + (Hours*60) + Minutes from Tickets where ticketid = 12
But is not working as I want.
select ticketid, sum((Days*8)*60), sum((Hours*60)), sum (Minutes)
from tickets
group by ticketid
select TicketID, sum((Days*8)*60) + sum(Hours*60) + sum(Minutes) as Time_in_minutes
from Tickets
group by TicketID
Distinct, as you were trying before, takes each row in the source table (Tickets) and filters out all of the duplicate rows. Instead, you are trying to sum up the days, minutes, and hours for each ticket. So sum them up, and group by the ticket number.
Try this:
SELECT TicketID, (Sum(Minutes)+(Sum(Hours)*60)+(sum(Days)*24*60) ) time
FROM Tickets Group by TicketID

SQL Query AVG Date Time In same Table Column

I’m trying to make a query that returns the difference of days to get the average of days in a period of time. This is the situation I need to get the max date from the status 2 and the max date from the status 3 from a request and get how much time the user spend on that period of time
So far this is the query I Have right now I get the mas and min and the difference between the days but are not the max of the status 2 and the max of status 3
Query I have so far:
SELECT distinct t1.user, t1.Request,
Min(t1.Time) as MinDate,
Max(t1.Time) as MaxDate,
DATEDIFF(day, MIN(t1.Time), MAX(t1.Time))
FROM [Hst_Log] t1
where t1.Request = 146800
GROUP BY t1.Request, t1.user
ORDER BY t1.user, max(t1.Time) desc
Example table:
-------------------------------
user | Request | Status | Time
-------------------------------
User 1 | 2 | 1 | 6/1/15 3:25 PM
User 2 | 1 | 1 | 2/1/15 3:24 PM
User 2 | 3 | 1 | 2/1/15 3:24 PM
User 1 | 4 | 1 | 5/10/15 3:18 PM
User 3 | 3 | 2 | 5/4/15 2:36 PM
User 2 | 2 | 2 | 6/4/15 2:34 PM
User 3 | 2 | 3 | 6/10/15 5:51 PM
User 1 | 1 | 2 | 5/1/15 5:49 PM
User 3 | 4 | 2 | 5/16/15 2:39 PM
User 2 | 4 | 2 | 5/17/15 2:32 PM
User 2 | 3 | 2 | 4/6/15 2:22 PM
User 2 | 3 | 3 | 4/7/15 2:06 PM
-------------------------------
I will appreciate all the help
You'll need to use subqueries since the groups for the min and max times are different. One query will pull the min value where the status is 2. Another will pull the max value where the status is 3.
Something like this:
SELECT MinDt.[User], minDt.MinTime, MaxDt.MaxTime, datediff(d,minDt.MinTime, MaxDt.MaxTime) as TimeSpan
FROM
(SELECT t1.[user], t1.Request,
Min(t1.Time) as MinTime
FROM [Hst_Log] t1
where t1.Request = 146800
and t1.[status] = 2
GROUP BY t1.Request, t1.[user]) MinDt
INNER JOIN
(SELECT t1.[user], t1.Request,
Max(t1.Time) as MaxTime
FROM [Hst_Log] t1
where t1.[status] = 3
GROUP BY t1.Request, t1.[user]) MaxDt
ON MinDt.[User] = MaxDt.[User] and minDt.Request = maxDt.Request
something like this?
(mysql)
SELECT t.*,MAX(t.UFecha), x.*,y.*,Min(t.UFecha) as MinDate,
Max(t.UFecha) as MaxDate,
avg(x.Expr2+y.Expr3),//?????
DATEDIFF(MIN(t.UFecha), MAX(t.UFecha)) AS Expr1
FROM `app_upgrade_hst_log` t
left join(select count(*),Request, DATEDIFF(MIN(UFecha), MAX(UFecha)) AS Expr2 FROM `app_upgrade_hst_log` where Status=1 group by Request,Status) x on t.Request= x.Request
left join(select count(*),Request, DATEDIFF(MIN(UFecha), MAX(UFecha)) AS Expr3 FROM `app_upgrade_hst_log` where Status=2) y on t.Request= y.Request
group by t.Request,t.Status
What is the SQL-Server version? Maybe you could use your query as CTE and do a follow-up SELECT where you can use the Min and Max date as date period.
EDIT: Exampel
WITH myCTE AS
(
put your query here
)
SELECT * FROM myCTE
You can use myCTE for further joins too, pick out the needed date, use sub-select, what ever... AND: have a look on the OVER-link, could be helpfull...
Depending on the version you could also think about using OVER
https://msdn.microsoft.com/en-us/library/ms189461.aspx