I have a users database for user sign up time:
id, signup_time
100 2020-09-01
001 2018-01-01
....
How could I find monthly existing user for all the history record? Use the last day in the month as the cut off day, existing users means if I observe in July last day, 2020-07-31, this user had already signed up before 2020-07-01. If I observe in June last day 2020-06-30, this user had already signed up before 2020-06-01.
Similar as a for loop in other language:
observation_year_month_list = ['2020-04','2020-05','2020-06']
for i in observation_year_month_list:
if user signup_time < i:
monthly_existing_user_count+1
While PL/SQL has loops, that is a procedural language extension. SQL is a declarative language and does not use loops. Instead, you describe the results you want and the database comes up with a query plan to make it happen.
Your case is handled by group by to aggregate rows into groups. In this case by month using date_trunc. Then you use the aggregate function count to count up how many users are in each group.
select
count(id) as num_users,
date_trunc('month', signup_time) as signup_month
from users
group by date_trunc('month', signup_time)
Related
I am trying to block off a window within my script that will attribute a sale to a 7-day window. The issue that I am having is that I want the seven-day window to not include the open date so open date = 0 and the sales window begins on day 1.
Here is the current way that I am creating that window -
and oh.Order_Date >= first_open_date.first_open
and oh.Order_Date <= first_open_date.first_open + 7
If you can provide some example data I can help with a more accurate answer, but for now I hope the below will share some ideas.
Please consider the below approach, where I'm assuming your 'opens' refer to tracking whether a user has opened a marketing campaign.
select orders.*,campaigns.campaign_name
from orders_table as orders
left join
(
select distinct timestamp as open_date,campaign_name from campaign_data
) as campaigns
on orders.user_id = campaigns.user_id and campaigns.open_date < orders.order_date and campaigns.open_date >= date_sub(orders.order_date,interval 7 day)
This example is based on something similar to what I've created for work in the past, which looks at each order date in the order table and then what campaigns were opened before that date.
You may also want to consider using a window statement like row_number or dense_rank with this if you wish to pull only the first or last campaign that was opened to answer questions like "What was the last google ad a user interacted with before placing an order".
Hope this helps,
Tom
Given ~23 million users, what is the most efficient way to compute the cumulative number of logins within the last X months for any given day (even when no login was performed) ? Start date of a customer is its first ever login, end date is today.
Desired output
c_id day nb_logins_past_6_months
----------------------------------------------
1 2019-01-01 10
1 2019-01-02 10
1 2019-01-03 9
...
1 today 5
➔ One line per user per day with the number of logins between current day and 179 days in the past
Approach 1
1. Cross join each customer ID with calendar table
2. Left join on login table on day
3. Compute window function (i.e. `sum(nb_logins) over (partition by c_id order by day rows between 179 preceding and current row)`)
+ Easy to understand and mantain
- Really heavy, quite impossible to run on daily basis
- Incremental does not bring much benefit : still have to go 179 days in the past
Approach 2
1. Cross join each customer ID with calendar table
2. Left join on login table on day between today and 179 days in the past
3. Group by customer ID and day to get nb logins within 179 days
+ Easier to do incremental
- Table at step 2 is exceeding 300 billion rows
What is the common way to deal with this knowing this is not the only use case, we have to compute other columns like this (nb logins in the past 12 months etc.)
In standard SQL, you would use:
select l.*,
count(*) over (partition by customerid
order by login_date
range between interval '6 month' preceding and current row
) as num_logins_180day
from logins l;
This assumes that the logins table has a date of the login with no time component.
I see no reason to multiply 23 million users by 180 days to generate a result set in excess of 4 million rows to answer this question.
For performance, don't do the entire task all at once. Instead, gather subtotals at the end of each month (or day or whatever makes sense for your data). Then SUM up the subtotals to provide the 'report'.
More discussion (with a focus on MySQL): http://mysql.rjweb.org/doc.php/summarytables
(You should tag questions with the specific product; different products have different syntax/capability/performance/etc.)
I'm a receptionist keeping track of incoming calls in MS-Access 2010. The table has Date column. I can get count of calls per day but am having trouble with SQL to get average calls per day.
Assuming your table has one record per call, you can use a query like this, just replace the table and field names:
SELECT Avg(TotalCalls.DailyCalls) AS AverageCalls
FROM
(
SELECT MyTable.MyDateField, Count(MyTable.MyDateField) AS DailyCalls
FROM MyTable
WHERE MyDate > #1-Feb-2017# AND MyDate <= #28-Feb-2017#
GROUP BY MyTable.MyDateField
) AS TotalCalls
This won't take into account days that have no calls, just the ones that do. The WHERE clause is optional, but you might want to use that to pick a specific date range.
I have a Table called SR_Audit which holds all of the updates for each ticket in our Helpdesk Ticketing system.
The table is formatted as per the below representation:
|-----------------|------------------|------------|------------|------------|
| SR_Audit_RecID | SR_Service_RecID | Audit_text | Updated_By | Last_Update|
|-----------------|------------------|------------|------------|------------|
|........PK.......|.......FK.........|
I've constructed the below query that provides me with the appropriate output that I require in the format I want it. That is to say that I'm looking to measure how many tickets each staff member completes every day for a month.
select SR_audit.updated_by, CONVERT(CHAR(10),SR_Audit.Last_Update,101) as DateOfClose, count (*) as NumberClosed
from SR_Audit
where SR_Audit.Audit_Text LIKE '%to "Completed"%' AND SR_Audit.Last_Update >= DATEADD(day, -30, GETDATE())
group by SR_audit.updated_by, CONVERT(CHAR(10),SR_Audit.Last_Update,101)
order by CONVERT(CHAR(10),SR_Audit.Last_Update,101)
However the query has one weakness which I'm looking to overcome.
A ticket can be reopened once its completed, which means that it can be completed again. This allows a staff member to artificially inflate their score by re-opening a ticket and completing it again, thus increasing their completed ticket count by one each time they do this.
The table has a field called SR_Service_RecID which is essentially the Ticket number. I want to put a condition in the query so that each ticket is only counted once regardless of how many times its completed, while still honouring the current where clause.
I've tried sub queries and a few other methods but haven't been able to get the results I'm after.
Any assistance would be appreciated.
Cheers.
Courtenay
use as
COUNT(DISTINCT(SR_Service_RecID)) as NumberClosed
Use:
COUNT(DISTINCT SR_Service_RecID) as NumberClosed
Let's say I have a table UserActivity in SQL Server 2012 with two columns:
ActivityDateTime
UserID
I want to calculate number of distinct users with any activity in a 30-day period (my monthly active users) on a daily basis. (So I have a 30-day window that increments a day at a time. How do I do this efficiently using window functions in SQL Server?
The output would look like this:
Date,NumberActiveUsersInPrevious30Days
01-01-2010,13567
01-02-2010,14780
01-03-2010,13490
01-04-2010,15231
01-05-2010,15321
01-06-2010,14513
...
SQL Server doesn't support COUNT(DISTINCT ... ) OVER () or a numeric value (30 PRECEDING) in conjunction with RANGE
I wouldn't bother trying to coerce window functions into doing this. Because of the COUNT(DISTINCT UserID) requirement it is always going to have to re-examine the entire 30 day window for each date.
You can create a calendar table with a row for each date and use
SELECT C.Date,
NumberActiveUsersInPrevious30Days
FROM Calendar C
CROSS APPLY (SELECT COUNT(DISTINCT UserID)
FROM UserActivity
WHERE ActivityDateTime >= DATEADD(DAY, -30, C.[Date])
AND ActivityDateTime < C.[Date]) CA(NumberActiveUsersInPrevious30Days)
WHERE C.Date BETWEEN '2010-01-01' AND '2010-01-06'
Option 1: For (while) loop though each day and select 30 days backward for each (obviously quite slow).
Option 2: A separate table with a row for each day and join on the original table (again quite slow).
Option 3: Recursive CTEs or stored procs (still not doing much better).
Option 4: For (while) loop in combination with cursors (efficient, but requires some advanced SQL knowledge). With this solution you will step through each day and each row in order and keep track of the average (you'll need some sort of wrap-around array to know what value to subtract when a day moves out of range).
Option 5: Option 3 in a general-purpose / scripting programming language (C++ / Java / PHP) (easy to do with basic knowledge of one of those languages, efficient).
Some related questions.