I'm new to SQL and I'm struggling with few of the questions on my exercise. How to calculate Day 7 retention rate of all players acquired on 01 October 2019. The tables shown is just a sample of what the extended table is.
My answer was:
SELECT
Session_table,
COUNT(DISTINCT user_id) as active_users,
COUNT(DISTINCT future_ user_id) as retained_users,
CAST(COUNT(DISTINCT future_ user_id) / COUNT(DISTINCT user_id) AS float) retention
FROM inapp_purchase
LEFT JOIN user_id as future_id
ON user_id = future_ user_id
AND user_id. inapp_purchase = user_id.session_table - datetime(page_views.pv_ts, '+7 day')
GROUP BY 1
There are two tables:
session_table
When each user_id starts a session , the session will have a unique session_id. Session_length_seconds tells how long a session lasted in seconds. level tells what game level the player was at the end of that session. The session_table will have one line for each session each user has (users can have multiple sessions per day each with unique session_id).
inapp_purchase
The inapp_purchase table has one line for each product that the user (denoted by user_id) purchased. There can be multiple purchases in a day per user, per session. The session_id and user_id here can link to the session_table to track active users who also make a payment. Product_id tells which product was purchased and the purchase_value tells the amount the user paid in $.
There are also three requests for the inapp_purchase table:
Calculate daily average revenue per daily active user for last 3 months
Calculate the daily average revenue per paying user year to date?
Calculate the daily conversion rate over the last month? This measure is defined by the proportion of daily active users who make a purchase on the day in question.
Please let me know if the above information is sufficient.
Thank you for your help.
Related
I'm doing some work around what we're spending on support vs. how much those users bring in and came into this unique problem.
Tables I have:
Revenue table: A row for each time a user generates revenue on the platform
Support Contacts Table: A row for each time a user contacts support and the cost associated with that contact
I'm trying to get a table at a daily grain that details...
How many users contacted support on the given day
How much revenue did all users bring in in the last 30 days?
How much did we spend on support contacts in the last 30 days?
The tough part: How much did the users who contacted support on the given day bring in in the last 30 days?
Here's what I have so far:
SELECT DISTINCT
-- Revenue generation date
r.revenue_date
-- Easy summing of contacts/revenue/costs on the given day
,COUNT(DISTINCT sc.user_pk) AS num_user_contacting_support_on_day
,SUM(r.revenue) AS all_users_revenue_for_day
,SUM(sc.support_contact_cost) AS support_costs_on_day
-- Double check that this would sum the p30d revenue/costs for the given day?
,SUM(IF(r.revenue_date BETWEEN r.revenue_date AND DATE_SUB(r.revenue_date, INTERVAL 30 DAY), c.revenue, NULL)) AS p30d_revenue_all_users
,SUM(IF(sc.support_contact_date BETWEEN r.revenue_date AND DATE_SUB(r.revenue_date, INTERVAL 30 DAY), sc.support_contact_cost, NULL)) AS p30d_support_contact_cost
-- The tough part:
-- How do I get the revenue for ONLY users who contacted support on the given day?
-- How do I get the p30d revenue for ONLY users who contacted support on the given day?
FROM revenue_table r
LEFT JOIN support_contact_table sc
ON r.revenue_date = sc.support_contact_date
GROUP BY r.revenue_date
In my database, I have three tables - page visit events, checkout started events, and checkout completed events. I'm trying to get the utm (marketing campaign) parameters out of the page visit events table and then count the # of checkout started and # of checkout completed events for that group, and sum the total revenue from checkout completed by marketing campaign, grouped by campaign and date. The tables (simplified) look like this:
# page visit events
user_id | utm_campaign_name | utm_campaign_source | timestamp
string string string | timestamp
# checkout started events
user_id | timestamp
string timestamp
# checkout completed events
user_id | timestamp | total
string timestamp int
The goal is to get all events for a day grouped by utm_campaign_name and utm_campaign_source, and then count the number of rows in the checkout started events table where the user_id existed in page visits, and then also count the number of rows in the checkout completed events table where the user_id existed in the page visits table but also sum the total from the checkout completed table.
The end result is to have a row tell me the total revenue from a campaign as well as how many people arrived based on that campaign and how many started and completed checkouts so we can also analyze conversion rates. This isn't BigQuery specific but I'm using BigQuery.
SELECT
`pages`.`utm_campaign_name`,
`pages`.`utm_campaign_source`,
TIMESTAMP_TRUNC(`pages`.`timestamp`, DAY, "UTC") AS date,
SUM(`checkout_completed`.`total`) AS revenue,
COUNT(`pages`.`user_id`) AS visitors,
COUNTIF(`checkout_completed`.`user_id` IS NOT NULL) AS checkoutStarted,
SUM(`checkout_completed`.`total`) / COUNT(`checkout_completed`.`user_id`) AS average,
FROM
`pages`
LEFT JOIN
`checkout_completed`
ON
`checkout_completed`.`user_id` = `pages`.`user_id`
AND `checkout_completed`.`user_id` IS NOT NULL
WHERE
`utm_campaign_source` IS NOT NULL
AND `pages`.`timestamp` IS NOT NULL
GROUP BY
`pages`.`context_campaign_name`,
`pages`.`context_campaign_source`,
date
ORDER BY
date DESC
This gets me close but visitors and checkoutStarted are the same value (I believe they're both visitors so I'm not getting checkoutStarted) and this also kills my confidence that I have the correct value for the average order (total / checkoutCompleted).
Note that the events tables also have rows with no attribution - ie, checkout completed could have happened with no initial utm_campaign_name so what I really want is to count and limit the checkout events to only those user_ids that come from the page visit events query that has utm_campaign... parameters.
Is this a case for subqueries (if so, any ideas?) or window functions (same - I haven't used them before) or something else? Ideas?
Given ~23 million users, what is the most efficient way to compute the cumulative number of logins within the last X months for any given day (even when no login was performed) ? Start date of a customer is its first ever login, end date is today.
Desired output
c_id day nb_logins_past_6_months
----------------------------------------------
1 2019-01-01 10
1 2019-01-02 10
1 2019-01-03 9
...
1 today 5
➔ One line per user per day with the number of logins between current day and 179 days in the past
Approach 1
1. Cross join each customer ID with calendar table
2. Left join on login table on day
3. Compute window function (i.e. `sum(nb_logins) over (partition by c_id order by day rows between 179 preceding and current row)`)
+ Easy to understand and mantain
- Really heavy, quite impossible to run on daily basis
- Incremental does not bring much benefit : still have to go 179 days in the past
Approach 2
1. Cross join each customer ID with calendar table
2. Left join on login table on day between today and 179 days in the past
3. Group by customer ID and day to get nb logins within 179 days
+ Easier to do incremental
- Table at step 2 is exceeding 300 billion rows
What is the common way to deal with this knowing this is not the only use case, we have to compute other columns like this (nb logins in the past 12 months etc.)
In standard SQL, you would use:
select l.*,
count(*) over (partition by customerid
order by login_date
range between interval '6 month' preceding and current row
) as num_logins_180day
from logins l;
This assumes that the logins table has a date of the login with no time component.
I see no reason to multiply 23 million users by 180 days to generate a result set in excess of 4 million rows to answer this question.
For performance, don't do the entire task all at once. Instead, gather subtotals at the end of each month (or day or whatever makes sense for your data). Then SUM up the subtotals to provide the 'report'.
More discussion (with a focus on MySQL): http://mysql.rjweb.org/doc.php/summarytables
(You should tag questions with the specific product; different products have different syntax/capability/performance/etc.)
I'm using the Ahoy (https://github.com/ankane/ahoy.js) gem for analytics. I'd like to calculate daily active users. The current SQL query that I have is based on the GroupDate documentation. Given the table ahoy_visits, which has columns started_at and user_id. It calculates the number of visits per day.
SELECT date_trunc('day', started_at)::date AS day, COUNT(*) AS visits FROM ahoy_visits GROUP BY day ORDER BY day
The problem is that it double counts visits from the same user on the same day. I only want to count one visit per day per user. Is there an easy way to modify this query to accomplish that goal?
You want count(distinct):
SELECT date_trunc('day', started_at)::date AS day, COUNT(DISTINCT user_id) AS visits
FROM ahoy_visits
GROUP BY day
ORDER BY day
I'm quite new to access and I am currently in the process of making a database for my company.
I have a 'Jobs' table with these fields in:
Job No.
Year Initiated
Month Initiated
Company ID
Job Description
Amount Quoted
Amount to Invoice
Invoice Number
Completed By
Cost
Profit
What I want to know Is what is the best way/ how do I calculate either in a form or query the overall profit for each month?
Please help, the database is really coming along, apart from this is well entruely stuck on.
You want to find all rows matching a specific year / month, and add together all the profit entries for that month to get a total?
If so, try this :
select sum(profit) from Jobs where year = 2013 and month = 02
Or, if you want to retrieve this information for all months in one go, try this :
select year, month, sum(profit) from Jobs group by year, month