Calculate Rolling retention with SQL (BigQuery) - sql

I have a table of logins with such columns:
id - unique id of user
day - days passed since registration (0-30)
Each record in this table is a record of a user logged in, so there might be same rows (because user can log in multiple times a day). So I have to calculate how much users have logged in on some day of their life (30 days) or any other day later (rolling retention). Output table should contain columns with days (1-30) and amount of users. If user logged in on 30th day, we count him as retained on every day before 30. Any ideas? :)

Try this one:
select single_day, count(distinct id)
from mytable, unnest(generate_array(1, day)) as single_day
group by single_day

Related

SQL in BigQuery - How to loop through every month in a timeframe and calculate some metrics (i.e. count userID)

I am doing a test to calculate user retention for a bank. While doing so, I can certainly calculate the number of churn users and number of loyal users for a certain date period (for example, from 1 jan to 1 feb). However, the timeframe has every transactions/activities of every unique user from 1 jan 2016 to 1 jun 2018. I know we could do a loop in sql to automatically loop through every month but it is hard to do so with the calculation (with new users joining every month and the new users of month 1 would either be loyal user or churn in month 2).
Could anybody shed a light for me?

GROUP BY date and empty data

I have table hits with columns created and user_id.
I want get stats hits count for last 30 days, GROUP BY day. But I have problem, because some days user dont have traffic.
And as a result, I do not see this day in the report.
How to get data for every day (with 0 hits), even where there is no hits?
My query:
SELECT user_id, toDate(created) as date, COUNT() as count
FROM hits
WHERE created > NOW() - INTERVAL 30 DAY
GROUP BY toDate(created), user_id

SQLite - Determine average sales made for each day of week

I am trying to produce a query in SQLite where I can determine the average sales made each weekday in the year.
As an example, I'd say like to say
"The average sales for Monday are $400.50 in 2017"
I have a sales table - each row represents a sale you made. You can have multiple sales for the same day. Columns that would be of interest here:
Id, SalesTotal, DayCreated, MonthCreated, YearCreated, CreationDate, PeriodOfTheDay
Day/Month/Year are integers that represent the day/month/year of the week. DateCreated is a unix timestamp that represents the date/time it was created too (and is obviously equal to day/month/year).
PeriodOfTheDay is 0, or 1 (day, or night). You can have multiple records for a given day (typically you can have at most 2 but some people like to add all of their sales in individually, so you could have 5 or more for a day).
Where I am stuck
Because you can have two records on the same day (i.e. a day sales, and a night sales, or multiple of each) I can't just group by day of the week (i.e. group all records by Saturday).
This is because the number of sales you made does not equal the number of days you worked (i.e. I could have worked 10 saturdays, but had 30 sales, so grouping by 'saturday' would produce 30 sales since 30 records exist for saturday (some just happen to share the same day)
Furthermore, if I group by daycreated,monthcreated,yearcreated it works in the sense it produces x rows (where x is the number of days you worked) however that now means I need to return this resultset to the back end and do a row count. I'd rather do this in the query so I can take the sales and divide it by the number of days you worked.
Would anyone be able to assist?
Thanks!
UPDATE
I think I got it - I would love someone to tell me if I'm right:
SELECT COUNT(DISTINCT CAST(( julianday((datetime(CreationDate / 1000, 'unixepoch', 'localtime'))) ) / 7 AS INT))
FROM Sales
WHERE strftime('%w', datetime(CreationDate / 1000, 'unixepoch'), 'localtime') = '6'
AND YearCreated = 2017
This would produce the number for saturday, and then I'd just put this in as an inner query, dividing the sale total by this number of days.
Buddy,
You can group your query by getting the day of week and week number of day created or creation date.
In MSSQL
DATEPART(WEEK,'2017-08-14') // Will give you week 33
DATEPART(WEEKDAY,'2017-08-14') // Will give you day 2
In MYSQL
WEEK('2017-08-14') // Will give you week 33
DAYOFWEEK('2017-08-14') // Will give you day 2
See this figures..
Day of Week
1-Sunday, 2- Monday, 3-Tuesday, 4-Wednesday, 5-Thursday, 6-Saturday
Week Number
1 - 53 Weeks in a year
This will be the key so that you will have a separate Saturday's in every month.
Hope this can help in building your query.

use of week of year & subsquend in bigquery

I need to show distinct users per week. I have a date-visit column, and a user id, it is a big table with 1 billion rows.
I can change the date column from the CSVs to year,month, day columns. but how do I deduce the week from that in the query.
I can calculate the week from the CSV, but this is a big process step.
I also need to show how many distinct users visit day after day, looking for workaround as there is no date type.
any ideas?
To get the week of year number:
SELECT STRFTIME_UTC_USEC(TIMESTAMP('2015-5-19'), '%W')
20
If you have your date as a timestamp (i.e microseconds since the epoch) you can use the UTC_USEC_TO_DAY/UTC_USEC_TO_WEEK functions. Alternately, if you have an iso-formatted date string (e.g. "2012/03/13 19:00:06 -0700") you can call PARSE_UTC_USEC to turn the string into a timestamp and then use that to get the week or day.
To see an example, try:
SELECT LEFT((format_utc_usec(day)),10) as day, cnt
FROM (
SELECT day, count(*) as cnt
FROM (
SELECT UTC_USEC_TO_DAY(PARSE_UTC_USEC(created_at)) as day
FROM [publicdata:samples.github_timeline])
GROUP BY day
ORDER BY cnt DESC)
To show week, just change UTC_USEC_TO_DAY(...) to UTC_USEC_TO_WEEK(..., 0) (the 0 at the end is to indicate the week starts on Sunday). See the documentation for the above functions at https://developers.google.com/bigquery/docs/query-reference for more information.

Single SQL Server query to get the total score by hourly, daily, weekly, monthly, and annual

I have the following requirement :
single SQL Server query to get the total score by hourly, daily, weekly, monthly, and annual data.
this is how the result should be:
thisperiodtotalscore previousperiodtotalscore sumtotalscore periodType
which meets the following criteria:
where score comes from the table data, totalscore has to be summed up for the different members of different teams,
where period can be hourly, daily, weekly, monthly, and annual and hourly is determined by the working hour definition, say it can be any number of selected hours (example: 1st working hour, 2nd working hour, and so on..)
and weekly determined by working days definition, say it can be wednesday to wednesday,..)
and likewise for monthy and annual
If the period has empty data, say if it is a holiday/leave that particular period should not be skipped in the count.
Note:
thisperiodtotalscore (this period total score can be for any of the periods (hourly, daily, weekly, monthly, annual) for the user date input) - say the corresponding week score of the user input, the corresponding month score of the user date input, .. likewise
previousperiodtotalscore (previous period total score can be for any of the periods (hourly, daily, weekly, monthly, annual) for the user date input) - say previous week score of the user date input, previous month score of the user date input,.. likewise
sumtotalscore - total of the thisperiodtotalscore and previousperiodtotalscore
periodType - hourly, daily, weekly, monthly, annual, based on the period request type
and which meets the following criteria:
where score comes from the table data, totalscore has to be summed up for the different members of different teams,
where period can be hourly, daily, weekly, monthly, and annual
and hourly is determined by the working hour definition, say it can be any number of selected hours (example: 1st working hour, 2nd working hour, and so on..)
and weekly determined by working days definition, say it can be wednesday to wednesday,..)
and likewise for monthly and annual
If the period has empty data, say if it is a holiday/leave that particular period should not be skipped in the count.
This is the requirement, also Welcome for other possible cases if missed in such kind of scenarios.
Thanks in advance,
GravityPush
For start, take a look at DimDate in AdventureWorksDW sample. You can have a similar table in your database and write some join queries. I suggest use queries which are dynamically grouped by different columns of DimDate.