GROUP BY date and empty data - sql

I have table hits with columns created and user_id.
I want get stats hits count for last 30 days, GROUP BY day. But I have problem, because some days user dont have traffic.
And as a result, I do not see this day in the report.
How to get data for every day (with 0 hits), even where there is no hits?
My query:
SELECT user_id, toDate(created) as date, COUNT() as count
FROM hits
WHERE created > NOW() - INTERVAL 30 DAY
GROUP BY toDate(created), user_id

Related

getting day wise query result for a certain time period in postgresql

i have a table in postgresql database called orders. where all the order related informations are stored. now, if an order gets rejected that certain order row gets moved from the orders table and gets stored in the rejected_orders table. As a result, the count function does not provide the correct number of orders.
Now, if I want to get the number of order request(s) in a certain day. I have to subtract the id numbers between the last order of the day and first order of the day. Below, i have the query for number total request for March 1st, 2022. Sadly, the previous employe forgot to save the timezone correctly in the database. Data is saved in the DB at UTC+00 timezone, Fetched data needs to be in GMT+06 timezone.
select
(select id from orders
where created_at<'2022-03-02 00:00:00+06'
order by created_at desc limit 1
)
-
(select id from orders
where created_at>='2022-03-01 00:00:00+06'
order by created_at limit 1
) as march_1st;
march_1st
-----------
185
Now,
If I want to get total request per day for certain time period(let's for month March, 2021). how can I do that in one sql query without having to write one query per day ?
To wrap-up,
total_request_per_day = id of last order of the day - id of first
order of the day.
How do I write a query based on that logic that would give me total_request_per_day for every day in a certain month.
like this,
|Date | total requests|
|01-03-2022 | 187 |
|02-03-2022 | 202 |
|03-03-2022 | 227 |
................
................
With respect, using id numbers to determine numbers of rows in a time period is incorrect. DELETEing rows leaves gaps in id number sequences; they are not designed for this purpose.
This is a job for date_trunc(), COUNT(*), and GROUP BY.
The date_trunc('day', created_at) function turns an arbitrary timestamp into midnight on its day. For example, it turns ``2022-03-02 16:41:00into2022-03-02 00:00:00`. Using that we can write the query this way.
SELECT COUNT(*) order_count,
date_trunc('day', created_at) day
FROM orders
WHERE created_at >= date_trunc('day', NOW()) - INTERVAL '7 day'
AND created_at < date_trunc('day', NOW())
GROUP BY date_trunc('day', created_at)
This query gives the number of orders on each day in the last 7 days.
Every minute you spend learning how to use SQL data arithmetic like this will pay off in hours saved in your work.
Try this :
SELECT d.ref_date :: date AS "date"
, count(*) AS "total requests"
FROM generate_series('20220301' :: timestamp, '20220331' :: timestamp, '1 day') AS d(ref_date)
LEFT JOIN orders
ON date_trunc('day', d.ref_date) = date_trunc('day', created_at)
GROUP BY d.ref_date
generate_series() generates the list of reference days where you
want to count the number of orders
Then you join with the orders table by comparing the reference date with the created_at date on year/month/day only. LEFT JOIN allows you to select reference days with no existing order.
Finally you count the number of orders per day by grouping by reference day.

Calculate Rolling retention with SQL (BigQuery)

I have a table of logins with such columns:
id - unique id of user
day - days passed since registration (0-30)
Each record in this table is a record of a user logged in, so there might be same rows (because user can log in multiple times a day). So I have to calculate how much users have logged in on some day of their life (30 days) or any other day later (rolling retention). Output table should contain columns with days (1-30) and amount of users. If user logged in on 30th day, we count him as retained on every day before 30. Any ideas? :)
Try this one:
select single_day, count(distinct id)
from mytable, unnest(generate_array(1, day)) as single_day
group by single_day

Time looping an average

I have a table with 17,000 records that is ordered by time spaced in 15 minute intervals. The time values loop back onto themselves every 24 hours, so for example, I could have 100 records that are all at 1 AM, just on different days. I want to create a 'average day' by taking those 100 records at 1 am and finding the average of them for the averaged 1 am.
I don't know how to format the table to make it show up nicely here.
I'm assuming you want to calculate the average value per time interval regardless of the day in a query. You could use this SQL to group your table by Time interval only (assuming that it's separate from the date field), and average whichever fields you want to average. Do not select or group by the date field, just select and group by the time field.
SELECT TimeField
, AVG([Field1ToAverage])
, AVG([Field2ToAverage])
FROM MyTable
GROUP BY TimeField;
If the date and time fields are stored together in the same column, you will have to extract the time only:
SELECT TimeValue([DateTimeField])
, AVG([Field1ToAverage])
, AVG([Field2ToAverage])
FROM MyTable
GROUP BY TimeValue([DateTimeField]);

Bigquery SQL for sliding window aggregate

Hi I have a table that looks like this
Date Customer Pageviews
2014/03/01 abc 5
2014/03/02 xyz 8
2014/03/03 abc 6
I want to get page view aggregates grouped by week but showing aggregates for past 30 days - (sliding window aggregates with window-size of 30 days for every week)
I am using google bigquery
EDIT: Gordon - re your comment about "Customer", Actually what I need is slightly more complicated thats why I included customer in the table above. I am looking to get the number of customers who had >n pageviews in a 30day window every week. something like this
Date Customers>10 pageviews in 30day window
2014/02/01 10
2014/02/08 5
2014/02/15 6
2014/02/22 15
However to keep it simple, I will work my way if I could just get a sliding window aggregate of pageviews ignoring customers altogether. something like this
Date count of pageviews in 30day window
2014/02/01 50
2014/02/08 55
2014/02/15 65
2014/02/22 75
How about this:
SELECT changes + changes1 + changes2 + changes3 changes28days, login, USEC_TO_TIMESTAMP(week)
FROM (
SELECT changes,
LAG(changes, 1) OVER (PARTITION BY login ORDER BY week) changes1,
LAG(changes, 2) OVER (PARTITION BY login ORDER BY week) changes2,
LAG(changes, 3) OVER (PARTITION BY login ORDER BY week) changes3,
login,
week
FROM (
SELECT SUM(payload_pull_request_changed_files) changes,
UTC_USEC_TO_WEEK(created_at, 1) week,
actor_attributes_login login,
FROM [publicdata:samples.github_timeline]
WHERE payload_pull_request_changed_files > 0
GROUP BY week, login
))
HAVING changes28days > 0
For each user it counts how many changes they have submitted per week. Then with LAG() we can peek into the next row, how many changes they submitted the -1, -2, and -3 week. Then we just add those 4 weeks to see how many changes were submitted on the last 28 days.
Now you can wrap everything in a new query to filter users with changes>X, and count them.
I have created the following "Times" table:
Table Details: Dim_Periods
Schema
Date TIMESTAMP
Year INTEGER
Month INTEGER
day INTEGER
QUARTER INTEGER
DAYOFWEEK INTEGER
MonthStart TIMESTAMP
MonthEnd TIMESTAMP
WeekStart TIMESTAMP
WeekEnd TIMESTAMP
Back30Days TIMESTAMP -- the date 30 days before "Date"
Back7Days TIMESTAMP -- the date 7 days before "Date"
and I use such query to handle "running sums"
SELECT Date,Count(*) as MovingCNT
FROM
(SELECT Date,
Back7Days
FROM DWH.Dim_Periods
where Date < timestamp(current_date()) AND
Date >= (DATE_ADD (CURRENT_TIMESTAMP(), -5, 'month'))
)P
CROSS JOIN EACH
(SELECT repository_url,repository_created_at
FROM publicdata:samples.github_timeline
) L
WHERE timestamp(repository_created_at)>= Back7Days
AND timestamp(repository_created_at)<= Date
GROUP EACH BY Date
Note that it can be used for "Month to date", Week to Date" "30 days back" etc. aggregations as well.
However, performance is not the best and the query can take a while on larger data sets due to the Cartesian join.
Hope this helps

use of week of year & subsquend in bigquery

I need to show distinct users per week. I have a date-visit column, and a user id, it is a big table with 1 billion rows.
I can change the date column from the CSVs to year,month, day columns. but how do I deduce the week from that in the query.
I can calculate the week from the CSV, but this is a big process step.
I also need to show how many distinct users visit day after day, looking for workaround as there is no date type.
any ideas?
To get the week of year number:
SELECT STRFTIME_UTC_USEC(TIMESTAMP('2015-5-19'), '%W')
20
If you have your date as a timestamp (i.e microseconds since the epoch) you can use the UTC_USEC_TO_DAY/UTC_USEC_TO_WEEK functions. Alternately, if you have an iso-formatted date string (e.g. "2012/03/13 19:00:06 -0700") you can call PARSE_UTC_USEC to turn the string into a timestamp and then use that to get the week or day.
To see an example, try:
SELECT LEFT((format_utc_usec(day)),10) as day, cnt
FROM (
SELECT day, count(*) as cnt
FROM (
SELECT UTC_USEC_TO_DAY(PARSE_UTC_USEC(created_at)) as day
FROM [publicdata:samples.github_timeline])
GROUP BY day
ORDER BY cnt DESC)
To show week, just change UTC_USEC_TO_DAY(...) to UTC_USEC_TO_WEEK(..., 0) (the 0 at the end is to indicate the week starts on Sunday). See the documentation for the above functions at https://developers.google.com/bigquery/docs/query-reference for more information.