getting day wise query result for a certain time period in postgresql - sql

i have a table in postgresql database called orders. where all the order related informations are stored. now, if an order gets rejected that certain order row gets moved from the orders table and gets stored in the rejected_orders table. As a result, the count function does not provide the correct number of orders.
Now, if I want to get the number of order request(s) in a certain day. I have to subtract the id numbers between the last order of the day and first order of the day. Below, i have the query for number total request for March 1st, 2022. Sadly, the previous employe forgot to save the timezone correctly in the database. Data is saved in the DB at UTC+00 timezone, Fetched data needs to be in GMT+06 timezone.
select
(select id from orders
where created_at<'2022-03-02 00:00:00+06'
order by created_at desc limit 1
)
-
(select id from orders
where created_at>='2022-03-01 00:00:00+06'
order by created_at limit 1
) as march_1st;
march_1st
-----------
185
Now,
If I want to get total request per day for certain time period(let's for month March, 2021). how can I do that in one sql query without having to write one query per day ?
To wrap-up,
total_request_per_day = id of last order of the day - id of first
order of the day.
How do I write a query based on that logic that would give me total_request_per_day for every day in a certain month.
like this,
|Date | total requests|
|01-03-2022 | 187 |
|02-03-2022 | 202 |
|03-03-2022 | 227 |
................
................

With respect, using id numbers to determine numbers of rows in a time period is incorrect. DELETEing rows leaves gaps in id number sequences; they are not designed for this purpose.
This is a job for date_trunc(), COUNT(*), and GROUP BY.
The date_trunc('day', created_at) function turns an arbitrary timestamp into midnight on its day. For example, it turns ``2022-03-02 16:41:00into2022-03-02 00:00:00`. Using that we can write the query this way.
SELECT COUNT(*) order_count,
date_trunc('day', created_at) day
FROM orders
WHERE created_at >= date_trunc('day', NOW()) - INTERVAL '7 day'
AND created_at < date_trunc('day', NOW())
GROUP BY date_trunc('day', created_at)
This query gives the number of orders on each day in the last 7 days.
Every minute you spend learning how to use SQL data arithmetic like this will pay off in hours saved in your work.

Try this :
SELECT d.ref_date :: date AS "date"
, count(*) AS "total requests"
FROM generate_series('20220301' :: timestamp, '20220331' :: timestamp, '1 day') AS d(ref_date)
LEFT JOIN orders
ON date_trunc('day', d.ref_date) = date_trunc('day', created_at)
GROUP BY d.ref_date
generate_series() generates the list of reference days where you
want to count the number of orders
Then you join with the orders table by comparing the reference date with the created_at date on year/month/day only. LEFT JOIN allows you to select reference days with no existing order.
Finally you count the number of orders per day by grouping by reference day.

Related

Create a Series of Dates between two Dates in a table - SQL

I have a table like this:
I want to list the rows per day between their Start Date and and End Date and Total Payment divided by number of days (I assume I would need a window function partition by name here). But my main concern is how to create those series of dates for each name based on their Start Date and End Date.
Using the table above I would like the output to look like this:
Consider a range join with count window function to spread out total by days:
SELECT t."Name",
t."Total Payment" / COUNT(dates) OVER(PARTITION BY t."Name") AS Payment,
t."Start Date",
t."End Date",
dates AS "Date of"
FROM generate_series(
timestamp without time zone '2022-01-01',
timestamp without time zone '2022-12-31',
'1 day'
) AS dates
INNER JOIN my_table t
ON dates BETWEEN t."Start Date" AND t."End Date"
You can get what your after is a single query by generate_series for getting each day, and by just subtracting the 2 dates. (Since you seem to want both dates included in the day count an additional 1 needs added).
select name, (total_payment/( (end_date-start_date) +1))::numeric(6,2), start_date, end_date, d::date date_of
from test t
cross join generate_series(t.start_date
,t.end_date
,interval ' 1 day'
) gs(d)
order by name desc, date_of;
See demo. I leave for you what to do when the total_payment is not a multiple of the number of days. The demo just ignores it.

GROUP BY date and empty data

I have table hits with columns created and user_id.
I want get stats hits count for last 30 days, GROUP BY day. But I have problem, because some days user dont have traffic.
And as a result, I do not see this day in the report.
How to get data for every day (with 0 hits), even where there is no hits?
My query:
SELECT user_id, toDate(created) as date, COUNT() as count
FROM hits
WHERE created > NOW() - INTERVAL 30 DAY
GROUP BY toDate(created), user_id

Bigquery SQL for sliding window aggregate

Hi I have a table that looks like this
Date Customer Pageviews
2014/03/01 abc 5
2014/03/02 xyz 8
2014/03/03 abc 6
I want to get page view aggregates grouped by week but showing aggregates for past 30 days - (sliding window aggregates with window-size of 30 days for every week)
I am using google bigquery
EDIT: Gordon - re your comment about "Customer", Actually what I need is slightly more complicated thats why I included customer in the table above. I am looking to get the number of customers who had >n pageviews in a 30day window every week. something like this
Date Customers>10 pageviews in 30day window
2014/02/01 10
2014/02/08 5
2014/02/15 6
2014/02/22 15
However to keep it simple, I will work my way if I could just get a sliding window aggregate of pageviews ignoring customers altogether. something like this
Date count of pageviews in 30day window
2014/02/01 50
2014/02/08 55
2014/02/15 65
2014/02/22 75
How about this:
SELECT changes + changes1 + changes2 + changes3 changes28days, login, USEC_TO_TIMESTAMP(week)
FROM (
SELECT changes,
LAG(changes, 1) OVER (PARTITION BY login ORDER BY week) changes1,
LAG(changes, 2) OVER (PARTITION BY login ORDER BY week) changes2,
LAG(changes, 3) OVER (PARTITION BY login ORDER BY week) changes3,
login,
week
FROM (
SELECT SUM(payload_pull_request_changed_files) changes,
UTC_USEC_TO_WEEK(created_at, 1) week,
actor_attributes_login login,
FROM [publicdata:samples.github_timeline]
WHERE payload_pull_request_changed_files > 0
GROUP BY week, login
))
HAVING changes28days > 0
For each user it counts how many changes they have submitted per week. Then with LAG() we can peek into the next row, how many changes they submitted the -1, -2, and -3 week. Then we just add those 4 weeks to see how many changes were submitted on the last 28 days.
Now you can wrap everything in a new query to filter users with changes>X, and count them.
I have created the following "Times" table:
Table Details: Dim_Periods
Schema
Date TIMESTAMP
Year INTEGER
Month INTEGER
day INTEGER
QUARTER INTEGER
DAYOFWEEK INTEGER
MonthStart TIMESTAMP
MonthEnd TIMESTAMP
WeekStart TIMESTAMP
WeekEnd TIMESTAMP
Back30Days TIMESTAMP -- the date 30 days before "Date"
Back7Days TIMESTAMP -- the date 7 days before "Date"
and I use such query to handle "running sums"
SELECT Date,Count(*) as MovingCNT
FROM
(SELECT Date,
Back7Days
FROM DWH.Dim_Periods
where Date < timestamp(current_date()) AND
Date >= (DATE_ADD (CURRENT_TIMESTAMP(), -5, 'month'))
)P
CROSS JOIN EACH
(SELECT repository_url,repository_created_at
FROM publicdata:samples.github_timeline
) L
WHERE timestamp(repository_created_at)>= Back7Days
AND timestamp(repository_created_at)<= Date
GROUP EACH BY Date
Note that it can be used for "Month to date", Week to Date" "30 days back" etc. aggregations as well.
However, performance is not the best and the query can take a while on larger data sets due to the Cartesian join.
Hope this helps

use of week of year & subsquend in bigquery

I need to show distinct users per week. I have a date-visit column, and a user id, it is a big table with 1 billion rows.
I can change the date column from the CSVs to year,month, day columns. but how do I deduce the week from that in the query.
I can calculate the week from the CSV, but this is a big process step.
I also need to show how many distinct users visit day after day, looking for workaround as there is no date type.
any ideas?
To get the week of year number:
SELECT STRFTIME_UTC_USEC(TIMESTAMP('2015-5-19'), '%W')
20
If you have your date as a timestamp (i.e microseconds since the epoch) you can use the UTC_USEC_TO_DAY/UTC_USEC_TO_WEEK functions. Alternately, if you have an iso-formatted date string (e.g. "2012/03/13 19:00:06 -0700") you can call PARSE_UTC_USEC to turn the string into a timestamp and then use that to get the week or day.
To see an example, try:
SELECT LEFT((format_utc_usec(day)),10) as day, cnt
FROM (
SELECT day, count(*) as cnt
FROM (
SELECT UTC_USEC_TO_DAY(PARSE_UTC_USEC(created_at)) as day
FROM [publicdata:samples.github_timeline])
GROUP BY day
ORDER BY cnt DESC)
To show week, just change UTC_USEC_TO_DAY(...) to UTC_USEC_TO_WEEK(..., 0) (the 0 at the end is to indicate the week starts on Sunday). See the documentation for the above functions at https://developers.google.com/bigquery/docs/query-reference for more information.

PostgreSQL - Getting statistical data

I need to collect some statistical information in my application.
I have a table of users (tb_user)
Every time a new user accesses the application, it adds a new record in this table, ie, one line for each user. The main field are id and date_hour (timestamp for the first time user accessed the application).
tb_user
id (bigint) | date_time (timestamp with time zone)
1 | 2012-01-29 11:29:50.359-03
2 | 2012-01-31 14:27:10.359-03
I need get:
amount average users by day, week and month
Example:
by day: 55.45
by week : XX.XX
month: XX.XX
EDIT:
My best solution was:
WITH daily_count AS (SELECT COUNT(id) AS user_count FROM tb_user)
SELECT user_count, tbaux2.days, (user_count/tbaux2.days) FROM daily_count,
(SELECT EXTRACT(DAY FROM (t2.diff) ) + 1 AS days
FROM
(with tbaux AS(SELECT min(date_time) AS min FROM tb_user)
SELECT (now() - min) AS diff
FROM tbaux) AS t2) AS tbaux2
GROUP BY user_count, tbaux2.days
But this solution only worked with EXTRACT (DAY ... With weeks and month did not work
Any help is welcome.
Alternatively:
SELECT user_count, tbaux2.days, (user_count/tbaux2.days) AS userPerDay, ((user_count/tbaux2.days) * 7) AS userPerWeek, ((user_count/tbaux2.days) * 30) AS userPerMonth
EDIT 2:
Based on responses from #Bruno, there are some considerations:
When I asked the question, in really I requested a way to select data by day, month and year. I believe that the search that I posted and #Bruno refined, should be interpreted as average of "a day, every 7 days and every 30 days" and not by days, weeks and months. I believe that if it is interpreted in this way, there not will be problems of gender-quoted in example (10% drop). I believe this approach of "every" is answer I need in moment, so will sign this answer.
I suggest as an improvement of post:
Consider only closed day in result (not collect users of the current day, and not counting the current day in division)
The result is two numeric digits.
New research considering a data really per week and per month.
Thanks.
You should look into aggregate functions (min, max, count, avg), which go hand in hand with GROUP BY. For date-based aggregations, date_trunc is also useful.
For example, this will return the number of rows per day:
SELECT date_trunc('day', date_time) AS day_start,
COUNT(id) AS user_count FROM tb_user
GROUP BY date_trunc('day', date_time);
You can then do the daily average using something like this (with a CTE):
WITH daily_count AS (SELECT date_trunc('day', date_time) AS day_start,
COUNT(id) AS user_count FROM tb_user
GROUP BY date_trunc('day', date_time))
SELECT AVG(user_count) FROM daily_count;
Use 'week' instead of day for the weekly counts, and so on (see date_trunc documentation).
EDIT: (Following comment: average up to and including 5/1/2012, i.e. before the 6th.)
WITH daily_count AS (SELECT date_trunc('day', date_time) AS day_start,
COUNT(id) AS user_count
FROM tb_user
WHERE date_time >= DATE('2012-01-01') AND date_time < DATE('2012-01-06')
GROUP BY date_trunc('day', date_time))
SELECT SUM(user_count)/(DATE('2012-01-06') - DATE('2012-01-01')) FROM daily_count;
What's above is over-complicated, in this case. This should give you the same result:
SELECT COUNT(id)/(DATE('2012-01-06') - DATE('2012-01-01'))
FROM tb_user
WHERE date_time >= DATE('2012-01-01') AND date_time < DATE('2012-01-06');
EDIT 2: After your edit, I guess what you're after is just a single global average for the entire period of existence of your database, rather than groups by month/week/day.
This should give you the average number of rows per day:
WITH total_min_max AS (SELECT
COUNT(id) AS total_visits,
MIN(date_time) AS first_date_time,
MAX(date_time) AS last_date_time,
FROM tb_user)
SELECT total_visits/((last_date_time::date-first_date_time::date)+1) AS users_per_day
FROM total_min_max
(I would replace last_date_time with NOW() to make the average over the time until now, rather than until the last visit, if there's no recent visit.)
Then, for daily, weekly, and "monthly":
WITH daily_avg AS (
WITH total_min_max AS (SELECT
COUNT(id) AS total_visits,
MIN(date_time) AS first_date_time,
MAX(date_time) AS last_date_time,
FROM tb_user)
SELECT total_visits/((last_date_time::date-first_date_time::date)+1) AS users_per_day
FROM total_min_max)
SELECT
users_per_day,
(users_per_day * 7) AS users_per_week,
(users_per_month * 30) AS users_per_month
FROM daily_avg
This being said, conclusions you draw from such statistics might not be great, especially if you want to see how it changes.
I would also normalise the data per day rather than assuming 30 days in a month (if not per hour, because not all days have 24 hours). Say you have 10 visits per day in Jan 2011 and 10 visits per day in Feb 2011. That gives you 310 visits in Jan and 280 visits in Feb. If you don't pay attention, you could think you've had a almost a 10% drop in terms of number of visitors, so something went wrong in Feb, when really, this isn't the case.