BIGQUERY SQL How to count date range per hour total - sql

I want to count total number of placed orders between date range 01 -31 days per hour in a day
Customer_placed_order_datetime
01 01:10:38
01 01:12:38
02 01:14:30
31 23:42:22
Example outcome would be like
Date 01-31
Date 01-31 total orders
1 hour 500
2 hour 300
and so forth.. Thank you

Consider below approach
select
format_datetime('%Y-%m', Customer_placed_order_datetime) year_month,
extract(hour from Customer_placed_order_datetime) hour,
count(*) total_orders
from your_table
group by year_month, hour
if applied to sample data as in your question
with your_table as (
select datetime '2021-12-01 01:10:38' Customer_placed_order_datetime union all
select '2021-12-01 01:12:38' union all
select '2021-12-02 01:14:30' union all
select '2021-12-31 23:42:22'
)
output is

Related

Calculate percentage in SQL Oracle

I have the following data and I am trying to find the percentage of purchases on weekdays vs weekends in ORACLE SQL per ID
ID
DAY
total per day
1
weekday
78
1
weekend
20
2
weekday
13
2
weekend
37
The output i am expecting is:
ID
DAY
percentage per day
1
weekday
79
1
weekend
20
2
weekday
26
2
weekend
74
the percentage is calculated by the ( total per day / sum (total per day) ) for each ID. what is the best way to do it.
Appreciate your help!
Use SUM as an analytic function:
SELECT id,
day,
100 * total_per_day / SUM(total_per_day) OVER (PARTITION BY id)
AS percentage
FROM table_name
Which, for the sample data:
CREATE TABLE table_name (ID, DAY, total_per_day) AS
SELECT 1, 'weekday', 78 FROM DUAL UNION ALL
SELECT 1, 'weekend', 20 FROM DUAL UNION ALL
SELECT 2, 'weekday', 13 FROM DUAL UNION ALL
SELECT 2, 'weekend', 37 FROM DUAL;
Outputs:
ID
DAY
PERCENTAGE
1
weekday
79.59183673469387755102040816326530612245
1
weekend
20.40816326530612244897959183673469387755
2
weekday
26
2
weekend
74
db<>fiddle here

SQL Bigquery Counting repeated customers from transaction table

I have a transaction table that looks something like this.
userid
orderDate
amount
111
2021-11-01
20
112
2021-09-07
17
111
2021-11-21
17
I want to count how many distinct customers (userid) that bought from our store this month also bought from our store in the previous month. For example, in February 2020, we had 20 customers and out of these 20 customers 7 of them also bought from our store in the previous month, January 2020. I want to do this for all the previous months so ending up with something like.
year
month
repeated customers
2020
01
11
2020
02
7
2020
03
9
I have written this but this only works for only the current month. How would I iterate or rewrite it to get the table as shown above.
WITH CURRENT_PERIOD AS (
SELECT DISTINCT userid
FROM table1
WHERE DATE(orderDate) BETWEEN DATE_TRUNC(CURRENT_DATE(),MONTH) AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
),
PREVIOUS_PERIOD AS (
SELECT DISTINCT userid
FROM table1
WHERE DATE(orderDate) BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH),MONTH) AND LAST_DAY(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH))
)
SELECT count(1)
FROM CURRENT_PERIOD RC
WHERE RC.userid IN (SELECT DISTINCT userid FROM PREVIOUS_PERIOD)
You can summarize to get one record per month, use lag(), and then aggregate:
select yyyymm,
countif(prev_yyyymm = date_add(yyyymm, interval -1 month)
from (select userid, date_trunc(order_date, month) as yyyymm,
lag(date_trunc(order_date, month)) over (partition by userid order by date_trunc(order_date, month)) as prev_yyyymm
from table1
group by 1, 2
) t
group by yyyymm
order by yyyymm;

Add Missing monthly dates in a timeseries data in Postgresql

I have monthly time series data in table where dates are as a last day of month. Some of the dates are missing in the data. I want to insert those dates and put zero value for other attributes.
Table is as follows:
id report_date price
1 2015-01-31 40
1 2015-02-28 56
1 2015-04-30 34
2 2014-05-31 45
2 2014-08-31 47
I want to convert this table to
id report_date price
1 2015-01-31 40
1 2015-02-28 56
1 2015-03-31 0
1 2015-04-30 34
2 2014-05-31 45
2 2014-06-30 0
2 2014-07-31 0
2 2014-08-31 47
Is there any way we can do this in Postgresql?
Currently we are doing this in Python. As our data is growing day by day and its not efficient to handle I/O just for one task.
Thank you
You can do this using generate_series() to generate the dates and then left join to bring in the values:
with m as (
select id, min(report_date) as minrd, max(report_date) as maxrd
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select m.*, generate_series(minrd, maxrd, interval '1' month) as report_date
from m
) m left join
t
on m.report_date = t.report_date;
EDIT:
Turns out that the above doesn't quite work, because adding months to the end of month doesn't keep the last day of the month.
This is easily fixed:
with t as (
select 1 as id, date '2012-01-31' as report_date, 10 as price union all
select 1 as id, date '2012-04-30', 20
), m as (
select id, min(report_date) - interval '1 day' as minrd, max(report_date) - interval '1 day' as maxrd
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select m.*, generate_series(minrd, maxrd, interval '1' month) + interval '1 day' as report_date
from m
) m left join
t
on m.report_date = t.report_date;
The first CTE is just to generate sample data.
This is a slight improvement over Gordon's query which fails to get the last date of a month in some cases.
Essentially you generate all the month end dates between the min and max date for each id (using generate_series) and left join on this generated table to show the missing dates with 0 price.
with minmax as (
select id, min(report_date) as mindt, max(report_date) as maxdt
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select *,
generate_series(date_trunc('MONTH',mindt+interval '1' day),
date_trunc('MONTH',maxdt+interval '1' day),
interval '1' month) - interval '1 day' as report_date
from minmax
) m
left join t on m.report_date = t.report_date
Sample Demo

PostgreSQL group by and order by

I have a table with a date column. I wanted to get the count of months and display them in the order of months. Months should be displayed as 'Jan', 'Feb' etc. If I use to_char function, the order by happens on text. I can use extract(month from dt), but that will also display month in number format. This is part of a report and month should be displayed in 'Mon' format only.
SELECT to_char(dt,'Mon'), COUNT(*) FROM tb GROUP BY to_char(dt,'Mon') ORDER BY to_char(dt,'Mon');
to_char | count
---------+-------
Dec | 1
Jan | 1
Jul | 2
select month, total
from (
select
extract(month from dt) as month_number,
to_char(dt,'mon') as month,
count(*) as total
from tb
group by 1, 2
) s
order by month_number

Calculating difference between daily sum and a average for the same day of the week in defined time range. SQL 10g Oracle

Hi I'm working with data depending mostly on the day of the week. Data is formatted in a table
Date - position - count/number.
There are multiple different positions.
I was able to sort my data for a each day of the week using.
select MOD(to_char(time, 'J'),7),
sum(COUNT))
from TABLE
where time > sysdate -x
group by to_char(time, 'J')
order by to_char(time, 'J');
This outputs daily sums according to day of the week.
Now I'm able to get an average for a single day of a week in a year.
This code outputs an average for only Sunday
SELECT AVG(asset_sums)
FROM (
select MOD(to_char(time, 'J'),7),
sum(COUNT)) as asset_sums
from table
where time > sysdate -365
and MOD(TO_CHAR(time, 'J'), 7) + 1 IN (7)
group by to_char(time, 'J')
order by to_char(time, 'J')
);
My goal is to be able to get a table with daily sum compared with yearly average for that particular day of the week.
For example yearly average number for Mondays is 57 , Tuesdays 60.
This week my Monday is 59 and Tuesday is 57. Output of the table is
Monday +2, Tuesday -3.
What is the easiest way / most efficient ?
Thanks for your help.
Edit : Format of my data
Date : yyyy-mm-dd | Place : xxxx | Number( of customers) 0 to 10000
2013-09-16 | AAAA | 1534
2013-09-16 | AAAB | 534
2013-09-17 | AAAA | 1434
2013-09-17 | AAAC | 834
2013-09-18 | AAAA | 134
2013-09-18 | AAAD | 183
Needed output
2013-09-16 | Day of the week | Sum | Average monday this year | Difference Sum-AVG
2013-09-16 | 1 (= Monday) | 2068 | 2015| 53
For clarity I will use subquery factoring. First, select the current weeks data. Next, subquery the sum for the day over the current week. Then, subquery the sum for each day over the past year. Then, average the daily sum of each day for each day of the week. Finally, join the two and display the difference.
with
this_week as (
select
time
from table
where time > x - 7
group by time
),
this_week_dly_sum as (
select
to_char(time, 'd') day,
sum(count) sum
from this_week
group by to_char(time, 'd')
),
this_year_dly_sum as (
select
time,
sum(count) sum
from table
where time > x - 365
group by time
),
this_year_dly_avg as (
select
to_char(day, 'd'),
avg(sum) avg
from this_year_dly_sum
group by to_char(day, 'd')
)
select
this_week.time,
to_char(this_week.time, 'day') day of week,
this_week_dly_sum.sum,
this_year_dly_avg.avg,
this_week_dly_sum.sum - this_year_dly_avg.avg difference
from this_week
inner join this_week_dly_sum
on to_char(this_week.time, 'd') = this_week_dly_sum.day
inner join this_year_dly_avg
on to_char(this_week.time, 'd').day = this_year_dly_avg.
group by time
;
You can use analytic function for this.
select date1, to_char(date1, 'd'),
sum(val) over(partition by to_char(date1, 'd')),
avg(val) over(partition by to_char(date1, 'd')),
sum(val) over(partition by to_char(date1, 'd'))-
avg(val) over(partition by to_char(date1, 'd'))
from table1
time > add_month(sysdate,-12);
This will give you daily counts for the last year:
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
You can modify it to additionally return averages per day of the week for the specified range:
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER
(PARTITION BY TO_CHAR(TRUNC(time, 'DD'), 'D')) AS asset_sum_avg
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
At this point you have all the initial data you need but probably for more days than necessary. You can use the above query as a derived table to limit the rows to just those where date > SYSDATE - x:
WITH last_year_by_day AS
(
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER
(PARTITION BY TO_CHAR(TRUNC(time, 'DD'), 'D')) AS asset_sum_avg
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
)
SELECT date,
TO_CHAR(TRUNC(time, 'DD'), 'D') AS day_of_week,
asset_sum,
asset_sum_avg,
asset_sum - asset_sum_avg AS asset_sum_diff
FROM last_year_by_day
WHERE date > SYSDATE - x
;
As some expressions are being repeated multiple times, it can be a good idea to re-factor the query to avoid the repetition. Here's one way:
WITH last_year AS
(
SELECT TRUNC(time, 'DD') AS date,
TO_CHAR(time, 'D') AS day_of_week,
count
FROM yourtable
WHERE time > SYSDATE - 365
),
last_year_by_day AS
(
SELECT date,
day_of_week,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER (PARTITION BY day_of_week) AS asset_sum_avg
FROM last_year
GROUP BY date, day_of_week
)
SELECT date,
day_of_week,
asset_sum,
asset_sum_avg,
asset_sum - asset_sum_avg AS asset_sum_diff
FROM last_year_by_day
WHERE date > SYSDATE - x
;
One last note is about TO_CHAR('D'), which is used to obtain the day_of_week values. Since you are using a different method for the same results, you may not be aware that the results of TO_CHAR('D') are affected by the NLS_TERRITORY setting. You may want to use an ALTER SESSION statement to set NLS_TERRITORY to the value that would cause TO_CHAR('D') to return 1 for Monday, 2 for Tuesday etc. Here is the list of territories supported.