Sales amounts of the top n selling vendors by month in bigquery - google-bigquery

i have a table in bigquery like this (260000 rows):
vendor date item_price
x 2021-07-08 23:41:10 451,5
y 2021-06-14 10:22:10 41,7
z 2020-01-03 13:41:12 74
s 2020-04-12 01:14:58 88
....
exactly what I want is to group this data by month and find the sum of the sales of only the top 20 vendors in that month. Expected output:
month sum_of_only_top20_vendor's_sales
2020-01 7857
2020-02 9685
2020-03 3574
2020-04 7421
.....

Consider below approach
select month, sum(sale) as sum_of_only_top20_vendor_sales
from (
select vendor,
format_datetime('%Y%m', date) month,
sum(item_price) as sale
from your_table
group by vendor, month
qualify row_number() over(partition by month order by sale desc) <= 20
)
group by month

Another solution that potentially can show much much better performance on really big data:
select month,
(select sum(sum) from t.top_20_vendors) as sum_of_only_top20_vendor_sales
from (
select
format_datetime('%Y%m', date) month,
approx_top_sum(vendor, item_price, 20) top_20_vendors
from your_table
group by month
) t
or with a little refactoring
select month, sum(sum) as sum_of_only_top20_vendor_sales
from (
select
format_datetime('%Y%m', date) month,
approx_top_sum(vendor, item_price, 20) top_20_vendors
from your_table
group by month
) t, t.top_20_vendors
group by month

Related

How to use SUM() OVER (partition by)?

Imagine, from 1st to 3rd november you have sold a certain amount of goods (there are two types A and B), and now you need to determine how much was sold in total for the day.
How can I query last 2 columns (sum and quantity for date) that my table looks like this?:
Date Type Quantity Amount Sum_Quantity Sum_Amount
01-11 A 2 100 5 300
01-11 B 3 200 5 300
02-11 A 1 700 3 950
02-11 B 2 250 3 950
03-11 A 2 600 7 800
03-11 B 5 200 7 800
And how can I query, if I want to take the results partitioned by month?
SELECT date,
type,
quantity,
amount,
-- Partition by date
SUM(quantity) OVER (PARTITION BY date) AS sum_quantity_date_part,
SUM(amount) OVER (PARTITION BY date) AS sum_amount_date_part,
-- Partition by month
SUM(quantity) OVER (
PARTITION BY EXTRACT(YEAR FROM date),
EXTRACT(MONTH FROM date)
) AS sum_quantity_month_part,
SUM(amount) OVER (
PARTITION BY EXTRACT(YEAR FROM date),
EXTRACT(MONTH FROM date)
) AS sum_amount_month_part
FROM sales
ORDER BY date, type
;

How to calculate average number of actions in selected month per client in Teradata SQL?

I have table with transactions in Teradata SQL like below:
ID | trans_date
-------------------
123 | 2021-09-15
456 | 2021-10-20
777 | 2021-11-02
890 | 2021-02-14
... | ...
And I need to calculate average number of transactions made by clients in month: 09, 10 and 11, so as a result I need something like below:
Month | Avg_num_trx
--------------------------------------------------------
09 | *average number of transactions per client in month 09*
10 | *average number of transactions per client in month 10*
11 | *average number of transactions per client in month 11*
How can I do taht in Teradata SQL ?
Not as familiar with Teradata, you could probably start by extracting the month from the trans_date, then grouping id and month and adding in count(id). From there you could group month by avg(count_id). Something like this -
WITH extraction AS(
SELECT
ID,
EXTRACT (MONTH FROM trans_date) AS MM
FROM your_table)
,
WITH id_counter AS(
SELECT
ID,
MM,
COUNT(ID) as id_count
FROM extraction
GROUP BY ID, MM)
SELECT
MM,
AVG(id_count) AS Avg_num_trx
FROM id_counter
ORDER BY MM;
The first CTE grabs month from trans_date.
The second CTE groups ID and month with count(ID) - should give you the total actions in that month for that client ID as id_count.
The final table gets the average of id_count grouped by month, which should be the average interactions per client for the period.
If EXTRACT doesn't work for some reason you could also try STRTOK(trans_date, '-', 2).
Other potential methods to replace -
--current
EXTRACT (MONTH FROM trans_date) AS MM
--option 1
STRTOK(trans_date, '-', 2) AS MM
--option 2
LEFT(RIGHT(trans_date, 5),2) AS MM
Above reworked as subqueries - should help with debugging -
SELECT
MM,
AVG(id_count) AS Avg_num_trx
FROM (SELECT
ID,
MM,
COUNT(ID) as id_count
FROM (SELECT
ID,
EXTRACT (MONTH FROM trans_date) AS MM
FROM your_table) AS a
GROUP BY ID, MM) AS b
ORDER BY MM;
This will return the expected answer:
SELECT
Extract (MONTH From trans_date) AS MM,
Cast(Count(*) AS FLOAT) / Count(DISTINCT id)
FROM my_table
GROUP BY MM
Compare to #procopypaster's answer too see which one is more efficient for your data.

SQL Bigquery Counting repeated customers from transaction table

I have a transaction table that looks something like this.
userid
orderDate
amount
111
2021-11-01
20
112
2021-09-07
17
111
2021-11-21
17
I want to count how many distinct customers (userid) that bought from our store this month also bought from our store in the previous month. For example, in February 2020, we had 20 customers and out of these 20 customers 7 of them also bought from our store in the previous month, January 2020. I want to do this for all the previous months so ending up with something like.
year
month
repeated customers
2020
01
11
2020
02
7
2020
03
9
I have written this but this only works for only the current month. How would I iterate or rewrite it to get the table as shown above.
WITH CURRENT_PERIOD AS (
SELECT DISTINCT userid
FROM table1
WHERE DATE(orderDate) BETWEEN DATE_TRUNC(CURRENT_DATE(),MONTH) AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
),
PREVIOUS_PERIOD AS (
SELECT DISTINCT userid
FROM table1
WHERE DATE(orderDate) BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH),MONTH) AND LAST_DAY(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH))
)
SELECT count(1)
FROM CURRENT_PERIOD RC
WHERE RC.userid IN (SELECT DISTINCT userid FROM PREVIOUS_PERIOD)
You can summarize to get one record per month, use lag(), and then aggregate:
select yyyymm,
countif(prev_yyyymm = date_add(yyyymm, interval -1 month)
from (select userid, date_trunc(order_date, month) as yyyymm,
lag(date_trunc(order_date, month)) over (partition by userid order by date_trunc(order_date, month)) as prev_yyyymm
from table1
group by 1, 2
) t
group by yyyymm
order by yyyymm;

postgreSQL- Count for value between previous month start date and end date

I have a table as follows
user_id date month year visiting_id
123 11-04-2017 APRIL 2017 4500
123 12-05-2017 MAY 2017 4567
123 13-05-2017 MAY 2017 4568
123 17-05-2017 MAY 2017 4569
123 22-05-2017 MAY 2017 4570
123 11-06-2017 JUNE 2017 4571
123 12-06-2017 JUNE 2017 4572
I want to calculate the visiting count for the current month and last month at the monthly level as follows:
user_id month year visit_count_this_month visit_count_last_month
123 APRIL 2017 1 0
123 MAY 2017 4 1
123 JUNE 2017 2 4
I was able to calculate visit_count_this_month using the following query
SELECT v.user_id, v.month, v.year,
SUM(is_visit_this_month) as visit_count_this_month
FROM
(SELECT user_id, date, month, year,
CASE WHEN TO_CHAR(date, 'MM/YYYY') = TO_CHAR(date, 'MM/YYYY')
THEN 1 ELSE 0
END as is_visit_this_month
FROM visits
GROUP BY user_id, date, month, year
HAVING user_id = 123) v
GROUP BY v.user_id, v.month, v.year
However, I'm stuck with calculating visit_count_last_month. Similar to this, I also want to calculate visit_count_last_2months.
Can somebody help?
You can use a LATERAL JOIN like this:
SELECT user_id, month, year, COUNT(*) as visit_count_this_month, visit_count_last_month
FROM visits v
CROSS JOIN LATERAL (
SELECT COUNT(*) as visit_count_last_month
FROM visits
WHERE user_id = v.user_id
AND date = (CAST(v.date AS date) - interval '1 month')
) l
GROUP BY user_id, month, year, visit_count_last_month;
SQLFiddle - http://sqlfiddle.com/#!15/393c8/2
Assuming there are values for every month, you can get the counts per month first and use lag to get the previous month's values per user.
SELECT T.*
,COALESCE(LAG(visits,1) OVER(PARTITION BY USER_ID ORDER BY year,mth),0) as last_month_visits
,COALESCE(LAG(visits,2) OVER(PARTITION BY USER_ID ORDER BY year,mth),0) as last_2_month_visits
FROM (
SELECT user_id, extract(month from date) as mth, year, COUNT(*) as visits
FROM visits
GROUP BY user_id, extract(month from date), year
) T
If there can be missing months, it is best to generate all months within a specified timeframe and left join ing the table on to that. (This example shows it for all the months in 2017).
select user_id,yr,mth,visits
,coalesce(lag(visits,1) over(PARTITION BY USER_ID ORDER BY yr,mth),0) as last_month_visits
,coalesce(lag(visits,2) OVER(PARTITION BY USER_ID ORDER BY yr,mth),0) as last_2_month_visits
from (select u.user_id,extract(year from d.dt) as yr, extract(month from d.dt) as mth,count(v.visiting_id) as visits
from generate_series(date '2017-01-01', date '2017-12-31',interval '1 month') d(dt)
cross join (select distinct user_id from visits) u
left join visits v on extract(month from v.dt)=extract(month from d.dt) and extract(year from v.dt)=extract(year from d.dt) and u.user_id=v.user_id
group by u.user_id,extract(year from d.dt), extract(month from d.dt)
) t

Oracle SQL Query:Find out which year total sales amount is maximum

my working table, Table name: sales
Here Is MY TABLE, [sl_no is primary key] table structure:
CREATE TABLE SALES
( SL_NO NUMBER PRIMARY KEY, REGION VARCHAR2(10) NOT NULL,
MONTH VARCHAR2(20) NOT NULL, YEAR NUMBER NOT NULL,
SALES_AMOUNT NUMBER NOT NULL )
and here is table data:
SQL> select * from sales;
SL_NO REGION MONTH YEAR SALES_AMOUNT
---------- ---------- -------------------- ---------- ------------
1 east december 2011 750000
2 east august 2011 800000
3 west january 2012 640000
5 east march 2012 1200000
6 west february 2011 580000
4 west april 2011 555000
6 rows selected.
I have tried this query to view total sales amount of those[2011,2012] year;
SELECT year, SUM(sales_amount) FROM sales GROUP BY year;
YEAR SUM(SALES_AMOUNT)
---------- -----------------
2011 2685000
2012 1840000
MY GOAL:> I want to find out the year of maximum sales amount.
I tried this,and work perfectly...but when i want to display that year also, it gives an Error.
SQL> select max(sum(sales_amount)) from sales group by year;
MAX(SUM(SALES_AMOUNT))
----------------------
2685000
SQL> select year, max(sum(sales_amount)) from sales group by year;
select year, max(sum(sales_amount)) from sales group by year
*
ERROR at line 1:
ORA-00937: not a single-group group function
Extra addition: if multiple rows have same value means....when sales amount of both year[2011,2012] remain same, Then....
plZ help me to Solve this problem.
This should work.
with yr_agg as (
select year, sum(sales_amount) as total
from sales
group by year
)
select year, total as max_total
from yr_agg
where total = (select max(total)
from yr_agg);
I think the simplest way is to order the results and take the first row:
select year, sales_amount
from (SELECT year, SUM(sales_amount) as sales_amount
FROM sales
GROUP BY year
order by sum(sales_amount) desc
) t
where rownum = 1;
EDIT:
If you need to display all the matching rows (which isn't mentioned in the question), I would suggest using the dense_rank() analytic function:
select year, sales_amount
from (SELECT year, SUM(sales_amount) as sales_amount,
dense_rank(over order by SUM(sales_amount) desc) as seqnum
FROM sales
GROUP BY year
order by sum(sales_amount) desc
) t
where seqnum = 1;
Or, you might like the max() version instead:
select year, sales_amount
from (SELECT year, SUM(sales_amount) as sales_amount,
max(sum(sales_amount)) over () as maxsa
FROM sales
GROUP BY year
order by sum(sales_amount) desc
) t
where sales_amount = maxsa;
Following select should do what you need (untested, do not have Oracle at home):
select year, total
from (
select year, sum(sales_amount) total
from sales
group by year
)
where total = (select max(total_amount)
from (
select year, sum(sales_amount) total_amount
from sales
group by year
))
Take in account, though, that it might give you different years in each execution if two of them have exactly the same total amount. You might want to include some more conditions to avoid this.
Here is my Query where multiple row can select
SELECT year,MAX(total_sale) as max_total
FROM
(SELECT year,SUM(sales_amount) AS total_sale FROM sales GROUP BY year)
GROUP BY
year HAVING MAX(total_sale) =
(SELECT MAX(total_sale) FROM (SELECT SUM(sales_amount) AS total_sale FROM sales GROUP BY year));