I am trying to extract itemised sales data for the past 12 months and build a dynamic table with column headers for each month ID. Extracting the data as below works, however when I get to the point of creating a SUM column for each month ID, I get stuck. I have tried to find similar questions but I'm not sure of the best approach.
Select Item, Qty, format(Transaction Date,'MMM-yy')
from Transactions
Data Extract:
Item
Qty
Month ID
A123
50
Apr-22
A123
30
May-22
A123
50
Jun-22
A321
50
Apr-22
A999
25
May-22
A321
10
Jun-22
Desired Output:
Item
Apr-22
May-22
Jun-22
A123
50
30
50
A321
50
Null
10
A999
Null
25
Null
Any advice would be greatly appreciated.
This is a typical case of pivot operation, where you
first filter every value according to your "Month_ID" value
then aggregate on common "Item"
WITH cte AS (
SELECT Item, Qty, FORMAT(Transaction Date,'MMM-yy') AS Month_ID
FROM Transactions
)
SELECT Item,
MAX(CASE WHEN Month_ID = 'Apr-22' THEN Qty END) AS [Apr-22],
MAX(CASE WHEN Month_ID = 'May-22' THEN Qty END) AS [May-22],
MAX(CASE WHEN Month_ID = 'Jun-22' THEN Qty END) AS [Jun-22]
FROM cte
GROUP BY Item
Note: you don't need the SUM as long as there's only one value for each couple <"Item", "Month-Year">.
The pivot and unpivot functions in snowflake are not efficient for processing 30+ unique columns into row based.
Use case : I have 35 different month columns which needs to be rows based , another 35 columns will be quantity for the corresponding month .
So at the and there will be 2 columns(one for month data and another for quantity) for 70 unique columns
there would be aggregation of quantity based on month
But unpivoting is not at all efficient. The below query is scanning 15 GB of data from the main table used
select part_num ,concat(date_part(year, dates),'-',date_part(month, dates)) as month_year,
sum(quantity) as quantities
from table_name
unpivot(dates for cols in (month_1, 30 other uniue cols)),
unpivot(quantity for cols in (qunatity_1, 30 other uniue cols)),
group by part_num, month_year
Is there any other approach to unpivot large dataset.
Thanks
Alternative approach could be using conditional aggregation:
with cte as (
select part_num
,concat(date_part(year, dates),'-',date_part(month, dates)) as month_year
,sum(quantity) as quantities
from table_name
group by part_num, month_year
)
SELECT part_num
-- lowest date
,'2020-01' AS "2020-01"
,MAX(IFF(month_year='2020-01', quantities, NULL) AS "quantities_2020-01"
-- next date
,...
-- last date
,'2022-04' AS "2022-04"
,MAX(IFF(month_year='2022-04', quantities, NULL) AS "quantities_2022-04"
FROM cte
GROUP BY part_num;
Version using single GROUP BY and TO_VARCHAR with format:
SELECT part_num
-- lowest date
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM'),'2020-01',NULL) AS "2020-01"
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM')='2020-01',quantities,NULL) AS "quantities_2020-01"
-- next date
,...
-- last date
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM'),'2022-04',NULL) AS "2022-04"
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM')='2022-04',quantities,NULL) AS "quantities_2022-04"
FROM table_name
GROUP BY part_num;
So if we get some example DATA and test is what is happening is what is wanted..
Here is a trival and tiny CTE worth of data
with table_name(part_num, month_1, month_2, month_3, qunatity_1, qunatity_2, qunatity_3) as (
select * from values
(1, '2022-01-01'::date, '2022-02-01'::date, '2022-03-01'::date, 4, 5, 6)
)
now pointing your SQL at it (after making it compile)
select
part_num
,to_char(dates, 'yyyy-mm') as month_year
,sum(quantity) as quantities
from table_name
unpivot(dates for month in (month_1, month_2, month_3))
unpivot(quantity for quan in (qunatity_1, qunatity_2, qunatity_3))
group by part_num, month_year
gives:
PART_NUM
MONTH_YEAR
QUANTITIES
1
2022-01
15
1
2022-02
15
1
2022-03
15
which is not what I think you are after.
If we look at the un aggregated rows:
PART_NUM
MONTH
DATES
QUAN
QUANTITY
1
MONTH_1
2022-01-01
QUNATITY_1
4
1
MONTH_1
2022-01-01
QUNATITY_2
5
1
MONTH_1
2022-01-01
QUNATITY_3
6
1
MONTH_2
2022-02-01
QUNATITY_1
4
1
MONTH_2
2022-02-01
QUNATITY_2
5
1
MONTH_2
2022-02-01
QUNATITY_3
6
1
MONTH_3
2022-03-01
QUNATITY_1
4
1
MONTH_3
2022-03-01
QUNATITY_2
5
1
MONTH_3
2022-03-01
QUNATITY_3
6
we are getting a cross join, which is not what I believe you are wanting.
my understanding is you want a relationship between month (1-35) and quantity (1-35)
thus a mix like:
PART_NUM
MONTH
DATES
QUAN
QUANTITY
1
MONTH_1
2022-01-01
QUNATITY_1
4
1
MONTH_2
2022-02-01
QUNATITY_2
5
1
MONTH_3
2022-03-01
QUNATITY_3
6
Guessed Answer:
My guess at what you really are wanting is:
select
part_num
,to_char(dates, 'yyyy-mm') as month_year
,array_construct(qunatity_1, qunatity_2, qunatity_3)[split_part(month,'_',2)::number - 1] as qunatity
from table_name
unpivot(dates for month in (month_1, month_2, month_3))
order by 1,2;
which gives (for the same above CTE data):
PART_NUM
MONTH_YEAR
QUNATITY
1
2022-01
4
1
2022-02
5
1
2022-03
6
Another way to way to get than guessed answer:
select
part_num
,to_char(dates, 'yyyy-mm') as month_year
,sum(iff(split_part(month,'_',2)=split_part(q_name,'_',2), q_val, null)) as qunatity
from table_name
unpivot(dates for month in (month_1, month_2, month_3))
unpivot(q_val for q_name in (qunatity_1, qunatity_2, qunatity_3))
group by 1,2
order by 1,2;
which uses the double unpivot, so might be slow, but then only aggregates the values if they match. Which feels somewhat almost as gross as the build an array, to rip it apart, but that version is not needing to do large joins, just some per row grossness.
Assuming your data is already aggregated at part_num level, you could divide and conquer like this
with year_month as
(select a.part_num, b.index+1 as month_num, left(b.value,7) as year_month
from my_table a,table(flatten(input=>array_construct(m1,m2,m3...))) b),
quantities as
(select a.part_num, b.index+1 as month_num, b.value::int as quantity
from my_table a,table(flatten(input=>array_construct(q1,q2,q3...))) b)
select a.part_num, a.year_month, b.quantity
from year_month a
join quantities b on a.part_num=b.part_num and a.month_num=b.month_num
I have a table of customer transactions where each item purchased by a customer is stored as one row. So, for a single transaction there can be multiple rows in the table. I have another col called visit_date.
There is a category column called cal_month_nbr which ranges from 1 to 12 based on which month transaction occurred.
The data looks like below
Id visit_date Cal_month_nbr
---- ------ ------
1 01/01/2020 1
1 01/02/2020 1
1 01/01/2020 1
2 02/01/2020 2
1 02/01/2020 2
1 03/01/2020 3
3 03/01/2020 3
first
I want to know how many times customer visits per month using their visit_date
i.e i want below output
id cal_month_nbr visit_per_month
--- --------- ----
1 1 2
1 2 1
1 3 1
2 2 1
3 3 1
and what is the avg frequency of visit per ids
ie.
id Avg_freq_per_month
---- -------------
1 1.33
2 1
3 1
I tried with below query but it counts each item as one transaction
select avg(count_e) as num_visits_per_month,individual_id
from
(
select r.individual_id, cal_month_nbr, count(*) as count_e
from
ww_customer_dl_secure.cust_scan
GROUP by
r.individual_id, cal_month_nbr
order by count_e desc
) as t
group by individual_id
I would appreciate any help, guidance or suggestions
You can divide the total visits by the number of months:
select individual_id,
count(*) / count(distinct cal_month_nbr)
from ww_customer_dl_secure.cust_scan c
group by individual_id;
If you want the average number of days per month, then:
select individual_id,
count(distinct visit_date) / count(distinct cal_month_nbr)
from ww_customer_dl_secure.cust_scan c
group by individual_id;
Actually, Hive may not be efficient at calculating count(distinct), so multiple levels of aggregation might be faster:
select individual_id, avg(num_visit_days)
from (select individual_id, cal_month_nbr, count(*) as num_visit_days
from (select distinct individual_id, visit_date, cal_month_nbr
from ww_customer_dl_secure.cust_scan c
) iv
group by individual_id, cal_month_nbr
) ic
group by individual_id;
I have 3 tables, PRODUCT_INVENTORY, CUSTOMER_INFORMATION, and SALES_ORDER.
Im using the following SQL to generate a Monthly revenue report
SELECT
PRODUCT_INVENTORY.UNIT_PRICE, SALES_ORDER.UNITS_SOLD , SALES_ORDER.SALE_DATE,
PRODUCT_INVENTORY.UNIT_PRICE * SALES_ORDER.UNITS_SOLD AS TOTAL_SALES
FROM PRODUCT_INVENTORY
INNER JOIN SALES_ORDER
ON PRODUCT_INVENTORY.PRODUCT_ID = SALES_ORDER.PRODUCT_ID
WHERE SALES_ORDER.SALE_DATE >= '01-JAN-09'
AND SALES_ORDER.SALE_DATE <= '31-JAN-09';
This is the data Im getting back.
UNIT_PRICE UNITS_SOLD SALE_DATE TOTAL_SALES
---------- ---------- --------- -----------
900 2 11-JAN-09 1800
1700 2 12-JAN-09 3400
My question is how do I add the values in the new TOTAL_SALES column and have it display something like this?
UNIT_PRICE UNITS_SOLD SALE_DATE TOTAL_SALES
---------- ---------- --------- -----------
900 2 11-JAN-09 1800
1700 2 12-JAN-09 3400
TOTAL_REVENUE
5200
You can use CTE or subquery
With CTE
As
(
Your query
)
Select sum(total_sales) as total_revenue ,
From CTE
I have one table with following data..
saleId amount date
-------------------------
1 2000 10/10/2012
2 3000 12/10/2012
3 2000 11/12/2012
2 3000 12/10/2012
1 4000 11/10/2012
4 6000 10/10/2012
From my table I want result with max of sum amount between dates 10/10/2012 and 12/10/2012 which for the data above will be:
saleId amount
---------------
1 6000
2 6000
4 6000
Here 6000 is the max of the sums (by saleId) so I want ids 1, 2 and 4.
You have to use Sub-queries like this:
SELECT saleId , SUM(amount) AS Amount
FROM Table1
GROUP BY saleId
HAVING SUM(amount) =
(
SELECT MAX(AMOUNT) FROM
(
SELECT SUM(amount) AS AMOUNT FROM Table1
WHERE date BETWEEN '10/10/2012' AND '12/10/2012'
GROUP BY saleId
) AS A
)
See this SQLFiddle
This query goes through the table only once and is fairly optimised.
select top(1) with ties saleid, amount
from (
select saleid, sum(amount) amount
from tbl
where date between '20121010' and '20121210'
group by saleid
) x
order by amount desc;
You can produce the SUM with the WHERE clause as a derived table, then SELECT TOP(1) in the query using WITH TIES to show all the ones with the same (MAX) amount.
When presenting dates to SQL Server, try to always use the format YYYYMMDD for robustness.