How to pivot multiple aggregation in Snowflake

How to pivot multiple aggregation in Snowflake - sql

I have the table structure as below
product_id
Period
Sales
Profit
x1
L13
$100
$10
x1
L26
$200
$20
x1
L52
$300
$30
x2
L13
$500
$110
x2
L26
$600
$120
x2
L52
$700
$130
I want to pivot the period column over and have the sales value and profit in those columns. I need a table like below.
product_id
SALES_L13
SALES_L26
SALES_L52
PROFIT_L13
PROFIT_L26
PROFIT_L52
x1
$100
$200
$300
$10
$20
$30
x2
$500
$600
$700
$110
$120
$130
I am using the snowflake to write the queries. I tried using the pivot function of snowflake but there I can only specify one aggregation function.
Can anyone help as how I can achieve this solution ?
Any help is appreciated.
Thanks

How about we stack sales and profit before we pivot? I'll leave it up to you to fix the column names that I messed up.
with cte (product_id, period, amount) as
(select product_id, period||'_profit', profit from t
union all
select product_id, period||'_sales', sales from t)
select *
from cte
pivot(max(amount) for period in ('L13_sales','L26_sales','L52_sales','L13_profit','L26_profit','L52_profit'))
as p (product_id,L13_sales,L26_sales,L52_sales,L13_profit,L26_profit,L52_profit);
If you wish to pivot period twice for sales and profit, you'll need to duplicate the column so you have one for each instance of pivot. Obviously, this will create nulls due to duplicate column still being present after the first pivot. To handle that, we can use max in the final select. Here's what the implementation looks like
select product_id,
max(L13_sales) as L13_sales,
max(L26_sales) as L26_sales,
max(L52_sales) as L52_sales,
max(L13_profit) as L13_profit,
max(L26_profit) as L26_profit,
max(L52_profit) as L52_profit
from (select *, period as period2 from t) t
pivot(max(sales) for period in ('L13','L26','L52'))
pivot(max(profit) for period2 in ('L13','L26','L52'))
as p (product_id, L13_sales,L26_sales,L52_sales,L13_profit,L26_profit,L52_profit)
group by product_id;
At this point, it's an eye soar. You might as well use conditional aggregation or better yet, handle pivoting inside the reporting application. A more compact alternative of conditional aggregation uses decode
select product_id,
max(decode(period,'L13',sales)) as L13_sales,
max(decode(period,'L26',sales)) as L26_sales,
max(decode(period,'L52',sales)) as L52_sales,
max(decode(period,'L13',profit)) as L13_profit,
max(decode(period,'L26',profit)) as L26_profit,
max(decode(period,'L52',profit)) as L52_profit
from t
group by product_id;

Using conditional aggregation:
SELECT product_id
,SUM(CASE WHEN Period = 'L13' THEN Sales END) AS SALES_L13
,SUM(CASE WHEN Period = 'L26' THEN Sales END) AS SALES_L26
,SUM(CASE WHEN Period = 'L52' THEN Sales END) AS SALES_L52
,SUM(CASE WHEN Period = 'L13' THEN Profit END) AS PROFIT_L52
,SUM(CASE WHEN Period = 'L26' THEN Profit END) AS PROFIT_L52
,SUM(CASE WHEN Period = 'L52' THEN Profit END) AS PROFIT_L52
FROM tab
GROUP BY product_id

I'm not 100% happy with this answer ... pretty sure someone can improve on this approach.
Basically PIVOTING an ARRAY ... the list of aggregation functions available to an ARRAY is not huge ... there's just one ARRAY_AGG. And PIVOT only supposed to support AVG, COUNT, MAX, MIN, and SUM. So this shouldn't work ... it does as I think PIVOT just requires an aggregation of some sorts.
I'd recommend aggregating your metrics PRIOR to constructing the ARRAY ... but does let you pivot multiple Metrics at once - which from reading Stack Overflow shouldn't be possible!
Copy|Paste|Run| .. and IMPROVE please :-)
WITH CTE AS( SELECT 'X1' PRODUCT_ID,'L13' PERIOD,100 SALES,10 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L26' PERIOD,200 SALES,20 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L52' PERIOD,300 SALES,30 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L13' PERIOD,500 SALES,110 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L26' PERIOD,600 SALES,120 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,700 SALES,130 PROFIT)
SELECT
PRODUCT_ID
,"'L13'"[0][0] SALES_L13
,"'L13'"[0][1] PROFIT_L13
,"'L26'"[0][0] SALES_L26
,"'L26'"[0][1] PROFIT_L26
,"'L52'"[0][0] SALES_L52
,"'L52'"[0][1] PROFIT_L52
FROM
(SELECT * FROM
(
SELECT PRODUCT_ID, PERIOD,ARRAY_CONSTRUCT(SALES,PROFIT) S FROM CTE)
PIVOT (ARRAY_AGG(S) FOR PERIOD IN ('L13','L26','L52')
)
)
Example with aggregations (added 1700,1130 to L52 X2)
WITH CTE AS(
SELECT 'X1' PRODUCT_ID,'L13' PERIOD,100 SALES,10 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L26' PERIOD,200 SALES,20 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L52' PERIOD,300 SALES,30 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L13' PERIOD,500 SALES,110 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L26' PERIOD,600 SALES,120 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,700 SALES,130 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,1700 SALES,1130 PROFIT)
SELECT
PRODUCT_ID
,"'L13'"[0][0] SALES_L13
,"'L13'"[0][1] PROFIT_L13
,"'L26'"[0][0] SALES_L26
,"'L26'"[0][1] PROFIT_L26
,"'L52'"[0][0] SALES_L52
,"'L52'"[0][1] PROFIT_L52
FROM
(SELECT * FROM
(
SELECT PRODUCT_ID, PERIOD,ARRAY_CONSTRUCT(SUM(SALES),SUM(PROFIT)) S FROM CTE GROUP BY 1,2)
PIVOT (ARRAY_AGG(S) FOR PERIOD IN ('L13','L26','L52')
)
)

Heres an alternative form using OBJECT_AGG with LATERAL FLATTEN that avoids the potential support issue of PIVOT with ARRAY_AGG proposed by Adrian White.
This should work for any aggregates on multiple input columns included within the initial ARRAY_CONSTRUCT in the OBJ_TALL CTE. I expect that the conditional aggregation option with CASE statements would be faster but you'd need to test at scale to see.
-- OBJECT FORM USING LATERAL FLATTEN
WITH CTE AS(
SELECT 'X1' PRODUCT_ID,'L13' PERIOD,100 SALES,10 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L26' PERIOD,200 SALES,20 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L52' PERIOD,300 SALES,30 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L13' PERIOD,500 SALES,110 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L26' PERIOD,600 SALES,120 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,700 SALES,130 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,1700 SALES,1130 PROFIT)
,OBJ_TALL AS ( SELECT PRODUCT_ID,
OBJECT_CONSTRUCT(PERIOD,
ARRAY_CONSTRUCT( SUM(SALES)
,SUM(PROFIT)
)
) S
FROM CTE
GROUP BY PRODUCT_ID, PERIOD)
SELECT * FROM OBJ_TALL;
,OBJ_WIDE AS ( SELECT PRODUCT_ID, OBJECT_AGG(KEY,VALUE) OA
FROM OBJ_TALL, LATERAL FLATTEN(INPUT => S)
GROUP BY PRODUCT_ID)
-- SELECT * FROM OBJ_WIDE;
SELECT
PRODUCT_ID
,OA:L13[0] SALES_L13
,OA:L13[1] PROFIT_L13
,OA:L26[0] SALES_L26
,OA:L26[1] PROFIT_L26
,OA:L52[0] SALES_L52
,OA:L52[1] PROFIT_L52
FROM OBJ_WIDE
ORDER BY 1;
For easy comparison to the above, heres Adrians ARRAY_AGG and PIVOT version reformatted using CTE's.
-- ARRAY FORM - RE-WRITTEN WITH CTES FOR CLARITY AND COMPARISON TO OBJECT FORM
WITH CTE AS(
SELECT 'X1' PRODUCT_ID,'L13' PERIOD,100 SALES,10 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L26' PERIOD,200 SALES,20 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L52' PERIOD,300 SALES,30 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L13' PERIOD,500 SALES,110 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L26' PERIOD,600 SALES,120 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,700 SALES,130 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,1700 SALES,1130 PROFIT)
,ARR_TALL AS (SELECT PRODUCT_ID,
PERIOD,
ARRAY_CONSTRUCT( SUM(SALES)
,SUM(PROFIT)
) S
FROM CTE GROUP BY 1,2)
,ARR_WIDE AS (SELECT *
FROM ARR_TALL PIVOT (ARRAY_AGG(S) FOR PERIOD IN ('L13','L26','L52') ) )
SELECT
PRODUCT_ID
,"'L13'"[0][0] SALES_L13
,"'L13'"[0][1] PROFIT_L13
,"'L26'"[0][0] SALES_L26
,"'L26'"[0][1] PROFIT_L26
,"'L52'"[0][0] SALES_L52
,"'L52'"[0][1] PROFIT_L52
FROM ARR_WIDE
ORDER BY 1;

I believe you can only have one pivot at one time but you can check by running the first code below. Then you can run separately only with one pivot to see if it is working fine. Unfortunately, if multiple pivots are not allowed i.e first code then you can use the third code i.e case when method OR use union first to combine them i.e (Phil Culson method from above).
select *
from [table name]
pivot(sum(amount) for PERIOD in (L13, L26, L52)),
pivot(sum(profit) for PERIOD in (L13, L26, L52))
order by product_id;
if the above one doesn't work try with one for example:
https://count.co/sql-resources/snowflake/pivot-tables
select *
from [table name]
pivot(sum(amount) for PERIOD in (L13, L26, L52))
order by product_id;
Otherwise you will have to apply the manual case when logic:
select
product_id,
sum(case when Period = 'L13' then Sales end) as sales_l13,
sum(case when Period = 'L26' then Sales end) as sales_l26,
sum(case when Period = 'L52' then Sales end) as sales_l52,
sum(case when Period = 'L13' then Profit end) as profi_l13,
sum(case when Period = 'L26' then Profit end) as profit_l26,
sum(case when Period = 'L52' then Profit end) as profit_l52
from [table name]
group by 1

Related

Averaging and Grouping In google Big Query

I have the table as shown in google big Query:
I just want to do the following:
Calculate Category wise total units sold
Calculate Category wise average selling price

consider below approach
select 'category' type, category name, count(1) units_sold, sum(sale_price) total_sale, round(avg(sale_price), 2) average_selling_price
from your_table group by category
union all
select * from (
select 'product' type, product name, count(1) units_sold, sum(sale_price) total_sale, round(avg(sale_price), 2) average_selling_price
from your_table group by product
order by total_sale desc limit 10
)
union all
select * from (
select 'order_date' type, '' || order_date name, count(1) units_sold, sum(sale_price) total_sale, round(avg(sale_price), 2) average_selling_price
from your_table group by order_date
order by total_sale desc limit 5
)
order by type
if applied to sample/dummy data - output would be like below

SQL Columns to Rows- for a View

I have a view which has
ID INQCLASS INQDETAIL Period BAL
1233 GROSS water 12-3-2017 233.32
1233 GROSS ENergy 12-3-2017 122.00
ID,INQCLASS, Period is same. Except the INQDETAIL and BAL
I want to combine this into a single row which displays water and energy Bal.
Any Suggestions would be helpful. Thank you

SELECT ID,
INQCLASS,
Period,
MAX(CASE WHEN INQDETAIL = 'water' then BAL else 0 end) as WaterBal,
MAX(CASE WHEN INQDETAIL = 'ENergy' then BAL else 0 end) as ENergyBal
FROM View_Name
GROUP BY ID, INQLASS, Period
The case statement serves to show the BAL only when the condition is met. So with case alone, this would still return two rows for each item, but one would have a Waterbal value and no energybal value, and the other would be the reverse.
When you do GROUP BY, every field has to either be in the GROUP BY list (in this case, ID, INQCLASS, Period), or have an aggregate function like SUM, MAX, COUNT, etc. (in this case Waterbal and energyBal have aggregate functions).
The GROUP BY in this case collapses the common ID, INQLASS, Period into single rows, and then takes the largest (MAX) value for Waterbal and energyBal. Since one is always 0, it simply supplies the other one.

A simple pivot table ought to do it. As long as you know Inqdetail values ahead of time:
select ID,
INQCLASS,
[Period],
[Water] AS [Water Bal],
[Energy] as [Energy Bal]
from
(
select [ID],
[INQCLASS],
[INQDETAIL],
[Period],
[BAL]
from #util
) As Utilities
PIVOT
(
SUM([BAL])
FOR [inqdetail] IN ([Water],[Energy])
) AS Pvttbl

Try something like this:
SELECT INQDETAIL
, PERIOD
, SUM(BAL) AS energy_Bal
FROM your_view
WHERE INQDETAIL LIKE 'water'
GROUP BY INQDETAIL
, PERIOD;

Try this:
SELECT *
FROM
(SELECT * FROM #temp) AS P
PIVOT
(
max(bal) FOR INQDETAIL IN ([water], [ENergy])
) AS pv1

Getting rid of grouping field

Is there a safe way to not have to group by a field when using an aggregate in another field? Here is my example
SELECT
C.CustomerName
,D.INDUSTRY_CODE
,CASE WHEN D.INDUSTRY_CODE IN ('003','004','005','006','007','008','009','010','017','029')
THEN 'PM'
WHEN UPPER(CustomerName) = 'ULINE INC'
THEN 'ULINE'
ELSE 'DR'
END AS BU
,ISNULL((SELECT SUM(GrossAmount)
where CONVERT(date,convert(char(8),InvoiceDateID )) between DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 1, 0) and DATEADD(year, -1, GETDATE())),0) [PREVIOUS YEAR GROSS]
FROM factMargins A
LEFT OUTER JOIN dimDate B ON A.InvoiceDateID = B.DateId
LEFT OUTER JOIN dimCustomer C ON A.CustomerID = C.CustomerId
LEFT OUTER JOIN CRCDATA.DBO.CU10 D ON D.CUST_NUMB = C.CustomerNumber
GROUP BY
C.CustomerName,D.INDUSTRY_CODE
,A.InvoiceDateID
order by CustomerName
before grouping I was only getting 984 rows but after grouping by the A.InvoiceDateId field I am getting over 11k rows. The rows blow up since there are multiple invoices per customer. Min and Max wont work since then it will pull data incorrectly. Would it be best to let my application (crystal) get rid of the extra lines? Usually I like to have my base data be as close as possible to how the report will layout if possible.

Try moving the reference to InvoiceDateID to within an aggregate function, rather than within a selected subquery's WHERE clause.
In Oracle, here's an example:
with TheData as (
select 'A' customerID, 25 AMOUNT , trunc(sysdate) THEDATE from dual union
select 'B' customerID, 35 AMOUNT , trunc(sysdate-1) THEDATE from dual union
select 'A' customerID, 45 AMOUNT , trunc(sysdate-2) THEDATE from dual union
select 'A' customerID, 11000 AMOUNT , trunc(sysdate-3) THEDATE from dual union
select 'B' customerID, 12000 AMOUNT , trunc(sysdate-4) THEDATE from dual union
select 'A' customerID, 15000 AMOUNT , trunc(sysdate-5) THEDATE from dual)
select
CustomerID,
sum(amount) as "AllRevenue"
sum(case when thedate<sysdate-3 then amount else 0 end) as "OlderRevenue",
from thedata
group by customerID;
Output:
CustomerID | AllRevenue | OlderRevenue
A | 26070 | 26000
B | 12035 | 12000
This says:
For each customerID
I want the sum of all amounts
and I want the sum of amounts earlier than 3 days ago

multiple dates against count

I have 3 columns
Order_ID
Activation_Date
Order_Received_Date
So, if i distinctly count all the order_IDs on Order_Received_Date, I will get "Number of Orders" Similarly, if i distinctly count all the order_IDs on Activation_Date, I will get "Number of Activations"
What i want is two columns called "# Orders" and "# Activations"
Appreciate any inputs, thanks

I use union all for this type of calculation. The following is standard SQL so it should work in any database:
select thedate, sum(r) as numorders, sum(a) as numactivations
from (select activation_date as thedate, 1 as a, 0 as r from table t
union all
select order_received_date, 0, 1 from table t
) t
group by thedate
order by thedate;

If I understand you correctly, I guess the below query might help you solve this issue:
SELECT
OrderID,
ActivationDate,
OrderReceivedDate,
COUNT(OrderID) OVER (),
COUNT(ActivationDate) OVER ()
FROM
(
SELECT 1 AS OrderID, '2014-01-01' AS ActivationDate, '2013-12-15' AS OrderReceivedDate UNION
SELECT 2 AS OrderID, NULL AS ActivationDate, '2013-12-19' AS OrderReceivedDate UNION
SELECT 3 AS OrderID, '2014-03-01' AS ActivationDate, '2013-12-17' AS OrderReceivedDate UNION
SELECT 4 AS OrderID, '2014-04-01' AS ActivationDate, '2013-12-03' AS OrderReceivedDate
) t
I wish this helps you...

How to get the real totals (without the TOP) when I do a ms access report with the SELECT TOP 10 ....?

When I do a report I use the variable
=sum([sell])
and the result here is the sum of TOP 10. My question is how do I show the result of the sum with all the elements, like the TOP 10 wouldn't exists?
SQL example:
Select top 10 name, cust, sell from sales
In practice the query is monstrous, big and dirty:
SELECT top 125 COD_FAM, NOME_FAM, ID_VENDEDOR, NOME_VENDEDOR, ID_ZONA, CONTA_CLI, SUB_CONTA_CLI, NOME_CLI, SUM(VENDA1) AS VENDAS1, SUM(VENDA2) AS VENDAS2, ROUND(IIF(SUM(VENDA1)=0, 9999, ((SUM(VENDA2)-SUM(VENDA1)))/abs(SUM(VENDA1))*100), 2) AS PER_DIFF FROM( SELECT quarter, month, COD_FAM, NOME_FAM, ID_VENDEDOR, NOME_VENDEDOR, ID_ZONA, CONTA_CLI, SUB_CONTA_CLI, NOME_CLI, VENDA AS VENDA1, 0 AS VENDA2 FROM STKQRY_VENDAS07_FAM_MONTH_VND_CLI_F1 WHERE year = '2012' AND Month between '00' and '05' UNION ALL SELECT quarter, month, COD_FAM, NOME_FAM, ID_VENDEDOR, NOME_VENDEDOR, ID_ZONA, CONTA_CLI, SUB_CONTA_CLI, NOME_CLI, 0 AS VENDA1, VENDA AS VENDA2 FROM STKQRY_VENDAS07_FAM_MONTH_VND_CLI_F1 WHERE year = '2013' AND Month between '00' and '05' ) GROUP BY COD_FAM, NOME_FAM, ID_VENDEDOR, NOME_VENDEDOR, ID_ZONA, CONTA_CLI, SUB_CONTA_CLI, NOME_CLI HAVING (SUM(VENDA1) > 1000 OR SUM(VENDA2) > 1000) ORDER BY vendas2 desc

You need to join 2 queries like
SELECT TOP 10 Company, SUM(Sales) from MyTable Group By Company --Query to get data for TOP 10
Union All
SELECT 'Grand Total', SUM(Sales) from MyTable --Query to get the Grand total

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to pivot multiple aggregation in Snowflake - sql

Related

Averaging and Grouping In google Big Query

SQL Columns to Rows- for a View

Getting rid of grouping field

multiple dates against count

How to get the real totals (without the TOP) when I do a ms access report with the SELECT TOP 10 ....?

Categories

Resources