I have the table structure as below
product_id
Period
Sales
Profit
x1
L13
$100
$10
x1
L26
$200
$20
x1
L52
$300
$30
x2
L13
$500
$110
x2
L26
$600
$120
x2
L52
$700
$130
I want to pivot the period column over and have the sales value and profit in those columns. I need a table like below.
product_id
SALES_L13
SALES_L26
SALES_L52
PROFIT_L13
PROFIT_L26
PROFIT_L52
x1
$100
$200
$300
$10
$20
$30
x2
$500
$600
$700
$110
$120
$130
I am using the snowflake to write the queries. I tried using the pivot function of snowflake but there I can only specify one aggregation function.
Can anyone help as how I can achieve this solution ?
Any help is appreciated.
Thanks
How about we stack sales and profit before we pivot? I'll leave it up to you to fix the column names that I messed up.
with cte (product_id, period, amount) as
(select product_id, period||'_profit', profit from t
union all
select product_id, period||'_sales', sales from t)
select *
from cte
pivot(max(amount) for period in ('L13_sales','L26_sales','L52_sales','L13_profit','L26_profit','L52_profit'))
as p (product_id,L13_sales,L26_sales,L52_sales,L13_profit,L26_profit,L52_profit);
If you wish to pivot period twice for sales and profit, you'll need to duplicate the column so you have one for each instance of pivot. Obviously, this will create nulls due to duplicate column still being present after the first pivot. To handle that, we can use max in the final select. Here's what the implementation looks like
select product_id,
max(L13_sales) as L13_sales,
max(L26_sales) as L26_sales,
max(L52_sales) as L52_sales,
max(L13_profit) as L13_profit,
max(L26_profit) as L26_profit,
max(L52_profit) as L52_profit
from (select *, period as period2 from t) t
pivot(max(sales) for period in ('L13','L26','L52'))
pivot(max(profit) for period2 in ('L13','L26','L52'))
as p (product_id, L13_sales,L26_sales,L52_sales,L13_profit,L26_profit,L52_profit)
group by product_id;
At this point, it's an eye soar. You might as well use conditional aggregation or better yet, handle pivoting inside the reporting application. A more compact alternative of conditional aggregation uses decode
select product_id,
max(decode(period,'L13',sales)) as L13_sales,
max(decode(period,'L26',sales)) as L26_sales,
max(decode(period,'L52',sales)) as L52_sales,
max(decode(period,'L13',profit)) as L13_profit,
max(decode(period,'L26',profit)) as L26_profit,
max(decode(period,'L52',profit)) as L52_profit
from t
group by product_id;
Using conditional aggregation:
SELECT product_id
,SUM(CASE WHEN Period = 'L13' THEN Sales END) AS SALES_L13
,SUM(CASE WHEN Period = 'L26' THEN Sales END) AS SALES_L26
,SUM(CASE WHEN Period = 'L52' THEN Sales END) AS SALES_L52
,SUM(CASE WHEN Period = 'L13' THEN Profit END) AS PROFIT_L52
,SUM(CASE WHEN Period = 'L26' THEN Profit END) AS PROFIT_L52
,SUM(CASE WHEN Period = 'L52' THEN Profit END) AS PROFIT_L52
FROM tab
GROUP BY product_id
I'm not 100% happy with this answer ... pretty sure someone can improve on this approach.
Basically PIVOTING an ARRAY ... the list of aggregation functions available to an ARRAY is not huge ... there's just one ARRAY_AGG. And PIVOT only supposed to support AVG, COUNT, MAX, MIN, and SUM. So this shouldn't work ... it does as I think PIVOT just requires an aggregation of some sorts.
I'd recommend aggregating your metrics PRIOR to constructing the ARRAY ... but does let you pivot multiple Metrics at once - which from reading Stack Overflow shouldn't be possible!
Copy|Paste|Run| .. and IMPROVE please :-)
WITH CTE AS( SELECT 'X1' PRODUCT_ID,'L13' PERIOD,100 SALES,10 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L26' PERIOD,200 SALES,20 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L52' PERIOD,300 SALES,30 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L13' PERIOD,500 SALES,110 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L26' PERIOD,600 SALES,120 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,700 SALES,130 PROFIT)
SELECT
PRODUCT_ID
,"'L13'"[0][0] SALES_L13
,"'L13'"[0][1] PROFIT_L13
,"'L26'"[0][0] SALES_L26
,"'L26'"[0][1] PROFIT_L26
,"'L52'"[0][0] SALES_L52
,"'L52'"[0][1] PROFIT_L52
FROM
(SELECT * FROM
(
SELECT PRODUCT_ID, PERIOD,ARRAY_CONSTRUCT(SALES,PROFIT) S FROM CTE)
PIVOT (ARRAY_AGG(S) FOR PERIOD IN ('L13','L26','L52')
)
)
Example with aggregations (added 1700,1130 to L52 X2)
WITH CTE AS(
SELECT 'X1' PRODUCT_ID,'L13' PERIOD,100 SALES,10 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L26' PERIOD,200 SALES,20 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L52' PERIOD,300 SALES,30 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L13' PERIOD,500 SALES,110 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L26' PERIOD,600 SALES,120 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,700 SALES,130 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,1700 SALES,1130 PROFIT)
SELECT
PRODUCT_ID
,"'L13'"[0][0] SALES_L13
,"'L13'"[0][1] PROFIT_L13
,"'L26'"[0][0] SALES_L26
,"'L26'"[0][1] PROFIT_L26
,"'L52'"[0][0] SALES_L52
,"'L52'"[0][1] PROFIT_L52
FROM
(SELECT * FROM
(
SELECT PRODUCT_ID, PERIOD,ARRAY_CONSTRUCT(SUM(SALES),SUM(PROFIT)) S FROM CTE GROUP BY 1,2)
PIVOT (ARRAY_AGG(S) FOR PERIOD IN ('L13','L26','L52')
)
)
Heres an alternative form using OBJECT_AGG with LATERAL FLATTEN that avoids the potential support issue of PIVOT with ARRAY_AGG proposed by Adrian White.
This should work for any aggregates on multiple input columns included within the initial ARRAY_CONSTRUCT in the OBJ_TALL CTE. I expect that the conditional aggregation option with CASE statements would be faster but you'd need to test at scale to see.
-- OBJECT FORM USING LATERAL FLATTEN
WITH CTE AS(
SELECT 'X1' PRODUCT_ID,'L13' PERIOD,100 SALES,10 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L26' PERIOD,200 SALES,20 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L52' PERIOD,300 SALES,30 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L13' PERIOD,500 SALES,110 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L26' PERIOD,600 SALES,120 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,700 SALES,130 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,1700 SALES,1130 PROFIT)
,OBJ_TALL AS ( SELECT PRODUCT_ID,
OBJECT_CONSTRUCT(PERIOD,
ARRAY_CONSTRUCT( SUM(SALES)
,SUM(PROFIT)
)
) S
FROM CTE
GROUP BY PRODUCT_ID, PERIOD)
SELECT * FROM OBJ_TALL;
,OBJ_WIDE AS ( SELECT PRODUCT_ID, OBJECT_AGG(KEY,VALUE) OA
FROM OBJ_TALL, LATERAL FLATTEN(INPUT => S)
GROUP BY PRODUCT_ID)
-- SELECT * FROM OBJ_WIDE;
SELECT
PRODUCT_ID
,OA:L13[0] SALES_L13
,OA:L13[1] PROFIT_L13
,OA:L26[0] SALES_L26
,OA:L26[1] PROFIT_L26
,OA:L52[0] SALES_L52
,OA:L52[1] PROFIT_L52
FROM OBJ_WIDE
ORDER BY 1;
For easy comparison to the above, heres Adrians ARRAY_AGG and PIVOT version reformatted using CTE's.
-- ARRAY FORM - RE-WRITTEN WITH CTES FOR CLARITY AND COMPARISON TO OBJECT FORM
WITH CTE AS(
SELECT 'X1' PRODUCT_ID,'L13' PERIOD,100 SALES,10 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L26' PERIOD,200 SALES,20 PROFIT
UNION SELECT 'X1' PRODUCT_ID,'L52' PERIOD,300 SALES,30 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L13' PERIOD,500 SALES,110 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L26' PERIOD,600 SALES,120 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,700 SALES,130 PROFIT
UNION SELECT 'X2' PRODUCT_ID,'L52' PERIOD,1700 SALES,1130 PROFIT)
,ARR_TALL AS (SELECT PRODUCT_ID,
PERIOD,
ARRAY_CONSTRUCT( SUM(SALES)
,SUM(PROFIT)
) S
FROM CTE GROUP BY 1,2)
,ARR_WIDE AS (SELECT *
FROM ARR_TALL PIVOT (ARRAY_AGG(S) FOR PERIOD IN ('L13','L26','L52') ) )
SELECT
PRODUCT_ID
,"'L13'"[0][0] SALES_L13
,"'L13'"[0][1] PROFIT_L13
,"'L26'"[0][0] SALES_L26
,"'L26'"[0][1] PROFIT_L26
,"'L52'"[0][0] SALES_L52
,"'L52'"[0][1] PROFIT_L52
FROM ARR_WIDE
ORDER BY 1;
I believe you can only have one pivot at one time but you can check by running the first code below. Then you can run separately only with one pivot to see if it is working fine. Unfortunately, if multiple pivots are not allowed i.e first code then you can use the third code i.e case when method OR use union first to combine them i.e (Phil Culson method from above).
select *
from [table name]
pivot(sum(amount) for PERIOD in (L13, L26, L52)),
pivot(sum(profit) for PERIOD in (L13, L26, L52))
order by product_id;
if the above one doesn't work try with one for example:
https://count.co/sql-resources/snowflake/pivot-tables
select *
from [table name]
pivot(sum(amount) for PERIOD in (L13, L26, L52))
order by product_id;
Otherwise you will have to apply the manual case when logic:
select
product_id,
sum(case when Period = 'L13' then Sales end) as sales_l13,
sum(case when Period = 'L26' then Sales end) as sales_l26,
sum(case when Period = 'L52' then Sales end) as sales_l52,
sum(case when Period = 'L13' then Profit end) as profi_l13,
sum(case when Period = 'L26' then Profit end) as profit_l26,
sum(case when Period = 'L52' then Profit end) as profit_l52
from [table name]
group by 1
I'm trying to produce a table that lists the month, account and product name from our billing database. However, I also want to understand (for subsequent cohort analysis) what the earliest use is of "Product A" for each line item too. I was hoping I could do the following:
SELECT
Month,
AccountID,
ProductName,
SUM(NetRevenue) AS NetRevenue,
MIN(Month) OVER(PARTITION BY AccountID, 'Product A') AS EarliestUse
FROM
<<my-billing-table>>
WHERE
NetRevenue > 0
AND AccountID IN (
SELECT DISTINCT AccountID
FROM <<my-billing-table>>
WHERE ProductName = 'Product A' AND NetRevenue > 0
)
GROUP BY 1,2,3
...but it seems that just using "Product A" within the OVER clause does not have the desired effect (it seems to just return the first month for AccountID).
While the syntax is fine and the query runs, I'm obviously missing something regarding PARTITIONing the OVER clause. Any help much appreciated!
I think you want conditional aggregation along with a window function:
SELECT Month, AccountID, ProductName,
SUM(NetRevenue) AS NetRevenue,
MIN(MIN(CASE WHEN ProductName = 'Product A' THEN month END)) OVER (PARTITION BY AccountID) AS EarliestUse
FROM <<my-billing-table>>
WHERE NetRevenue > 0 AND
AccountID IN (SELECT AccountID
FROM <<my-billing-table>>
WHERE ProductName = 'Product A' AND NetRevenue > 0
)
GROUP BY 1,2,3;
The key expression here is an aggregation function nestled inside a window function. The aggregation function is MIN(CASE WHEN ProductName = 'Product A' THEN month END). This calculates the earliest month for the specified product on each row. This could be a column in the result set, and you would see the minimum value on the product row.
The window function then "spreads" this value over all rows for a given AccountID.
you are using a constant in partition it will not impact in your result, should use the column ProductName in partition to get the earliest use of the product
SELECT
Month,
AccountID,
ProductName,
SUM(NetRevenue) AS NetRevenue,
MIN(Month) OVER(PARTITION BY AccountID, ProductName) AS EarliestUse
FROM
<<my-billing-table>>
WHERE
NetRevenue > 0
AND AccountID IN (
SELECT DISTINCT AccountID
FROM <<my-billing-table>>
WHERE ProductName = 'Product A' AND NetRevenue > 0
)
GROUP BY 1,2,3
I have a sample DB below. I'm looking to see how many TV and Internet bundles we sold. In the sample data, only Bob and Trevor sold that bundle so we sold 2.
How do I write the query for the number of bundles sold by each Sales rep and the total price of the bundles sold?
Thanks
I imagine that, for a bundle to happen, the same sales person needs to have sold both products to the same customer.
I would approach this with two levels of aggregation. First group by sales person and customer in a subquery to identify the bundles, then, in an outer query, count how many such bundles happened for each sales person:
SELECT sales_person, COUNT(*) bundles_sold, SUM(total_price) total_price
FROM (
SELECT sales_person, customer_name, SUM(total_price) total_price
FROM mytable
WHERE product_name in ('TV', 'Phone')
GROUP BY sales_person
HAVING COUNT(DISTINCT product_name) = 2
) x
You can simply group the salesman's by counting the distinct products they sold -
SELECT Sales_Person, FLOOR(COUNT(DISTINCT product_name)/2) NO_OF_BUNDLES, sum(total_price)
FROM YOUR_TAB
WHERE product_name IN ('TV', 'Internet')
GROUP BY Sales_Person
HAVING COUNT(DISTINCT product_name) >= 2
Using cte as below:
with cte1(sales_person, customer_name, product_count) as
(
select sales_person, customer_name, count(product_name)
from sales
where product_name in ('TV', 'Internet')
group by sales_person, customer_name
having count(product_name) = 2
)
select sales_person, count(product_count)
from cte1
group by sales_person
I would suggest two levels of aggregation:
select sales_person, count(*), sum(total_price)
from (select sales_person, customer_name,
sum(total_price) as total_price,
max(case when product_name = 'tv' then 1 else 0 end) as has_tv,
max(case when product_name = 'phone' then 1 else 0 end) as has_phone,
max(case when product_name = 'internet' then 1 else 0 end) as has_internet
from t
group by sales_person, customer_name
) sc
where has_phone = 0 and
has_tv = 1 and
has_internet = 1
group by sales_person;
I recommend this structure because it is pretty easy to change the conditions in the where clause to return this for any bundle -- or even to aggregate by the three flags and return the totals for all bundles in one query.
Transaction -> Transaction_id, buyer_id, seller_id, object_id,Shipping_id, Price, Quantity, site_id,transaction_date, expected_delivery_date, check_out_status
leaf_category_id, defect_id
Buyer -> Buyer_id, name, country
Seller -> Seller_id, name, country, segment, standard
Listing -> object_id, seller_id, auction_start_date
auction_end_date, listing_site_id, leaf_category_id
quantity
For the sellers from UK who transacted on the second week of december(6 December 2015 to 12 December 2015), find the number of sellers
who have atleast twice the total transaction amount (qty*price) in the following week.
I have tried below query to get sellers who transacted in dec 2nd week but facing error when calculating sellers having twice the transaction amount from those sellers in following week.
With trans_dec_uk as
(
select s.seller_id,t.transaction_date, sum(t.Qty * Price) trans_amount
from transaction t join seller s
on t.seller_id =s.seller_id
where s.country ='UK'
and t.transaction_date between '12-05-2015' and '12-18-2015'
group by s.seller_id,t.transaction_date
)
select count(seller) from trans_dec_uk
where trans_amount = 2 * to_char(sysdate+7,'DD-MM')
with uk_sellers as (
select * from <dataset>.Seller where country = 'UK'
),
first_week_uk as (
select seller_id, sum(Price*Quantity) as first_week_total
from <dataset>.Transaction
inner join uk_sellers using(seller_id)
where transaction_date between '2015-12-05' and '2015-12-11'
group by 1
),
second_week_uk as (
select seller_id, sum(Price*Quantity) as second_week_total
from <dataset>.Transaction
inner join uk_sellers using(seller_id)
where transaction_date between '2015-12-12' and '2015-12-18'
group by 1
)
select count(distinct seller_id) as the_answer
from first_week_uk
inner join second_week_uk using(seller_id)
where second_week_total >= 2*first_week_total
Good day.
I have the following tables:
Order_Header(Order_id {pk}, customer_id {fk}, agent_id {fk}, Order_date(DATE FORMAT))
Invoice_Header (Invoice_ID {pk}, Customer_ID {fk}, Agent_ID{fk}, invoice_Date{DATE FORMAT} )
Stock( Product_ID {pk}, Product_description)
I created a table called AVG_COMPLETION_TIME_FACT and want to populate it with the following values regarding the previous 3 tables:
Product_ID
Invoice_month
Invoice_Year
AVG_Completion_Time (Invoice_date - Order_date)
I have the following code that doesn't work:
INSERT INTO AVG_COMPLETION_TIME_FACT(
SELECT PRODUCT_ID, EXTRACT (YEAR FROM INVOICE_DATE), EXTRACT (MONTH FROM INVOICE_DATE), (INVOICE_DATE - ORDER_DATE)
FROM STOCK, INVOICE_HEADER, ORDER_HEADER
GROUP BY PRODUCT_ID, EXTRACT (YEAR FROM INVOICE_DATE), EXTRACT (MONTH FROM INVOICE_DATE)
);
I want to group it by the product_id, year of invoice and month of invoice.
Is this possible?
Any advice would be much appreciated.
Regards
Short answer: it may be possible - if your database contains some more columns that are needed for writing the correct query.
There are several problems, apart from the syntactical ones. When we create some test tables, you can see that the answer you are looking for cannot be derived from the columns you have provided in your question. Example tables (Oracle 12c), all PK/FK constraints omitted:
-- 3 tables, similar to the ones described in your question,
-- including some test data
create table order_header (id, customer_id, agent_id, order_date )
as
select 1000, 100, 1, date'2018-01-01' from dual union all
select 1001, 100, 2, date'2018-01-02' from dual union all
select 1002, 100, 3, date'2018-01-03' from dual
;
create table invoice_header ( id, customer_id, agent_id, invoice_date )
as
select 2000, 100, 1, date'2018-02-01' from dual union all
select 2001, 100, 2, date'2018-03-11' from dual union all
select 2002, 100, 3, date'2018-04-21' from dual
;
create table stock( product_id, product_description)
as
select 3000, 'product3000' from dual union all
select 3001, 'product3001' from dual union all
select 3002, 'product3002' from dual
;
If you join the tables as you have done it (using a cross join), you will see that you get more rows than expected ... But: Neither the invoice_header table, nor the order_header table contains any PRODUCT_ID data. Thus, we cannot tell which product_ids are associated with the stored order_ids or invoice_ids.
select
product_id
, extract( year from invoice_date )
, extract( month from invoice_date )
, invoice_date - order_date
from stock, invoice_header, order_header -- cross join -> too many rows in the resultset!
-- group by ...
;
...
27 rows selected.
For getting your query right, you should probably write INNER JOINs and conditions (keyword: ON). If we try to do this with your original table definitions (as provided in your question) you will see that we cannot join all 3 tables, as they do not contain all the columns needed: PRODUCT_ID (table STOCK) cannot be associated with ORDER_HEADER or INVOICE_HEADER.
One column that these 2 tables (ORDER_HEADER and INVOICE_HEADER) do have in common is: customer_id, but that's not enough for answering your question. However, we can use it for demonstrating how you could code the JOINs.
select
-- product_id
IH.customer_id as cust_id
, OH.id as OH_id
, IH.id as IH_id
, extract( year from invoice_date ) as year_
, extract( month from invoice_date ) as month_
, invoice_date - order_date as completion_time
from invoice_header IH
join order_header OH on IH.customer_id = OH.customer_id
-- the stock table cannot be joined at this stage
;
Missing columns:
Please regard the following just as "proof of concept" code. Assuming that somewhere in your database, you have tables that have columns that {1} link STOCK and ORDER_HEADER (name here: STOCK_ORDER) and {2} link ORDER_HEADER and INVOICE_HEADER (name here: ORDER_INVOICE), you could actually get the information you want.
-- each ORDER_HEADER is mapped to multiple product_ids
create table stock_order
as
select S.product_id, OH.id as oh_id -- STOCK and ORDER_HEADER
from stock S, order_header OH ; -- cross join, we use all possible combinations here
select oh_id, product_id
from stock_order
order by OH_id
;
PRODUCT_ID OH_ID
---------- ----------
3000 1000
3000 1001
3000 1002
3001 1000
3001 1001
3001 1002
3002 1000
3002 1001
3002 1002
9 rows selected.
-- each INVOICE_HEADER mapped to a single ORDER_HEADER
create table order_invoice ( order_id, invoice_id )
as
select 1000, 2000 from dual union all
select 1001, 2001 from dual union all
select 1002, 2002 from dual
;
For querying, make sure that you code the correct JOIN conditions (ON ...) eg
-- example query. NOTICE: conditions in ON ...
select
S.product_id
, IH.customer_id as cust_id
, OH.id as OH_id
, IH.id as IH_id
, extract( year from invoice_date ) as year_
, extract( month from invoice_date ) as month_
, invoice_date - order_date as completion_time
from invoice_header IH
join order_invoice OI on IH.id = OI.invoice_id -- <- new "link" table
join order_header OH on OI.order_id = OH.id
join stock_order SO on OH.id = SO.OH_id -- <- new "link" table
join stock S on S.product_id = SO.product_id
;
Now you can add the GROUP BY, and SELECT only the columns you need. Combined with an INSERT, you should write something like ...
-- example avg_completion_time_fact table.
create table avg_completion_time_fact (
product_id number
, year_ number
, month_ number
, avg_completion_time number
) ;
insert into avg_completion_time_fact ( product_id, year_, month_, avg_completion_time )
select
S.product_id
, extract( year from invoice_date ) as year_
, extract( month from invoice_date ) as month_
, avg( invoice_date - order_date ) as avg_completion_time
from invoice_header IH
join order_invoice OI on IH.id = OI.invoice_id
join order_header OH on OI.order_id = OH.id
join stock_order SO on OH.id = SO.OH_id
join stock S on S.product_id = SO.product_id
group by S.product_id, extract( year from invoice_date ), extract( month from invoice_date )
;
The AVG_COMPLETION_TIME_FACT table now contains:
SQL> select * from avg_completion_time_fact order by product_id ;
PRODUCT_ID YEAR_ MONTH_ AVG_COMPLETION_TIME
---------- ---------- ---------- -------------------
3000 2018 3 68
3000 2018 4 108
3000 2018 2 31
3001 2018 3 68
3001 2018 2 31
3001 2018 4 108
3002 2018 3 68
3002 2018 4 108
3002 2018 2 31
It is not completely clear what the final query for your database (or schema) will look like, as we don't know the definitions of all the tables it contains. However, if you apply the techniques and stick to the syntax of the examples, you should be able to obtain the required results. Best of luck!