Can't use column alias in GROUP BY - sql

I can run this in mysql with no problem
SELECT
DATE_FORMAT(trans_date, '%Y-%m') month,
COUNTRY, COALESCE(COUNT(*), 0) trans_count,
COALESCE(SUM(CASE WHEN state ='approved' THEN 1 END), 0) approved_count,
COALESCE(SUM(amount), 0) trans_total_amount,
COALESCE(SUM(CASE WHEN state ='approved' THEN amount END), 0) approved_total_amount
FROM
Transactions
GROUP BY
month, COUNTRY
ORDER BY
month;
but the same query doesn't run in Orcale, I can't use GROUP BY using aggregation alias, and I can't aggregate without using GROUP BY.
I can call subquery over subquery or use CTE, but it is just so tedious.
What is a good query for type of issue?

As mentioned in another answer, You can not add aliases in GROUP BY but you can add aliases in ORDER BY. Also, DATE_FORMAT is MySql function. It is TO_CHAR in Oracle.
So your final query should be as following:
SELECT
TO_CHAR(TRANS_DATE, 'YYYY-MM') AS MONTH,
COUNTRY,
COUNT(*) AS TRANS_COUNT,
SUM(CASE WHEN STATE = 'approved' THEN 1 ELSE 0 END) AS APPROVED_COUNT,
SUM(AMOUNT) AS TRANS_TOTAL_AMOUNT,
SUM(CASE WHEN STATE = 'approved' THEN AMOUNT ELSE 0 END) AS APPROVED_TOTAL_AMOUNT
FROM TRANSACTIONS
GROUP BY TO_CHAR(TRANS_DATE, 'YYYY-MM'), COUNTRY
ORDER BY MONTH;

Oracle doesn't support aliases for the GROUP BY. Also, the COALESCE() is unnecessary in this case:
SELECT DATE_FORMAT(trans_date, '%Y-%m') as month, COUNTRY,
COUNT(*) as trans_count,
SUM(CASE WHEN state ='approved' THEN 1 ELSE 0 END) as approved_count,
SUM(amount) as trans_total_amount,
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) as approved_total_amount
FROM Transactions
GROUP BY DATE_FORMAT(trans_date, '%Y-%m'), COUNTRY
ORDER BY month;

Related

Getting sum of distinct values when grouping by an unnested value in google big query

I am querying google big query table which has many rows but the ones I am interested in looks like this:
date fullVisitorId hits.product.productSKU hits.product.v2ProductName hits.transaction.transactionId
20210427 63546815 MM52AF panda 149816182
20210427 65198162 KGSA5A giraffe 321498182
I am trying to count the total transactions by counting distinct hits.transaction.transactionId.
with t1 as
(
SELECT
DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
fullVisitorId,
product.productSKU as sku,
product.v2ProductName as v2,
case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
GROUP BY 1,2,3,4,5
)
select
month,
product.productSKU as sku,
product.v2ProductName as v2,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt) as transactions
from t1
group by 1
order by 1 desc;
Which returns:
month sku v2 pdp add_cart conversions transactions
2021-04-01 AHBS 615 10146410 365569 46885 46640
2021-03-01 HERD 154 10074095 399483 58162 57811
But transactions is not correct, I get the correct output using this:
with t1 as
(
SELECT
DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
fullVisitorId,
ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list, -- changed this
ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list, -- changed this
case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
0 AS views_impressions,
count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
0 AS add_shortlist,
count(case when hits.ecommerceaction.action_type = '5' then fullvisitorid else null end) AS checkouts,
count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
GROUP BY 1,2,5
)
select
month,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt) as transactions
from t1
group by 1
order by 1 desc;
Which returns the correct transactions
month pdp add_cart conversions transactions
2021-04-01 9978511 396333 46885 30917
2021-03-01 15101718 568904 58162 23017
But using this :
...
ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list,
ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list,
...
Does not allow me to group or select productSKU_list and productName_list in my second select statement.
I believe this is because if one order is made with multiple items in the basket there are multiple lines in google big query with the same hits.transaction.transactionId
I tried confirming this with:
select distinct(hits.transaction.transactionId), count(distinct hits.transaction.transactionId) as total
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
WHERE _TABLE_SUFFIX between '200101' AND '210428'
GROUP BY 1
order by 2 desc
But I get:
transactionId total
ABSAD54 1
515ABDG 1
So at this point, I am lost, as I am unsure why I get the correct answer if I use the second script or when I comment out this part from the first query.
--product.productSKU,
--product.v2ProductName,
Any tips on how google big query works is accepted.
My goal is to have the correct output of transactions which is achieved in the second script but still be able to group and have values of product.productSKU and product.v2ProductName.
in your second query , you need to aggregate them again :
select
month,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt) as transactions
,ARRAY_AGG(productSKU_list)
,ARRAY_AGG(productName_list)
from t1
group by month
order by month desc;

Sum of distinct values after grouping explodes a metric

I am using
with t1 as
(
SELECT
DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
fullVisitorId,
product.productSKU,
product.v2ProductName,
case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
GROUP BY 1,2,3,4,5
)
select
month,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt)
from t1
group by 1
order by 1 desc;
Which returns
month pdp add_cart conversions f0_
2021-02-01 500 100 20 10
2021-01-01 600 200 30 20
I know that f0_ ( count(distinct(hits.transaction.transactionId)) ) is bad here because of product.productSKU and product.v2ProductName grouping.
In general, when user makes an order with 3 items in his basket, I want to count this as one order, whereas now it is counted as 3.
This count(distinct(hits.transaction.transactionId)) as transaction_id_cnt results in the correct output if I comment out product.productSKU and product.v2ProductName.
Running this query:
with t1 as
(
SELECT
DATE_TRUNC(PARSE_DATE("%Y%m%d", date), MONTH) as month,
fullVisitorId,
-- product.productSKU, # commented out
-- product.v2ProductName, # commented out
case when hits.ecommerceaction.action_type = '2' then 1 else 0 end as pdp_visitor,
count(case when hits.ecommerceaction.action_type = '2' then fullvisitorid else null end) AS views_pdp,
count(case when hits.ecommerceaction.action_type = '3' then fullvisitorid else null end) AS add_cart,
count(case when hits.ecommerceaction.action_type = '6' then hits.transaction.transactionid else null end) AS conversions,
count(distinct(hits.transaction.transactionId)) as transaction_id_cnt,
FROM `table` AS nr,
UNNEST(hits) hits,
UNNEST(product) product
GROUP BY 1,2,3,4,5
)
select
month,
sum(views_pdp) as pdp
,sum(add_cart) as add_cart
,sum(conversions) as conversions
,sum(transaction_id_cnt)
from t1
group by 1
order by 1 desc;
Returns what is expected, but now I don't have productSKU and v2ProductName which I need. I suspect that the problem is that each order is a new line in google big query and when I ask to to select it by product name and SKU, I count the uniques and then sum it.
How can I achieve the correct summation of count(distinct(hits.transaction.transactionId)) without losing the grouping by product.productSKU and product.v2ProductName which explodes this metric?
On the group by Query you could cherry pick them as array(so you don't group by them):
ARRAY_AGG(DISTINCT product.productSKU IGNORE NULLS) AS productSKU_list,
ARRAY_AGG(DISTINCT product.v2ProductName IGNORE NULLS) AS productName_list,
Update per your below comment: If you want to use them in further group by just save them as string instead of array.
STRING_AGG(DISTINCT product.productSKU, ',') AS productSKU_list,
STRING_AGG(DISTINCT product.v2ProductName, ',') AS productName_list,

Advanced SQL with window function

I have Table a(Dimension table) and Table B(Fact table) stores transaction shopper history.
Table a : shopped id(surrogate key) created for unique combination(any of column 2,colum3,column4 repeated it will have same shopper id)
Table b is transaction data.
I am trying to identify New customers and repeated customers for each week, expected output is below.
I am thinking following SQL Statement
Select COUNT(*) OVER (PARTITION BY shopperid,weekdate) as total_new_shopperid for Repeated customer,
for Identifying new customer(ie unique) in same join condition, I am stuck on window function..
thanks,
Sam
You can use the DENSE_RANK analytical function along with aggregate function as follows:
SELECT WEEK_DATE,
COUNT(DISTINCT CASE WHEN DR = 1 THEN SHOPPER_ID END) AS TOTAL_NEW_CUSTOMER,
SUM(CASE WHEN DR = 1 THEN AMOUNT END) AS TOTAL_NEW_CUSTOMER_AMT,
COUNT(DISTINCT CASE WHEN DR > 1 THEN SHOPPER_ID END) AS TOTAL_REPEATED_CUSTOMER,
SUM(CASE WHEN DR > 1 THEN AMOUNT END) AS TOTAL_REPEATED_CUSTOMER_AMT
FROM
(
select T.*,
DENSE_RANK() OVER (PARTITION BY SHOPPER_ID ORDER BY WEEK_DATE) AS DR
FROM YOUR_TABLE T);
GROUP BY WEEK_DATE;
Cheers!!
Tejash's answer is fine (and I'm upvoting it).
However, Oracle is quite efficient with aggregation, so two levels of aggregation might have better performance (depending on the data):
select week_date,
sum(case when min_week_date = week_date then 1 else 0 end) as new_shoppers,
sum(case when min_week_date = week_date then amount else 0 end) as new_shopper_amount,
sum(case when min_week_date > week_date then 1 else 0 end) as returning_shoppers,
sum(case when min_week_date > week_date then amount else 0 end) as returning_amount
from (select shopper_id, week_date,
sum(amount) as amount,
min(week_date) over (partition by shopper_id) as min_week_date
from t
group by shopper_id, week_date
) sw
group by week_date
order by week_date;
Note: If this has better performance, it is probably due to the elimination of count(distinct).

AVG duplication SQL

I'm currently having issues creating two columns with AVG for different date ranges.
I've tried the below code to try and resolve this.
WITH Tbl AS(
SELECT FORMAT(SaleDate,'MM')+'.'+FORMAT(SaleDate,'yyyy') AS SALE_MY, Employee, NewScheme
FROM Salereport
WHERE Business Area='Sales'
)
SELECT
AgentName,
(SELECT AVG(NewScheme) FROM Tbl WHERE SALE_MY='01.2019' OR SALE_MY='02.2019' OR SALE_MY='03.2019'),
(SELECT AVG(NewScheme) FROM Tbl WHERE SALE_MY='04.2019' OR SALE_MY='05.2019' OR SALE_MY='06.2019')
FROM Tbl
GROUP BY Employee
Result is just the same AVG for everyone.
Use conditional aggregation:
SELECT Employee,
AVG(CASE WHEN SaleDate >= '2019-01-01' AND SaleDate < '2019-04-01'
THEN NewScheme
END),
AVG(CASE WHEN SaleDate >= '2019-01-04' AND SaleDate < '2019-04-07'
THEN NewScheme
END),
FROM Salereport
WHERE Business Area = 'Sales'
GROUP BY Employee;
When working with dates, you should be using date operations. The only time you normally need to convert to a string is to format dates in the result set.
Incidentally, your version is taking the average across all employees. No subquery is needed, but if you were to use one, you would want a correlated subquery.
Try this:
WITH Tbl AS(
SELECT FORMAT(SaleDate,'MM')+'.'+FORMAT(SaleDate,'yyyy') AS SALE_MY, Employee, NewScheme
FROM Salereport
WHERE Business Area='Sales'
)
SELECT Employee,
AVG(CASE WHEN SALE_MY IN ('01.2019', '02.2019', '03.2019') THEN NewScheme ELSE NULL END) AS Q1_Avg,
AVG(CASE WHEN SALE_MY IN ('04.2019', '05.2019', '06.2019') THEN NewScheme ELSE NULL END) AS Q2_Avg
FROM Tbl
GROUP BY Employee

MSSQL Group by and Select rows from grouping

I'm trying to figure out if what I'm trying to do is possible. Instead of resorting to multiple queries on a table, I wanted to group the records by business date and id then group by the id and select one date for a field and another date for the other field.
SELECT
*
{AMOUNT FROM DATE}
{AMOUNT FROM OTHER DATE}
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
AS subquery
GROUP BY id
It seems that you're looking to do a pivot query. I usually use cross tabs for this. Based on the query you posted, it could look like:
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)AS subquery
GROUP BY id;
You could also use a CTE.
WITH CTE AS(
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
Or even be a rebel and do the operation directly.
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
However, some people have tested for performance and found that pre-aggregating can improve performance.
If I understand you correctly, then you're just trying to pivot, but only with two particular dates:
select id,
date1 = sum(iif(date = '2000-01-01', amount, null)),
date2 = sum(iif(date = '2000-01-02', amount, null))
from [table]
group by id