Why my CASE WHEN gave me an AGGREGATION error message? - google-bigquery

I'm trying to make a promo grouping using one promo_code field in a month where there's a chance that a single customer_ID would have more than one transaction and could have two different promo code
SELECT customer_id AS buyer,
CASE
WHEN COUNT(DISTINCT flag_promo) = 2 THEN 'Mixed'
WHEN COUNT(DISTINCT flag_promo) = 1 AND flag_promo = 1 THEN 'Promo'
WHEN COUNT(DISTINCT flag_promo) = 1 AND flag_promo = 0 THEN 'Organic'
END AS promo_group
FROM TABLE
WHERE DATE BETWEEN '2019-04-01' AND '2019-04-30'
GROUP BY 1
ORDER BY 2
It gave me an error message :
SELECT list expression references column flag_promo which is neither grouped nor aggregated at [4:41]

Below is for BigQuery Standard SQL
#standardSQL
SELECT customer_id AS buyer,
CASE
WHEN COUNT(DISTINCT flag_promo) > 1 THEN 'Mixed'
WHEN ANY_VALUE(flag_promo) = 1 THEN 'Promo'
WHEN ANY_VALUE(flag_promo) = 2 THEN 'Organic'
END AS promo_group
FROM `project.dataset.table`
WHERE DATE BETWEEN '2019-04-01' AND '2019-04-30'
GROUP BY 1
ORDER BY 2

This is the query I think you intended to do:
SELECT
customer_id AS buyer,
CASE WHEN COUNT(DISTINCT flag_promo) = 2 THEN 'Mixed'
WHEN COUNT(DISTINCT flag_promo) = 1 AND MIN(flag_promo) = 1 THEN 'Promo'
WHEN COUNT(DISTINCT flag_promo) = 1 AND MIN(flag_promo) = 2 THEN 'Organic'
END AS promo_group
FROM TABLE
WHERE
DATE BETWEEN '2019-04-01' AND '2019-04-30'
GROUP BY 1
ORDER BY 2;
This assumes that a flag_promo value of 1 means Promo and a value of 2 means Organic. If not, then we can easily edit the above query.

Related

Combine 2 queries together

I am struggling to work out combining a query that should give me 3 columns of Month, total_sold_products and drinks_sold_products
Query 1:
Select month(date), count(id) as total_sold_products
from Products
where date between '2022-01-01' and '2022-12-31'
Query 2
Select month(date), count(id) as drinks_sold_products
from Products where type = 'drinks' and date between '2022-01-01' and '2022-12-31'
I tried the union function but it summed count(id) twice and gave me only 2 columns
Many thanks!
Union is for attaching sets of data on top of each other. You need conditional aggregation or a join. See below.
SELECT MONTH(date),
COUNT(*) AS total_sold_products,
COUNT(CASE WHEN type = 'drinks' THEN 1 ELSE 0 END) AS drinks_sold_products,
FORMAT((CASE
WHEN COUNT(*) > 0 THEN
COUNT(CASE WHEN type = 'drinks' THEN 1 ELSE 0 END)/COUNT(*)
ELSE 0 END),
'P') AS Percentage
FROM Products
WHERE date BETWEEN'2022-01-01' AND '2022-12-31'
GROUP BY MONTH(date)

How to exclude 0 from count()? in sql?

I have a code as below where I want to count number of first purchases for a given period of time. I have a column in my sales table where if the buyer is not a first time buyer, then is_first_purchase = 0
For example:
buyer_id = 456391 is already an existing buyer who made purchases on 2 different dates.
Hence is_first_purchase column will show as 0 as per below.
If i do a count() on is_first_purchase for this buyer_id = 456391 then it should return 0 instead of 2.
My query is as follows:
with first_purchases as
(select *,
case when is_first_purchase = 1 then 'Yes' else 'No' end as first_purchase
from sales)
select
count(case when first_purchase = 'Yes' then 1 else 0 end) as no_of_first_purchases
from first_purchases
where buyer_id = 456391
and date_id between '2021-02-01' and '2021-03-01'
order by 1 desc;
It returned the below which is not an intended output
Appreciate if someone can help explain how to exclude is_first_purchase = 0 from the count, thanks.
Because COUNT function count when the value isn't NULL (include 0), if you don't want to count, need to let CASE WHEN return NULL
There are two ways you can count as your expectation, one is SUM other is COUNT but remove the part of else 0
SUM(case when first_purchase = 'Yes' then 1 else 0 end) as no_of_first_purchases
COUNT(case when first_purchase = 'Yes' then 1 end) as no_of_first_purchases
From your question, I would combine CTE and main query as below
select
COUNT(case when is_first_purchase = 1 then 1 end) as no_of_first_purchases
from sales
where buyer_id = 456391
and date_id between '2021-02-01' and '2021-03-01'
order by 1 desc;
I think that you are using COUNT() when you want SUM().
with first_purchases as
(select *,
case when is_first_purchase = 1 then 'Yes' else 'No' end as first_purchase
from sales)
select
SUM(case when first_purchase = 'Yes' then 1 else 0 end) as no_of_first_purchases
from first_purchases
where buyer_id = 456391
and date_id between '2021-02-01' and '2021-03-01'
order by 1 desc;
You could simplify your query as:
SELECT COUNT(*) AS
FROM sales no_of_first_purchases
WHERE is_first_purchase = 1
AND buyer_id = 456391
AND date_id BETWEEN '2021-02-01' AND '2021-03-01'
ORDER BY 1 DESC;
It is better to avoid the use of functions like IF and CASE when it can be done with WHERE.
The simplest approach for Trino (f.k.a. Presto SQL) is to use an aggregate with a filter:
count(name) FILTER (WHERE first_purchase = 'Yes') AS no_of_first_purchases

BigQuery: group counters by month after self-join

I have table that looks like this:
I'm trying to build a query, that will show specific partnerId counters groupped by keywordName and month.
To solve first part(without grouping by month), I've built this query:
SELECT keywordName, COUNT(keywordName) as total, IFNULL(b.ebay_count, 0) as ebay, IFNULL(c.amazon_count, 0) as amazon,
FROM LogFilesv2_Dataset.FR_Clickstats_v2 a
LEFT JOIN
(SELECT keywordName as kw , SUM(CASE WHEN partnerId='eBay' THEN 1 ELSE 0 END) as ebay_count
FROM LogFilesv2_Dataset.FR_Clickstats_v2
WHERE partnerId = 'eBay' GROUP BY kw) b
ON keywordName = b.kw
LEFT JOIN
(SELECT keywordName as kw , SUM(CASE WHEN partnerId='AmazonApi' THEN 1 ELSE 0 END) as amazon_count
FROM LogFilesv2_Dataset.FR_Clickstats_v2
WHERE partnerId = 'AmazonApi' GROUP BY kw) c
ON keywordName = c.kw
WHERE keywordName = 'flipper' -- just to filter out single kw.
GROUP BY keywordName, ebay, amazon
It works quite well and returns following output:
Now I'm trying to make additional group by month, but all my attempts returned incorrect results.
Final output supposed to be similar to this:
You can do this with conditional aggregation:
select
date_trunc(dt, month) dt,
keywordName,
count(*) total,
sum(case when partnerId = 'eBay' then 1 else 0 end) ebay,
sum(case when partnerId = 'AmazonApi' then 1 else 0 end) amazon
from LogFilesv2_Dataset.FR_Clickstats_v2
group by date_trun(dt, month), keywordName

order by clause not showing expected result

when i run the following query where i need to use trim function on date,
the order of output is not proper
select trim(man_date_created)as createddate,count(*) recordcount
from man
where man_date_created>sysdate-15
group by trim(man_date_created) ORDER BY createddate;
this the out put i am getting from this query
01-APR-16
02-APR-16
03-APR-16
04-APR-16
05-APR-16
06-APR-16
07-APR-16
08-APR-16
09-APR-16
10-APR-16
11-APR-16
27-MAR-16
28-MAR-16
29-MAR-16
30-MAR-16
31-MAR-16
where you can see that after 11 april its showing entries of march.
is there any solution for this so that i cant get the count of all status?
You should convert your string in date
SELECT TO_DATE('12-4-2016','YYYY-MM-DD');
select trim(DATE(date,'YYYY-MM-DD'))as createddate,count(*) recordcount
from man
where man_date_created>sysdate-15
group by trim(man_date_created) ORDER BY createddate;
in your case try this
select DATE(mandate,'YYYY-MM-DD') createddate, count(*) recordcount,
count(case when man_status = 'A' then 1 end) as a,
count(case when man_status = 'S' then 1 end) as s,
count(case when man_status = 'C' then 1 end) as c,
count(case when man_status = 'R' then 1 end) as r
from man
where man_status IN ('A','S','C','R') and mandate>sysdate-15
group bycreateddate ORDER BY createddate;
You have to convert the string to date in the ORDER BY clause:
select trim(date)as createddate,count(*) recordcount
from man
where man_date_created>sysdate-15
group by trim(man_date_created) ORDER BY TO_DATE(date, 'DD/Month/YYYY');

Multiple Queries in different table

(Also posted here.)
So I have two tables, one is invalid table and the other is valid table.
valid table:
id
status
date
invalid table:
id
status
date
I have to produce a report with this output:
date on-time late total valid invalid1 invalid2 total rate
--------- ------- ---- ----- ----- -------- -------- ----- ----
9/10/2011 4 10 14 3 3 3 6
date: common fields on the 2 tables, field to group by, how many records on that day has
on-time: count of all the id on the valid table
late: count of all the records(id) on the invalid table
total: total of on-time and late
valid: count of id on the valid table with the "valid" status
invalid1: count of id on the invalid table with "invalid1" status
invalid2: count of id on the invalid table with "invalid2" status
total: total of valid, invalid1, invalid2
rate: average of totals
It's basically multiple queries with different table. How can I achieve it?
Someting like this?
SELECT
*,
(result.total + result._total) / 2 AS rate
FROM (
SELECT
date,
SUM(CASE WHEN data.valid = 1 THEN 1 ELSE 0 END) AS ontime,
SUM(CASE WHEN data.valid = 0 THEN 1 ELSE 0 END) AS late,
COUNT(*) AS total,
SUM(CASE WHEN data.valid = 1 AND data.status = 'valid' THEN 1 ELSE 0 END) AS valid,
SUM(CASE WHEN data.valid = 0 AND data.status = 'invalid1' THEN 1 ELSE 0 END) AS invalid1,
SUM(CASE WHEN data.valid = 0 AND data.status = 'invalid2' THEN 1 ELSE 0 END) AS invalid2,
SUM(CASE WHEN data.status IN ('valid', 'invalid', 'invalid2') THEN 1 ELSE 0 END) AS _total
FROM (
SELECT
date,
status,
valid = 1
FROM
Valid
UNION ALL
SELECT
date,
status,
valid = 0
FROM
InValid ) AS data
GROUP BY
date) AS result
SELECT date, ontime, late, ontime+late total, valid, invalid1, invalid2, valid+invalid1+invalid2 total
FROM
(SELECT date,
COUNT(*) late,
COUNT(IIF(status = 'invalid1', 1, NULL)) invalid1,
COUNT(IIF(status = 'invalid2', 1, NULL)) invalid2,
FROM invalid
GROUP BY date
) JOIN (
SELECT date,
COUNT(*) ontime,
COUNT(IIF(status = 'valud', 1, NULL)) valid,
FROM valid
GROUP BY date
) USING (date)
First of all, it seems that you are holding exactly the same information in 2 tables - I would recommend merging those tables together and add an additional boolean column called valid to hold the info related to validity of the record.
The query on your existent DB structure might look something like this:
SELECT unioned.* FROM (
( SELECT v.date AS date, v.status AS status, v.id AS id, COUNT(id) AS valid, 0 AS invalid1, 0 AS invalid2 FROM valid v GROUP BY v.date)
UNION
( SELECT i1.date AS date, i1.status AS status, i1.id AS id, 0 AS valid, COUNT(i1.id) AS invalid1, 0 AS invalid2 FROM invalid1 i1 GROUP BY i1.date)
UNION
( SELECT i2.date AS date, i2.status AS status, i2.id AS id, 0 AS valid, 0 AS invalid1, COUNT(i.id) AS invalid2 FROM invalid1 i1 GROUP BY i1.date)
) AS unioned GROUP BY unioned.date