Combine 2 queries together - sql

I am struggling to work out combining a query that should give me 3 columns of Month, total_sold_products and drinks_sold_products
Query 1:
Select month(date), count(id) as total_sold_products
from Products
where date between '2022-01-01' and '2022-12-31'
Query 2
Select month(date), count(id) as drinks_sold_products
from Products where type = 'drinks' and date between '2022-01-01' and '2022-12-31'
I tried the union function but it summed count(id) twice and gave me only 2 columns
Many thanks!

Union is for attaching sets of data on top of each other. You need conditional aggregation or a join. See below.
SELECT MONTH(date),
COUNT(*) AS total_sold_products,
COUNT(CASE WHEN type = 'drinks' THEN 1 ELSE 0 END) AS drinks_sold_products,
FORMAT((CASE
WHEN COUNT(*) > 0 THEN
COUNT(CASE WHEN type = 'drinks' THEN 1 ELSE 0 END)/COUNT(*)
ELSE 0 END),
'P') AS Percentage
FROM Products
WHERE date BETWEEN'2022-01-01' AND '2022-12-31'
GROUP BY MONTH(date)

Related

How to exclude 0 from count()? in sql?

I have a code as below where I want to count number of first purchases for a given period of time. I have a column in my sales table where if the buyer is not a first time buyer, then is_first_purchase = 0
For example:
buyer_id = 456391 is already an existing buyer who made purchases on 2 different dates.
Hence is_first_purchase column will show as 0 as per below.
If i do a count() on is_first_purchase for this buyer_id = 456391 then it should return 0 instead of 2.
My query is as follows:
with first_purchases as
(select *,
case when is_first_purchase = 1 then 'Yes' else 'No' end as first_purchase
from sales)
select
count(case when first_purchase = 'Yes' then 1 else 0 end) as no_of_first_purchases
from first_purchases
where buyer_id = 456391
and date_id between '2021-02-01' and '2021-03-01'
order by 1 desc;
It returned the below which is not an intended output
Appreciate if someone can help explain how to exclude is_first_purchase = 0 from the count, thanks.
Because COUNT function count when the value isn't NULL (include 0), if you don't want to count, need to let CASE WHEN return NULL
There are two ways you can count as your expectation, one is SUM other is COUNT but remove the part of else 0
SUM(case when first_purchase = 'Yes' then 1 else 0 end) as no_of_first_purchases
COUNT(case when first_purchase = 'Yes' then 1 end) as no_of_first_purchases
From your question, I would combine CTE and main query as below
select
COUNT(case when is_first_purchase = 1 then 1 end) as no_of_first_purchases
from sales
where buyer_id = 456391
and date_id between '2021-02-01' and '2021-03-01'
order by 1 desc;
I think that you are using COUNT() when you want SUM().
with first_purchases as
(select *,
case when is_first_purchase = 1 then 'Yes' else 'No' end as first_purchase
from sales)
select
SUM(case when first_purchase = 'Yes' then 1 else 0 end) as no_of_first_purchases
from first_purchases
where buyer_id = 456391
and date_id between '2021-02-01' and '2021-03-01'
order by 1 desc;
You could simplify your query as:
SELECT COUNT(*) AS
FROM sales no_of_first_purchases
WHERE is_first_purchase = 1
AND buyer_id = 456391
AND date_id BETWEEN '2021-02-01' AND '2021-03-01'
ORDER BY 1 DESC;
It is better to avoid the use of functions like IF and CASE when it can be done with WHERE.
The simplest approach for Trino (f.k.a. Presto SQL) is to use an aggregate with a filter:
count(name) FILTER (WHERE first_purchase = 'Yes') AS no_of_first_purchases

MSSQL Group by and Select rows from grouping

I'm trying to figure out if what I'm trying to do is possible. Instead of resorting to multiple queries on a table, I wanted to group the records by business date and id then group by the id and select one date for a field and another date for the other field.
SELECT
*
{AMOUNT FROM DATE}
{AMOUNT FROM OTHER DATE}
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
AS subquery
GROUP BY id
It seems that you're looking to do a pivot query. I usually use cross tabs for this. Based on the query you posted, it could look like:
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)AS subquery
GROUP BY id;
You could also use a CTE.
WITH CTE AS(
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
Or even be a rebel and do the operation directly.
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
However, some people have tested for performance and found that pre-aggregating can improve performance.
If I understand you correctly, then you're just trying to pivot, but only with two particular dates:
select id,
date1 = sum(iif(date = '2000-01-01', amount, null)),
date2 = sum(iif(date = '2000-01-02', amount, null))
from [table]
group by id

Why my CASE WHEN gave me an AGGREGATION error message?

I'm trying to make a promo grouping using one promo_code field in a month where there's a chance that a single customer_ID would have more than one transaction and could have two different promo code
SELECT customer_id AS buyer,
CASE
WHEN COUNT(DISTINCT flag_promo) = 2 THEN 'Mixed'
WHEN COUNT(DISTINCT flag_promo) = 1 AND flag_promo = 1 THEN 'Promo'
WHEN COUNT(DISTINCT flag_promo) = 1 AND flag_promo = 0 THEN 'Organic'
END AS promo_group
FROM TABLE
WHERE DATE BETWEEN '2019-04-01' AND '2019-04-30'
GROUP BY 1
ORDER BY 2
It gave me an error message :
SELECT list expression references column flag_promo which is neither grouped nor aggregated at [4:41]
Below is for BigQuery Standard SQL
#standardSQL
SELECT customer_id AS buyer,
CASE
WHEN COUNT(DISTINCT flag_promo) > 1 THEN 'Mixed'
WHEN ANY_VALUE(flag_promo) = 1 THEN 'Promo'
WHEN ANY_VALUE(flag_promo) = 2 THEN 'Organic'
END AS promo_group
FROM `project.dataset.table`
WHERE DATE BETWEEN '2019-04-01' AND '2019-04-30'
GROUP BY 1
ORDER BY 2
This is the query I think you intended to do:
SELECT
customer_id AS buyer,
CASE WHEN COUNT(DISTINCT flag_promo) = 2 THEN 'Mixed'
WHEN COUNT(DISTINCT flag_promo) = 1 AND MIN(flag_promo) = 1 THEN 'Promo'
WHEN COUNT(DISTINCT flag_promo) = 1 AND MIN(flag_promo) = 2 THEN 'Organic'
END AS promo_group
FROM TABLE
WHERE
DATE BETWEEN '2019-04-01' AND '2019-04-30'
GROUP BY 1
ORDER BY 2;
This assumes that a flag_promo value of 1 means Promo and a value of 2 means Organic. If not, then we can easily edit the above query.

SQL Efficiency on Date Range or Separate Tables

I'm calculating historical amount from a table in years(ex. 2015-2016, 2014-2015, etc.) I would like to seek expertise if its more efficient to do it in one batch or repeat the query multiple times filtered by the date required.
Thanks in advance!
OPTION 1:
select
id,
sum(case when year(getdate()) - year(txndate) between 5 and 6 then amt else 0 end) as amt_6_5,
...
sum(case when year(getdate()) - year(txndate) between 0 and 1 then amt else 0 end) as amt_1_0,
from
mytable
group by
id
OPTION 2:
select
id, sum(amt) as amt_6_5
from
mytable
group by
id
where
year(getdate()) - year(txndate) between 5 and 6
...
select
id, sum(amt) as amt_1_0
from
mytable
group by
id
where
year(getdate()) - year(txndate) between 0 and 1
1.
Unless you have resources issues I would go with the CASE version.
Although it has no impact on the results, filtering on the requested period in the WHERE clause might have a significant performance advantage.
2. Your period definition creates overlapping.
select id
,sum(case when year(getdate()) - year(txndate) = 6 then amt else 0 end) as amt_6
-- ...
,sum(case when year(getdate()) - year(txndate) = 0 then amt else 0 end) as amt_0
where txndate >= dateadd(year, datediff(year,0, getDate())-6, 0)
from mytable
group by id
This may be help you,
WITH CTE
AS
(
SELECT id,
(CASE WHEN year(getdate()) - year(txndate) BETWEEN 5 AND 6 THEN 'year_5-6'
WHEN year(getdate()) - year(txndate) BETWEEN 4 AND 5 THEN 'year_4-5'
...
END) AS my_year,
amt
FROM mytable
)
SELECT id,my_year,sum(amt)
FROM CTE
GROUP BY id,my_year
Here, inside the CTE, just assigned a proper year_tag for each records (based on your conditions), after that select a summary for the CTE grouped by that year_tag.

change rows to columns and count

how to calculate count based on rows?
SOURCE TABLE
each employee can take 2 days off
Employee-----First_Day_Off-----Second_Day_Off
1------------10/21/2009--------12/6/2009
2------------09/3/2009--------12/6/2009
3------------09/3/2009--------NULL
4
5
.
.
.
Now i need a table that shows the dates and number of people taking off on that day
Date---------First_Day_Off-------Second_Day_Off
10/21/2009---1-------------------0
12/06/2009---1--------------------1
09/3/2009----2--------------------0
Any ideas?
Oracle 9i+, using Subquery Factoring (WITH):
WITH sample AS (
SELECT a.employee,
a.first_day_off AS day_off,
1 AS day_number
FROM YOUR_TABLE a
WHERE a.first_day_off IS NOT NULL
UNION ALL
SELECT b.employee,
b.second_day_off,
2 AS day_number
FROM YOUR_TABLE b
WHERE b.second_day_off IS NOT NULL)
SELECT s.day_off AS date,
SUM(CASE WHEN s.day_number = 1 THEN 1 ELSE 0 END) AS first_day_off,
SUM(CASE WHEN s.day_number = 2 THEN 1 ELSE 0 END) AS second_day_off
FROM sample s
GROUP BY s.day_off
Non Subquery Version
SELECT s.day_off AS date,
SUM(CASE WHEN s.day_number = 1 THEN 1 ELSE 0 END) AS first_day_off,
SUM(CASE WHEN s.day_number = 2 THEN 1 ELSE 0 END) AS second_day_off
FROM (SELECT a.employee,
a.first_day_off AS day_off,
1 AS day_number
FROM YOUR_TABLE a
WHERE a.first_day_off IS NOT NULL
UNION ALL
SELECT b.employee,
b.second_day_off,
2 AS day_number
FROM YOUR_TABLE b
WHERE b.second_day_off IS NOT NULL) s
GROUP BY s.day_off
It is a bit awkward to handle these queries, since you have days off stored in different columns. A better layout would be to have something like
EMPLOYEE_ID DAY_OFF
Then you would have multiple rows if an employee took multiple days off
EMPLOYEE_ID DAY_OFF
1 10/21/2009
1 12/6/2009
2 09/3/2009
2 12/6/2009
3 09/3/2009
...
In that case, you could find out how many days off each person took by using the following query:
SELECT EMPLOYEE_ID, COUNT(*) AS NUM_DAYS_OFF FROM DAYS_OFF_TABLE GROUP BY EMPLOYEE_ID
And the number of people who took days off on each date like this:
SELECT DAY_OFF, COUNT(*) AS NUM_PEOPLE FROM DAYS_OFF_TABLE GROUP BY DAY_OFF
But I digress...
You can try to use an SQL CASE statement to help with this:
SELECT Employee, CASE
WHEN First_Day_Off is NULL AND Second_Day_Off is NULL THEN 0
WHEN First_Day_Off is NOT NULL AND Second_Day_Off is NULL THEN 1
WHEN First_Day_Off is NULL AND Second_Day_Off is NOT NULL THEN 1
ELSE 2
END AS NUM_DAYS_OFF
FROM DAYS_OFF_TABLE
(note that you may need to change around the syntax slightly depending on your database.
Getting dates and number of people who took off on that day might be more complicated.
I don't know if this would work, but you can try it:
SELECT
Date_Off,
COUNT(*) AS Num_People
FROM
(SELECT
First_Day_Off, COUNT(*) AS Num_People FROM DAYS_OFF_TABLE WHERE First_Day_Off IS NOT NULL GROUP BY First_Day_Off
UNION
SELECT Second_Day_Off, COUNT(*) AS Num_People FROM DAYS_OFF_TABLE WHERE Second_Day_Off IS NOT NULL GROUP BY Second_Day_Off)
GROUP BY
Num_People