SQL: Combine result columns - sql

SELECT Category, SUM (Volume) as Volume
FROM Product
GROUP BY Category;
The above query returns this result:
Category Volume
-------------------
Oth 2
Tv Kids 4
{null} 1
Humour 3
Tv 5
Theatrical 13
Doc 6
I want to combine some of the columns as one colum as follows:
Oth,{null}, Humour, Doc as Others
Tv Kids, Tv as TV
Theatrical as Film
So my result would look like:
Category Volume
-------------------
Others 12
Tv 9
Film 13
How would I go about this?

You need a CASE here, like this:
SELECT
CASE
WHEN Category IN ('Oth','Humour','Doc')
OR Category IS NULL THEN 'Others'
WHEN Category IN ('Tv Kids','Tv') THEN 'TV'
WHEN Category = 'Theatrical' THEN 'Film'
END as category ,
SUM (Volume) as Volume
from Product
GROUP BY
CASE
WHEN Category IN ('Oth','Humour','Doc')
OR Category IS NULL THEN 'Others'
WHEN Category IN ('Tv Kids','Tv') THEN 'TV'
WHEN Category = 'Theatrical' THEN 'Film'
END;
Null must be dealt with outside the IN list as it is a special value.

I think you need to use a case statement to group categories together.
select case category when 'Tv' then 'Tv'
when 'Film' then 'Film'
else 'Other'
end as Category,
sum(Volume) as Volume
from (
SELECT Category, SUM (Volume) as Volume
FROM Product
GROUP BY Category
) subcategoryTotals
group by Category
(I think most DBs will allow you to group by the alias Category. (If not you can re-use the case statement)
Edit: Just a final thought (or two):
You should consider normalizing your database - for example, the Category column should really be a foreign key to a Categories table.
Also, this sql is reasonably ok because the case statement isn't too long or complex. If you wanted to split things up further it could quickly get to be unmanageable. I'd be inclined to use the idea of categories and subcategories in my database.

The best solution might be to implement those groups in the database. For instance:
category_group
id_category_group name sortkey
1 Others 3
2 TV 2
3 Film 1
category
id_category name id_category_group
1 Oth 1
2 Tv Kids 2
3 Humour 1
4 Tv 2
5 Theatrical 3
6 Doc 1
query
SELECT g.Name, SUM (p.Volume) as Volume
FROM Product p
LEFT JOIN Category c ON c.Id_Category = p.Id_Category
LEFT JOIN Category_Group g ON g.Id_Category_Group = c.Id_Category_Group
GROUP BY g.Id_Category_Group, g.Name
ORDER BY g.sortkey;
This makes NULL a group of its own, though. But well, it is a group of its own, as NULL means not known (yet), so you don't actually know whether it's TV, Film or Other. If you still want to count NULL as Others, change the ON clause accordingly:
LEFT JOIN Category_Group g
ON g.Id_Category_Group = COALESCE(c.Id_Category_Group, 3) -- default to group 'Others'

Try following,
select category_group , sum(volume) as Volume from
(
SELECT
Category,
Volume,
case
WHEN Category IN ('Oth','Humour','Doc','{null}') THEN 'Others'
WHEN Category IN ('Tv Kids','Tv') THEN 'TV'
WHEN Category = 'Theatrical' THEN 'Film'
end as category_group
FROM Product
) T
group by category_group

Related

Inner join + group by - select common columns and aggregate functions

Let's say i have two tables
Customer
---
Id Name
1 Foo
2 Bar
and
CustomerPurchase
---
CustomerId, Amount, AmountVAT, Accountable(bit)
1 10 11 1
1 20 22 0
2 5 6 0
2 2 3 0
I need a single record for every joined and grouped Customer and CustomerPurchase group.
Every record would contain
columns from table Customer
some aggregation functions like SUM
a 'calculated' column. For example difference of other columns
result of subquery to CustomerPurchase table
An example of result i would like to get
CustomerPurchases
---
Name Total TotalVAT VAT TotalAccountable
Foo 30 33 3 10
Bar 7 9 2 0
I was able to get a single row only by grouping by all the common columns, which i dont think is the right way to do. Plus i have no idea how to do the 'VAT' column and 'TotalAccountable' column, which filters out only certain rows of CustomerPurchase, and then runs some kind of aggregate function on the result. Following example doesn't work ofc but i wanted to show what i would like to achieve
select C.Name,
SUM(CP.Amount) as 'Total',
SUM(CP.AmountVAT) as 'TotalVAT',
diff? as 'VAT',
subquery? as 'TotalAccountable'
from Customer C
inner join CustomerPurchase CR
on C.Id = CR.CustomerId
group by C.Id
I would suggest you just need the follow slight changes to your query. I would also consider for clarity, if you can, to use the terms net and gross which is typical for prices excluding and including VAT.
select c.[Name],
Sum(cp.Amount) as Total,
Sum(cp.AmountVAT) as TotalVAT,
Sum(cp.AmountVAT) - Sum(CP.Amount) as VAT,
Sum(case when cp.Accountable = 1 then cp.Amount end) as TotalAccountable
from Customer c
join CustomerPurchase cp on cp.CustomerId = c.Id
group by c.[Name];

Categorising "group by" groups by their contents

I have a view which is a product of two joined tables:
ID Type
1 A
2 A
2 B
3 B
There can only be two values in Type column: A or B.
I would like to aggregate IDs into three categories: Catgegory_A, Category_B and Category_AB. If the ID is associated only with type A, it is assigned Category_A, if the ID is associated with types A and B it is associated with Categry_AB. Based on these rules, the view above should be categorised as follows:
ID Category
1 Category_A
2 Category_AB
3 Category_C
Is it possible to write an SQL query to achieve this?
I would name them differently, but the logic is:
select id,
(case when min(category) = max(category)
then 'Category_' || min(category)
else 'Category_AB'
end)
from t
group by id;
Independently of Gordon's answer, I came up with the following...
SELECT ID,
CASE
WHEN COUNT(*) > 1 THEN 'AB'
ELSE MAX(Type)
END AS Category
FROM Products
GROUP BY ID
See SQLFiddle to run :)

Case Statement for multiple criteria

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500
Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END
Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.
You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable
You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

SQL - Query the same column but with 2 different conditions

I have a table called Products which contains the entire catalog. That table has a unique Product_ID, the Category it belongs to, and then a field Available which shows in which countries (US, UK, DE, ...) the product can be sold. If a product can be sold on multiple then the combination Product_ID and Available looks like:
23523 DE
23523 UK
23523 US
...
I need to do a query that produces 3 columns:
Category Total_Number_Products DE_Number_Products
I can do this on 2 separate queries, one for Total_Number_Products and the other for DE_Number_Products, each one with a Count - the 1st one without any condition and the 2nd one checking if "Available = 'DE'".
How can I or should I query that same column with COUNT(Product_ID) twice on the same query, once for all the products and then for the DE specific products?
Please consider this:
select category,
count(*) total_number_products,
sum(case available when 'DE' then 1 else 0 end) de_number_products
from products
group by category
you can do conditional aggregation here:
select category,
count(*) as total_number_products,
count(case when country = 'DE' then 1 end) as DE_number_products
from your_table
group by category;

SQL Query For Most Popular Combination

Suppose I have a grocery store application with a table of purchases:
customerId int
itemId int
Four customers come into the store:
Bob buys a banana, lemonade, and a cookie
Kevin buys a banana, lemonade, and a donut
Sam buys a banana, orange juice, and a cupcake
Susie buys a banana
I am trying to write a query which would return which combinations of items are most popular. In this case, the results of this query should be:
banana and lemonade-2
I have already written a query which tells me a list of all items which were in a multi-item purchase (we exclude sales of one item - it cannot form a "combination"). It returns:
banana - 3
lemonade - 2
cookie - 1
donut - 1
cupcake - 1
orange juice - 1
Here is the query:
SELECT itemId, count( * )
FROM grocery_store
INNER JOIN (
SELECT customerId
FROM grocery_store
GROUP BY customerId
HAVING count( itemId ) > 1
)subQuery ON subQuery.customerId = grocery_store.customerId
GROUP BY itemId;
Could I get a pointer about how to expand my existing query to get the desired output?
select a.itemID, b.itemID, COUNT(*) countForCombination
from grocery_store a
inner join grocery_store b
on a.customer_id = b.customer_id
and a.itemID < b.itemID
group by a.itemID, b.itemID
order by countForCombination desc
Assumed:
grocery_store = sales records
customer_id = unique sale
This query takes all the grocery_store records and for each single sales transaction, it creates all the possible combinations (a.itemid, b.itemid) in a specific order (a.itemid
This specific order eliminates duplicates (apple, orange) is kept whereas (orange, apple) is not necessary.
After producing all the combinations from all sales, a simple group by and sorting by count is used to show the most popular combinations at the top