Categorising "group by" groups by their contents - sql

I have a view which is a product of two joined tables:
ID Type
1 A
2 A
2 B
3 B
There can only be two values in Type column: A or B.
I would like to aggregate IDs into three categories: Catgegory_A, Category_B and Category_AB. If the ID is associated only with type A, it is assigned Category_A, if the ID is associated with types A and B it is associated with Categry_AB. Based on these rules, the view above should be categorised as follows:
ID Category
1 Category_A
2 Category_AB
3 Category_C
Is it possible to write an SQL query to achieve this?

I would name them differently, but the logic is:
select id,
(case when min(category) = max(category)
then 'Category_' || min(category)
else 'Category_AB'
end)
from t
group by id;

Independently of Gordon's answer, I came up with the following...
SELECT ID,
CASE
WHEN COUNT(*) > 1 THEN 'AB'
ELSE MAX(Type)
END AS Category
FROM Products
GROUP BY ID
See SQLFiddle to run :)

Related

Case Statement for multiple criteria

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500
Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END
Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.
You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable
You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

SQL: How to exclude group from result set by one of the elements, not using subqueries

Input:
id group_id type_id
1 1 aaaaa
2 1 BAD
3 2 bbbbb
4 2 ccccc
5 3 ddddd
6 3 eeeee
7 3 aaaaa
I need to output group_ids which consist only of a members for which type_id <> 'BAD'. A whole group with at least one BAD member should be excluded
Use of subqueries (or CTE or NOT EXISTS or views or T-SQL inline functions) is not allowed!
Use of except is not allowed!
Use of cursors is not allowed.
Any solutions which trick the rules above are appreciated. Any RDBMS is ok.
Bad example solution producing correct results, (using except):
select distinct group_id
from input
except
select group_id
from input
where type_id = 'bad'
group by group_id, type_id
Output:
group_id
2
3
I would just use group by and having:
select group_id
from input
group by group_id
having min(type_id) = 'good' and max(type_id) = min(type_id);
This particular version assumes that type_id (as in the question) does not take on NULL values. It is easily modified to take that into account.
EDIT:
If you are looking for one bad, then just do:
select group_id
from input
where type_id = 'bad'
group by group_id;
Group by group_id and count occurrences of 'BAD':
select group_id
from mytable
group by group_id
having count(case when type_id = 'BAD' then 'count me' end) = 0;

Difference in output from two SQL queries

What is the difference between the two SQL queries below other than Query2 returning an additional field? Are there any possible scenarios where the output of the two queries would be different (other than the additional field in Query2)
Query1:
SELECT Field1, COUNT(*)
FROM Table1
GROUP BY Field1
HAVING COUNT(*) > 1
Query2:
SELECT Field1, Field2, COUNT(*)
FROM Table1
GROUP BY Field1, Field2
HAVING COUNT(*) > 1
Absolutely, these are different. Query2's Group By clause specifies an extra field. That means when the results are aggregated, they will be aggregated for the combined unique values of Field1 AND Field2. That is, two records are aggregated if and only if both Field1 and Field2 are equal.
For example:
SELECT Profession, Count(*)
FROM People
GROUP BY Profession
HAVING Count(*) > 1
will return a list of professions with associated counts like:
Software Developer, 10
PM, 5
Tester, 2
whereas:
SELECT Profession, Gender, Count(*)
FROM People
GROUP BY Profession, Gender
HAVING Count(*) > 1
will return a list of professions broken out by gender like:
Software Developer, Male, 5
Sofware Developer, Female, 5
PM, Male, 3
PM, Female, 2
Tester, Male, 2
Edit with additional requested information:
You can retrieve counts of professions with rows for both genders via:
SELECT Profession, Count(*)
FROM People
GROUP BY Profession
HAVING SUM(case Gender when 'Female' then 1 else 0 end) > 0 AND SUM(case Gender when 'Male' then 1 else 0 end) > 0
It gets a bit hairy (need subqueries) if you also need associated gender counts
Extra group by clause in query 2 filters records.To know more look at below example.
test data:
id name
1 a
2 b
3 a
4 a
So when I say group by name,sql first filters out distinct records for name which goes like below for the below query
select name,sum(id)
from test
group by name
--first filter out distinct values for group by column (here name)
a
b
--next for each distinct record ,how many values fall into that category..
a 1 a
4 a
3 a
b 2 b
So from the above groups ,now you can calculate any aggregations on the group in our case,it is sum,so next output will go some thing like this
a 8
b 2
As you can see from above output,you also can calculate,any aggregation on group (here a and b values) ,like give me count(id),len(name) on group like below
select name,len(name),sum(id)
from test
group by name
The same thing happens when you group by another field,lets say like below
select id,name
from
test
group by id,name
so in above case,sql first filters alldistinct records for id,name
1 a
2 b
3 a
4 a
next step is to get records which fall for each group
groupby columns --columns which fall into this
1 a 1 a
2 b 2 b
3 a 3 a
4 a 4 a
Now you can calculate aggergations on above groups.hope this helps in visualizing your group by.further having will eliminate groups after group by phase,where will eliminate record before group by phase

Exclude value of a record in a group if another is present v2

In the example table below, I'm trying to figure out a way to sum amount over marks in two situations: the first, when mark 'C' exists within a single id, and the second, when mark 'C' doesn't exist within an id (see id 1 or 2). In the first situation, I want to exclude the amount against mark 'A' within that id (see id 3 in the desired conversion table below). In the second situation, I want to perform no exclusion and take a simple sum of the amounts against the marks.
In other words, for id's containing both mark 'A' and 'C', I want to make the amount against 'A' as zero. For id's that do not contain mark 'C' but contain mark 'A', keep the original amount against mark 'A'.
My desired output is at the bottom. I've considered trying to partition over id or use the EXISTS command, but I'm having trouble conceptualizing the solution. If any of you could take a look and point me in the right direction, it would be greatly appreciated :)
example table:
id mark amount
------------------
1 A 1
2 A 3
2 B 2
3 A 1
3 C 3
desired conversion:
id mark amount
------------------
1 A 1
2 A 3
2 B 2
3 A 0
3 C 3
desired output:
mark sum(amount)
--------------------
A 4
B 2
C 3
You could slightly modify my previous answer and end up with this:
SELECT
mark,
sum(amount) AS sum_amount
FROM atable t
WHERE mark <> 'A'
OR NOT EXISTS (
SELECT *
FROM atable
WHERE id = t.id
AND mark = 'C'
)
GROUP BY
mark
;
There's a live demo at SQL Fiddle.
Try:
select
mark,
sum(amount)
from ( select
id,
mark,
case
when (mark = 'A' and id in (select id from table where mark = 'C')) then 0
else amount
end as amount
from table ) t1
group by mark

Counting values in columns

What I am looking for is to group by and count the total of different data in the same table and have them show in two different columns. Like below.
Data in table A
Fields:
Name Type
Bob 1
John 2
Bob 1
Steve 1
John 1
Bob 2
Desired result from query:
Name Type 1 Type 2
Bob 2 1
John 1 1
Steve 1 0
This will do the trick in SQL Server:
SELECT
name,
SUM( CASE type WHEN 1 THEN 1 ELSE 0 END) AS type1,
SUM( CASE type WHEN 2 THEN 1 ELSE 0 END) AS type2
FROM
myTable
GROUP BY
name
No time to write the code, but the Case statement is what you want here. SImply havea value of 1 if it meets the case and zero if it deosn't. Then you can sum the columns.
Use two separate GROUP BY subqueries.
SELECT Name, a.Count1, b.Count2
from myTable
JOIN
(SELECT Name, SUM(Type) AS Count1 FROM myTable GROUP BY Name WHERE Type=1) AS a ON a.Name = myTable.Name
(SELECT Name, SUM(Type) FROM myTable GROUP BY Name WHERE Type=2) AS b ON b.Name = myTable.Name
You're looking for a CrossTab solution. The above solutions will work, but you'll come unstuck if you want a general solution and have N types.
A CrossTab solution will solve this for you. If this is for quickly crunching some numbers then dump your data into Excel and use the native Pivot Table feature.
If it's for a RDBMS in an app, then it depends upon the RDBMS. MS SQL 2005 and above has a crosstab syntax. See:
http://www.databasejournal.com/features/mssql/article.php/3521101/Cross-Tab-reports-in-SQL-Server-2005.htm
#Seb has a good solution, but it's server-dependent. Here's an alternate using subselects that should be portable:
select
name,
(select count(type) from myTable where type=1 and name=a.name) as type1,
(select count(type) from myTable where type=2 and name=a.name) as type2
from
myTable as a
group by
name