Group Count in SQL - sql

I am looking for a way to display a table where a set of multiple attributes appear more than one time.
For example, suppose I had a table, Tbl1 with attributes A, B, C, D, E
How do I make a query such that it only shows rows where A, B, C appear more than once (as in the same A, B, C as a group), but D and E may or may not be different?
My attempt:
SELECT *
FROM Tbl1
WHERE COUNT(A, B, C) > 1
and I get an error: "group function is not allowed here"

The reason for this is, that you cannot use this grouping in the WHERE-part of an sql clause.
SELECT colums
FROM tables
WHERE condition
the condition refers to a single row of the table.
What you want is HAVING
SELECT colums
FROM tables
HAVING condition
The condition after HAVING is evaluated after the grouping and there you can use aggregation functions like COUNT or SUM

Use the GROUP BY clause (SQL Server: http://msdn.microsoft.com/en-us/library/ms177673.aspx, MySQL: http://www.tutorialspoint.com/mysql/mysql-group-by-clause.htm).
Within each group, you'll want to get get the count of rows in that group (using COUNT(*)) and then use a HAVING clause to filter on that count. HAVING is like a WHERE clause for GROUP BY. It filters on the results of the grouping, and can make reference to the grouped columns (in this case, A, B and C), or any aggregates (in this case, COUNT(*)).
Here's what your query could look like. Note that you can only include columns in the SELECT field list that are mentioned in the GROUP BY or that are contained in aggregate functions such as COUNT() and MAX(). MySQL will let you get away with putting other columns in, but SQL Server will give you an error. It's best to follow this rule even if the database allows it.
SELECT A,
B,
C,
COUNT(*) AS GroupCount
FROM Tbl1
GROUP BY A, B, C
HAVING COUNT(*) > 1
If you want the full rows where this is true, then you can used a derived table:
SELECT *
FROM Tbl1
JOIN (
SELECT A,
B,
C,
COUNT(*) AS GroupCount
FROM Tbl1
GROUP BY A, B, C
HAVING COUNT(*) > 1
) AS duplicates
ON duplicates.A = Tbl1.A AND
duplicates.B = Tbl1.B AND
duplicates.C = Tbl1.C

Related

SQL count without group by to return zero when no matches

I am using SQL Server. This is part of a stored procedure.
If I have a query like this:
SELECT COUNT(a) AS countA, b
FROM [dbo].[tbl] AS TBL
WHERE TBL.site='100'
AND TBL.status='status'
AND TBL.b IN (SELECT b FROM #tbTemp)
GROUP BY TBL.b
#tbTemp contains a list of ids 1,2,3,4 etc.
I need to return column b with my count, so am forced to use a group by.
This now returns no records when the where condition fails.
How can I return both count(a) and b, and if the WHERE condition fails return a 0?
Thank you

group by primary key vs primary key and dependant columns

CREATE TABLE T1 (a int primary key, b int);
SELECT a, b FROM T1 GROUP BY a;
--Msg 8120 Level 16. Column 'T1.b' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I would have expected this to work, since column b is clearly a dependant column, so grouping by a, b is clearly the same as grouping by a.
I am using SQL Server 2016.
SQL Server does not support dependent columns in GROUP BY. All databases differ from standard specifications in some respects. So, although what you want to do is allowed in the standard, not all databases support the functionality.
Just use an aggregation function:
SELECT a, MAX(b) as b
FROM T1
GROUP BY a;
Or include it in the GROUP BY:
SELECT a, b
FROM T1
GROUP BY a, b;
And I should also point out that the GROUP BY is unnecessary in this case. I suspect that this is in reference to more complicated queries where it would be appropriate.
If you need distinct value then use DISTINCT and not group by
SELECT distinct a, b FROM T1
otehrwise ..use proper aggregation function for column not mentioned in group by clause

Sorted SQL groups

I was trying to do something like:
SELECT
a, b, c, MAX(d)
FROM
table -- table with 4 columns a, b, c and d
GROUP BY
a, b
I would like to have c as an additional value from the table that I do not want to group by, but that distinguish rows within groups. My problem is that GROUP BY makes c look like the first rows from groups and not the ones that really contain
d = MAX(d)
in the table.
ORDER BY is applied to the whole result, so it's not an option. Can I achieve that in any other way than sorting the table prematurely (as a subquery) and then applying the grouping? Would that work in every SQL engine? Do standards define such behaviors?
Edit1:
I tested something like:
SELECT
t.*,
MAX(d) AS v
FROM
(SELECT
a, b, c, d
FROM
table
ORDER BY
d DESC) AS t
GROUP BY
a, b
and it works... but I do not think anybody can guarantee that the sort order will also be applied to the group rows... - maybe it works this way in MySQL, but how will it go with Oracle or PostgreSQL?
This is ANSI SQL:
SELECT a,
b,
c,
MAX(d) over (partition by a,b) as max_d
FROM the_table
This will still return all rows from the table. The max value will repeated for every row that is returned. If you want to get only the rows with the max value you need to wrap this in a derived table:
select a,b,c,d
from (
SELECT a,
b,
c,
d,
MAX(d) over (partition by a,b) as max_d
FROM the_table
) t
where d = max_d;
That will return multiple rows if the same max value occurs more than once. If you only want a single row for each max value you need to use row_number()
You can use
select x.*,y.c from
(SELECT a, b, MAX(d) as d FROM table GROUP BY a, b) x,(select c,d from table) y
where x.d = y.d

Can I add aggregated column without performing a join?

I have a table table1 with three columns a, b, c. I am creating another column by doing a group by on c and some function func(a,b) as d giving me view1. In order to add the column d to table1, the only thing I can think of is to perform a join between view1 and table1. However, both of them have millions of rows and it gets really slow. Is there any other way without joining them? It looks intuitively that it should be possible.
Here is a snippet of the script
with
found_mean
as
(select sum(count*avg)/sum(count) as combined_avg , b from view_1 group by b),
view_1_m
as
(select combined_avg , count , avg, variance , found_mean.b from found_mean , view_1 where found_mean.b = view_1.b),
Depending on what your function is, you can use window functions (sometimes called analytic functions). For instance, if you wanted the maximum value of b for a given a:
select a, b, c, max(b) over (partition by a) as d
from table1;
Without more information, it is hard to be more specific.
EDIT:
You should be able to do this with analytic functions:
select count , avg, variance,
(sum(count * avg) over (partition by b) /
sum(count) over (partition by b)
) as weighted_average
from view_1;

Selecting the distinct values from three columns with the max of a fourth where there are duplicates

I have a table with one numeric value (n) and three string values (a,b,c). How do I query this table so that I get only distinct values of (a,b,c) and if there are duplicates, take the maximum of the corresponding set of n values?
select max(n), a, b, c
from mytable
group by a, b, c
Use GROUP BY:
select a, b, c, max(n)
from table
group by a, b, c;
This will show only unique or distinct sets of a, b, c and show the maximum n found in that set.
MAX is an aggregate function designed for use with GROUP BY. Other potentially useful aggregate functions include MIN, AVERAGE, and COUNT.