group by primary key vs primary key and dependant columns - sql

CREATE TABLE T1 (a int primary key, b int);
SELECT a, b FROM T1 GROUP BY a;
--Msg 8120 Level 16. Column 'T1.b' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I would have expected this to work, since column b is clearly a dependant column, so grouping by a, b is clearly the same as grouping by a.
I am using SQL Server 2016.

SQL Server does not support dependent columns in GROUP BY. All databases differ from standard specifications in some respects. So, although what you want to do is allowed in the standard, not all databases support the functionality.
Just use an aggregation function:
SELECT a, MAX(b) as b
FROM T1
GROUP BY a;
Or include it in the GROUP BY:
SELECT a, b
FROM T1
GROUP BY a, b;
And I should also point out that the GROUP BY is unnecessary in this case. I suspect that this is in reference to more complicated queries where it would be appropriate.

If you need distinct value then use DISTINCT and not group by
SELECT distinct a, b FROM T1
otehrwise ..use proper aggregation function for column not mentioned in group by clause

Related

SQL count(distinct) from both the table

I have 2 tables. Let's say Table A and Table B. Table A has a column called "name". Table B also has a column "name". I want to find out the count(distinct name). Name should take values from both the columns.
For ex-
Table A
name
A
B
C
Table B
name
A
B
D
Output should be 4.
The best concept is, first combine the data in the way you want using a subquery, and then dedupe or do the 2nd step.
For example,
WITH COMBINED AS (
SELECT
name
FROM
TableA
UNION ALL
SELECT
name
FROM
TableB
)
SELECT
DISTINCT name
FROM
COMBINED
In your situation, the 2nd step can be accomplished by changing UNION ALL to a UNION. This will dedupe the values automatically. You won't even need a subquery or a 2nd step. But I wanted to teach you the concept because it comes up often.
SELECT name FROM TableA
UNION
SELECT name FROM TableB
Then UNION in the CTE will reove all Duplicates
so a COUNT(*) will suffoce
WITH CTE AS (
SELECT name FROM TableA
UNION
SELECT name FROM TableB
)
SELECT COUNT(*) FROM CTE
I hope this query should do it:
SELECT SUM(names) AS total_names
FROM (
SELECT COUNT(DISTINCT(name)) as names FROM TableA
UNION
SELECT COUNT(DISTINCT(name)) as names FROM TableB
) t;
Note: Tested with sql server
Yet another option:
select hll_count.merge(hll_sketch) names
from (
select hll_count.init(name) hll_sketch from tableA
union all
select hll_count.init(name) from tableB
)
HLL++ functions are approximate aggregate functions. Approximate aggregation typically requires less memory than exact aggregation functions, like COUNT(DISTINCT), but also introduces statistical error. This makes HLL++ functions appropriate for large data streams for which linear memory usage is impractical, as well as for data that is already approximate.
See more about benefits of using HyperLogLog++ functions

How to select multiple columns from a table while ensuring that one specific column doesn't contain duplicate values in sql server?

Table:
x_id---y---z_id------a-------b-------c
1------0----NULL----Blah----Blah---Blah
2------0----NULL----Blah----Blah---Blah
3------10---6-------Blah----Blah---Blah
3------10---5-------Blah----Blah---Blah
3------10---4-------Blah----Blah---Blah
3------10---3-------Blah----Blah---Blah
3------10---2-------Blah----Blah---Blah
3------10---1-------Blah----Blah---Blah
4------0----NULL----Blah----Blah---Blah
5------0----NULL----Blah----Blah---Blah
My Query:
SELECT
#temp.x_id,
#temp.y,
MAX(#temp.z_id) AS z_id
FROM #temp
GROUP BY
#temp.x_id
Error:
Column 'y' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Requirement:
I want x_id to be unique and i want to select the MAX value of z_id
Expected Output:
x_id---y---z_id------a-------b-------c
1------0----NULL---Blah----Blah-----Blah
2------0----NULL---Blah----Blah-----Blah
3------10---6------Blah----Blah-----Blah
4------0----NULL---Blah----Blah-----Blah
5------0----NULL---Blah----Blah-----Blah
The error you are seeing is a common one, and it is happening because when you state GROUP BY x_id you are telling SQL Server to return a single record for every value of x_id. However, when you select y, it is unclear as to which one of possibly many values you want to use. Hence, it is resulting in error. One correct approach would be to use ROW_NUMBER:
SELECT TOP 1 WITH TIES x_id, y, z_id, a, b, c
FROM #temp
ORDER BY ROW_NUMBER() OVER (PARTITION BY x_id ORDER BY z_id DESC);

SQL Server, include columns that are not in group by statement

I have a permanent problem,
lets assume that I have a following columns:
T:A(PK), B, C, D, E
Now,
select A, MAX(B) from T group BY A
No, I cant do:
select A, C, MAX(B) from T group BY A
I don't understand why - when in comes to AVG or SUM I get it. However, MAX or MIN is getting from exactly one row.
How to deal with it?
You can use ROW_NUMBER() for that like this:
select A, C, B
from (
select *
, row_number() over (partition by A order by B desc) seq
-- group by ^ max(^)
from yourTable ) t
where seq = 1;
That's cause columns included in the select list should also be part of group by clause. You may have column which re part of group by but not present in select list but vice-versa not possible.
You generally, put only those columns in select clause on which you want the grouping to happen.
try this. it can help you find the MAX by just 1 column (f1), and also adding the column you wanted(f3) but not affecting your MAX operation
SELECT m.f1,s.f2,m.maxf3 FROM
(SELECT f1,max(f3) maxf3 FROM t1 GROUP BY f1) m
CROSS APPLY (SELECT TOP(1) f2,f1 FROM t1 WHERE m.f1 = f1) s
Your question isn't very clear in that we aren't sure what you are trying to do.
Assuming you don't actually want to do a group by in your main query but want to return the max of B based on column A you can do it like so.
select A, C,(Select Max(B) from T as T2 WHERE T.A = T2.A) as MaxB from T

How to apply Count on multiple distinct columns and use Having clause

I would like to do something like this , but getting an error please suggest some good methods?
select A,B,C, count(Distinct A,B,C)
from table_name
group by A,B,C
having count(Distinct A,B,C) > 1
Basically i have an index on the columns(A,B,C), and some rows doesnt have this unique combination set, So I'm trying a query similar to identify the rows which disobeys the unique constraint. PLease let me know if there is a best way
If you group by these columns then you already only get those unique records and then you can use count(*) to get how many duplicates you have
select A,B,C, count(*)
from table_name
group by A,B,C
HAVING count(*) > 1
What #jurgend said is right, and you can further find the exact rows (I'm assuming there are more fields to look at, including maybe a PK) by doing
SELECT *
FROM table_name
WHERE (A,B,C) IN (
SELECT A, B, C
FROM table_name
GROUP BY A, B, C
HAVING COUNT(*) > 1
)
A Tuple IN list query works in Oracle, although not all other DBMS.

Group Count in SQL

I am looking for a way to display a table where a set of multiple attributes appear more than one time.
For example, suppose I had a table, Tbl1 with attributes A, B, C, D, E
How do I make a query such that it only shows rows where A, B, C appear more than once (as in the same A, B, C as a group), but D and E may or may not be different?
My attempt:
SELECT *
FROM Tbl1
WHERE COUNT(A, B, C) > 1
and I get an error: "group function is not allowed here"
The reason for this is, that you cannot use this grouping in the WHERE-part of an sql clause.
SELECT colums
FROM tables
WHERE condition
the condition refers to a single row of the table.
What you want is HAVING
SELECT colums
FROM tables
HAVING condition
The condition after HAVING is evaluated after the grouping and there you can use aggregation functions like COUNT or SUM
Use the GROUP BY clause (SQL Server: http://msdn.microsoft.com/en-us/library/ms177673.aspx, MySQL: http://www.tutorialspoint.com/mysql/mysql-group-by-clause.htm).
Within each group, you'll want to get get the count of rows in that group (using COUNT(*)) and then use a HAVING clause to filter on that count. HAVING is like a WHERE clause for GROUP BY. It filters on the results of the grouping, and can make reference to the grouped columns (in this case, A, B and C), or any aggregates (in this case, COUNT(*)).
Here's what your query could look like. Note that you can only include columns in the SELECT field list that are mentioned in the GROUP BY or that are contained in aggregate functions such as COUNT() and MAX(). MySQL will let you get away with putting other columns in, but SQL Server will give you an error. It's best to follow this rule even if the database allows it.
SELECT A,
B,
C,
COUNT(*) AS GroupCount
FROM Tbl1
GROUP BY A, B, C
HAVING COUNT(*) > 1
If you want the full rows where this is true, then you can used a derived table:
SELECT *
FROM Tbl1
JOIN (
SELECT A,
B,
C,
COUNT(*) AS GroupCount
FROM Tbl1
GROUP BY A, B, C
HAVING COUNT(*) > 1
) AS duplicates
ON duplicates.A = Tbl1.A AND
duplicates.B = Tbl1.B AND
duplicates.C = Tbl1.C