How can I use a COUNT(DISTINCT var) to return the count of unique values per group? - sql

I need to return a count of unique values, but unique per group of the result set, not unique to the entire result set. For example I would like the following code:
SELECT col1 AS letters, count(DISTINCT col2) AS numbers
GROUP BY col1;
applied to this data:
col1 col2
a 5
a 5
a 6
b 1
b 2
b 6
To return this:
col1 col2
a 2
b 3
If the above code will not produce this, how can I accomplish this is T-SQL?

I hope this works for your solution, you need to use group by on col2 with count distinct of col2
SELECT
col1,
COUNT(DISTINCT col2)
FROM
count_unique_values_per_group
GROUP BY
col1

Try this:
SELECT DISTINCT col1
,dense_rank() over (partition by col1 order by col2 asc) + dense_rank() over (partition by col1 order by col2 desc) - 1
FROM my_table

Apply concat function to get the unique count. Hope this helps..
SELECT col1, count(distinct col1 + col2) FROM table_name group by col1;
or
SELECT col1, count(distinct concat(col1,col2)) FROM table_name group by col1;

Related

SQL with having statement now want complete rows

Here is a mock table
MYTABLE ROWS
PKEY 1,2,3,4,5,6
COL1 a,b,b,c,d,d
COL2 55,44,33,88,22,33
I want to know which rows have duplicated COL1 values:
select col1, count(*)
from MYTABLE
group by col1
having count(*) > 1
This returns :
b,2
d,2
I now want all the rows that contain b and d. Normally, I would use where in stmt, but with the count column, not certain what type of statement I should use?
maybe you need
select * from MYTABLE
where col1 in
(
select col1
from MYTABLE
group by col1
having count(*) > 1
)
Use a CTE and a windowed aggregate:
WITH CTE AS(
SELECT Pkey,
Col1,
Col2,
COUNT(1) OVER (PARTITION BY Col1) AS C
FROM dbo.YourTable)
SELECT PKey,
Col1,
Col2
FROM CTE
WHERE C > 1;
Lots of ways to solve this here's another
select * from MYTABLE
join
(
select col1 ,count(*)
from MYTABLE
group by col1
having count(*) > 1
) s on s.col1 = mytable.col1;

SQL DISTINCT based on a single column, but keep all columns as output

--mytable
col1 col2 col3
1 A red
2 A green
3 B purple
4 C blue
Let's call the table above mytable. I want to select only distinct values from col2:
SELECT DISTINCT
col2
FROM
mytable
When I do this the output looks like this, which is expected:
col2
A
B
C
but how do I perform the same type of query, yet keep all columns? The output would look like below. In essence I'm going through mytable looking at col2, and when there's multiple occurrences of col2 I'm only keeping the first row.
col1 col2 col3
1 A red
3 B purple
4 C blue
Do SQL functions (eg DISTINCT) have arguments I could set? I could imagine it to be something like KeepAllColumns = TRUE for this DISTINCT function? Or do I need to perform JOINs to get what I want?
You can use window functions, particularly row_number():
select t.*
from (select t.*, row_number() over (partition by col2 order by col2) as seqnum
from mytable t
) t
where seqnum = 1;
row_number() enumerates the rows, starting with "1". You can control whether you get the oldest, earliest, biggest, smallest . . .
You can use the QUALIFY clause in Teradata:
SELECT col1, col2, col3
FROM mytable
QUALIFY ROW_NUMBER() OVER(PARTITION BY col2 ORDER BY col2) = 1 -- Get 1st row per group
If you want to change the ordering for how to determine which col2 row to get, just change the expression in the ORDER BY.
With NOT EXISTS:
select m.* from mytable m
where not exists (
select 1 from mytable
where col2 = m.col2 and col1 < m.col1
)
This code will return the rows for which there is not another row with the same col2 and a smaller value in col1.

How to use and "in" clause in "having" in HIVE?

I have my data in sometable like this:
col1 col2 col3
A B 3
A B 1
A B 2
C B 1
And I want to get all of the unique groups of col1 and col2 that contain certain rows of col3. Like, all groups of col1 and col2 that contain a "2".
I wanted to do something like this:
select col1, col2 from sometable
group by col1, col2
having col3=1 and col3=2
But I want it to only return groups that have an instance of both 1 and 2 in col3. so, the result after the query should return this:
col1 col2
A B
How do I express this in HIVE? THANK YOU.
I don't know why others deleted answers that where correct and then almost correct but I will put their's back up.
SELECT col1, col2, COUNT(DISTINCT col3)
FROM
sometable
WHERE
col3 IN (1,2)
GROUP BY col1, col2
HAVING
COUNT(DISTINCT col3) > 1
If you actually want to return all of the records that meet your criteria you need to do a sub select and join back to the main table to get them.
SELECT s.*
FROM
sometable s
INNER JOIN (
SELECT col1, col2, COUNT(DISTINCT col3)
FROM
sometable
WHERE
col3 IN (1,2)
GROUP BY col1, col2
HAVING
COUNT(DISTINCT col3) > 1
) t
ON s.Col1 = t.Col1
AND s.Col2 = t.Col2
AND s.col3 IN (1,2)
The gist of this is narrow/filter your rowset to the rows that you want to test col3 IN (1,2) then count the DISTINCT values of col3 to make sure both 1 and 2 exist and not just 1 & 1 or 2 & 2.
I think below mentioned query will be useful for your question.
select col1,col2
from Abc
group by col1,col2
having count(col1) >1 AND COUNT(COL2)>2

Return distinct rows from not entirely distinct results

Two columns, first is distcint, second not so much.
Col1 ---- Col2
1 ---- abc
1 ---- abc (123)
2 ---- def
2 ---- def (324)
etc
I need to bring back distinct records, but only the ones with the longer Col2.
I've tried using the CONTAINS function, but my table isn't full-text indexed.
One option is to use use ROW_NUMBER() ordering by the LEN() of Col2:
SELECT *
FROM (
SELECT Col1, Col2, ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY LEN(Col2) DESC) rn
FROM YourTable
) t
WHERE rn = 1
SQL Fiddle Demo
SELECT col1 ,
col2
FROM ( SELECT col1 ,
col2 ,
Rank() OVER ( PARTITION BY col1 ORDER BY col2 DESC ) row
FROM dbo.table
) t
WHERE row = 1
You can also try this ..

select all columns with one column has different value

In my table,some records have all column values are the same, except one. I need write a query to get those records. what's the best way to do it? the table is like this:
colA colB colC
a b c
a b d
a b e
What's the best way to get all records with all the columns? Thanks for everyone's help.
Assuming you know that column3 will always be different, to get the rows that have more than one value:
SELECT Col1, Col2
FROM Table t
GROUP BY Col1, Col2
HAVING COUNT(distinct col3) > 1
If you need all the values in the three columns, then you can join this back to the original table:
SELECT t.*
FROM table t join
(SELECT Col1, Col2
FROM Table t
GROUP BY Col1, Col2
HAVING COUNT(distinct col3) > 1
) cols
on t.col1 = cols.col1 and t.col2 = cols.col2
Just select those rows that have the different values:
SELECT col1, col2
FROM myTable
WHERE colWanted != knownValue
If this is not what you are looking for, please post examples of the data in the table and the wanted output.
How about something like
SELECT Col1, Col2
FROM Table
GROUP BY Col1, Col2
HAVING COUNT(*) = 1
This will give you Col1, Col2 that have unique data.
Assuming col3 has the difs
SELECT Col1, Col2
FROM Table
GROUP BY Col1, Col2
HAVING COUNT(*) > 1
OR TO SHOW ALL 3 COLS
SELECT Col1, Col2, Col3
FROM Table1
GROUP BY Col1, Col2, Col3
HAVING COUNT(Col3) > 1