Distinct over multiple columns in SQL Server - sql

How to apply distinct on multiple rows in SQL Server? The query that I have tried below does not work on SQL Server.
select distinct(column1, column2), column3
from table_name

select distinct applies to all columns in the row. So, you can do:
select distinct col1, col2, col3
from t;
If you only want col1 and col2 to be distinct, then group by works:
select col1, col2, min(col3)
from t
group by col1, col2;
Or if you want random rows, you can use row_number(). For instance:
select t.*
from (select t.*,
row_number() over (partition by col1, col2 order by newid()) as seqnum
from t
) t
where seqnum = 1;
A clever version of this doesn't require a subquery:
select top (1) with ties t.*
from t
order by row_number() over (partition by col1, col2 order by newid());

Related

SQL with having statement now want complete rows

Here is a mock table
MYTABLE ROWS
PKEY 1,2,3,4,5,6
COL1 a,b,b,c,d,d
COL2 55,44,33,88,22,33
I want to know which rows have duplicated COL1 values:
select col1, count(*)
from MYTABLE
group by col1
having count(*) > 1
This returns :
b,2
d,2
I now want all the rows that contain b and d. Normally, I would use where in stmt, but with the count column, not certain what type of statement I should use?
maybe you need
select * from MYTABLE
where col1 in
(
select col1
from MYTABLE
group by col1
having count(*) > 1
)
Use a CTE and a windowed aggregate:
WITH CTE AS(
SELECT Pkey,
Col1,
Col2,
COUNT(1) OVER (PARTITION BY Col1) AS C
FROM dbo.YourTable)
SELECT PKey,
Col1,
Col2
FROM CTE
WHERE C > 1;
Lots of ways to solve this here's another
select * from MYTABLE
join
(
select col1 ,count(*)
from MYTABLE
group by col1
having count(*) > 1
) s on s.col1 = mytable.col1;

Is there an equivalent stats_mode from Oracle in Netezza?

I need to create a view in Netezza that currently exists in Oracle. The Oracle view uses 'STATS_MODE' to return the value that occurs most often. Is there an equivalent function in Netezza?
You can use two levels of aggregation:
select col1, col2 as mode
from (select col1, col2, count(*) as cnt,
row_number() over (partition by col1 order by count(*) desc) seqnum
from t
group by col1, col2
) t
where seqnum = 1;

Hive / SQL query for top n values per key

I want top 2 valus per key. The result would look like:
What should be the hive query.
You can use a window function with OVER() close:
select col1,col2 from (SELECT col1,
col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 DESC) AS row_num
FROM data)f
WHERE f.row_num < 3
order by col1,col2

Alternative for count distinct

I want an alternative way to write the following query
SELECT COUNT(DISTINCT col1) FROM table.
I dont want to use distinct. Is there an alternative way?
Try GROUP BY as a subquery and COUNT() from outside query. It would achieve same result.
SELECT COUNT(*)
FROM
(
SELECT Col1
FROM Table
GROUP BY Col1
) tbl
Select count(col1) from table GROUP BY col1
Try this
SELECT COUNT(Col1)
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col1) As RNO, Col1
FROM Table_Name)
WHERE RNO = 1

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).