Alternative for count distinct - sql

I want an alternative way to write the following query
SELECT COUNT(DISTINCT col1) FROM table.
I dont want to use distinct. Is there an alternative way?

Try GROUP BY as a subquery and COUNT() from outside query. It would achieve same result.
SELECT COUNT(*)
FROM
(
SELECT Col1
FROM Table
GROUP BY Col1
) tbl

Select count(col1) from table GROUP BY col1

Try this
SELECT COUNT(Col1)
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col1) As RNO, Col1
FROM Table_Name)
WHERE RNO = 1

Related

SQL with having statement now want complete rows

Here is a mock table
MYTABLE ROWS
PKEY 1,2,3,4,5,6
COL1 a,b,b,c,d,d
COL2 55,44,33,88,22,33
I want to know which rows have duplicated COL1 values:
select col1, count(*)
from MYTABLE
group by col1
having count(*) > 1
This returns :
b,2
d,2
I now want all the rows that contain b and d. Normally, I would use where in stmt, but with the count column, not certain what type of statement I should use?
maybe you need
select * from MYTABLE
where col1 in
(
select col1
from MYTABLE
group by col1
having count(*) > 1
)
Use a CTE and a windowed aggregate:
WITH CTE AS(
SELECT Pkey,
Col1,
Col2,
COUNT(1) OVER (PARTITION BY Col1) AS C
FROM dbo.YourTable)
SELECT PKey,
Col1,
Col2
FROM CTE
WHERE C > 1;
Lots of ways to solve this here's another
select * from MYTABLE
join
(
select col1 ,count(*)
from MYTABLE
group by col1
having count(*) > 1
) s on s.col1 = mytable.col1;

how to select max(column) and a column in the same request teradata

I need to select the max of a column and the column itself in the same request using TeraData SQL Assitant
I tried :
select distinct id, col1, max(col1) from tab where id='myId' group by col1,id;
I tried also :
SELECT DISTINCT a.id, a.col1 FROM tab a
INNER JOIN (SELECT max(a.col1) AS maxINT,id FROM tab GROUP BY id)x
ON a.id = x.id
WHERE a.I_INTNE_DOSS_FIN = 'myId' ;
The problem I have the value of col1 in both col1 and max(col1)
Any idea please ?
Thanks in advance.
I think you want the row where col1 has the greater value for each id.
In Teradata, you can do this with row_number() and qualify:
select *
from tab
qualify row_number() over(partition by id order by col1 desc) = 1
Seems like you want both details and aggregate in the same Select. This is easy using Windowed Aggregates, probably
select id, col1, max(col1) over ()
from tab
where id='myId'
I think you just want one row. If so:
select top (1) t.*
from tab
where id = 'myId'
order by col1 desc;

Distinct over multiple columns in SQL Server

How to apply distinct on multiple rows in SQL Server? The query that I have tried below does not work on SQL Server.
select distinct(column1, column2), column3
from table_name
select distinct applies to all columns in the row. So, you can do:
select distinct col1, col2, col3
from t;
If you only want col1 and col2 to be distinct, then group by works:
select col1, col2, min(col3)
from t
group by col1, col2;
Or if you want random rows, you can use row_number(). For instance:
select t.*
from (select t.*,
row_number() over (partition by col1, col2 order by newid()) as seqnum
from t
) t
where seqnum = 1;
A clever version of this doesn't require a subquery:
select top (1) with ties t.*
from t
order by row_number() over (partition by col1, col2 order by newid());

Hive / SQL query for top n values per key

I want top 2 valus per key. The result would look like:
What should be the hive query.
You can use a window function with OVER() close:
select col1,col2 from (SELECT col1,
col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 DESC) AS row_num
FROM data)f
WHERE f.row_num < 3
order by col1,col2

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).