Is there an equivalent stats_mode from Oracle in Netezza? - sql

I need to create a view in Netezza that currently exists in Oracle. The Oracle view uses 'STATS_MODE' to return the value that occurs most often. Is there an equivalent function in Netezza?

You can use two levels of aggregation:
select col1, col2 as mode
from (select col1, col2, count(*) as cnt,
row_number() over (partition by col1 order by count(*) desc) seqnum
from t
group by col1, col2
) t
where seqnum = 1;

Related

Distinct over multiple columns in SQL Server

How to apply distinct on multiple rows in SQL Server? The query that I have tried below does not work on SQL Server.
select distinct(column1, column2), column3
from table_name
select distinct applies to all columns in the row. So, you can do:
select distinct col1, col2, col3
from t;
If you only want col1 and col2 to be distinct, then group by works:
select col1, col2, min(col3)
from t
group by col1, col2;
Or if you want random rows, you can use row_number(). For instance:
select t.*
from (select t.*,
row_number() over (partition by col1, col2 order by newid()) as seqnum
from t
) t
where seqnum = 1;
A clever version of this doesn't require a subquery:
select top (1) with ties t.*
from t
order by row_number() over (partition by col1, col2 order by newid());

Hive / SQL query for top n values per key

I want top 2 valus per key. The result would look like:
What should be the hive query.
You can use a window function with OVER() close:
select col1,col2 from (SELECT col1,
col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 DESC) AS row_num
FROM data)f
WHERE f.row_num < 3
order by col1,col2

How to replace a DISTINCT ON with GROUP BY in PostgreSQL 9?

I have been using the DISTINCT ON predicate and have decided to replace it with GROUP BY, mainly because it "is not part of the SQL standard and is sometimes considered bad style because of the potentially indeterminate nature of its results".
I am using DISTINCT ON in conjunction with ORDER BY in order to select the latest records in a history table, but it's not clear to me how to do the same with the GROUP BY.
What could be a general approach in order to move from one construct to the other one?
An example could be
SELECT
DISTINCT ON (f1, f2 ) *
FROM table
ORDER BY f1, f2, datefield DESC;
where I get the "latest" pairs of (f1,f2).
If you have a query like this:
select distinct on (col1) t.*
from table t
order by col1, col2
Then you would replace this with window functions, not a group by:
select t.*
from (select t.*,
row_number() over (partition by col1 order by col2) as seqnum
from table t
) t
where seqnum = 1;

Alternative for count distinct

I want an alternative way to write the following query
SELECT COUNT(DISTINCT col1) FROM table.
I dont want to use distinct. Is there an alternative way?
Try GROUP BY as a subquery and COUNT() from outside query. It would achieve same result.
SELECT COUNT(*)
FROM
(
SELECT Col1
FROM Table
GROUP BY Col1
) tbl
Select count(col1) from table GROUP BY col1
Try this
SELECT COUNT(Col1)
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col1) As RNO, Col1
FROM Table_Name)
WHERE RNO = 1

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).