Multiple rows match, but I only want one? - sql

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?

If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1

As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )

Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).

Related

Need to sort first column by values in a dataset and then find average

There are values in my data set. there are 3 columns.
column 1 has values 1,1,3,4,5,5,6,7,7,7,7. I need to sort the column and then apply average.
1,1 means two rows with index 1 and 1. i need to average values in rest of the columns i.e column 2 and column 3 for each row.
similarly for data in 5,5 and so on. able to sort but cannot manage the average problem..
The ROW_NUMBER() should do the sorting for you and (col1+col2+col3)/3 should make it average for you. For nullable columns you will need to do some changes to the code.
SELECT t1.rownumber, (t1.col1 + t2.col2 + t3.col3)/3 as "AVG"
FROM (SELECT ROW_NUMBER() OVER(ORDER BY col1 DESC) AS rownumber, col1 FROM MyTable) t1
INNER JOIN (SELECT ROW_NUMBER() OVER(ORDER BY col2 ASC) AS rownumber, col2 FROM MyTable) as t2 on t1.rownumber = t2.rownumber
INNER JOIN (SELECT ROW_NUMBER() OVER(ORDER BY col3 ASC) AS rownumber, col3 FROM MyTable) as t3 on t1.rownumber = t3.rownumber
Your question sounds like a convoluted way of describing aggregation. Is this what you want?
select col1, avg(col2), avg(col3)
from t
group by col1;
If you want the average on each row, then use window functions:
select col1,
avg(col2) over (partition by (col1),
avg(col3) over (partition by (col1)
from t;

Select group by with a max predicate

Quite often I have to do queries like below:
select col1, max(id)
from Table
where col2 = 'value'
and col3 = ( select max(col3)
from Table
where col2 = 'value'
)
group by col1
Are there any other ways to avoid subqueries and temp tables? Basically I need a group by on all the rows with a particular max value. Assuming all proper indices are used.
You can use an OLAP function to achieve this. I would say this solution is marginally better in that your predicates are not duplicated between the main query and subquery, so you don't violate DRY:
SELECT *
FROM (
select col1, max(id) as max_id,
RANK() OVER (PARTITION BY col1 ORDER BY col3 DESC) AS irow
from [Member]
where col2 = 'value'
group by col1
) subquery
WHERE subquery.irow = 1

Hive / SQL query for top n values per key

I want top 2 valus per key. The result would look like:
What should be the hive query.
You can use a window function with OVER() close:
select col1,col2 from (SELECT col1,
col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 DESC) AS row_num
FROM data)f
WHERE f.row_num < 3
order by col1,col2

How to intersect two tables without losing the duplicate values oracle

How to intersect two tables without losing the duplicate values in Oracle?
TAB1:
A
A
B
C
TAB2:
A
A
B
D
Output:
A
A
B
A subquery will filter the rows:
select *
from tab1
where col in (select col from tab2)
If I understand correctly:
select a.*, row_number() over (partition by col1 order by col1)
from a
intersect
select b.*, row_number() over (partition by col1 order by col1)
from b;
This adds a new sequential number to each row. Intersect will go up to the matching number.
This uses partition by col1 -- the col1 is arbitrary. You may need to include all columns in the partition by.

Add Identity column to a view in SQL Server 2008

This is my view:
Create View [MyView] as
(
Select col1, col2, col3 From Table1
UnionAll
Select col1, col2, col3 From Table2
)
I need to add a new column named Id and I need to this column be unique so I think to add new column as identity. I must mention this view returned a large of data so I need a way with good performance, And also I use two select query with union all I think this might be some complicated so what is your suggestion?
Use the ROW_NUMBER() function in SQL Server 2008.
Create View [MyView] as
SELECT ROW_NUMBER() OVER( ORDER BY col1 ) AS id, col1, col2, col3
FROM(
Select col1, col2, col3 From Table1
Union All
Select col1, col2, col3 From Table2 ) AS MyResults
GO
The view is just a stored query that does not contain the data itself so you can add a stable ID. If you need an id for other purposes like paging for example, you can do something like this:
create view MyView as
(
select row_number() over ( order by col1) as ID, col1 from (
Select col1 From Table1
Union All
Select col1 From Table2
) a
)
There is no guarantee that the rows returned by a query using ROW_NUMBER() will be ordered exactly the same with each execution unless the following conditions are true:
Values of the partitioned column are unique. [partitions are parent-child, like a boss has 3 employees][ignore]
Values of the ORDER BY columns are unique. [if column 1 is unique, row_number should be stable]
Combinations of values of the partition column and ORDER BY columns are unique. [if you need 10 columns in your order by to get unique... go for it to make row_number stable]"
There is a secondary issue here, with this being a view. Order By's don't always work in views (long-time sql bug). Ignoring the row_number() for a second:
create view MyView as
(
select top 10000000 [or top 99.9999999 Percent] col1
from (
Select col1 From Table1
Union All
Select col1 From Table2
) a order by col1
)
Using "row_number() over ( order by col1) as ID" is very expensive.
This way is much more efficient in cost:
Create View [MyView] as
(
Select ID = isnull(cast(newid() as varchar(40)), '')
, col1
, col2
, col3
From Table1
UnionAll
Select ID = isnull(cast(newid() as varchar(40)), '')
, col1
, col2
, col3
From Table2
)
use ROW_NUMBER() with "order by (select null)" this will be less expensive and will get your result.
Create View [MyView] as
SELECT ROW_NUMBER() over (order by (select null)) as id, *
FROM(
Select col1, col2, col3 From Table1
Union All
Select col1, col2, col3 From Table2 ) R
GO