I have two queries,
SELECT col1,col2,col3
from scnd1
where col2<>''
group by col1,col2,col3
order by col1
SELECT ROWNUMBER() OVER (PARTITION BY COL1) AS RN FROM SCND1)AS A
WHERE RN > 1
For same table, I need single query to combine these two
ie. 1st I want to sort table as well as remove NULL and then delete the repeated rows by 2nd query.
Try a sub query, along the lines of this:
SELECT
main.col1,
main.col2,
main.col3
FROM scnd1 main
JOIN (SELECT
col1,
col2,
col3,
ROW_NUMBER() OVER (PARTITION BY(col1) ORDER BY col1) AS RN
FROM SCND1
WHERE col2<>''
) sub
ON sub.Col1 = main.Col1 AND sub.Col2 = main.col2 AND sub.Col3 = main.col3
WHERE RN > 1
GROUP BY main.col1,main.col2,main.col3
ORDER BY main.col1
Related
There is a simple table with two columns. Col1 is like an identifier.
I want to SUM the quantity of rows with the same Col1 value. And UPDATE the current table.
Current table:
Col1 | Quantity
-----+----------
12 | 3
15 | 7
12 | 2
The UPDATED table I need is to SUM the quantity of two rows with Col1=12 (e.g .. WHERE Col1=12 ...) and Merge Them as one row:
Col1 | Quantity
-----+----------
12 | 5
15 | 7
How is it possible in a SQL Server query?
Please note that I need to update the table. Not just select rows.
use sum() aggregation with group by
with cte as
(
select col1,quantity,row_number() over(partition by col1 order by quantity) as rn
from tablename
)
update a set a.quantity=b.qty
from cte a
inner join
(select col1, sum(quantity) as qty
from tablename
group by col1
)b on a.col1=b.col1 where rn=1
delete a from tablename a
join
(
select col1,quantity,row_number() over(partition by col1 order by quantity) as rn
from tablename
)b on a.col1=b.col1 and rn>2
You can use the MERGE statement for this.
Like with the UPDATE statement it's possible to combine with a Common-Table-Expression.
And in the CTE the window functions can be used.
;WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER (partition by Col1 order by Quantity) as rn,
SUM(Quantity) OVER (partition by Col1) as TotalQuantity,
COUNT(*) OVER (partition by Col1) as Cnt
FROM TestTable
)
MERGE (SELECT * FROM CTE WHERE cnt > 1) target
USING (SELECT * FROM CTE WHERE cnt > 1 AND rn = 1) src
ON (src.Col1 = target.Col1 AND src.rn = target.rn)
WHEN NOT MATCHED BY SOURCE
THEN DELETE
WHEN MATCHED
THEN UPDATE SET target.Quantity = source.TotalQuantity;
A test on rexrester here
Such statement would however update all records of the table each time you re-run it. Even when the dups were already deleted.
But with a few tweaks it'll become a MERGE query that doesn't update those that have no dups.
;WITH CTE AS
(
SELECT *
FROM
(
SELECT Col1, Quantity,
ROW_NUMBER() OVER (partition by Col1 order by Quantity DESC) as rn,
COUNT(*) OVER (partition by Col1) as cnt,
SUM(Quantity) OVER (partition by Col1) as TotalQuantity
FROM TestTable
) q
WHERE cnt > 1
)
MERGE CTE t
USING (SELECT * FROM CTE WHERE rn = 1) src
ON (src.Col1 = t.Col1 AND src.rn = t.rn)
WHEN NOT MATCHED BY SOURCE
THEN DELETE
WHEN MATCHED
THEN UPDATE SET t.Quantity = src.TotalQuantity;
There are values in my data set. there are 3 columns.
column 1 has values 1,1,3,4,5,5,6,7,7,7,7. I need to sort the column and then apply average.
1,1 means two rows with index 1 and 1. i need to average values in rest of the columns i.e column 2 and column 3 for each row.
similarly for data in 5,5 and so on. able to sort but cannot manage the average problem..
The ROW_NUMBER() should do the sorting for you and (col1+col2+col3)/3 should make it average for you. For nullable columns you will need to do some changes to the code.
SELECT t1.rownumber, (t1.col1 + t2.col2 + t3.col3)/3 as "AVG"
FROM (SELECT ROW_NUMBER() OVER(ORDER BY col1 DESC) AS rownumber, col1 FROM MyTable) t1
INNER JOIN (SELECT ROW_NUMBER() OVER(ORDER BY col2 ASC) AS rownumber, col2 FROM MyTable) as t2 on t1.rownumber = t2.rownumber
INNER JOIN (SELECT ROW_NUMBER() OVER(ORDER BY col3 ASC) AS rownumber, col3 FROM MyTable) as t3 on t1.rownumber = t3.rownumber
Your question sounds like a convoluted way of describing aggregation. Is this what you want?
select col1, avg(col2), avg(col3)
from t
group by col1;
If you want the average on each row, then use window functions:
select col1,
avg(col2) over (partition by (col1),
avg(col3) over (partition by (col1)
from t;
I want top 2 valus per key. The result would look like:
What should be the hive query.
You can use a window function with OVER() close:
select col1,col2 from (SELECT col1,
col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 DESC) AS row_num
FROM data)f
WHERE f.row_num < 3
order by col1,col2
I had an issue with writing a query that would gather groups in a column, and then select one of them by a number.
A good person (#sstan) gave me this:
select your_col
from (select your_col,
row_number() over (order by your_col) as rn
from your_table
group by your_col)
where rn = 2
And it works. However, it appears that my query needs to consider other columns. For now, it looks like this:
select MAINCOL, sum(some_col+other_col) as together_col, count(another_col)
from my_table
where date_col >= next_day(trunc(sysdate), 'MONDAY') - 14
and date_col < next_day(trunc(sysdate), 'MONDAY') - 7
group by MAINCOL, other_col, together_col
order by MAINCOL
So the challenge is to extend the upper query with what is below. Although I couldn't make it work, it seems simple..
You may try with Inner table alias
SELECT your_col,rn.your_col,rn.your_col2,rn.your_col3
FROM(select your_col,your_col2,your_col3,row_number() over (order by your_col)
from your_table group by your_col)as rn where rn = 2
Got it!
With help of Stack, of course.
select t.*
from (select MAINCOL, col1, col2, col3, col4, DENSE_RANK()OVER(ORDER BY MAINCOL) GROUPID
from tab_1
group by MAINCOL, col1, col2
) t
where GROUPID = 1;
Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).