If two rows have same id but different col2, how can you keep only the ones that have max col3? - sql

I have a table with three columns (id, col2, col3, col4) where col2 is A or B and col3 and col4 are integers. My problem is, there are many columns that have the same id and a different col2 value, and I want to select ONLY the rows that have a maximum value in col3.
For instance, if we have:
id | col2 | col3 | col4
1 | A | 3 | 2
1 | B | 5 | 3
2 | A | 6 | 2
...
I want to keep only the tuple (1, B, 5, 3). How can I achieve this?
I've tried this:
SELECT id, col2, MAX(col3), col4 FROM t GROUP BY id;
but I get an error saying that this is not a valid GROUP BY statement.

You can use keep:
SELECT id,
MAX(col2) KEEP (DENSE_RANK FIRST ORDER BY col3 DESC) as col2
MAX(col3),
MAX(col4) KEEP (DENSE_RANK FIRST ORDER BY col3 DESC) as col4
FROM t
GROUP BY id;
Or:
SELECT id, col2, col3, col4
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY col3 DESC) as seqnum
FROM t
) t
WHERE seqnum = 1;

This query:
select t.*
from tablename t inner join (
select id, max(col3) col3
from tablename
group by id
having count(distinct col2) > 1
) g on g.id = t.id and g.col3 = t.col3
returns for each id that has different values in col2 only 1 row: the one containing the maximum value of col3.
If you also want the other rows where each id does not have different values in col2, then use UNION ALL:
select t.*
from tablename t inner join (
select id, max(col3) col3
from tablename
group by id
having count(distinct col2) > 1
) g on g.id = t.id and g.col3 = t.col3
union all
select t.* from tablename t
where not exists (
select 1 from tablename
where id = t.id and col2 <> t.col2
)

select * from TableName where col3 = (select max(col3) from TableName)

Related

Compare main table with all records from another table to derive the column value of the main table

I have two tables tb1 & main_tbl with sample dataset as shown below and I'm trying to derive the value for the column COL_VAL for the main table. So I have created the query for getting the expected value. However I'm looking for the possibility to simply the number of code lines and achieve the same result
main_tbl Table:
col1 col2 col3 COL_VAL
123 Hi 568 ??
tbl Table:
col1 col2 col3 col4 col5
123 LN Y IP 2021-02-01
123 LN N NON-IP 2021-02-01
123 MOB Y AP 2021-02-01
123 MOB N NON-AP 2021-02-01
Main Query:
SELECT
d.COL1,
d.COL2,
d.COL3,
CAST(COALESCE(FRT_QRY.COL4,SND_QRY.COL4,FIF_QRY.COL4,TRD_QRY.COL4) AS STRING) AS COL_VAL
FROM
(
SELECT * FROM db.main_tbl)d
LEFT JOIN
(
SELECT * FROM
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY col1,col2,col3 ORDER BY col5 desc) as Rnk
FROM ( select * from db.tb1 where col2 IN ('LN') and col3 = 'Y') b
) a where a.Rnk =1
) SND_QRY
on d.col1=SND_QRY.col1
LEFT JOIN
(
SELECT * FROM
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY col1,col2,col3 ORDER BY col5 desc) as Rnk
FROM ( select * from db.tb1 where col2 IN ('LN') and col3 = 'N') b
) a where a.Rnk =1
) TRD_QRY
on d.col1=TRD_QRY.col1
LEFT JOIN
(
SELECT * FROM
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY col1,col2,col3 ORDER BY col5 desc) as Rnk
FROM ( select * from db.tb1 where col2 IN ('MOB') and col3 = 'Y') b
) a where a.Rnk =1
) FRT_QRY
on d.col1=FRT_QRY.col1
LEFT JOIN
(
SELECT * FROM
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY col1,col2,col3 ORDER BY col5 desc) as Rnk
FROM ( select * from db.tb1 where col2 IN ('MOB') and col3 = 'N') b
) a where a.Rnk =1
) FIF_QRY
on d.col1=FIF_QRY.col1
Expected Output - main_tbl Table:
col1 col2 col3 COL_VAL
123 Hi 568 AP
To start with something, I noticed that all your subqueries contain different filters applied to the same columns and those columns are in partition by clause. This means that filters do not affect row_number and you can calculate row_number once without filters and use filters as join conditions or filter in join subqueries:
WITH RANKED AS (
SELECT * FROM
( SELECT b.*,
ROW_NUMBER() OVER(PARTITION BY col1,col2,col3 ORDER BY col5 desc) as Rnk
FROM db.tb1 b
) a where a.Rnk =1
)
SELECT
d.COL1,
d.COL2,
d.COL3,
CAST(COALESCE(FRT_QRY.COL4,SND_QRY.COL4,FIF_QRY.COL4,TRD_QRY.COL4) AS STRING) AS COL_VAL
FROM
(
SELECT * FROM db.main_tbl)d
LEFT JOIN RANKED SND_QRY on d.col1=SND_QRY.col1 AND SND_QRY.col2 IN ('LN') AND SND_QRY.col3 = 'Y'
LEFT JOIN RANKED TRD_QRY on d.col1=TRD_QRY.col1 AND TRD_QRY.col2 IN ('LN') AND TRD_QRY.col3 = 'N'
LEFT JOIN RANKED FRT_QRY on d.col1=FRT_QRY.col1 AND FRT_QRY.col2 IN ('MOB') AND FRT_QRY.col3 = 'Y'
LEFT JOIN RANKED FIF_QRY on d.col1=FIF_QRY.col1 AND FIF_QRY.col2 IN ('MOB') AND FIF_QRY.col3 = 'N'
Also if you are lucky and have Hive version with CTE materialization feature, use this setting:
set hive.optimize.cte.materialize.threshold=2;--HIVE-11752
RANKED CTE will be calculated only one time and the same result used in all joins.
Also you can try to eliminate many joins with the same table. Calculate all fields in single query using CASE expressions + aggregation and join only one time. Aggregation works faster than joins:
WITH RANKED AS (
SELECT col1,
--aggregate all in single row per col1
max(case when col2 IN ('LN') AND col3 = 'Y' then COL4 else null end) as SND_COL4,
max(case when col2 IN ('LN') AND col3 = 'N' then COL4 else null end) as TRD_COL4,
max(case when col2 IN ('MOB') AND col3 = 'Y' then COL4 else null end) as FRT_COL4,
max(case when col2 IN ('MOB') AND col3 = 'N' then COL4 else null end) as FIF_COL4
FROM
( SELECT b.*,
ROW_NUMBER() OVER(PARTITION BY col1,col2,col3 ORDER BY col5 desc) as Rnk
FROM db.tb1 b
WHERE (col2 IN ('LN') AND col3 = 'Y')
or (col2 IN ('LN') AND col3 = 'N')
or (col2 IN ('MOB') AND col3 = 'Y')
or (col2 IN ('MOB') AND col3 = 'N')
) a where a.Rnk =1
GROUP BY col1
)
SELECT
d.COL1,
d.COL2,
d.COL3,
CAST(COALESCE(R.FRT_COL4,R.SND_COL4,R.FIF_COL4, R.TRD_COL4) AS STRING) AS COL_VAL
FROM
(
SELECT * FROM db.main_tbl)d
LEFT JOIN RANKED R d.col1=R.col1

Oracle Group by based on next row value

We are trying to get a group by result by checking the next rows value.
Sample Data:
Table A
COL1 COL2 COL3
---- ---- ----
B BUY 1
B SELL 1.2
B SELL 2
C BUY 3
C SELL 4
C BUY 5
Result:
COL1 COL2 COUNT(1)
---- ---- --------
B BUY 1
B SELL 2
C BUY 1
C SELL 1
C BUY 1
You appear to have ordered by COL3; if this is the case then:
SELECT col1,
col2,
change - COALESCE( LAG( change ) OVER ( PARTITION BY col1 ORDER BY change ), 0 )
AS cnt
FROM (
SELECT col1,
col2,
CASE LEAD( col2 ) OVER ( PARTITION BY col1 ORDER BY col3 )
WHEN col2
THEN NULL
ELSE ROW_NUMBER() OVER ( PARTITION BY col1 ORDER BY col3 )
END AS change
FROM a
)
WHERE change IS NOT NULL;
If I understand correctly, you can do this with a difference of row numbers approach:
select col1, col2, count(*)
from (select t.*,
row_number() over (partition by col1 order by col3) as seqnum,
row_number() over (partition by col1, col2 order by col3) as seqnum_2,
from t
) t
group by col1, col2, (seqnum - seqnum_2);
This identifies groups of adjacent col2 values based on the ordering in col3.

Count on case Oracle

WE have below data in oracle database -
col1 col2
Z1 A
Z1 B
Z2 A
Z2 C
Z3 A
Z4 D
I want count on column two in such a way that -
Ouput -
col2 count
A 3 (Z1,Z2,Z3)
B 0 (Dont count if A is already present for record)
C 0
D 1 (Z4)
Best Regards
You can use window function rank() to achieve this.
select col2, count(case when rn = 1 then 1 end) cnt from (
select t.*,
rank() over (partition by col1 order by case when col2 = 'A' then 1 else 2 end) rn
from table t
) group by col2;
The most general solution to your propositions where each key COL1 is counted only in the first occurrence of the key COL2 (in alphabetical order)
WITH tab AS
(
SELECT 'Z1' col1, 'A' col2 FROM dual UNION ALL
SELECT 'Z1' col1, 'B' col2 FROM dual UNION ALL
SELECT 'Z2' col1, 'A' col2 FROM dual UNION ALL
SELECT 'Z2' col1, 'C' col2 FROM dual UNION ALL
SELECT 'Z3' col1, 'A' col2 FROM dual UNION ALL
SELECT 'Z4' col1, 'D' col2 FROM dual
), tab2 as (
select COL1, COL2,
row_number() over (partition by COL1 order by COL2) as rn
from tab)
select COL1, COL2,
case when rn = 1 then 1 else 0 end is_valid
from tab2
order by 1,2
;
COL1 COL2 IS_VALID
---- ---- ----------
Z1 A 1
Z1 B 0
Z2 A 1
Z2 C 0
Z3 A 1
Z4 D 1
The rest is simple group by with a SUM on IS_VALID
select COL2, sum(is_valid) cnt from tab3 -- TAB3 is the above row source
group by COL2
order by 1
COL2 CNT
---- ----------
A 3
B 0
C 0
D 1
Thanks Guys. But I could do this way -
select count(case
when (LISTAGG(col2,'-') WITHIN GROUP (ORDER BY col2)) like '%A%' then 1
else null
end) A,
count(case
when (LISTAGG(col2,'-') WITHIN GROUP (ORDER BY col2)) = 'B' then 1
else null
end) B,
count(case
when (LISTAGG(col2,'-') WITHIN GROUP (ORDER BY col2)) = 'C' then 1
else null
end) C,
count(case
when (LISTAGG(col2,'-') WITHIN GROUP (ORDER BY col2)) = 'D' then 1
else null
end) D
from T
GROUP BY col1
Thanks for your replies
Assume your table name is table_name, One way to do it is using this:
WITH table_a AS
(
SELECT DISTINCT col1
FROM table_name
WHERE col2 = 'A'
)
SELECT col2,
SUM(CASE WHEN col1 IN (SELECT col1 FROM table_a)
THEN DECODE(col2, 'A', 1, 0)
ELSE 1 END
) count
FROM table_name
GROUP BY col2
ORDER BY col2;
Tested ok:
WITH table_name AS
(
SELECT 'Z1' col1, 'A' col2 FROM dual UNION ALL
SELECT 'Z1' col1, 'B' col2 FROM dual UNION ALL
SELECT 'Z2' col1, 'A' col2 FROM dual UNION ALL
SELECT 'Z2' col1, 'C' col2 FROM dual UNION ALL
SELECT 'Z3' col1, 'A' col2 FROM dual UNION ALL
--SELECT 'Z4' col1, 'B' col2 FROM dual UNION ALL
SELECT 'Z4' col1, 'D' col2 FROM dual
)
, table_a AS
(
SELECT DISTINCT col1
FROM table_name
WHERE col2 = 'A'
)
SELECT col2,
SUM(CASE WHEN col1 IN (SELECT col1 FROM table_a)
THEN DECODE(col2, 'A', 1, 0)
ELSE 1 END
) count
FROM table_name
GROUP BY col2
ORDER BY col2;
You want to count each record where either col2 is 'A' or no 'A' record exists for col1.
select
col2,
count(
case
when col2 = 'A' or col1 not in (select col1 from table_name where col2 = 'A') then 1
end) as cnt
from table_name
group by col2;
select col2, count(case when col2 = col3 then 'x' end) as ct
from ( select col2, min(col2) over (partition by col1) as col3
from your_table
)
group by col2
order by col2 -- if needed
;
Explanation:
There is an inner query (a.k.a. "subquery") which returns one row for each row in the original table. It returns col2 as is, and an additional (new) column, labeled col3. col3 is calculated as the "first" or min() value of col2 (in alphabetical order) for all the rows in the original table that have the same value in col1 as the current row does. This is a typical example of an analytic function; partition by col1 is similar to group by col1 but it returns all the rows in the group (all the original rows from the original table) instead of one row per group, as would an aggregate function.
To see what the inner query does by itself, select it and run it in your favorite front-end. You may add col1 to the select in the inner query - that will make what's going on in this query even clearer. You'll get the initial table, with one more column, col3, that shows the "min" col2 for each value of col1. I didn't include col1 in the subquery because I don't need it, but add it back to see what the subquery really does.
Then in the outer query I take the results from the inner query and I group by col2. For each col2 I count just how many times it is equal to the "min" value of col2 for the corresponding col1 value. That's what the case expression does in the count() function; when col2 is not equal to col3, then case returns null (by default) so the expression - and therefore the row - is not counted.
I should add that the query written this way assumes there are no duplicate (col1, col2) rows in the original table. If there are, then the inner subquery should select from a sub-subquery; line 3 of my code should be
from (select distinct col1, col2 from your_table)
Use the below script:
SELECT A.COL2, NVL(B.CNT, 0) AS CNT
FROM (SELECT DISTINCT COL2 FROM TET) A
LEFT JOIN (SELECT COL2, COUNT(COL2) AS CNT
FROM (SELECT SUBSTR(F, 1, INSTR(F, ',') - 1) AS COL2,
ROW_NUMBER() OVER(PARTITION BY SUBSTR(F, 1, INSTR(F, ',') - 1) ORDER BY SUBSTR(F, 1, INSTR(F, ',') - 1)) AS U
FROM (SELECT COL1,
LISTAGG(COL2, ',') WITHIN GROUP(ORDER BY COL2) || ',' AS F
FROM TET
GROUP BY COL1)) A
GROUP BY COL2) B
ON A.COL2 = B.COL2
ORDER BY A.COL2;

SQL Server : get max of the column2 and column3 value must be 1

I have an output of some part of my stored proedure like this:
col1 col2 col3 col4
--------------------------
2016-05-05 1 2 2
2016-05-05 1 3 32
2016-05-12 2 1 11
2016-05-12 3 1 31
Now I need to get result based on this condition
col2 = 1 and col3 = max or col3 = 1
and col2 = max
The final result should be
col1 col2 col3 col4
-------------------------
2016-05-05 1 3 32
2016-05-12 3 1 31
Not sure if thats the most efficient way , but you can use ROW_NUMBER() :
SELECT * FROM (
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.col1 ORDER BY t.col3 DESC) as rnk,
WHERE t.col2 = 1
UNION ALL
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.col1 ORDER BY t.col2 DESC) as rnk,
WHERE t.col3 = 1) tt
WHERE rnk = 1
This will give you all the records with
(col2=1 and col3=max) or (col3=1 and col2=max)
This is a bit tricky. Your data has no ambiguities, such as duplicate maximuma in col4 or "1" values in both col2 and col3.
The following is a direct translation of the logic in your question:
select t.*
from t
where t.col4 = (select max(t2.col4)
from t t2
where t2.col1 = t.col1 and (t2.col2 = 1 or t2.col3 = 1)
);
Try this. Note if there are more than 1 same max value, then you need all of those in output. And it will work for all scenarios, even when col1 is not in sync with col2 and col3.
I am first finding highest values of col2 and col3 and assigning them value as 1. Then in outer query, I am using your join condition. Demo created for Postgres DB as SQLServer wasn't available.
SQLFiddle Demo
select col1,col2,col3,col4
from
(
select t.*,
RANK() OVER(ORDER BY col3 DESC) as col3_max,
RANK() OVER(ORDER BY col2 DESC) as col2_max
from your_table t
) t1
where
(col2=1 and col3_max=1)
OR
(col3=1 and col2_max=1)
Alternative way:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY iif(col2 = 1, col3, col2) DESC) as r
FROM tbl) t
WHERE r = 1

Getting the value of no grouping column

I know the basics in SQL programming and I know how to apply some tricks in SQL Server in order to get the result set, but I don't know all tricks in Oracle.
I have these columns:
col1 col2 col3
And I wrote this query
SELECT
col1, MAX(col3) AS mx3
FROM
myTable
GROUP BY
col1
And I need to get the value of col2 in the same row where I found the max value of col3, do you know some trick to solve this problem?
The easiest way to do this, IMHO, is not to use max, but the window function rank:
SELECT col1 , col2, col3
FROM (SELECT col1, col2, col3,
RANK() OVER (PARTITION BY col1 ORDER BY col3 DESC) rk
FROM myTable) t
WHERE rk = 1
BTW, the same syntax should also work for MS SQL-Server and most other modern databases, with MySQL being the notable exception.
A couple of different ways to do this:
In both cases I'm treating your initial query as either a common table expression or as an inline view and joining it back to the base table to get your added column. The trick here is that the INNER JOIN eliminates all the records not in your max query.
SELECT A.*,
FROM myTable A
INNER JOIN (SELECT col1 , MAX( col3 ) AS mx3 FROM myTable GROUP BY col1) B
on A.Col1=B.Col1
and B.mx3 = A.Col3
or
with CTE AS (SELECT col1 , MAX( col3 ) AS mx3 FROM myTable GROUP BY col1)
SELECT A.*
FROM MyTable A
INNER JOIN CTE
on A.col1 = B.Col1
and A.col3= cte.mx3
Here's an alternative that's just a slight extension of your existing group by query (ie. doesn't require querying the same table more than once):
with mytable as (select 1 col1, 1 col2, 1 col3 from dual union all
select 1 col1, 2 col2, 2 col3 from dual union all
select 1 col1, 1 col2, 3 col3 from dual union all
select 1 col1, 3 col2, 3 col3 from dual union all
select 2 col1, 10 col2, 1 col3 from dual union all
select 2 col1, 23 col2, 2 col3 from dual union all
select 2 col1, 12 col2, 2 col3 from dual)
SELECT
col1,
MAX(col2) keep (dense_rank first order by col3 desc) mx2,
MAX(col3) AS mx3
FROM
myTable
GROUP BY
col1;
COL1 MX2 MX3
---------- ---------- ----------
1 3 3
2 23 2