In postgres, you can query for the first value of in a group with DISTINCT ON. How can this be achieved in Oracle?
From the postgres manual:
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of
each set of rows where the given expressions evaluate to equal. The
DISTINCT ON expressions are interpreted using the same rules as for
ORDER BY (see above). Note that the "first row" of each set is
unpredictable unless ORDER BY is used to ensure that the desired row
appears first.
For example, for a given table:
col1 | col2
------+------
A | AB
A | AD
A | BC
B | AN
B | BA
C | AC
C | CC
Ascending sort:
> select distinct on(col1) col1, col2 from tmp order by col1, col2 asc;
col1 | col2
------+------
A | AB
B | AN
C | AC
Descending sort:
> select distinct on(col1) col1, col2 from tmp order by col1, col2 desc;
col1 | col2
------+------
A | BC
B | BA
C | CC
The same effect can be replicated in Oracle either by using the first_value() function or by using one of the rank() or row_number() functions.
Both variants also work in Postgres.
first_value()
select distinct col1,
first_value(col2) over (partition by col1 order by col2 asc)
from tmp
first_value gives the first value for the partition, but repeats it for each row, so it is necessary to use it in combination with distinct to get a single row for each partition.
row_number() / rank()
select col1, col2 from (
select col1, col2,
row_number() over (partition by col1 order by col2 asc) as rownumber
from tmp
) foo
where rownumber = 1
Replacing row_number() with rank() in this example yields the same result.
A feature of this variant is that it can be used to fetch the first N rows for a given partition (e.g. "last 3 updated") simply by changing rownumber = 1 to rownumber <= N.
If you have more than two fields then use beerbajays answer as a sub query (note in DESC order):
select col1,col2, col3,col4 from tmp where col2 in
(
select distinct
first_value(col2) over (partition by col1 order by col2 DESC) as col2
from tmp
--WHERE you decide conditions
)
Related
Let's say I have a table with col1 and col2
I group by col1 and order by col1
From the first group, I want to have all values of col2 but from the second group, I want to have only those values which were present in the first group and so on with the consecutive groups.
sample table
col1 col2
1 A
1 B
1 C
1 D
2 E
2 A
2 B
2 G
3 B
3 D
And the output should be
col1 col2
1 A
1 B
1 C
1 D
2 A
2 B
3 B
You can use window functions in order to avoid to read the same table twice:
Number the groups to make sure to have 1, 2, 3, ... without gaps.
Get a rolling count of col2, or in other words the cumulated numbers of their appearances.
Only show rows where the group number equals the count.
The query:
select col1, col2
from
(
select
col1, col2,
dense_rank() over (order by col1) as rn,
count(*) over (partition by col2 order by col1) as cnt
from mytable
) numbered_and_counted
where rn = cnt
order by col1, col2;
Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=f0cc6a211a1a4c767c9e3ce9deb8c28f
I have below data in a big query table
col1 col2
abc 3/22/2020
abc 3/4/2020
xyz 3/22/2020
xyz 3/4/2020
I am trying to get below output.
col1 col2
abc 3/22/2020
xyz 3/22/2020
For this I have tried using the rank() OVER Partition clause, but no luck. Please advise.
select * from (select col1, col2 RANK() over (partition by col1, col2 order by col1, col2 desc) as r1 from table1) temp
where temp.r1 = 1
You were very close - correct one is (just slight adjustment of your query)
#standardSQL
SELECT * EXCEPT(r1) FROM (
SELECT col1, col2, RANK() OVER (PARTITION BY col1 ORDER BY col2 DESC) AS r1
FROM table1) temp
WHERE r1 = 1
WHile above should work, below is more optimal BigQuery'ish option
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY col2 DESC LIMIT 1)[OFFSET(0)]
FROM table1 t
GROUP BY col1
I mean, I don't know if this is a good practice but I've done this several times.
select col1, MIN(PARSE_DATE('%m-%d-%Y', col2))
group by col1
I have an output of some part of my stored proedure like this:
col1 col2 col3 col4
--------------------------
2016-05-05 1 2 2
2016-05-05 1 3 32
2016-05-12 2 1 11
2016-05-12 3 1 31
Now I need to get result based on this condition
col2 = 1 and col3 = max or col3 = 1
and col2 = max
The final result should be
col1 col2 col3 col4
-------------------------
2016-05-05 1 3 32
2016-05-12 3 1 31
Not sure if thats the most efficient way , but you can use ROW_NUMBER() :
SELECT * FROM (
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.col1 ORDER BY t.col3 DESC) as rnk,
WHERE t.col2 = 1
UNION ALL
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.col1 ORDER BY t.col2 DESC) as rnk,
WHERE t.col3 = 1) tt
WHERE rnk = 1
This will give you all the records with
(col2=1 and col3=max) or (col3=1 and col2=max)
This is a bit tricky. Your data has no ambiguities, such as duplicate maximuma in col4 or "1" values in both col2 and col3.
The following is a direct translation of the logic in your question:
select t.*
from t
where t.col4 = (select max(t2.col4)
from t t2
where t2.col1 = t.col1 and (t2.col2 = 1 or t2.col3 = 1)
);
Try this. Note if there are more than 1 same max value, then you need all of those in output. And it will work for all scenarios, even when col1 is not in sync with col2 and col3.
I am first finding highest values of col2 and col3 and assigning them value as 1. Then in outer query, I am using your join condition. Demo created for Postgres DB as SQLServer wasn't available.
SQLFiddle Demo
select col1,col2,col3,col4
from
(
select t.*,
RANK() OVER(ORDER BY col3 DESC) as col3_max,
RANK() OVER(ORDER BY col2 DESC) as col2_max
from your_table t
) t1
where
(col2=1 and col3_max=1)
OR
(col3=1 and col2_max=1)
Alternative way:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY iif(col2 = 1, col3, col2) DESC) as r
FROM tbl) t
WHERE r = 1
I have the following data which I would like to filter so I only get only one row based on the grouping of the first column and select the max date
co2 contains unique values
col1 | col2 | date
1 | 123 | 2013
1 | 124 | 2012
1 | 125 | 2014
2 | 213 | 2011
2 | 214 | 2015
2 | 215 | 2018
so the results I want are:
1 | 125 | 2014
2 | 215 | 2018
I've tried using a few examples which I found on here (as below) as well other group by / distinct / max(date) but with no luck
select t.*
from (select t.*,
row_number() over (partition by col1, col2 order by date desc) as seqnum
from t
) t
where seqnum = 1
Change the partition in the row_number() to only partition by col1 but keep the order by date desc:
select col1, col2, date
from
(
select col1, col2, date,
row_number() over (partition by col1
order by date desc) as rn
from yourtable
) x
where rn = 1
See SQL Fiddle with Demo.
Since you were partitioning by both col1 and col2 you were getting unique values for each row. So it would not return the row with the max date.
I prefer bluefeet's method, but here is an equivalent using MAX:
SELECT t.col1, t.col2, t.date
FROM yourtable t
JOIN (
SELECT col1, MAX(date) maxDate
FROM yourtable
GROUP BY col1
) t2 on t.col1 = t2.col1 AND t.date = t2.maxDate
SQL Fiddle Demo (borrowed from other post)
Select * from yourtable where date in
(select max(date) from tab group by col1);
I'm using this query to find duplicate values in a table:
select col1,
count(col1)
from table1
group by col1
having count (col1) > 1
order by 2 desc;
But also I want to add another column from the same table, like this:
select col1,
col2,
count(col1)
from table1
group by col1
having count (col1) > 1
order by 2 desc;
I get an ORA-00979 error with that second query
How can I add another column in my search?
Your query should be
SELECT * FROM (
select col1,
col2,
count(col1) over (partition by col1) col1_cnt
from table1
)
WHERE col1_cnt > 1
order by 2 desc;
Presumably you want to get col2 for each duplicate of col1 that turns up. You can't really do that in a single query^. Instead, what you need to do is get your list of duplicates, then use that to retrieve any other associated values:
select col1, col2
from table1
where col1 in (select col1
from table1
group by col1
having count (col1) > 1)
order by col2 desc
^ Okay, you can, by using analytic functions, as #rs. demonstrated. For this scenario, I suspect that the nested query will be more efficient, but both should give you the same results.
Based on comments, it seems like you're not clear on why you can't just add the second column. Assume you have sample data that looks like this:
Col1 | Col2
-----+-----
1 | A
1 | B
2 | C
2 | D
3 | E
If you run
select Col1, count(*) as cnt
from table1
group by Col1
having count(*) > 1
then your results will be:
Col1 | Cnt
-----+-----
1 | 2
2 | 2
You can't just add Col2 to this query without adding it to the group by clause because the database will have no way of knowing which value you actually want (i.e. for Col1=1 should the DB return 'A' or 'B'?). If you add Col2 to the group by clause, you get the following:
select Col1, Col2, count(*) as cnt
from table1
group by Col1, Col2
having count(*) > 1
Col1 | Col2 | Cnt
-----+------+----
[no results]
This is because the count is for each combination of Col1 and Col2 (each of which are unique).
Finally, by using either a nested query (as in my answer) or an analytic function (as in #rs.'s answer), you'll get the following result (query changed slightly to return the count):
select t1.col1, t1.col2, cnt
from table1 t1
join (select col1, count(*) as cnt
from table1
group by col1
having count (col1) > 1) t2
on table1.col1 = t2.col1
Col1 | Col2 | Cnt
-----+------+----
1 | A | 2
1 | B | 2
2 | C | 2
2 | D | 2
You should list all selected columns in the group by clause as well.
select col1,
col2,
count(col1)
from table1
group by col1, col2
having count (col1) > 1
order by 2 desc;
Cause of Error
You tried to execute an SQL SELECT statement that included a GROUP BY
function (ie: SQL MIN Function, SQL MAX Function, SQL SUM Function,
SQL COUNT Function) and an expression in the SELECT list that was not
in the SQL GROUP BY clause.
select col1,
col2,
count(col1)
from table1
group by col1,col2
having count (col1) > 1
order by 2 desc;