Rank function in Big Query - google-bigquery

I have below data in a big query table
col1 col2
abc 3/22/2020
abc 3/4/2020
xyz 3/22/2020
xyz 3/4/2020
I am trying to get below output.
col1 col2
abc 3/22/2020
xyz 3/22/2020
For this I have tried using the rank() OVER Partition clause, but no luck. Please advise.
select * from (select col1, col2 RANK() over (partition by col1, col2 order by col1, col2 desc) as r1 from table1) temp
where temp.r1 = 1

You were very close - correct one is (just slight adjustment of your query)
#standardSQL
SELECT * EXCEPT(r1) FROM (
SELECT col1, col2, RANK() OVER (PARTITION BY col1 ORDER BY col2 DESC) AS r1
FROM table1) temp
WHERE r1 = 1
WHile above should work, below is more optimal BigQuery'ish option
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY col2 DESC LIMIT 1)[OFFSET(0)]
FROM table1 t
GROUP BY col1

I mean, I don't know if this is a good practice but I've done this several times.
select col1, MIN(PARSE_DATE('%m-%d-%Y', col2))
group by col1

Related

If two rows have same id but different col2, how can you keep only the ones that have max col3?

I have a table with three columns (id, col2, col3, col4) where col2 is A or B and col3 and col4 are integers. My problem is, there are many columns that have the same id and a different col2 value, and I want to select ONLY the rows that have a maximum value in col3.
For instance, if we have:
id | col2 | col3 | col4
1 | A | 3 | 2
1 | B | 5 | 3
2 | A | 6 | 2
...
I want to keep only the tuple (1, B, 5, 3). How can I achieve this?
I've tried this:
SELECT id, col2, MAX(col3), col4 FROM t GROUP BY id;
but I get an error saying that this is not a valid GROUP BY statement.
You can use keep:
SELECT id,
MAX(col2) KEEP (DENSE_RANK FIRST ORDER BY col3 DESC) as col2
MAX(col3),
MAX(col4) KEEP (DENSE_RANK FIRST ORDER BY col3 DESC) as col4
FROM t
GROUP BY id;
Or:
SELECT id, col2, col3, col4
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY col3 DESC) as seqnum
FROM t
) t
WHERE seqnum = 1;
This query:
select t.*
from tablename t inner join (
select id, max(col3) col3
from tablename
group by id
having count(distinct col2) > 1
) g on g.id = t.id and g.col3 = t.col3
returns for each id that has different values in col2 only 1 row: the one containing the maximum value of col3.
If you also want the other rows where each id does not have different values in col2, then use UNION ALL:
select t.*
from tablename t inner join (
select id, max(col3) col3
from tablename
group by id
having count(distinct col2) > 1
) g on g.id = t.id and g.col3 = t.col3
union all
select t.* from tablename t
where not exists (
select 1 from tablename
where id = t.id and col2 <> t.col2
)
select * from TableName where col3 = (select max(col3) from TableName)

Oracle Group by based on next row value

We are trying to get a group by result by checking the next rows value.
Sample Data:
Table A
COL1 COL2 COL3
---- ---- ----
B BUY 1
B SELL 1.2
B SELL 2
C BUY 3
C SELL 4
C BUY 5
Result:
COL1 COL2 COUNT(1)
---- ---- --------
B BUY 1
B SELL 2
C BUY 1
C SELL 1
C BUY 1
You appear to have ordered by COL3; if this is the case then:
SELECT col1,
col2,
change - COALESCE( LAG( change ) OVER ( PARTITION BY col1 ORDER BY change ), 0 )
AS cnt
FROM (
SELECT col1,
col2,
CASE LEAD( col2 ) OVER ( PARTITION BY col1 ORDER BY col3 )
WHEN col2
THEN NULL
ELSE ROW_NUMBER() OVER ( PARTITION BY col1 ORDER BY col3 )
END AS change
FROM a
)
WHERE change IS NOT NULL;
If I understand correctly, you can do this with a difference of row numbers approach:
select col1, col2, count(*)
from (select t.*,
row_number() over (partition by col1 order by col3) as seqnum,
row_number() over (partition by col1, col2 order by col3) as seqnum_2,
from t
) t
group by col1, col2, (seqnum - seqnum_2);
This identifies groups of adjacent col2 values based on the ordering in col3.

SQL Server : get max of the column2 and column3 value must be 1

I have an output of some part of my stored proedure like this:
col1 col2 col3 col4
--------------------------
2016-05-05 1 2 2
2016-05-05 1 3 32
2016-05-12 2 1 11
2016-05-12 3 1 31
Now I need to get result based on this condition
col2 = 1 and col3 = max or col3 = 1
and col2 = max
The final result should be
col1 col2 col3 col4
-------------------------
2016-05-05 1 3 32
2016-05-12 3 1 31
Not sure if thats the most efficient way , but you can use ROW_NUMBER() :
SELECT * FROM (
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.col1 ORDER BY t.col3 DESC) as rnk,
WHERE t.col2 = 1
UNION ALL
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.col1 ORDER BY t.col2 DESC) as rnk,
WHERE t.col3 = 1) tt
WHERE rnk = 1
This will give you all the records with
(col2=1 and col3=max) or (col3=1 and col2=max)
This is a bit tricky. Your data has no ambiguities, such as duplicate maximuma in col4 or "1" values in both col2 and col3.
The following is a direct translation of the logic in your question:
select t.*
from t
where t.col4 = (select max(t2.col4)
from t t2
where t2.col1 = t.col1 and (t2.col2 = 1 or t2.col3 = 1)
);
Try this. Note if there are more than 1 same max value, then you need all of those in output. And it will work for all scenarios, even when col1 is not in sync with col2 and col3.
I am first finding highest values of col2 and col3 and assigning them value as 1. Then in outer query, I am using your join condition. Demo created for Postgres DB as SQLServer wasn't available.
SQLFiddle Demo
select col1,col2,col3,col4
from
(
select t.*,
RANK() OVER(ORDER BY col3 DESC) as col3_max,
RANK() OVER(ORDER BY col2 DESC) as col2_max
from your_table t
) t1
where
(col2=1 and col3_max=1)
OR
(col3=1 and col2_max=1)
Alternative way:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY iif(col2 = 1, col3, col2) DESC) as r
FROM tbl) t
WHERE r = 1

Getting the value of no grouping column

I know the basics in SQL programming and I know how to apply some tricks in SQL Server in order to get the result set, but I don't know all tricks in Oracle.
I have these columns:
col1 col2 col3
And I wrote this query
SELECT
col1, MAX(col3) AS mx3
FROM
myTable
GROUP BY
col1
And I need to get the value of col2 in the same row where I found the max value of col3, do you know some trick to solve this problem?
The easiest way to do this, IMHO, is not to use max, but the window function rank:
SELECT col1 , col2, col3
FROM (SELECT col1, col2, col3,
RANK() OVER (PARTITION BY col1 ORDER BY col3 DESC) rk
FROM myTable) t
WHERE rk = 1
BTW, the same syntax should also work for MS SQL-Server and most other modern databases, with MySQL being the notable exception.
A couple of different ways to do this:
In both cases I'm treating your initial query as either a common table expression or as an inline view and joining it back to the base table to get your added column. The trick here is that the INNER JOIN eliminates all the records not in your max query.
SELECT A.*,
FROM myTable A
INNER JOIN (SELECT col1 , MAX( col3 ) AS mx3 FROM myTable GROUP BY col1) B
on A.Col1=B.Col1
and B.mx3 = A.Col3
or
with CTE AS (SELECT col1 , MAX( col3 ) AS mx3 FROM myTable GROUP BY col1)
SELECT A.*
FROM MyTable A
INNER JOIN CTE
on A.col1 = B.Col1
and A.col3= cte.mx3
Here's an alternative that's just a slight extension of your existing group by query (ie. doesn't require querying the same table more than once):
with mytable as (select 1 col1, 1 col2, 1 col3 from dual union all
select 1 col1, 2 col2, 2 col3 from dual union all
select 1 col1, 1 col2, 3 col3 from dual union all
select 1 col1, 3 col2, 3 col3 from dual union all
select 2 col1, 10 col2, 1 col3 from dual union all
select 2 col1, 23 col2, 2 col3 from dual union all
select 2 col1, 12 col2, 2 col3 from dual)
SELECT
col1,
MAX(col2) keep (dense_rank first order by col3 desc) mx2,
MAX(col3) AS mx3
FROM
myTable
GROUP BY
col1;
COL1 MX2 MX3
---------- ---------- ----------
1 3 3
2 23 2

Oracle equivalent of Postgres' DISTINCT ON?

In postgres, you can query for the first value of in a group with DISTINCT ON. How can this be achieved in Oracle?
From the postgres manual:
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of
each set of rows where the given expressions evaluate to equal. The
DISTINCT ON expressions are interpreted using the same rules as for
ORDER BY (see above). Note that the "first row" of each set is
unpredictable unless ORDER BY is used to ensure that the desired row
appears first.
For example, for a given table:
col1 | col2
------+------
A | AB
A | AD
A | BC
B | AN
B | BA
C | AC
C | CC
Ascending sort:
> select distinct on(col1) col1, col2 from tmp order by col1, col2 asc;
col1 | col2
------+------
A | AB
B | AN
C | AC
Descending sort:
> select distinct on(col1) col1, col2 from tmp order by col1, col2 desc;
col1 | col2
------+------
A | BC
B | BA
C | CC
The same effect can be replicated in Oracle either by using the first_value() function or by using one of the rank() or row_number() functions.
Both variants also work in Postgres.
first_value()
select distinct col1,
first_value(col2) over (partition by col1 order by col2 asc)
from tmp
first_value gives the first value for the partition, but repeats it for each row, so it is necessary to use it in combination with distinct to get a single row for each partition.
row_number() / rank()
select col1, col2 from (
select col1, col2,
row_number() over (partition by col1 order by col2 asc) as rownumber
from tmp
) foo
where rownumber = 1
Replacing row_number() with rank() in this example yields the same result.
A feature of this variant is that it can be used to fetch the first N rows for a given partition (e.g. "last 3 updated") simply by changing rownumber = 1 to rownumber <= N.
If you have more than two fields then use beerbajays answer as a sub query (note in DESC order):
select col1,col2, col3,col4 from tmp where col2 in
(
select distinct
first_value(col2) over (partition by col1 order by col2 DESC) as col2
from tmp
--WHERE you decide conditions
)