BigQuery reproducing a query that use another table in a subselect - sql

I am blocked on reproducing in BigQuery a query that is similar to the following one on MSSQL :
SELECT
COL1,
COL2, COL3,
CASE
WHEN ( COL1 % 2 ) = 0 THEN COL2
ELSE (SELECT TOP 1 COL99 FROM ANOTHER_TABLE AS AT WHERE AT.COL8 = T.COL2 AND AT.COL9 < T.COL3 ORDER BY AT.COL9 DESC)
END AS COL4
FROM TABLE AS T
First, I tried to reproduce the query on BQ like the following :
SELECT
COL1,
COL2, COL3,
CASE
WHEN ( COL1 % 2 ) = 0 THEN COL2
ELSE (SELECT COL99 FROM PROJECT.DATASET.ANOTHER_TABLE AS AT WHERE AT.COL8 = T.COL2 AND AT.COL9 < T.COL3 ORDER BY AT.COL9 DESC LIMIT 1)
END AS COL4
FROM PROJECT.DATASET.TABLE AS T
But it leads to the error :
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
I can understand this error, I agree that the original query is not very optimized since a subselect can be executed for every rows in the table.
Knowing that I tried the following which doesn't lead to an error but give wrong (too much) results :
SELECT
COL1,
COL2, COL3,
CASE
WHEN ( COL1 % 2 ) = 0 THEN COL2
ELSE AT.COL99
END AS COL4
FROM PROJECT.DATASET.TABLE AS T
LEFT JOIN (
SELECT * FROM (
SELECT
COL99,
COL8,
COL9
ROW_NUMBER() OVER (PARITION BY COL8 ORDER BY COL9 DESC) AS rn
) AS TMP
/*WHERE TMP.rn = 1*/
) AS AT
ON AT.COL8 = T.COL2
AND AT.COL9 < T.COL3
This query returns more rows than expected which is normal knowing the condition "AND AT.COL9 < T.COL3", but I have difficulties to find out how to take the minimum ROW_NUMBER value (rn) to reproduce the TOP 1 of the original query.
I tried to put TMP.rn = 1 in the AT table, but the problem is that it is not always the first value that respect the condition AND AT.COL9 < T.COL3.
To resume, my goal is to be able to reproduce the first query at the top of this question on BigQuery, I've tried something but I am blocking on the how to take the minimum value of ROW_NUMBER (rn) matching the condition AND AT.COL9 < T.COL3.
Did anyone had a similar use case by any chance ?
Edit : Adding input and output :
TABLE AS T :
COL1
COL2
COL3
1234
AAA
25/12/2022
1235
BBB
25/12/2022
1236
CCC
25/12/2022
1337
AAA
24/12/2022
1238
AAA
23/12/2022
1239
AAA
22/12/2022
ANOTHER TABLE AS AT
COL99
COL8
COL9
1111
AAA
25/12/2022
2222
BBB
25/12/2022
3333
CCC
25/12/2022
9999
AAA
23/12/2022
8888
AAA
22/12/2022
7777
AAA
21/12/2022
Expected output
COL1
COL2
COL3
COL4
1234
AAA
25/12/2022
AAA
1235
BBB
25/12/2022
NULL
1236
CCC
25/12/2022
CCC
1237
AAA
24/12/2022
9999
1238
AAA
23/12/2022
AAA
1239
AAA
22/12/2022
7777

You can use FIRST_VALUE() window function:
SELECT DISTINCT T.COL1, T.COL2, T.COL3,
CASE
WHEN T.COL1 % 2 = 0 THEN T.COL2
ELSE FIRST_VALUE(AT.COL99) OVER (PARTITION BY T.COL1, T.COL2, T.COL3 ORDER BY AT.COL9 DESC)
END AS COL4
FROM FIRST_TABLE AS T LEFT JOIN ANOTHER_TABLE AS AT
ON AT.COL8 = T.COL2 AND AT.COL9 < T.COL3 AND T.COL1 % 2 <> 0;
If COL1 is unique in the first table, you can simplify the PARTITION BY clause to:
OVER (PARTITION BY T.COL1 ORDER BY AT.COL9 DESC)
See the demo (for MySql but it is standard SQL).

The query provided by #forpas returns good results in my example but does not return the result I am waiting for in my real use case.
But #forpas's idea inspired me and I found a way to resolve my problem.
It gives the same result in the link provided by #forpas and the query looks like this in MySQL :
SELECT T.COL1, T.COL2, T.COL3,
CASE
WHEN T.COL1 % 2 = 0 THEN T.COL2
ELSE AT1.COL99
END AS COL4
FROM FIRST_TABLE AS T
LEFT JOIN (
SELECT * FROM (
SELECT
AT.COL99,
T.COL2,
T.COL3,
ROW_NUMBER() OVER (PARTITION BY T.COL3, T.COL2, AT.COL8 ORDER BY AT.COL9 DESC) AS COUNTER
FROM ANOTHER_TABLE AS AT
INNER JOIN FIRST_TABLE AS T
ON AT.COL8 = T.COL2 AND AT.COL9 < T.COL3) TEMP
WHERE TEMP.COUNTER = 1
) AS AT1
ON AT1.COL2 = T.COL2 AND AT1.COL3 = T.COL3 ;
The query might be complex for nothing and if someone has something more optimized I would be happy to try it.
Thank you #forpas for the proposal !

Related

How to get Distinct value for a column on the basis of other column in Oracle

I want to get the distinct values from COL1 and it's COL3 value also but the condition is if COL1 = COl2 then it should pick the matching COL3 value otherwise pick the COL1 value if they are not same. I'm stuck in the logic, any help will be appreciated!
Please see the below image for more detail:
select DISTINCT COL1,
CASE WHEN COL1 = COL2 THEN COL3 END COL3 from TABLE1
WHERE COL1 IS NOT NULL;
Do a GROUP BY to get distinct COL1 values.
Use COALESCE() to return the COL3 value if there exists a COL1 = COL2 row, otherwise return the max COL3 value for the COL1. (Could use MIN() too, if that's better.)
select COL1,
COALESCE( MAX(CASE WHEN COL1 = COL2 THEN COL3 END), MAX(COL3) )
FROM table1
WHERE COL1 IS NOT NULL
GROUP BY COL1
use correlated subquery
select col1,col3
from TABLE1 a
where col2 in (select min(col2) from table1 b where a.col1=b.col1)
select distinct COL1, if(COL1 = COL2, COL3, COL1) as result
from table1
I think that you can join the table with itself and then use a join conditio to filter that out, then decide in select wether there was COL2 = COL1 and choose appropriate COL3:
SELECT DISTINCT a.COL1, CASE WHEN b.COL1 IS NULL THEN a.COL3 ELSE b.COL3 END as COL3
FROM TABLE1 a
LEFT JOIN TBALE2 b
on a.COL1 = b.COL2
and a.COL1 = b.COL1
This way you have on table a all the data, and on table b data if and only if COL1 matches with COL2. Then you select whichever COL3 is not null, prefarably the one from table b. There is Oracle function coalesce that does just that.
With a self join:
select distinct
t.col1,
case
when tt.col1 is null then t.col3
else tt.col3
end col3
from tablename t left join tablename tt
on tt.col1 = t.col1 and tt.col2 = t.col1
See the demo.
Results:
> COL1 | COL3
> ---: | :---
> 11 | ABC
> 12 | ABC
> 13 | BDG
> 14 | DEF
> 15 | CEG

How to get previous row data in sql server

I would like to get the data from previous row. I have used LAG function but did not get the expected result.
Table:-
col1 col2 col3
ABCD 1 Y
ABCD 2 N
ABCD 3 N
EFGH 4 N
EFGH 5 Y
EFGH 6 N
XXXX 7 Y
Expected result
col1 col2 col3 col4
ABCD 1 A NULL
ABCD 2 B A
ABCD 3 C B
EFGH 4 A NULL
EFGH 5 B A
EFGH 6 E B
XXXX 7 F NULL
Col4 should hold the data from previous row grouping by the value in Col1.
Please let me know how can this be achieved.
Use lag() function
select *, lag(col3) over (partition by col1 order by col2) as col4
from table t;
However You can also use subquery if your SQL doesn't have LAG()
select *,
(select top 1 col3
from table
where col1 = t.col1 and col2 < t.col2
order by col2 desc
) as col4
from table t;
Assuming SQL Server 2012 or newer...
SELECT
*,
LAG(col3) OVER (PARTITION BY col1 ORDER BY col2) AS col4
FROM
yourTable
If you're on SQL Server 2008 or older...
SELECT
*,
(
SELECT TOP(1) previous.col3
FROM yourTable AS previous
WHERE previous.col1 = yourTable.col1
AND previous.col2 < yourTable.col2
ORDER BY previous.col2 DESC
)
AS col4
FROM
yourTable
If you are on 2008 or earlier, try this:
select t1.col1, t1.col2, t1.col3, t2.col3 as col4
from table1 t1
left join table1 t2 on t1.col1 = t2.col1 and t1.col2 - 1 = t2.col2
the lag() function is the bee's knees, though. Use that, if you can.
Thank you all for the replies. By using the lag function with partition I got the expected result. I missed to used partition previously and due to that I was getting wrong results.

SQL Server : get max of the column2 and column3 value must be 1

I have an output of some part of my stored proedure like this:
col1 col2 col3 col4
--------------------------
2016-05-05 1 2 2
2016-05-05 1 3 32
2016-05-12 2 1 11
2016-05-12 3 1 31
Now I need to get result based on this condition
col2 = 1 and col3 = max or col3 = 1
and col2 = max
The final result should be
col1 col2 col3 col4
-------------------------
2016-05-05 1 3 32
2016-05-12 3 1 31
Not sure if thats the most efficient way , but you can use ROW_NUMBER() :
SELECT * FROM (
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.col1 ORDER BY t.col3 DESC) as rnk,
WHERE t.col2 = 1
UNION ALL
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.col1 ORDER BY t.col2 DESC) as rnk,
WHERE t.col3 = 1) tt
WHERE rnk = 1
This will give you all the records with
(col2=1 and col3=max) or (col3=1 and col2=max)
This is a bit tricky. Your data has no ambiguities, such as duplicate maximuma in col4 or "1" values in both col2 and col3.
The following is a direct translation of the logic in your question:
select t.*
from t
where t.col4 = (select max(t2.col4)
from t t2
where t2.col1 = t.col1 and (t2.col2 = 1 or t2.col3 = 1)
);
Try this. Note if there are more than 1 same max value, then you need all of those in output. And it will work for all scenarios, even when col1 is not in sync with col2 and col3.
I am first finding highest values of col2 and col3 and assigning them value as 1. Then in outer query, I am using your join condition. Demo created for Postgres DB as SQLServer wasn't available.
SQLFiddle Demo
select col1,col2,col3,col4
from
(
select t.*,
RANK() OVER(ORDER BY col3 DESC) as col3_max,
RANK() OVER(ORDER BY col2 DESC) as col2_max
from your_table t
) t1
where
(col2=1 and col3_max=1)
OR
(col3=1 and col2_max=1)
Alternative way:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY iif(col2 = 1, col3, col2) DESC) as r
FROM tbl) t
WHERE r = 1

Query to get the rows in a table where value in column 1 is mapped to exactly one value in column 2

I have a table T1 like below:
COL1 COL2
---------------
aaa 10
bbb 20
bbb 20
bbb 10
ccc 30
ccc 30
aaa 30
ddd 30
I want col1 and col2 values where col1 is mapped to only one col2.
COL1 COL2
-----------
ccc 30
ddd 30
Please let me know how to achieve my goal.
I tried with the following to get required result set:
select distinct col1, col2
from t1
where col1 in (select col1
from (select distinct col1, col2 from t1)
group by col1
having count(col2) = 1);
What are the other options without having those many inner queries.
Thanks In Advance.
select Col1
, max(Col2) as Col2
from YourTable
group by
Col1
having count(distinct Col2) = 1
The having clause makes sure there's only one Col2 in a single group. You can display it using max, min or even avg.
See it working at SQL Fiddle (thanks to Amit Singh.)
Select Distinct A.Col1,A.Col2 from Table1 A
inner join
(Select Col1,Count(Distinct Col2) as col3 from Table1 group by Col1) B on
A.Col1=B.Col1 and B.Col3=1
Sql Fiddle

SQL Server - Query to return groups with multiple distinct records

My table:
Col1 Col2
1 xyz
1 abc
2 abc
3 yyy
4 zzz
4 zzz
I have a table with two columns. I want to query for records where col1 has more than one DISTINCT col2 values. In the example table given above, the query should return records for col1 with value "1".
Expected query result:
Col1 Col2
1 xyz
1 abc
SELECT *
FROM tableName
WHERE Col1 IN
(
SELECT Col1
FROM tableName
GROUP BY Col1
HAVING COUNT(DISTINCT col2) > 1
)
SQLFiddle Demo
select t.col1, t.col2
from (
select col1
from tbl
group by col1
having MIN(col2) <> MAX(col2)
) x
join tbl t on t.col1 = c.col1