Better way to do corelated query having count in condition in AWS Athena sql - sql

There are two table A and B. Table A has one to many relationship with B.
I want to fetch records from A and corresponding one single record from B (if B has one record),
If there is multiple record in Table B then find the one having status ='Active' find first.
Below is the query, running in oracle, but we want the same functionality running in AWS Athena, however correlated query is not supported in AWS athena sql. Athena supports ANSI Sql.
SELECT b.*
FROM A a ,B b
WHERE a.instruction_id = b.txn_report_instruction_id AND b.txn_report_instruction_id IN
(SELECT b2.txn_report_instruction_id FROM B b2
WHERE b2.txn_report_instruction_id=b.txn_report_instruction_id
GROUP BY b2.txn_report_instruction_id
HAVING COUNT(b2.txn_report_instruction_id)=1
)
UNION
SELECT * FROM
(SELECT b.*
FROM A a , B b
WHERE a.instruction_id = b.txn_report_instruction_id AND b.txn_report_instruction_id IN
(SELECT b2.txn_report_instruction_id
FROM B b2
WHERE b2.txn_report_instruction_id=b.txn_report_instruction_id
AND b2.status ='ACTIVE'
GROUP BY b2.txn_report_instruction_id
HAVING COUNT(b2.txn_report_instruction_id)> 1
)
)
We need to put all the field in select or in aggregate function when using group by so group by not preferable.
A help would be much appreciated.
[]
2
Output result table

Joining the best row can be achieved with a lateral join.
select *
from a
outer apply
(
select *
from b
where b.txn_report_instruction_id = a.instruction_id
order by case when b.status = 'ACTIVE' then 1 else 2 end
fetch first row only
) bb;
Another option is a window function:
select *
from a
left join
(
select
b.*,
row_number() over (partition by txn_report_instruction_id
order by case when status = 'ACTIVE' then 1 else 2 end) as rn
from b
) bb on bb.txn_report_instruction_id = a.instruction_id and bb.rn = 1;
I don't know about amazon athena's SQL coverage. This is all standard SQL, however, except for OUTER APPLY I think. If I am not mistaken, the SQL standard requires LEFT OUTER JOIN LATERAL (...) ON ... instead, for which you need a dummy ON clause, such as ON 1 = 1. So if above queries fail, there is another option for you :-)

Related

Query Efficiency in Redshift

I have a question about query efficiency in Redshift. I have two sample queries as below
Query A
select a.*, b.*
from a
left outer join b
on a.id=b.id
where a.market_id = 1
and a.dataset_date = to_date('{RUN_DATE_YYYY-MM-DD}', 'YYYY-MM-DD')
and b.market_id = 1
and b.user_group in ('X');
Query B
select a.*, b.*
from (
select *
from a
where marketplace_id = 1
and dataset_date = to_date('{RUN_DATE_YYYY-MM-DD}', 'YYYY-MM-DD')
)a
left outer join
(select *
from b
where market_id = 1
and user_group in ('X')
)b
on a.id=b.id
where b.market_id = 1
and b.user_group in ('X');
I used to think that Query B would be more efficient as the dataset used were filtered and smaller. But recently I found that Redshift would do some query optimization automatically before it runs each query. In this case, the efficiency of Query A and B should be very close, and Query A is easier to be understood and maintained.
I'm not sure about the Redshift optimization thing, so post this question here. Really appreciate for any knowledge shared!

Query left join without all the right rows from B table

I have 2 tables, A and B.
I need all columns from A + 1 column from B in my select.
Unfortunately, B has multiples rows(all identicals) for 1 row in A
on the join condition.
I tried but I can't isolate one row in A for one row in B with left join for example while keeping my select.
How can I do this query ? Query in ORACLE SQL
Thanks in advance.
This is a good use for outer apply. The structure of the query looks like this:
select a.*, b.col
from a outer apply
(select top 1 b.col
from b
where b.? = a.?
) b;
Normally, you would only use top 1 with order by. In this case, it doesn't seem to make a difference which row you choose.
You can group by on all columns from A, and then use an aggregate (like max or min) to pick any of the identical B values:
select a.*
, b.min_col1
from TableA a
left join
(
select a_id
, min(col1) as min_col1
from TableB
group by
a_id
) b
on b.a_id = a.id

Efficient way to check if row exists for multiple records in postgres

I saw answers to a related question, but couldn't really apply what they are doing to my specific case.
I have a large table (300k rows) that I need to join with another even larger (1-2M rows) table efficiently. For my purposes, I only need to know whether a matching row exists in the second table. I came up with a nested query like so:
SELECT
id,
CASE cnt WHEN 0 then 'NO_MATCH' else 'YES_MATCH' end as match_exists
FROM
(
SELECT
A.id as id, count(*) as cnt
FROM
A, B
WHERE
A.id = B.foreing_id
GROUP BY A.id
) AS id_and_matches_count
Is there a better and/or more efficient way to do it?
Thanks!
You just want a left outer join:
SELECT
A.id as id, count(B.foreing_id) as cnt
FROM A
LEFT OUTER JOIN B ON
A.id = B.foreing_id
GROUP BY A.id

How to select records from a Table that has a certain number of rows in a related table in SQL Server?

Not quite sure how to ask this, but I have 2 tables that are related in a 1 to many relationship, I need to select all records in the "1" table that have less than three records in the "many' table.
select b.foreignkey,count(b.foreignkey) as bidcount
from b
where b.foreignkey in (select a.id from a) and bidcount< 3
group by b.foreignkey
this doesn't work at all I know but I am at a loss how to do this.
I need to in the end select all the records from the "a" table based on this criteria. Sorry if that is confusing!
Just using your code, not tested:
SELECT
b.foreignkey,
count(b.foreignkey) as bidcount
FROM
b
WHERE
b.foreignkey IN (SELECT a.id FROM a)
GROUP BY
b.foreignkey
HAVING
count(b.foreignkey) < 3
Try this:
SELECT t1.id,COUNT(t2.parentId)
FROM table1 as t1
INNER JOIN table2 as t2
ON t1.id = t2.parentId
GROUP BY t1.id
HAVING COUNT(t2.parentId) < 3
You didn't mention which version of SQL Server you're using - if you're on SQL Server 2005 or newer, you could use this CTE (Common Table Expression):
;WITH ChildRows AS
(
SELECT A.Id, COUNT(b.Id) AS 'BCount'
FROM
dbo.TableA A
INNER JOIN
dbo.TableB B ON B.TableAId = A.Id
)
SELECT A.*, R.BCount
FROM dbo.TableA A
INNER JOIN ChildRows R ON A.Id = R.Id
The inner SELECT lists the Id columns from TableA and the count of the child rows associated with those (using the INNER JOIN to TableB) - and the outer SELECT just builds on top of that result set and shows all fields from table A (and the count from the B table)
if you want to return all fields of your (1) table in one query, I suggest you consider using CROSS APPLY:
SELECT t1.* FROM table_1 t1
CROSS APPLY (SELECT COUNT(*) cnt FROM Table_Many t2 WHERE t2.fk = t1.pk) a
where a.cnt < 3
in some particular cases, based on your indices and db structure, this query may run 4 times faster than the GROUP BY method
you have posted this question in sql server, I have a answer in oracle database system (don't know whether it will run in sql server as well or not)
this is as follow-
select [desired column list] from
(select b.*, count(*) over (partition by b.foreignkey) c_1
from b
where b.foreignkey in (select a.id from a) )
where c_1 < 3 ;
i hope it should work on sql server as well...
if not please let me update ..

Querying a table finding if child table's matching records exist in ANSI SQL

I have two tables A and B where there is one-to-many relationship.
Now I want some records from A and with this existence field that shows if B has any matching records. I don't want to use the count function as B has too many records that delays SQL execution. Either I don't want to use proprietary keywords like rownum of Oracle like below, as I need as much compatibility as possible.
select A.*, (
select 1 from B where ref_column = A.ref_column and rownum = 1
) existence
...
You would use left join + count anyway, select statement in select list can be executed multiple times while join will be done only once.
Also you can consider EXISTS:
select A.*, case when exists (
select 1 from B where ref_column = A.ref_column and rownum = 1
) then 1 else 0 end
Use an EXISTS clause. If the foreign key in B is indexed, performance should not be an issue.
SELECT *
FROM a
WHERE EXISTS (SELECT 1 FROM b WHERE b.a_id = a.id)