SQL : matching two tables with two possibles conditions - sql

is there a way in SQL while we join two tables table_A and table_B, if we can’t match the two tables on a criteria said criteria_X at all we will try the second criteria criteria_Y
Something like this:
select *
from table_A, table_B
where table_A.id = table_B.id2
and (if there is no row where table_B.criteria_X = X then try table_B.criteria_Y = Y)
The following query is not a solution:
..
and (table_B.criteria_X = X OR table_B.criteria_Y = Y)
Thanks

This is a find the best match query:
select *
from
(
select *,
row_number() -- based on priority, #1 criteria_X, #2 criteria_Y
over (partition by table_A.id
order by case when table_B.criteria_X = X then 1
else 2
end) as best_match
from table_A, table_B
where table_A.id = table_B.id2
and (table_B.criteria_X = X OR table_B.criteria_Y = Y)
) dt
where best_match = 1
If the ORed condition results in loosing indexed access you might try splitting it into two UNION ALL selects.

A typical method uses left join twice . . . once for each criterion. Then then uses coalesce() in the select. And, with indexes on the join keys, this also should have very good performance:
select a.*, coalesce(b1.colx, b2.colx)
from table_A a left join
table_B b1
on a.id = b1.id2 and b1.criteria_X = X left join
table_B b2
on a.id = b1.id2 and b2.criteria_Y = Y
where b1.id2 is not null or b2.id2 is not null;
The where clause ensures that at least one row matches.
This does not work under all circumstances -- in particular, each join needs to return only 0 or 1 matching rows. This is often the situation with this type of "priority" joins.
An alternative version uses row_number(). This is sort of similar to #dnoeth's approach, but the row number calculation is done before the join:
select a.*, coalesce(b1.colx, b2.colx)
from table_A a join
(select b.*,
row_number() over (partition by id2
order by (case when criteria_x = X then 1
when criteria_y = Y then 2
end)
) as seqnum
from table_B b
where criteria_x = X or criteria_y = Y
) b
on a.id = b.id2 and seqnum = 1

Related

Better way to do corelated query having count in condition in AWS Athena sql

There are two table A and B. Table A has one to many relationship with B.
I want to fetch records from A and corresponding one single record from B (if B has one record),
If there is multiple record in Table B then find the one having status ='Active' find first.
Below is the query, running in oracle, but we want the same functionality running in AWS Athena, however correlated query is not supported in AWS athena sql. Athena supports ANSI Sql.
SELECT b.*
FROM A a ,B b
WHERE a.instruction_id = b.txn_report_instruction_id AND b.txn_report_instruction_id IN
(SELECT b2.txn_report_instruction_id FROM B b2
WHERE b2.txn_report_instruction_id=b.txn_report_instruction_id
GROUP BY b2.txn_report_instruction_id
HAVING COUNT(b2.txn_report_instruction_id)=1
)
UNION
SELECT * FROM
(SELECT b.*
FROM A a , B b
WHERE a.instruction_id = b.txn_report_instruction_id AND b.txn_report_instruction_id IN
(SELECT b2.txn_report_instruction_id
FROM B b2
WHERE b2.txn_report_instruction_id=b.txn_report_instruction_id
AND b2.status ='ACTIVE'
GROUP BY b2.txn_report_instruction_id
HAVING COUNT(b2.txn_report_instruction_id)> 1
)
)
We need to put all the field in select or in aggregate function when using group by so group by not preferable.
A help would be much appreciated.
[]
2
Output result table
Joining the best row can be achieved with a lateral join.
select *
from a
outer apply
(
select *
from b
where b.txn_report_instruction_id = a.instruction_id
order by case when b.status = 'ACTIVE' then 1 else 2 end
fetch first row only
) bb;
Another option is a window function:
select *
from a
left join
(
select
b.*,
row_number() over (partition by txn_report_instruction_id
order by case when status = 'ACTIVE' then 1 else 2 end) as rn
from b
) bb on bb.txn_report_instruction_id = a.instruction_id and bb.rn = 1;
I don't know about amazon athena's SQL coverage. This is all standard SQL, however, except for OUTER APPLY I think. If I am not mistaken, the SQL standard requires LEFT OUTER JOIN LATERAL (...) ON ... instead, for which you need a dummy ON clause, such as ON 1 = 1. So if above queries fail, there is another option for you :-)

Subqueries vs Multi Table Join

I've 3 tables A, B, C. I want to list the intersection count.
Way 1:-
select count(id) from A a join B b on a.id = b.id join C c on B.id = C.id;
Result Count - X
Way 2:-
SELECT count(id) FROM A WHERE id IN (SELECT id FROM B WHERE id IN (SELECT id FROM C));
Result Count - Y
The result count in each of the query is different. What exactly is wrong?
A JOIN can multiply the number of rows as well as filtering out rows.
In this case, the second count should be the correct one because nothing is double counted -- assuming id is unique in a. If not, it needs count(distinct a.id).
The equivalent using JOIN would use COUNT(DISTINCT):
select count(distinct a.id)
from A a join
B b
on a.id = b.id join
C c
on B.id = C.id;
I mention this for completeness but do not recommend this approach. Multiplying the number of rows just to remove them using distinct is inefficient.
In many databases, the most efficient method might be:
select count(*)
from a
where exists (select 1 from b where b.id = a.id) and
exists (select 1 from c where c.id = a.id);
Note: This assumes there are indexes on the id columns and that id is unique in a.

Select table as json array

I have three tables in PostgreSQL: A, B, C.
I want to get a row from table A with a specific id, plus all records from tables B and C with matching id as aggregated JSON.
For example:
Table A Table B Table C
---------------------------------------------------------------
id / colum1 / colum2 id/ colum 1 id / column1
1 someValue, somValue 1 someVal1 1 someVal1
1 someVal2 1 someVal2
The expected output for id = 1 would be:
a.column1 a.column2 ARRAY_JSON_B ARRAY_JSON_C
------------------------------------------------------------------------------
someValue someValue [{colum1:'someVal1'}, [{colum1:'someVal1'},
{colum1:'someVal2'}] {colum1:'someVal2'}]
This requires Postgres 9.3 or later.
Simple case
I suggest to use the simpler json_agg() that's meant for this purpose, in LATERAL joins:
SELECT *
FROM a
LEFT JOIN LATERAL (SELECT json_agg(b) AS array_json_b FROM b WHERE id = a.id) b ON true
LEFT JOIN LATERAL (SELECT json_agg(c) AS array_json_c FROM c WHERE id = a.id) c ON true
WHERE id = 1;
LEFT JOIN LATERAL ... ON true keeps rows in the result that have no match on the left side of the join. Details:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Subtle difference: This query returns NULL where no match is found in b or c, #stas' query with correlated subqueries returns an empty array instead. May or may not be important.
Actual answer
Your example in the question excludes the redundant id column in b and c from the result - which makes sense. To achieve this, you can't use #stas' simple correlated subquery. While it would still work for a single column instead of the whole row, it would lose the column name and produce a simple array. Also, it would not work for more than one column.
Use json_object_agg() for a single selected column (which also allows to chose the tag name freely):
SELECT *
FROM a
LEFT JOIN LATERAL (
SELECT json_object_agg('colum1', colum1) AS array_json_b
FROM b WHERE id = a.id
) b ON true
LEFT JOIN LATERAL (
SELECT json_object_agg('colum1', colum1) AS array_json_c
FROM c WHERE id = a.id
) c ON true
WHERE id = 1;
Or use a subselect for any selection (col1 and col2 in this example):
SELECT *
FROM a
LEFT JOIN LATERAL (
SELECT json_agg(x) AS array_json_b
FROM (SELECT col1, col2 FROM b WHERE id = a.id) x
) b ON true
LEFT JOIN LATERAL (
SELECT json_agg(x) AS array_json_c
FROM (SELECT col1, col2 FROM c WHERE id = a.id) x
) c ON true
WHERE id = 1;
Related:
Return multiple columns of the same row as JSON array of objects
How do I return a jsonb array and array of objects from my data?
select
a.*,
to_json(array(select b from b where b.id = a.id)) array_json_b,
to_json(array(select c from c where c.id = a.id)) array_json_c
from a
where
a.id = 1;
I hope your Postgresql version is 9.3 or higher. There is a clever function to_json which can convert anything to json. So we take an array of all related rows from b and convert it. Same with c.

one to one distinct restriction on selection

I encountered a problem like that. There are two tables (x value is ordered so that
in a incremental trend !)
Table A
id x
1 1
1 3
1 4
1 7
Table B
id x
1 2
1 5
I want to join these two tables:
1) on the condition of the equality of id and
2) each row of A should be matched only to one row of B, vice verse (one to one relationship) based on the absolute difference of x value (small difference row has
more priority to match).
Only based on the description above it is not a clear description because if two pairs of row which share a common row in one of the table have the same difference, there is no way to decide which one goes first. So define A as "Main" table, the row in table A with smaller line number always go first
Expected result of demo:
id A.x B.x abs_diff
1 1 2 1
1 4 5 1
End of table(two extra rows in A shouldn't be considered, because one to one rule)
I am using PostgreSQL so the thing I have tried is DISTINCT ON, but it can not solve.
select distinct on (A.x) id,A.x,B.x,abs_diff
from
(A join B
on A.id=B.id)
order by A.x,greatest(A.x,B.x)-least(A.x,B.x)
Do you have any ideas, it seems to be tricky in plain SQL.
Try:
select a.id, a.x as ax, b.x as bx, x.min_abs_diff
from table_a a
join table_b b
on a.id = b.id
join (select a.id, min(abs(a.x - b.x)) as min_abs_diff
from table_a a
join table_b b
on a.id = b.id
group by a.id) x
on x.id = a.id
and abs(a.x - b.x) = x.min_abs_diff
fiddle: http://sqlfiddle.com/#!15/ab5ae/5/0
Although it doesn't match your expected output, I think the output is correct based on what you described, as you can see each pair has a difference with an absolute value of 1.
Edit - Try the following, based on order of a to b:
select *
from (select a.id,
a.x as ax,
b.x as bx,
x.min_abs_diff,
row_number() over(partition by a.id, b.x order by a.id, a.x) as rn
from table_a a
join table_b b
on a.id = b.id
join (select a.id, min(abs(a.x - b.x)) as min_abs_diff
from table_a a
join table_b b
on a.id = b.id
group by a.id) x
on x.id = a.id
and abs(a.x - b.x) = x.min_abs_diff) x
where x.rn = 1
Fiddle: http://sqlfiddle.com/#!15/ab5ae/19/0
One possible solution for your currently ambiguous question:
SELECT *
FROM (
SELECT id, x AS a, lead(x) OVER (PARTITION BY grp ORDER BY x) AS b
FROM (
SELECT *, count(tbl) OVER (PARTITION BY id ORDER BY x) AS grp
FROM (
SELECT TRUE AS tbl, * FROM table_a
UNION ALL
SELECT NULL, * FROM table_b
) x
) y
) z
WHERE b IS NOT NULL
ORDER BY 1,2,3;
This way, every a.x is assigned the next bigger (or same) b.x, unless there is another a.x that is still smaller than the next b.x (or the same).
Produces the requested result for the demo case. Not sure about various ambiguous cases.
SQL Fiddle.

query optimization with condition

I want to optimze the follwing query, to not use subquery to get max value:
select c.ida2a2 from table1 m, table2 c
where c.ida3a5 = m.ida2a2
and (c.createstampa2 < (select max(cc.createstampa2)
from table2 cc where cc.ida3a5 = c.ida3a5));
Any idea? Please let me know if you want to get more info.
This may be a more efficient way to write the query:
select c.ida2a2
from table1 m join
(select c.*, MAX(createstampa2) over (partition by ida3a5) as maxcs
from table2 c
) c
on c.ida3a5 = m.ida2a2
where c.createstampa2 < maxcs
I'm pretty sure Oracle optimizes this correctly (filtering the rows before the join). If you wanted to be clearer:
select c.ida2a2
from table1 m join
(select c.*
from (select c.*, MAX(createstampa2) over (partition by ida3a5) as maxcs
from table2 c
) c
where c.createstamp2 < c.maxcs
) c
on c.ida3a5 = m.ida2a2