Select table as json array - sql

I have three tables in PostgreSQL: A, B, C.
I want to get a row from table A with a specific id, plus all records from tables B and C with matching id as aggregated JSON.
For example:
Table A Table B Table C
---------------------------------------------------------------
id / colum1 / colum2 id/ colum 1 id / column1
1 someValue, somValue 1 someVal1 1 someVal1
1 someVal2 1 someVal2
The expected output for id = 1 would be:
a.column1 a.column2 ARRAY_JSON_B ARRAY_JSON_C
------------------------------------------------------------------------------
someValue someValue [{colum1:'someVal1'}, [{colum1:'someVal1'},
{colum1:'someVal2'}] {colum1:'someVal2'}]

This requires Postgres 9.3 or later.
Simple case
I suggest to use the simpler json_agg() that's meant for this purpose, in LATERAL joins:
SELECT *
FROM a
LEFT JOIN LATERAL (SELECT json_agg(b) AS array_json_b FROM b WHERE id = a.id) b ON true
LEFT JOIN LATERAL (SELECT json_agg(c) AS array_json_c FROM c WHERE id = a.id) c ON true
WHERE id = 1;
LEFT JOIN LATERAL ... ON true keeps rows in the result that have no match on the left side of the join. Details:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Subtle difference: This query returns NULL where no match is found in b or c, #stas' query with correlated subqueries returns an empty array instead. May or may not be important.
Actual answer
Your example in the question excludes the redundant id column in b and c from the result - which makes sense. To achieve this, you can't use #stas' simple correlated subquery. While it would still work for a single column instead of the whole row, it would lose the column name and produce a simple array. Also, it would not work for more than one column.
Use json_object_agg() for a single selected column (which also allows to chose the tag name freely):
SELECT *
FROM a
LEFT JOIN LATERAL (
SELECT json_object_agg('colum1', colum1) AS array_json_b
FROM b WHERE id = a.id
) b ON true
LEFT JOIN LATERAL (
SELECT json_object_agg('colum1', colum1) AS array_json_c
FROM c WHERE id = a.id
) c ON true
WHERE id = 1;
Or use a subselect for any selection (col1 and col2 in this example):
SELECT *
FROM a
LEFT JOIN LATERAL (
SELECT json_agg(x) AS array_json_b
FROM (SELECT col1, col2 FROM b WHERE id = a.id) x
) b ON true
LEFT JOIN LATERAL (
SELECT json_agg(x) AS array_json_c
FROM (SELECT col1, col2 FROM c WHERE id = a.id) x
) c ON true
WHERE id = 1;
Related:
Return multiple columns of the same row as JSON array of objects
How do I return a jsonb array and array of objects from my data?

select
a.*,
to_json(array(select b from b where b.id = a.id)) array_json_b,
to_json(array(select c from c where c.id = a.id)) array_json_c
from a
where
a.id = 1;
I hope your Postgresql version is 9.3 or higher. There is a clever function to_json which can convert anything to json. So we take an array of all related rows from b and convert it. Same with c.

Related

Subqueries vs Multi Table Join

I've 3 tables A, B, C. I want to list the intersection count.
Way 1:-
select count(id) from A a join B b on a.id = b.id join C c on B.id = C.id;
Result Count - X
Way 2:-
SELECT count(id) FROM A WHERE id IN (SELECT id FROM B WHERE id IN (SELECT id FROM C));
Result Count - Y
The result count in each of the query is different. What exactly is wrong?
A JOIN can multiply the number of rows as well as filtering out rows.
In this case, the second count should be the correct one because nothing is double counted -- assuming id is unique in a. If not, it needs count(distinct a.id).
The equivalent using JOIN would use COUNT(DISTINCT):
select count(distinct a.id)
from A a join
B b
on a.id = b.id join
C c
on B.id = C.id;
I mention this for completeness but do not recommend this approach. Multiplying the number of rows just to remove them using distinct is inefficient.
In many databases, the most efficient method might be:
select count(*)
from a
where exists (select 1 from b where b.id = a.id) and
exists (select 1 from c where c.id = a.id);
Note: This assumes there are indexes on the id columns and that id is unique in a.

select join result as columns to same row

I have a one-to-many relation in a pg database. I have table A and table B, where rows of B have a foreign key to A.
I want to select certain rows from A and attach certain columns from matching rows of B to same row from A.
E.g.
A
id | created_at |
B
id | created_at | a_id | type |
I tried to do multiple subqueries, e.g.
select A.id,
(select created_at from B where b.a_id = a.id and B.type = 'some_type' limit 1) as some_type_created_at,
(select created_at from B where b.a_id = a.id and B.type = 'another_type' limit 1) as another_type_created_at
from A
But this is obviously ugly and wrong, feels like that. What is the better way of achieving it in Postgres?
Ofcourse I can do join and get the full cartesian product, but I want the result from the db to be directly like this.
There's nothing wrong about using scalar subqueries the way you are doing it. That will work well and will give you the result you want.
Alternatively, you could use lateral table expressions; that will also give you the same result, it's more complex, and in this case I don't see any particular benefit to use them. Lateral queries will take the form:
select
a.id,
b1.created_at as some_type_created_at,
b2.created_at as another_type_created_at
from a
left join lateral (
select created_at from B where b.a_id = a.id and B.type = 'some_type' limit 1
) b1 on true,
left join lateral (
select created_at from B where b.a_id = a.id and B.type = 'another_type' limit 1
) b2 on true
In sum, you are good as you are.

Query left join without all the right rows from B table

I have 2 tables, A and B.
I need all columns from A + 1 column from B in my select.
Unfortunately, B has multiples rows(all identicals) for 1 row in A
on the join condition.
I tried but I can't isolate one row in A for one row in B with left join for example while keeping my select.
How can I do this query ? Query in ORACLE SQL
Thanks in advance.
This is a good use for outer apply. The structure of the query looks like this:
select a.*, b.col
from a outer apply
(select top 1 b.col
from b
where b.? = a.?
) b;
Normally, you would only use top 1 with order by. In this case, it doesn't seem to make a difference which row you choose.
You can group by on all columns from A, and then use an aggregate (like max or min) to pick any of the identical B values:
select a.*
, b.min_col1
from TableA a
left join
(
select a_id
, min(col1) as min_col1
from TableB
group by
a_id
) b
on b.a_id = a.id

SQL : matching two tables with two possibles conditions

is there a way in SQL while we join two tables table_A and table_B, if we can’t match the two tables on a criteria said criteria_X at all we will try the second criteria criteria_Y
Something like this:
select *
from table_A, table_B
where table_A.id = table_B.id2
and (if there is no row where table_B.criteria_X = X then try table_B.criteria_Y = Y)
The following query is not a solution:
..
and (table_B.criteria_X = X OR table_B.criteria_Y = Y)
Thanks
This is a find the best match query:
select *
from
(
select *,
row_number() -- based on priority, #1 criteria_X, #2 criteria_Y
over (partition by table_A.id
order by case when table_B.criteria_X = X then 1
else 2
end) as best_match
from table_A, table_B
where table_A.id = table_B.id2
and (table_B.criteria_X = X OR table_B.criteria_Y = Y)
) dt
where best_match = 1
If the ORed condition results in loosing indexed access you might try splitting it into two UNION ALL selects.
A typical method uses left join twice . . . once for each criterion. Then then uses coalesce() in the select. And, with indexes on the join keys, this also should have very good performance:
select a.*, coalesce(b1.colx, b2.colx)
from table_A a left join
table_B b1
on a.id = b1.id2 and b1.criteria_X = X left join
table_B b2
on a.id = b1.id2 and b2.criteria_Y = Y
where b1.id2 is not null or b2.id2 is not null;
The where clause ensures that at least one row matches.
This does not work under all circumstances -- in particular, each join needs to return only 0 or 1 matching rows. This is often the situation with this type of "priority" joins.
An alternative version uses row_number(). This is sort of similar to #dnoeth's approach, but the row number calculation is done before the join:
select a.*, coalesce(b1.colx, b2.colx)
from table_A a join
(select b.*,
row_number() over (partition by id2
order by (case when criteria_x = X then 1
when criteria_y = Y then 2
end)
) as seqnum
from table_B b
where criteria_x = X or criteria_y = Y
) b
on a.id = b.id2 and seqnum = 1

Value present in more than one table

I have 3 tables. All of them have a column - id. I want to find if there is any value that is common across the tables. Assuming that the tables are named a.b and c, if id value 3 is present is a and b, there is a problem. The query can/should exit at the first such occurrence. There is no need to probe further. What I have now is something like
( select id from a intersect select id from b )
union
( select id from b intersect select id from c )
union
( select id from a intersect select id from c )
Obviously, this is not very efficient. Database is PostgreSQL, version 9.0
id is not unique in the individual tables. It is OK to have duplicates in the same table. But if a value is present in just 2 of the 3 tables, that also needs to be flagged and there is no need to check for existence in he third table, or check if there are more such values. One value, present in more than one table, and I can stop.
Although id is not unique within any given table, it should be unique across the tables; a union of distinct id should be unique, so:
select id from (
select distinct id from a
union all
select distinct id from b
union all
select distinct id from c) x
group by id
having count(*) > 1
Note the use of union all, which preserves duplicates (plain union removes duplicates).
I would suggest a simple join:
select a.id
from a join
b
on a.id = b.id join
c
on a.id = c.id
limit 1;
If you have a query that uses union or group by (or order by, but that is not relevant here), then you need to process all the data before returning a single row. A join can start returning rows as soon as the first values are found.
An alternative, but similar method is:
select a.id
from a
where exists (select 1 from b where a.id = b.id) and
exists (select 1 from c where a.id = c.id);
If a is the smallest table and id is indexes in b and c, then this could be quite fast.
Try this
select id from
(
select distinct id, 1 as t from a
union all
select distinct id, 2 as t from b
union all
select distinct id, 3 as t from c
) as t
group by id having count(t)=3
It is OK to have duplicates in the same table.
The query can/should exit at the first such occurrence.
SELECT 'OMG!' AS danger_bill_robinson
WHERE EXISTS (SELECT 1
FROM a,b,c -- maybe there is a place for old-style joins ...
WHERE a.id = b.id
OR a.id = c.id
OR c.id = b.id
);
Update: it appears the optimiser does not like carthesian joins with 3 OR conditions. The below query is a bit faster:
SELECT 'WTF!' AS danger_bill_robinson
WHERE exists (select 1 from a JOIN b USING (id))
OR exists (select 1 from a JOIN c USING (id))
OR exists (select 1 from c JOIN b USING (id))
;