PostgreSQL union two tables and join with a third table

PostgreSQL union two tables and join with a third table - sql

I want to union to tables and join them with a third metadata table and I would like to know which approach is the best/fastest?
The database is a PostgreSQL.
Below is my two suggestions, but other approaches are welcome.
To do the join before the union on both tables:
SELECT a.id, a.feature_type, b.datetime, b.file_path
FROM table1 a, metadata b WHERE a.metadata_id = b.id
UNION ALL
SELECT a.id, a.feature_type, b.datetime, b.file_path
FROM table2 a, metadata b WHERE a.metadata_id = b.id
Or to do the union first and then do the join:
SELECT a.id, a.feature_type, b.datetime, b.file_path
FROM
(
SELECT id, feature_type, metadata_id FROM table1
UNION ALL
SELECT id, feature_type, metadata_id FROM table2
)a, metadata b
WHERE a.metadata_id = b.id

Run an EXPLAIN ANALYZE on both statements then you will see which one is more efficient.

it can be unpredictable due to sql-engine optimizator. it's better to look at the execution plan. finally both approaches can be represented in the same way

In so far as I can remember, running Explain will reveal that PostgreSQL interprets the second as the first provided that there is no group by clause (explicit, or implicit due to union instead of union all) in any of the subqueries.

Related

SQL with string concatenation is slow

I have a stored procedure which is taking 15 seconds to execute. The database is SQL Server 2008 R2.
The SQL query is as follows,
SELECT * FROM EmployeeSetA
UNION
SELECT * FROM EmployeeSetB
WHERE
Name+''+Id NOT IN (SELECT Name+''+Id FROM EmployeeSetA)
The query is trying to union EmployeeSetA and EmployeeSetB and also ensures that Name and Id in EmployeeSetB is not in EmployeeSetA before performing Union.
When I verified, the string concatenation is causing the SQL to run slowly. Is there any better approach? Any suggestion will be greatly appreciated.

You could stack the two tables first and then use except to get rid of unwanted records. Feel free to change union all to union if that's what you actually want. Having said that, not exists is the ideal solution
select * from EmployeeSetA
union all
select * from EmployeeSetB
except
select b.* from EmployeeSetB b
join EmployeeSetA a on a.name=b.name and a.id=b.id;
Or more directly,
select * from EmployeeSetA
union all
select b.* from EmployeeSetB b
left join EmployeeSetA a on a.name=b.name and a.id=b.id
where a.name is null or a.id is null;

Can I select several tables in the same WITH query?

I have a long query with a with structure. At the end of it, I'd like to output two tables. Is this possible?
(The tables and queries are in snowflake SQL by the way.)
The code looks like this:
with table_a as (
select id,
product_a
from x.x ),
table_b as (
select id,
product_b
from x.y ),
table_c as (
..... many more alias tables and subqueries here .....
)
select * from table_g where z = 3 ;
But for the very last row, I'd like to query table_g twice, once with z = 3 and once with another condition, so I get two tables as the result. Is there a way of doing that (ending with two queries rather than just one) or do I have to re-run the whole code for each table I want as output?

One query = One result set. That's just the way that RDBMS's work.
A CTE (WITH statement) is just syntactic sugar for a subquery.
For instance, a query similar to yours:
with table_a as (
select id,
product_a
from x.x ),
table_b as (
select id,
product_b
from x.y ),
table_c as (
select id,
product_c
from x.z ),
select *
from table_a
inner join table_b on table_a.id = table_b.id
inner join table_c on table_b.id = table_c.id;
Is 100% identical to:
select *
from
(select id, product_a from x.x) table_a
inner join (select id, product_b from x.y) table_b
on table_a.id = table_b.id
inner join (select id, product_c from x.z) table_c
on table_b.id = table_c.id
The CTE version doesn't give you any extra features that aren't available in the non-cte version (with the exception of a recursive cte) and the execution path will be 100% the same (EDIT: Please see Simon's answer and comment below where he notes that Snowflake may materialize the derived table defined by the CTE so that it only has to perform that step once should the CTE be referenced multiple times in the main query). As such there is still no way to get a second result set from the single query.

While they are the same syntactically, they don't have the same performance plan.
The first case can be when one of the stages in the CTE is expensive, and is reused via other CTE's or join to many times, under Snowflake, use them as a CTE I have witness it running the "expensive" part only a single time, which can be good so for example like this.
WITH expensive_select AS (
SELECT a.a, b.b, c.c
FROM table_a AS a
JOIN table_b AS b
JOIN table_c AS c
WHERE complex_filters
), do_some_thing_with_results AS (
SELECT stuff
FROM expensive_select
WHERE filters_1
), do_some_agregation AS (
SELECT a, SUM(b) as sum_b
FROM expensive_select
WHERE filters_2
)
SELECT a.a
,a.b
,b.stuff
,c.sum_b
FROM expensive_select AS a
LEFT JOIN do_some_thing_with_results AS b ON a.a = b.a
LEFT JOIN do_some_agregation AS c ON a.a = b.a;
This was originally unrolled, and the expensive part was some VIEWS that the date range filter that was applied at the top level were not getting pushed down (due to window functions) so resulted in full table scans, multiple times. Where pushing them into the CTE the cost was paid once. (In our case putting date range filters in the CTE made Snowflake notice the filters and push them down into the view, and things can change, a few weeks later the original code ran as good as the modified, so they "fixed" something)
In other cases, like this the different paths that used the CTE use smaller sub-sets of the results, so using the CTE reduced the remote IO so improved performance, there then was more stalls in the execution plan.
I also use CTEs like this to make the code easier to read, but giving the CTE a meaningful name, but the aliasing it to something short, for use. Really love that.

Mssql union two tables based on column value

I have to tabels (tab_a and tab_b) both carrying customers data. The two tables have some records in common, or better, customers in common. The problem is they are identified by different customer's code and the names of the same customer may vary from one table to the other. The only common key they have is the VAT number.
What i need is a recordset with the customers from both tables but without dupicates.
I tried a regular UNION but the problem is that if the name of the customers is written slightly different from one tabel to the other i get a duplicate.
I short I need the result of
SELECT t1.vatnumber FROM tab_a AS t1 UNION t2.vatnumber FROM tab_b AS t2
But with the adition of the name of the cusotmers and his custumer code (taken from tab_a or if not present in tab_a from tab_b)
Any help is truly appreciated.
Regards

You are not going to get what you want readily with distinct. A better approach might be union all with aggregation. The following returns what you want -- but without precedence. That is, if a name or code exists in both tables, then an arbitrary one is returned:
select vatnumber, max(name) as name, max(code) as code
from ((select a.vatnumber, a.name, a.code
from tab_a
) union all
(select b.vatnumber, b.name, b.code
from tab_b
)
) ab
group by vatnumber;
If you want precedence, then it is a bit more cumbersome. Here is one method:
select vatnumber,
coalesce(max(case when which = 'a' then name end), max(name)) as name,
coalesce(max(case when which = 'a' then code end), max(code)) as code
from ((select a.vatnumber, a.name, a.code, 'a' as which
from tab_a
) union all
(select b.vatnumber, b.name, b.code, 'b' as which
from tab_b
)
) ab
group by vatnumber;

Don't use UNION, it is simpler with FULL OUTER JOIN:
SELECT
ISNULL(a.vatnumber, b.vatnumber) vatnumber,
ISNULL(a.name, b.name) name,
...
FROM tab_a a FULL JOIN tab_b b ON a.vatnumber = b.vatnumber
Syntax will depend on DB that you are using. e.g. This example should be ok for MSSQL.

Join or Union All To Combine Records?

I have 2 similar tables that contain campaign names. I know I can do an union all to combine the tables, but I was wondering if there was a way to do this using form of Join instead? I want to create a table Z with campaign names for table A plus campaign names from table B (which are not in A). Can I do this with a join or is Union ALL the only way?

UNION is the easier and correct way to do that. Purely for the exercise you can do it with a JOIN but it is a lot more complex, unreadable, and the perf will be way worse...

SELECT * INTO TABLEZ
FROM
(
SELECT Column1, Column2, Column3.... FROM TABLEA
UNION ALL
SELECT Column1, Column2, Column3.... FROM TABLEB
)Q

Here is how you would do this with a full outer join:
select distinct coalesce(a.campaign, b.campaign)
from b left outer join
a
on a.campaign = b.campaign;
The union/union all approach is totally reasonable. I'm just offering this as a join solution that you seem to be alluding to in the question.

SQL getting results from two tables in order by primary key

In my query I want to get the rows from two different tables descending by their primary keys. They have two different keys so is it possible to be able to do this in one query?

Your question is a little vague. To "get rows from two different tables" you can do a JOIN, or you can do a UNION.
In the case of a JOIN:
SELECT a.id, a.something, b.id, b.something
FROM a
INNER JOIN b ON b.aId = a.id
ORDER BY a.id, b.id
In the case of a UNION:
SELECT id, something
FROM (
SELECT a.id. a.something FROM a
UNION
SELECT b.id, b.something FROM b
) t
ORDER BY t.id
These are very different, but it seems like one of them will meet your needs.
(Note that UNION by default eliminates duplicates. Use UNION ALL to keep duplicates.)

SELECT *
FROM ( select a.pk, a.foo, a.bar from a
union
select b.pk, b.foo, b.bar from b
) c
ORDER BY c.pk DESC;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

PostgreSQL union two tables and join with a third table - sql

Run an EXPLAIN ANALYZE on both statements then you will see which one is more efficient.

it can be unpredictable due to sql-engine optimizator. it's better to look at the execution plan. finally both approaches can be represented in the same way

In so far as I can remember, running Explain will reveal that PostgreSQL interprets the second as the first provided that there is no group by clause (explicit, or implicit due to union instead of union all) in any of the subqueries.

Related

SQL with string concatenation is slow

Can I select several tables in the same WITH query?

Mssql union two tables based on column value

Join or Union All To Combine Records?

SQL getting results from two tables in order by primary key

Categories

Resources