SQL with string concatenation is slow - sql

I have a stored procedure which is taking 15 seconds to execute. The database is SQL Server 2008 R2.
The SQL query is as follows,
SELECT * FROM EmployeeSetA
UNION
SELECT * FROM EmployeeSetB
WHERE
Name+''+Id NOT IN (SELECT Name+''+Id FROM EmployeeSetA)
The query is trying to union EmployeeSetA and EmployeeSetB and also ensures that Name and Id in EmployeeSetB is not in EmployeeSetA before performing Union.
When I verified, the string concatenation is causing the SQL to run slowly. Is there any better approach? Any suggestion will be greatly appreciated.

You could stack the two tables first and then use except to get rid of unwanted records. Feel free to change union all to union if that's what you actually want. Having said that, not exists is the ideal solution
select * from EmployeeSetA
union all
select * from EmployeeSetB
except
select b.* from EmployeeSetB b
join EmployeeSetA a on a.name=b.name and a.id=b.id;
Or more directly,
select * from EmployeeSetA
union all
select b.* from EmployeeSetB b
left join EmployeeSetA a on a.name=b.name and a.id=b.id
where a.name is null or a.id is null;

Related

Can I select several tables in the same WITH query?

I have a long query with a with structure. At the end of it, I'd like to output two tables. Is this possible?
(The tables and queries are in snowflake SQL by the way.)
The code looks like this:
with table_a as (
select id,
product_a
from x.x ),
table_b as (
select id,
product_b
from x.y ),
table_c as (
..... many more alias tables and subqueries here .....
)
select * from table_g where z = 3 ;
But for the very last row, I'd like to query table_g twice, once with z = 3 and once with another condition, so I get two tables as the result. Is there a way of doing that (ending with two queries rather than just one) or do I have to re-run the whole code for each table I want as output?
One query = One result set. That's just the way that RDBMS's work.
A CTE (WITH statement) is just syntactic sugar for a subquery.
For instance, a query similar to yours:
with table_a as (
select id,
product_a
from x.x ),
table_b as (
select id,
product_b
from x.y ),
table_c as (
select id,
product_c
from x.z ),
select *
from table_a
inner join table_b on table_a.id = table_b.id
inner join table_c on table_b.id = table_c.id;
Is 100% identical to:
select *
from
(select id, product_a from x.x) table_a
inner join (select id, product_b from x.y) table_b
on table_a.id = table_b.id
inner join (select id, product_c from x.z) table_c
on table_b.id = table_c.id
The CTE version doesn't give you any extra features that aren't available in the non-cte version (with the exception of a recursive cte) and the execution path will be 100% the same (EDIT: Please see Simon's answer and comment below where he notes that Snowflake may materialize the derived table defined by the CTE so that it only has to perform that step once should the CTE be referenced multiple times in the main query). As such there is still no way to get a second result set from the single query.
While they are the same syntactically, they don't have the same performance plan.
The first case can be when one of the stages in the CTE is expensive, and is reused via other CTE's or join to many times, under Snowflake, use them as a CTE I have witness it running the "expensive" part only a single time, which can be good so for example like this.
WITH expensive_select AS (
SELECT a.a, b.b, c.c
FROM table_a AS a
JOIN table_b AS b
JOIN table_c AS c
WHERE complex_filters
), do_some_thing_with_results AS (
SELECT stuff
FROM expensive_select
WHERE filters_1
), do_some_agregation AS (
SELECT a, SUM(b) as sum_b
FROM expensive_select
WHERE filters_2
)
SELECT a.a
,a.b
,b.stuff
,c.sum_b
FROM expensive_select AS a
LEFT JOIN do_some_thing_with_results AS b ON a.a = b.a
LEFT JOIN do_some_agregation AS c ON a.a = b.a;
This was originally unrolled, and the expensive part was some VIEWS that the date range filter that was applied at the top level were not getting pushed down (due to window functions) so resulted in full table scans, multiple times. Where pushing them into the CTE the cost was paid once. (In our case putting date range filters in the CTE made Snowflake notice the filters and push them down into the view, and things can change, a few weeks later the original code ran as good as the modified, so they "fixed" something)
In other cases, like this the different paths that used the CTE use smaller sub-sets of the results, so using the CTE reduced the remote IO so improved performance, there then was more stalls in the execution plan.
I also use CTEs like this to make the code easier to read, but giving the CTE a meaningful name, but the aliasing it to something short, for use. Really love that.

get all rows from A plus missing rows from B

This seems so obvious but I am failing.
In Teradata SQL, how to get all rows from table A, plus those from table B, that do not occur in table A, based on key field key?
This must have been asked a thousand times. But honestly I do not find the answer.
Full outer join seems to give me duplicate "inner join" results.
--Edit , based on first comment (thanks) --
so if I would do
select * from A
union all
select * from B
left join A
on A.key = B.key
where A.key IS NULL
I guess that would work (untested) but is that the most performant way?
Sometimes EXISTS or NOT EXISTS performs better than joins:
select * from A
union all
select * from B
where not exists (
select 1 from A
where A.key = B.key
)
I assume the key columns are already indexed.
Your version is fine . . . if you select the right columns:
select A.* from A
union all
select B.*
from B left join
A
on A.key = B.key
where A.key IS NULL;
I think Teradata does a good job optimizing joins. That said, EXISTS is also a very reasonable option.

How to retrieve only those rows of a table (db1) which are not in another table (db2)

I have a table t1 in db1, and another table t2 in db2. I have the same columns in both tables.
How do I retrieve only those rows which are not in the other table?
select id_num
from [db1].[dbo].[Tbl1]
except
select id_num
from [db2].[dbo].[Tb01]
You can use LEFT JOIN or WHERE NOT IN functions.
Using WHERE NOT IN:
select
dbase1.id_num from [db1].[dbo].[Tbl1] as dbase1
where dbase1.id_num not in
(select dbase2.id_num from [db2].[dbo].[Tb01] as dbase2)
Using LEFT JOIN (recommended as this is much faster)
SELECT dbase1.id_num
FROM [db1].[dbo].[Tbl1] as dbase1
LEFT JOIN [db2].[dbo].[Tb01] as dbase2 ON dbase2.id_num COLLATE Latin1_General_CI_A = dbase1.id_num COLLATE Latin1_General_CI_A
WHERE dbase2.id_num IS NULL
Compare tables with DB2 other databases may have a select a - b statement or similar. Because at the time my database also didn't have a-b I use the following. Wrap the statement in a create table statement to dig into the results. No rows and the tables are identical. I've added in a column BEFORE|AFTER which makes the results easy to read.
SELECT 'AFTER', A.* FROM
(SELECT * FROM &AFTER
EXCEPT
SELECT * FROM &BEFORE) AS A
UNION
SELECT 'BEFORE', B.* FROM
(SELECT * FROM &BEFORE
EXCEPT
SELECT * FROM &AFTER) AS B

Multiple reusable SQL queries

(Note I am getting an error submitting to stackoverflow if i use "select", so have misspelled my queries. [Now Fixed])
Sorry this is a newbie question. I have one very long SQL query that is getting harder to manage. In fact there are some sub-queries that are being used multiple times. What is the best way to break up the query? I would prefer to keep it in the database, rather than take it out into the calling program. It goes something like this.
Select A, B, C
from (select D from Table_1 where ...)
Union Select E, F
from Table_2
Inner Join (Select D, E, from Table_1 where...)..
So what I would like to do is
Result1 = select D,E from Table_1 where....
Result2 = Select A,B,C from Result_1 Union Select E,F from Table_2 Inner Join Result_1 ...
What is the best way to do this? I can't use Views because I don't have privileges. How can I use the results from the first query in the second query? Can cursors be used in this case?
Using a CTE you can access the same subquery multiple times (this is the main difference to Derived Tables):
with CTE as
(Select D, E, from Table_1 where...)
Select A, B, C
from CTE
Union
Select E, F
from Table_2
Inner Join CTE ..

PostgreSQL union two tables and join with a third table

I want to union to tables and join them with a third metadata table and I would like to know which approach is the best/fastest?
The database is a PostgreSQL.
Below is my two suggestions, but other approaches are welcome.
To do the join before the union on both tables:
SELECT a.id, a.feature_type, b.datetime, b.file_path
FROM table1 a, metadata b WHERE a.metadata_id = b.id
UNION ALL
SELECT a.id, a.feature_type, b.datetime, b.file_path
FROM table2 a, metadata b WHERE a.metadata_id = b.id
Or to do the union first and then do the join:
SELECT a.id, a.feature_type, b.datetime, b.file_path
FROM
(
SELECT id, feature_type, metadata_id FROM table1
UNION ALL
SELECT id, feature_type, metadata_id FROM table2
)a, metadata b
WHERE a.metadata_id = b.id
Run an EXPLAIN ANALYZE on both statements then you will see which one is more efficient.
it can be unpredictable due to sql-engine optimizator. it's better to look at the execution plan. finally both approaches can be represented in the same way
In so far as I can remember, running Explain will reveal that PostgreSQL interprets the second as the first provided that there is no group by clause (explicit, or implicit due to union instead of union all) in any of the subqueries.