SQL select from either one or other table - sql

Assume I have a table A with a lot of records (> 100'000) and a table B with has the same columns as A and about the same data amount.
Is there a possibility with one clever select statement that I can either get all records of table A or all records of table B?
I am not so happy with the approach I currently use because of the performance:
select
column1
,column2
,column3
from (
select 'A' as tablename, a.* from table_a a
union
select 'B' as tablename, b.* from table_b b
) x
where
x.tablename = 'A'

Offhand, your approach seems like the only approach in standard SQL.
You will improve performance considerably by changing the UNION to UNION ALL. The UNION must read in the data from both tables and then eliminate duplicates, before returning any data.
The UNION ALL does not eliminate duplicates. How much better this performs depends on the database engine and possibly on turning parameters.
Actually, there is another possibility. I don't know how well it will work, but you can try it:
select *
from ((select const.tableName, a.*
from A cross join
(select 'A' as tableName where x.TableName = 'A')
) union all
(select const.tableName, b.*
from B cross join
(select 'B' as tableName where x.TableName = 'B')
)
) t
No promises. But the idea is to cross join to a table with either 1 or 0 rows. This will not work in MySQL, because it does not allow WHERE clauses without a FROM. In other databases, you might need a tablename such as dual. This gives the query engine an opportunity to optimize away the read of the table entirely, when the subquery contains no records. Of course, just because you give a SQL engine the opportunity to optimize does not mean that it will.
Also, the "*" is a bad idea particularly in union's. But I've left it in because that is not the focus of the question.

you can try next solution, it's selects only from table tmp1 ('A' = 'A')
select
*
from
tmp1
where
'A' = 'A'
union all
select
*
from
tmp2
where
'B' = 'A'
SQL Fiddle demo here
check execution plan

Hard to tell exactly what you want without a little more context, but perhaps something like this could work?
DECLARE #TableName nvarchar(15);
DECLARE #Query nvarchar(50);
SELECT #TableName = YourField
FROM YourTable
WHERE ...
SET #Query = 'SELECT * FROM ' + #TableName
EXEC #Query
Syntax might differ a bit depending on what RDBMS you are using, and more specifically what you are trying to accomplish, but might be a push in the right direction.

The proper way to do this and maintain performance requires some modification to your physical table design.
If you can add a column to each table that holds your indicator column and add a check constraint on that column, you can achieve "partition" elimination on your query.
DDL:
create table table_a (
c1 ...
,c2 ...
,c3 ...
,table_ind char(1) not null generated always as 'A'
,constraint ck_table_ind check (table_ind = 'A')
);
create table table_b (
c1 ...
,c2 ...
,c3 ...
,table_ind char(1) not null generated always as 'B'
,constraint ck_table_ind check (table_ind = 'B')
);
create view v1 as (
select * from table_a
union all
select * from table_b
);
If you execute the query select c1,c2,c3 from v1 where table_ind = 'A' the DB2 optimizer will use the check constraint to recognize that no rows in table_b can match the table_ind = 'A' predicate, so it will completely eliminate the table from the access plan.
This was used (and still is in some cases) before DB2 for Linux/UNIX/Windows supported Range Partitioning. You can read more about this technique in this research paper [PDF] written by some of the IBM DB2 developers back in 2002.

Related

(Teradata Version)- get all records plus all corresponding records in another table

Can following Query be optimised for Teradata?
We need all records from small table A, plus all corresponding records from large table B, that match on a nonunique key
Or, in other words: everything except all from B that has no match in A.
Maybe something with a JOIN? Or a Subselect that is a non-correlated Query, does that also apply to Teradata?
SELECT a.nonunique
, a.colX
FROM small_tab a
UNION ALL
SELECT b.nonunique
, b.colY
FROM large_tab b
WHERE EXISTS (
SELECT 1
FROM small_tab a
WHERE a.nonuniqe = b.nonunique
);
Thanks for the help!
=========UPDATE====
based on quanos answer in this MySQL question, would following statement with a noncorrelated subquery be faster also in Teradata?
SELECT a.nonunique
, a.colX
FROM small_tab a
UNION ALL
SELECT b.nonunique
, b.colY
FROM large_tab b
WHERE b.nonunique IN
(
SELECT DISTINCT nonunique
FROM small_tab
GROUP BY nonunique
)
I cannot test in Teradata currently, only have an Oracle instance at home..
I'm not sure whether it is a typo, but you have a redundant select query after WHERE clause. Also, you will have to use the same column name in SELECT query that is being used in WHERE Claue.
Below query works fine in Teradata.
SELECT a.nonunique, a.colX
FROM small_tab a
UNION ALL
SELECT b.nonunique, b.colY
FROM large_tab b
WHERE b.id IN(
SELECT **id**
FROM small_tab)
Hope it helps. if any query on above query, please let me know.

How to create temporary result set from selected set of data in SQL?

For debugging purpose I want to create pseudo "result set" in order to join them, like:
with tmp_tbl as ( select v from dual where v in ('cat', 'dog', 'fish') )
select read_tbl.* from tmp_tbl
left outer join read_tbl on real_tbl.id = tmp_tbl.id;
I understand that above expression is invalid and can be transformed into another which works. But my real example too complicate to shown here.
My question how to make this expression:
select v from dual where v in ('cat', 'dog', 'fish')
a valid result set so I can use it with joins and from keywords?
dual doesn't have v column. I look for a way to break SQL syntax to avoid create table calls..
I'm still not quite sure what you're trying to do, but it looks to me like you want a dummy table with fixed values. If so you can select multiple dummy values from dual and union all the results, which will give you multiple rows. You can then use that as a sub-select, or if you're effectively masking a real table (from the 'debug' comment) then a CTE might be clearer:
with tmp_tbl as (
select 'cat' as id from dual
union all select 'dog' from dual
union all select 'fish' from dual
)
select tmp_tbl.id, read_tbl.*
from tmp_tbl
left outer join real_tbl
on real_tbl.id = tmp_tbl.id;
You referred to a v column in the text, but you're joining on id, so I've aliased the fixed value as id inside the CTE (it only needs to be named in the first row). You can just change that to something else if you prefer. And you can of course select several fixed values (with different aliases) in each select from dual to make it look more like a real table.
For this purpose you can use subquery factoring, also known as “the with clause”
with t as
( select v from dial where v in ('cat','dog','fish') )
Select * from t
Oracle may decide to materialize this result set internally or not. If you want to control this behavior, you can use the optimizer hints “materialize” and “inline”.
Hope this helps.
Regards,
Rob.
Just enclose the query in brackets and give it a name, than you can use it in joins as you wish:
SELECT *
FROM ( select v from dial where v in ('cat', 'dog', 'fish') ) tmp_table
JOIN other_table ON tmp_table.v = other_table.v
WHERE tmp_table.v = xxx etc

Select columnValue if the column exists otherwise null

I'm wondering if I can select the value of a column if the column exists and just select null otherwise. In other words I'd like to "lift" the select statement to handle the case when the column doesn't exist.
SELECT uniqueId
, columnTwo
, /*WHEN columnThree exists THEN columnThree ELSE NULL END*/ AS columnThree
FROM (subQuery) s
Note, I'm in the middle to solidifying my data model and design. I hope to exclude this logic in the coming weeks, but I'd really like to move beyond this problem right because the data model fix is a more time consuming endeavor than I'd like to tackle now.
Also note, I'd like to be able to do this in one query. So I'm not looking for an answer like
check what columns are on your sub query first. Then modify your
query to appropriately handle the columns on your sub query.
You cannot do this with a simple SQL statement. A SQL query will not compile unless all table and column references in the table exist.
You can do this with dynamic SQL if the "subquery" is a table reference or a view.
In dynamic SQL, you would do something like:
declare #sql nvarchar(max) = '
SELECT uniqueId, columnTwo, '+
(case when exists (select *
from INFORMATION_SCHEMA.COLUMNS
where tablename = #TableName and
columnname = 'ColumnThree' -- and schema name too, if you like
)
then 'ColumnThree'
else 'NULL as ColumnThree'
end) + '
FROM (select * from '+#SourceName+' s
';
exec sp_executesql #sql;
For an actual subquery, you could approximate the same thing by checking to see if the subquery returned something with that column name. One method for this is to run the query: select top 0 * into #temp from (<subquery>) s and then check the columns in #temp.
EDIT:
I don't usually update such old questions, but based on the comment below. If you have a unique identifier for each row in the "subquery", you can run the following:
select t.. . ., -- everything but columnthree
(select column3 -- not qualified!
from t t2
where t2.pk = t.pk
) as column3
from t cross join
(values (NULL)) v(columnthree);
The subquery will pick up column3 from the outer query if it doesn't exist. However, this depends critically on having a unique identifier for each row. The question is explicitly about a subquery, and there is no reason to expect that the rows are easily uniquely identified.
As others already suggested, the sane approach is to have queries that meet your table design.
There is a rather exotic approach to achieve what you want in (pure, not dynamic) SQL though. A similar problem was posted at DBA.SE: How to select specific rows if a column exists or all rows if a column doesn't but it was simpler as only one row and one column was wanted as result. Your problem is more complex so the query is more convoluted, to say the least. Here is, the insane approach:
; WITH s AS
(subquery) -- subquery
SELECT uniqueId
, columnTwo
, columnThree =
( SELECT ( SELECT columnThree
FROM s AS s2
WHERE s2.uniqueId = s.uniqueId
) AS columnThree
FROM (SELECT NULL AS columnThree) AS dummy
)
FROM s ;
It also assumes that the uniqueId is unique in the result set of the subquery.
Tested at SQL-Fiddle
And a simpler method which has the additional advantage that allows more than one column with a single subquery:
SELECT s.*
FROM
( SELECT NULL AS columnTwo,
NULL AS columnThree,
NULL AS columnFour
) AS dummy
CROSS APPLY
( SELECT
uniqueId,
columnTwo,
columnThree,
columnFour
FROM tableX
) AS s ;
The question has also been asked at DBA.SE and has been answered by #Andriy M (using CROSS APPLY too!) and Michael Ericsson (using XML):
Why can't I use a CASE statement to see if a column exists and not SELECT from it?
you can use dynamic SQL.
first you need to check exist column and then create dynamic query.
DECLARE #query NVARCHAR(MAX) = '
SELECT FirstColumn, SecondColumn, '+
(CASE WHEN exists (SELECT 1 FROM syscolumns
WHERE name = 'ColumnName' AND id = OBJECT_ID('TableName'))
THEN 'ColumnName'
ELSE 'NULL as ThreeColumn'
END) + '
FROM TableName'
EXEC sp_executesql #query;

When #tmp table is null ignore where

I trying to ignore my where when #tmp table is empty.
Like:
create table #tmp
(
my_id int
)
create table #tmp2
(
my_name_id int
)
select * from foo
where foo_id in (select my_id from #tmp)
and foo_name_id in (select my_name_id from #tmp2)
And now case.
When we have ituation when one of tables is empty it will not generate any result.
#tmp is not empty
#tmp1 is empty
So my where with #tmp1 should be ignored.
Got any clue how to do it?
Just add additional conditions:
select * from foo
where (foo_id in (select my_id from #tmp) or not exists(select * from #tmp))
and (foo_name_id in (select my_name_id from #tmp2) or not exists(select * from #tmp2))
The general form you're adopted, however, makes it look like you're taking quite a procedural approach to SQL, where you're storing partial results in temp tables and then combining them at the end. It's usually better to write the entire desired result as a single query and let SQL Server work out how best to compute the result (and cache intermediate forms if required)

updating changes rows

I have a requirement to update a couple of thousand rows in a table based on whether any changes have happened to any of the values. At the moment im just updating all the values regardless but was wondering what was more effecient. Should i check all the columns to see if there are any changes and update or should i just update regardless. e.g
update someTable Set
column1 = somevalue,
column2 = somevalue,
column3 = somevalue,
etc....
from someTable inner join sometable2 on
someTable.id = sometable2.id
where
someTable.column1 != sometable2.column1 or
someTable.column2 != sometable2.column2 or
someTable.column2 != sometable2.column2 or
etc etc......
Whats faster and whats best practice
See two articles on Paul White's Blog.
The Impact of Non-Updating Updates for discussion of the main issue.
Undocumented Query Plans: Equality Comparisons for a less tedious way of doing the inequality comparisons particularly if your columns are nullable (WHERE NOT EXISTS (SELECT someTable.* INTERSECT SELECT someTable2.*)).
I believe this is the best way.
Tables and data:
declare #someTable1 table(id int, column1 int, column2 varchar(2))
declare #someTable2 table(id int, column1 int, column2 varchar(2))
insert #someTable1
select 1,10 a, 'a3'
union all select 2,20 , 'a3'
union all select 3,null, 'a4'
insert #someTable2
select 1,10, 'a3'
union all select 2,19, 'a3'
union all select 3,null, 'a5'
Update:
UPDATE t1
set t1.column1 = t2.column1,
t1.column2 = t2.column2
from #someTable1 t1
JOIN
(select * from #someTable2
EXCEPT
select * from #someTable1) t2
on t2.id = t1.id
Result:
select * from #someTable1
id a b
----------- -------- --
1 10 a3
2 19 a3
3 NULL a5
I've found that explicitly including the where clause the excludes no-op updates to perform faster, when working against large tables, but this is a very YMMV type of question.
If possible, compare the two approaches side by side, against a realistic set of data. E.g. if your tables contain millions of rows, and the updates affect only 10, make sure your sample data affects just a few rows. Or likewise, if it's likely that most rows will change, make your sample data reflect that.