In my current project there is a query where a set of parameters is given and I need to check those parameters against another table. Each of these parameters can be NULL and in this case has to be ignored. What I currently do is the following:
SELECT t.col1,
t.col2,
t.col3,
t.col4,
t.col5,
t.col6,
t.col7,
t.col8
FROM table1 t
INNER JOIN #parameters p ON (p.col1 IS NULL OR p.col1 = t.col1)
AND (p.col2 IS NULL OR p.col2 = t.col2)
AND (p.col3 IS NULL OR p.col3 = t.col3)
AND (p.col4 IS NULL OR p.col4 = t.col4)
AND (p.col5 IS NULL OR p.col5 = t.col5)
AND (p.col6 IS NULL OR p.col6 = t.col6)
AND (p.col7 IS NULL OR b.col7 >= t.col7)
AND (p.col8 IS NULL OR b.col8 <= t.col8)
This means if the column in the parameters table is NULL it will be ignored otherwise it will be compared to the corresponding column in table1. This works but unfortunately is VERY slow. Does anybody know a better solution (other then concatenating a string query)?
It seems like you don't have any real criteria that could be used to limit the data in your table, and that kind of structure usually never performs well. As far as I know, there's not much you can do to try to improve that.
Is any of these columns such that it is included in the parameters often (for all rows) and could limit the data a lot? You could use union to do something like this:
SELECT ...
FROM table1 t
INNER JOIN #parameters p ON p.col1 = t.col1 ...
union
SELECT ...
FROM table1 t
INNER JOIN #parameters p ... where p.col1 is NULL
If you're lucky something like that might work.
The other option that comes to my mind is somehow iterate the rows in the #parameters table, which is probably what you meant by string concatenating. Either by building a dynamic SQL with either or clauses or union or have a temp. table maybe with ignore dup key index and create & run dynamic insert clauses one by one for all the rows in parameters -table.
Related
I ran into a problem today that I couldn't quite understand, so I was hoping for some outside knowledge. I was trying to find the number of items in a table where their id isn't referenced in another. I ran two different queries and seem to have conflicting results.
select count(*)
from TableA
where ID not in (select aID from TableB)
returns 0
select count(*)
from TableA a
left join TableB b on b.aID = a.ID
where b.aID is null
returns a few thousand.
All IDs in both TableA and TableB are unique. An ID from TableA never shows up in the aID column from TableB more than once. To me, it seems like I am querying the same thing but receiving different results. Where am I going wrong?
Do not use not in with a subquery. If any value in the subquery is NULL, then all rows are filtered out. These are the rules of how NULL is defined in SQL. The LEFT JOIN is correct.
The reason is that NULL means an unknown value. Almost any comparison with NULL returns NULL, which is treated as false. So, the only possibilities with NOT IN with NULL are that an element matches what you are looking for -- and the expression returns false -- or an element is NULL -- and the expression returns NULL which is treated as false.
I usually advise replacing the NOT IN with NOT EXISTS:
select count(*)
from TableA a
where not exists (select 1 from TableB b where b.aID = a.ID);
The LEFT JOIN performs correctly and usually has good performance.
We should always use the EXISTS operator if the columns involved are nullables. Also,Exist is faster than In clause.
Using IN/Not IN operator might produce an inferior plan and also can lead to misleading results if a null value is inserted in the table just like in you case.
I have a query in SQL Server 2014 that takes a lot of time to get the results when I execute it.
When I remove the TOPor the ORDER BYintructions, it executes faster, but if I write both of them, it takes a lot of time.
SELECT TOP (10) A.ColumnValue AS ValueA
FROM TableA AS A
INNER JOIN TableB AS B
ON A.ID = B.ID
WHERE A.DateValue > '1982-05-02'
ORDER BY ValueA
How could I make it faster?
You say
When I remove the TOP or the ORDER BY ... it executes faster
Which would indicate that SQL Server has no problem generating the entire result set in the desired order. It just goes pear shaped with the limiting of TOP 10. This is a common issue with rowgoals. When SQL Server knows you just need the first few results it can choose a different plan attempting to optimise for this case that can backfire.
More recent versions include the hint DISABLE_OPTIMIZER_ROWGOAL to disable this on a per query basis. On older versions you can use QUERYTRACEON 4138 as below.
SELECT TOP (10) A.ColumnValue AS ValueA
FROM TableA AS A
INNER JOIN TableB AS B
ON A.ID = B.ID
WHERE A.DateValue > '1982-05-02'
ORDER BY ValueA
OPTION (QUERYTRACEON 4138)
You can use this to verify the cause but may find permissions to run QUERYTRACEON are a problem.
In that eventuality you can hide the TOP value in a variable as below
DECLARE #Top INT = 10
SELECT TOP (#Top) A.ColumnValue AS ValueA
FROM TableA AS A
INNER JOIN TableB AS B
ON A.ID = B.ID
WHERE A.DateValue > '1982-05-02'
ORDER BY ValueA
option (optimize for (#Top = 1000000))
create the index based on ID column of both tables
CREATE INDEX index_nameA
ON TableA (ID, DateValue)
;
CREATE INDEX index_nameB
ON TableB (ID)
it will create better plan in times of query execution
The best way would be to use the indexes to improve performance.
Here, in this case, the index can be put on (date_value).
For uses of indexes refer to this URL:using indexes
This is pretty hopeless, unless most of your data has an earlier date. If the date is special, you could create a computed persisted column to speed up the query in general. However, I doubt that is the case.
I can envision a better execution plan for the query phrased this way:
SELECT TOP (10) A.ColumnValue AS ValueA
FROM TableA A
WHERE EXISTS (SELECT 1 FROM TableB b WHERE A.ID = B.ID) AND
A.DateValue > '1982-05-02'
ORDER BY ValueA;
with an indexes on TableA(ValueA, DateValue, Id, ColumnValue) and TableB(id). That execution plan would scan the index from the beginning and then do the test on DateValue and Id and return ColumnValue for the corresponding matching rows.
However, I don't think SQL Server would generate this plan (although it is worth a try), and I don't know how to force it if it doesn't.
I have a view which returns me some nulls values for the columns b.emissor and B.indexador. In case of null, I need to find this values first in table TB_CAD_RF and, if still nulls, I need to query TB_CAD_RF_2.
I try the logic below but its not working. Also tried to think something with case statements but cant figure it out.
Anyone could help me please?
select A.NM_ATIVO, B.EMISSOR, B.INDEXADOR from VW_POSICAO as A
LEFT JOIN
TB_CAD_RF B on A.NM_ATIVO = B.CODIGO
where a.NM_EMISSOR is null
as C LEFT JOIN (
select C.EMISSOR, C.INDEXADOR from TB_CAD_RF_2 as D ON B.NM_ATIVO = C.CODIGO where C.EMISSOR is null)
This pattern:
SELECT
COALESCE( first.choice, second.choice, third.choice) as a
FROM
first
LEFT JOIN second on first.id = second.id
LEFT JOIN third on second.id = third.id
Coalesce returns the first non null passed into it, scanning from left to right
Here is one way. Always join both and coalesce the fields in the order of desired results.
select A.NM_ATIVO,
EMISSOR = COALESCE(A.EMISSOR,B.EMISSOR),
INDEXADOR=COALESCE(A.INDEXADOR,B.INDEXADOR)
from VW_POSICAO A
LEFT JOIN TB_CAD_RF B on A.NM_ATIVO = B.CODIGO
LEFT JOIN TB_CAD_RF_2 D ON A.NM_ATIVO =D.CODIGO
Case when ISNULL((select * from table1) , (select * from table2) ) else select...
ISNULL is like an IIF statement, but if query field returns a null value, it tries the alternate query or you can set an alternate value.
Syntax may be a bit off, haven't written this query in a while, but it should put you on the right path. Google sql ISNULL
We have a large-ish query here that has several params, and for each one, the query only differs by one portion of the where clause, like so:
CASE WHEN #IncludeNames = 1 AND #NameFilter IS NULL THEN
(SELECT blah FROM blahBlah
INNER JOIN ...
INNER JOIN ...
INNER JOIN ...
WHERE blahBlah.Id = x.Id)
WHEN #IncludeNames = 1 AND #NameFilter IS NOT NULL THEN
(SELECT blah FROM blahBlah
INNER JOIN ...
INNER JOIN ...
INNER JOIN ...
WHERE blahBlah.Id = x.Id
AND table2.Id = #NameFilter
It goes on like that for several instances, differing only by one condition on the where clause.
Keep in mind this is in the middle of a larger select.
Is there a good way of cleaning this up, without placing it all into one large concatenated sql string and running exec on it, or using something absurd like multiple stored procs per block, as shown here: http://www.developerfusion.com/article/7305/dynamic-search-conditions-in-tsql/7/
Server is SQL Server 2008 R2. TIA!
Try setting up your query with an option of all or specific values for each clause e.g.
SELECT x.*
FROM x
WHERE (x.id = #NameFilter
OR #NameFilter is null)
AND (x.typeId = #typeFilter
OR -1 = #typeFilter)
AND (x.date = #date
OR #date is null)
AND (x.someStingType = #someStringType
Or '' = #someStringType)
This should allow you to concatenate your clauses into a single select statement. Each parameter may apply a filter or have no effect (if set to the default such as null, empty string or -1).
Okay, I know there are a few posts that discuss this, but my problem cannot be solved by a conditional where statement on a join (the common solution).
I have three join statements, and depending on the query parameters, I may need to run any combination of the three. My Join statement is quite expensive, so I want to only do the join when the query needs it, and I'm not prepared to write a 7 combination IF..ELSE.. statement to fulfill those combinations.
Here is what I've used for solutions thus far, but all of these have been less than ideal:
LEFT JOIN joinedTable jt
ON jt.someCol = someCol
WHERE jt.someCol = conditions
OR #neededJoin is null
(This is just too expensive, because I'm performing the join even when I don't need it, just not evaluating the join)
OUTER APPLY
(SELECT TOP(1) * FROM joinedTable jt
WHERE jt.someCol = someCol
AND #neededjoin is null)
(this is even more expensive than always left joining)
SELECT #sql = #sql + ' INNER JOIN joinedTable jt ' +
' ON jt.someCol = someCol ' +
' WHERE (conditions...) '
(this one is IDEAL, and how it is written now, but I'm trying to convert it away from dynamic SQL).
Any thoughts or help would be great!
EDIT:
If I take the dynamic SQL approach, I'm trying to figure out what would be most efficient with regards to structuring my query. Given that I have three optional conditions, and I need the results from all of them my current query does something like this:
IF condition one
SELECT from db
INNER JOIN condition one
UNION
IF condition two
SELECT from db
INNER JOIN condition two
UNION
IF condition three
SELECT from db
INNER JOIN condition three
My non-dynamic query does this task by performing left joins:
SELECT from db
LEFT JOIN condition one
LEFT JOIN condition two
LEFT JOIN condition three
WHERE condition one is true
OR condition two is true
OR condition three is true
Which makes more sense to do? since all of the code from the "SELECT from db" statement is the same? It appears that the union condition is more efficient, but my query is VERY long because of it....
Thanks!
LEFT JOIN
joinedTable jt ON jt.someCol = someCol AND jt.someCol = conditions AND #neededjoin ...
...
OR
LEFT JOIN
(
SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND #neededjoin ...
) jt ON jt.someCol = someCol
...
OR
;WITH jtCTE AS
(SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND #neededjoin ...)
SELECT
...
LEFT JOIN
jtCTE ON jtCTE.someCol = someCol
...
To be honest, there is no such construct as a conditional JOIN unless you use literals.
If it's in the SQL statement it's evaluated... so don't have it in the SQL statement by using dynamic SQL or IF ELSE
the dynamic sql solution is usually the best for these situations, but if you really need to get away from that a series of if statments in a stroed porc will do the job. It's a pain and you have to write much more code but it will be faster than trying to make joins conditional in the statement itself.
I would go for a simple and straightforward approach like this:
DECLARE #ret TABLE(...) ;
IF <coondition one> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
IF <coondition two> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
IF <coondition three> BEGIN ;
INSERT INTO #ret() SELECT ...
END ;
SELECT DISTINCT ... FROM #ret ;
Edit: I am suggesting a table variable, not a temporary table, so that the procedure will not recompile every time it runs. Generally speaking, three simpler inserts have a better chance of getting better execution plans than one big huge monster query combining all three.
However, we can not guess-timate performance. we must benchmark to determine it. Yet simpler code chunks are better for readability and maintainability.
Try this:
LEFT JOIN joinedTable jt
ON jt.someCol = someCol
AND jt.someCol = conditions
AND #neededJoin = 1 -- or whatever indicates join is needed
I think you'll find it is good performance and does what you need.
Update
If this doesn't give the performance I claimed, then perhaps that's because the last time I did this using joins to a table. The value I needed could come from one of 3 tables, based on 2 columns, so I built a 'join-map' table like so:
Col1 Col2 TableCode
1 2 A
1 4 A
1 3 B
1 5 B
2 2 C
2 5 C
1 11 C
Then,
SELECT
V.*,
LookedUpValue =
CASE M.TableCode
WHEN 'A' THEN A.Value
WHEN 'B' THEN B.Value
WHEN 'C' THEN C.Value
END
FROM
ValueMaster V
INNER JOIN JoinMap M ON V.Col1 = M.oOl1 AND V.Col2 = M.Col2
LEFT JOIN TableA A ON M.TableCode = 'A'
LEFT JOIN TableB B ON M.TableCode = 'B'
LEFT JOIN TableC C ON M.TableCode = 'C'
This gave me a huge performance improvement querying these tables (most of them dozens or hundreds of million-row tables).
This is why I'm asking if you actually get improved performance. Of course it's going to throw a join into the execution plan and assign it some cost, but overall it's going to do a lot less work than some plan that just indiscriminately joins all 3 tables and then Coalesce()s to find the right value.
If you find that compared to dynamic SQL it's only 5% more expensive to do the joins this way, but with the indiscriminate joins is 100% more expensive, it might be worth it to you to do this because of the correctness, clarity, and simplicity over dynamic SQL, all of which are probably more valuable than a small improvement (depending on what you're doing, of course).
Whether the cost scales with the number of rows is also another factor to consider. If even with a huge amount of data you only save 200ms of CPU on a query that isn't run dozens of times a second, it's a no-brainer to use it.
The reason I keep hammering on the fact that I think it's going to perform well is that even with a hash match, it wouldn't have any rows to probe with, or it wouldn't have any rows to create a hash of. The hash operation is going to stop a lot earlier compared to using the WHERE clause OR-style query of your initial post.
The dynamic SQL solution is best in most respects; you are trying to run different queries with different numbers of joins without rewriting the query to do different numbers of joins - and that doesn't work very well in terms of performance.
When I was doing this sort of stuff an æon or so ago (say the early 90s), the language I used was I4GL and the queries were built using its CONSTRUCT statement. This was used to generate part of a WHERE clause, so (based on the user input), the filter criteria it generated might look like:
a.column1 BETWEEN 1 AND 50 AND
b.column2 = 'ABCD' AND
c.column3 > 10
In those days, we didn't have the modern JOIN notations; I'm going to have to improvise a bit as we go. Typically there is a core table (or a set of core tables) that are always part of the query; there are also some tables that are optionally part of the query. In the example above, I assume that 'c' is the alias for the main table. The way the code worked would be:
Note that table 'a' was referenced in the query:
Add 'FullTableName AS a' to the FROM clause
Add a join condition 'AND a.join1 = c.join1' to the WHERE clause
Note that table 'b' was referenced...
Add bits to the FROM clause and WHERE clause.
Assemble the SELECT statement from the select-list (usually fixed), the FROM clause and the WHERE clause (occasionally with decorations such as GROUP BY, HAVING or ORDER BY too).
The same basic technique should be applied here - but the details are slightly different.
First of all, you don't have the string to analyze; you know from other circumstances which tables you need to add to your query. So, you still need to design things so that they can be assembled, but...
The SELECT clause with its select-list is probably fixed. It will identify the tables that must be present in the query because values are pulled from those tables.
The FROM clause will probably consist of a series of joins.
One part will be the core query:
FROM CoreTable1 AS C1
JOIN CoreTable2 AS C2
ON C1.JoinColumn = C2.JoinColumn
JOIN CoreTable3 AS M
ON M.PrimaryKey = C1.ForeignKey
Other tables can be added as necessary:
JOIN AuxilliaryTable1 AS A
ON M.ForeignKey1 = A.PrimaryKey
Or you can specify a full query:
JOIN (SELECT RelevantColumn1, RelevantColumn2
FROM AuxilliaryTable1
WHERE Column1 BETWEEN 1 AND 50) AS A
In the first case, you have to remember to add the WHERE criterion to the main WHERE clause, and trust the DBMS Optimizer to move the condition into the JOIN table as shown. A good optimizer will do that automatically; a poor one might not. Use query plans to help you determine how able your DBMS is.
Add the WHERE clause for any inter-table criteria not covered in the joining operations, and any filter criteria based on the core tables. Note that I'm thinking primarily in terms of extra criteria (AND operations) rather than alternative criteria (OR operations), but you can deal with OR too as long as you are careful to parenthesize the expressions sufficiently.
Occasionally, you may have to add a couple of JOIN conditions to connect a table to the core of the query - that is not dreadfully unusual.
Add any GROUP BY, HAVING or ORDER BY clauses (or limits, or any other decorations).
Note that you need a good understanding of the database schema and the join conditions. Basically, this is coding in your programming language the way you have to think about constructing the query. As long as you understand this and your schema, there aren't any insuperable problems.
Good luck...
Just because no one else mentioned this, here's something that you could use (not dynamic). If the syntax looks weird, it's because I tested it in Oracle.
Basically, you turn your joined tables into sub-selects that have a where clause that returns nothing if your condition does not match. If the condition does match, then the sub-select returns data for that table. The Case statement lets you pick which column is returned in the overall select.
with m as (select 1 Num, 'One' Txt from dual union select 2, 'Two' from dual union select 3, 'Three' from dual),
t1 as (select 1 Num from dual union select 11 from dual),
t2 as (select 2 Num from dual union select 22 from dual),
t3 as (select 3 Num from dual union select 33 from dual)
SELECT m.*
,CASE 1
WHEN 1 THEN
t1.Num
WHEN 2 THEN
t2.Num
WHEN 3 THEN
t3.Num
END SelectedNum
FROM m
LEFT JOIN (SELECT * FROM t1 WHERE 1 = 1) t1 ON m.Num = t1.Num
LEFT JOIN (SELECT * FROM t2 WHERE 1 = 2) t2 ON m.Num = t2.Num
LEFT JOIN (SELECT * FROM t3 WHERE 1 = 3) t3 ON m.Num = t3.Num