SQL Server - Query Short-Circuiting? - sql

Do T-SQL queries in SQL Server support short-circuiting?
For instance, I have a situation where I have two database and I'm comparing data between the two tables to match and copy some info across. In one table, the "ID" field will always have leading zeros (such as "000000001234"), and in the other table, the ID field may or may not have leading zeros (might be "000000001234" or "1234").
So my query to match the two is something like:
select * from table1 where table1.ID LIKE '%1234'
To speed things up, I'm thinking of adding an OR before the like that just says:
table1.ID = table2.ID
to handle the case where both ID's have the padded zeros and are equal.
Will doing so speed up the query by matching items on the "=" and not evaluating the LIKE for every single row (will it short circuit and skip the LIKE)?

SQL Server does NOT short circuit where conditions.
it can't since it's a cost based system: How SQL Server short-circuits WHERE condition evaluation .

You could add a computed column to the table. Then, index the computed column and use that column in the join.
Ex:
Alter Table Table1 Add PaddedId As Right('000000000000' + Id, 12)
Create Index idx_WhateverIndexNameYouWant On Table1(PaddedId)
Then your query would be...
select * from table1 where table1.PaddedID ='000000001234'
This will use the index you just created to quickly return the row.

You want to make sure that at least one of the tables is using its actual data type for the IDs and that it can use an index seek if possible. It depends on the selectivity of your query and the rate of matches though to determine which one should be converted to the other. If you know that you have to scan through the entire first table, then you can't use a seek anyway and you should convert that ID to the data type of the other table.
To make sure that you can use indexes, also avoid LIKE. As an example, it's much better to have:
WHERE
T1.ID = CAST(T2.ID AS VARCHAR) OR
T1.ID = RIGHT('0000000000' + CAST(T2.ID AS VARCHAR), 10)
than:
WHERE
T1.ID LIKE '%' + CAST(T2.ID AS VARCHAR)
As Steven A. Lowe mentioned, the second query might be inaccurate as well.
If you are going to be using all of the rows from T1 though (in other words a LEFT OUTER JOIN to T2) then you might be better off with:
WHERE
CAST(T1.ID AS INT) = T2.ID
Do some query plans with each method if you're not sure and see what works best.
The absolute best route to go though is as others have suggested and change the data type of the tables to match if that's at all possible. Even if you can't do it before this project is due, put it on your "to do" list for the near future.

How about,
table1WithZero.ID = REPLICATE('0', 12-len(table2.ID))+table2.ID
In this case, it should able to use the index on the table1

Just in case it's useful, as the linked page in Mladen Prajdic's anwer explains, CASE clauses are short-circuit evaluated.

If the ID is purely numeric (as your example), I would reccomend (if possible) changing that field to a number type instead. If the database is allready in use it might be hard to change the type though.

fix the database to be consistent
select * from table1 where table1.ID LIKE '%1234'
will match '1234', '01234', '00000000001234', but also '999991234'. Using LIKE pretty much guarantees an index scan (assuming table1.ID is indexed!). Cleaning up the data will improve performance significantly.
if cleaning up the data is not possible, write a user-defined function (UDF) to strip off leading zeros, e.g.
select * from table1 where dbo.udfStripLeadingZeros(table1.ID) = '1234'
this may not improve performance (since the function will have to run for each row) but it will eliminate false matches and make the intent of the query more obvious
EDIT: Tom H's suggestion to CAST to an integer would be best, if that is possible.

Related

SQL Like Operator very slow when using from Another table in AWS Athena

I have SQL query in athena that is very slow when using like operator value from another table
Select * from table1 t1
Where t1.value like (select
concat('%',t2.value,'%') as val
from table2 t2 where t2.id =1
limit 1)
The above query is very slow
When i am using something like below query its working super fast
Select * from table1 t1
Where t1.value like
'%somevalue%'
In my scenario the like value is not fixed it can be changed by the time that's why i need to use this value from another table.
Please suggest fastest way
"Slow" is a relative term, but a query that joins two tables will always be slower than a query that doesn't. A query that compares against a pattern that needs to be looked up in another table at query time will always be slower than a query that uses a static pattern.
Does that mean that the second query is slow? Perhaps, but it you have to base that on what you're actually asking the query engine to do.
Let's dissect what your query is doing:
The outer query looks for all columns of all rows of the first table where one of the columns contains a particular string.
That string is dynamically looked up by scanning every row in the second table looking for a row with a particular value for the id column.
In other words, the first query scans only the first table but the second scans both tables. That's always going to be slower, because it's doing a lot more work. How much more work? That depends on the sizes of the tables. You aren't specifying the running times of any of the queries or the sizes of the tables, so it's hard to know.
You don't provide enough context in your question to answer any more precise than this. We can only respond with generalities like: if it's slow then don't use LIKE, that's a slow operation. Don't do a correlated subquery that reads the whole second table, that's slow.
i have found other method to the same and it's super faster in Athena
Select * from table1 t1
Where POSITION ( (select concat('%',t2.value,'%') as val from table2 t2 where t2.id =1 limit 1) in t1.value )>0

difference b/w where column='' and column like '' in sql [duplicate]

This question skirts around what I'm wondering, but the answers don't exactly address it.
It would seem that in general '=' is faster than 'like' when using wildcards. This appears to be the conventional wisdom. However, lets suppose I have a column containing a limited number of different fixed, hardcoded, varchar identifiers, and I want to select all rows matching one of them:
select * from table where value like 'abc%'
and
select * from table where value = 'abcdefghijklmn'
'Like' should only need to test the first three chars to find a match, whereas '=' must compare the entire string. In this case it would seem to me that 'like' would have an advantage, all other things being equal.
This is intended as a general, academic question, and so should not matter which DB, but it arose using SQL Server 2005.
See https://web.archive.org/web/20150209022016/http://myitforum.com/cs2/blogs/jnelson/archive/2007/11/16/108354.aspx
Quote from there:
the rules for index usage with LIKE
are loosely like this:
If your filter criteria uses equals =
and the field is indexed, then most
likely it will use an INDEX/CLUSTERED
INDEX SEEK
If your filter criteria uses LIKE,
with no wildcards (like if you had a
parameter in a web report that COULD
have a % but you instead use the full
string), it is about as likely as #1
to use the index. The increased cost
is almost nothing.
If your filter criteria uses LIKE, but
with a wildcard at the beginning (as
in Name0 LIKE '%UTER') it's much less
likely to use the index, but it still
may at least perform an INDEX SCAN on
a full or partial range of the index.
HOWEVER, if your filter criteria uses
LIKE, but starts with a STRING FIRST
and has wildcards somewhere AFTER that
(as in Name0 LIKE 'COMP%ER'), then SQL
may just use an INDEX SEEK to quickly
find rows that have the same first
starting characters, and then look
through those rows for an exact match.
(Also keep in mind, the SQL engine
still might not use an index the way
you're expecting, depending on what
else is going on in your query and
what tables you're joining to. The
SQL engine reserves the right to
rewrite your query a little to get the
data in a way that it thinks is most
efficient and that may include an
INDEX SCAN instead of an INDEX SEEK)
It's a measureable difference.
Run the following:
Create Table #TempTester (id int, col1 varchar(20), value varchar(20))
go
INSERT INTO #TempTester (id, col1, value)
VALUES
(1, 'this is #1', 'abcdefghij')
GO
INSERT INTO #TempTester (id, col1, value)
VALUES
(2, 'this is #2', 'foob'),
(3, 'this is #3', 'abdefghic'),
(4, 'this is #4', 'other'),
(5, 'this is #5', 'zyx'),
(6, 'this is #6', 'zyx'),
(7, 'this is #7', 'zyx'),
(8, 'this is #8', 'klm'),
(9, 'this is #9', 'klm'),
(10, 'this is #10', 'zyx')
GO 10000
CREATE CLUSTERED INDEX ixId ON #TempTester(id)CREATE CLUSTERED INDEX ixId ON #TempTester(id)
CREATE NONCLUSTERED INDEX ixTesting ON #TempTester(value)
Then:
SET SHOWPLAN_XML ON
Then:
SELECT * FROM #TempTester WHERE value LIKE 'abc%'
SELECT * FROM #TempTester WHERE value = 'abcdefghij'
The resulting execution plan shows you that the cost of the first operation, the LIKE comparison, is about 10 times more expensive than the = comparison.
If you can use an = comparison, please do so.
You should also keep in mind that when using like, some sql flavors will ignore indexes, and that will kill performance. This is especially true if you don't use the "starts with" pattern like your example.
You should really look at the execution plan for the query and see what it's doing, guess as little as possible.
This being said, the "starts with" pattern can and is optimized in sql server. It will use the table index. EF 4.0 switched to like for StartsWith for this very reason.
If value is unindexed, both result in a table-scan. The performance difference in this scenario will be negligible.
If value is indexed, as Daniel points out in his comment, the = will result in an index lookup which is O(log N) performance. The LIKE will (most likely - depending on how selective it is) result in a partial scan of the index >= 'abc' and < 'abd' which will require more effort than the =.
Note that I'm talking SQL Server here - not all DBMSs will be nice with LIKE.
You are asking the wrong question. In databases is not the operator performance that matters, is always the SARGability of the expression, and the coverability of the overall query. Performance of the operator itself is largely irrelevant.
So, how do LIKE and = compare in terms of SARGability? LIKE, when used with an expression that does not start with a constant (eg. when used LIKE '%something') is by definition non-SARGabale. But does that make = or LIKE 'something%' SARGable? No. As with any question about SQL performance the answer does not lie with the query of the text, but with the schema deployed. These expression may be SARGable if an index exists to satisfy them.
So, truth be told, there are small differences between = and LIKE. But asking whether one operator or other operator is 'faster' in SQL is like asking 'What goes faster, a red car or a blue car?'. You should eb asking questions about the engine size and vechicle weight, not about the color... To approach questions about optimizing relational tables, the place to look is your indexes and your expressions in the WHERE clause (and other clauses, but it usually starts with the WHERE).
A personal example using mysql 5.5: I had an inner join between 2 tables, one of 3 million rows and one of 10 thousand rows.
When using a like on an index as below(no wildcards), it took about 30 seconds:
where login like '12345678'
using 'explain' I get:
When using an '=' on the same query, it took about 0.1 seconds:
where login ='12345678'
Using 'explain' I get:
As you can see, the like completely cancelled the index seek, so query took 300 times more time.
= is much faster than LIKE, even without wildcard. I tested on MySQL with 11GB of data and more than 100 million of records, the f_time column is indexed.
SELECT * FROM XXXXX WHERE f_time = '1621442261'
#took 0.00sec and return 330 records
SELECT * FROM XXXXX WHERE f_time LIKE '1621442261'
#took 44.71sec and return 330 records
Besides all the answers, there this to consider:
'like' is case insensitive, so every character needs to be compared twice, whereas the '=' only compares once for identical characters.
This issue arises with or without indexes.
Maybe you are looking about Full Text Search.
In contrast to full-text search, the LIKE Transact-SQL predicate works on
character patterns only. Also, you cannot use the LIKE predicate to
query formatted binary data. Furthermore, a LIKE query against a large
amount of unstructured text data is much slower than an equivalent
full-text query against the same data. A LIKE query against millions
of rows of text data can take minutes to return; whereas a full-text
query can take only seconds or less against the same data, depending
on the number of rows that are returned.
I was working with a huge database that has more then 400M records and I put LIKE in search query. Here is the final results.
There were three tables tb1, tb2 and tb3. When I use EQUAL for in all tables QUERY the response time was 193ms. and when I put LIKE in one of he table the response time was 19.22 sec. and for all table LIKE response time was 112 Sec

Index on VARCHAR column

I have a table of 32,589 rows, and one of the columns is called 'Location' and is a Varchar(40) column type. The column holds a location, which is actually a suburb, all uppercase text.
A function that uses this table does a:
IF EXISTS(SELECT * FROM MyTable WHERE Location = 'A Suburb')
...
Would it be beneficial to add an index to this column, for efficiency? This is more a read-only table, so not much edits or inserts except for maintanance.
Without an index SQL Server will have to perform a table scan to find the first instance of the location you're looking for. You might get lucky and have the value be in one of the first few rows, but it could be at row 32,000, which would be a waste of time. Adding an index only takes a few second and you'll probably see a big performance gain.
I concur with #Brian Shamblen answer.
Also, try using TOP 1 in the inner select
IF EXISTS(SELECT TOP 1 * FROM MyTable WHERE Location = 'A Suburb')
You don't have to select all the records matching your criteria for EXISTS, one is enough.
An opportunistic approach to performance tuning is usually a bad idea.
To answer the specific question - if your function is using location in a where clause, and the table has more than a few hundred rows, and the values in the location column are not all identical, creating an index will speed up your function.
Whether you notice any difference is hard to say - there may be much bigger performance problems lurking in the database, and you might be fixing the wrong problem.

comparing a column with itself in WHERE clause of oracle SELECT

I have a multi-table SELECT query which compares column values with itself like below:
SELECT * FROM table1 t1,table2 t2
WHERE t1.col1=t2.col1 --Different tables,So OK.
AND t1.col1=t1.col1 --Same tables??
AND t2.col1=t2.col1 --Same tables??
This seems redundant to me.
My query is, Will removing them have any impact on logic/performance?
Thanks in advance.
This seems redundant, its only effect is removing lines that have NULL values in these columns. Make sure the columns are NOT NULL before removing those clauses.
If the columns are nullable you can safely replace these lines with (easier to read, easier to maintain):
AND t1.col1 IS NOT NULL
AND t2.col1 IS NOT NULL
Update following Jeffrey's comment
You're absolutely right, I don't know how I didn't see it myself: the join condition t1.col1=t2.col1 implies that only the rows with the join columns not null will be considered. The clauses tx.col1=tx.col1 are therefore completely redundant and can be safely removed.
Don't remove them until you understand the impact. If, as others are pointing out, they have no effect on the query and are probably optimised out, there's no harm in leaving them there but there may be harm in removing them.
Don't try to fix something that's working until your damn sure you're not breaking something else.
The reason I mention this is because we inherited a legacy reporting application that had exactly this construct, along the lines of:
where id = id
And, being a sensible fellow, I ditched it, only to discover that the database engine wasn't the only thing using the query.
It first went through a pre-processor which extracted every column that existed in a where clause and ensured they were indexed. Basically an auto-tuning database.
Well, imagine our surprise on the next iteration when the database slowed to a fraction of its former speed when users were doing ad-hoc queries on the id field :-)
Turns out this was a kludge put in by the previous support team to ensure common ad-hoc queries were using indexed columns as well, even though none of our standard queries did so.
So, I'm not saying you can't do it, just suggesting that it might be a good idea to understand why it was put in first.
Yes, the conditions are obviously redundant since they're identical!
SELECT * FROM table1 t1,table2 t2
WHERE t1.col1=t2.col1
But you do need at least one of them. Otherwise, you'd have a cartesian join on your hands: each row from table1 will be joined to every row in table2. If table1 had 100 rows, and table2 had 1,000 rows, the resultant query would return 100,000 results.
SELECT * FROM table1 t1,table2 t2 --warning!

Sql Un-Wizardry: Compare values from one list in another

I have a comparison I'd like to make more efficient in SQL.
The input field (fldInputField) is a comma separated list of "1,3,4,5"
The database has a field (fldRoleList) which contains "1,2,3,4,5,6,7,8"
So, for the first occurrence of fldInputField within fldRoleList, tell us which value it was.
Is there a way to achieve the following in MySQL or a Stored Procedure?
pseudo-code
SELECT *
FROM aTable t1
WHERE fldInputField in t1.fldRoleList
/pseudo-code
I'm guessing there might be some functions that are best suited for this type of comparison? I couldn't find anything in the search, if someone could direct me I'll delete the question... Thanks!
UPDATE: This isn't the ideal (or good) way to do things. It's inherited code and we are simply trying to put in a quick fix while we look at building in the logic to deal with this via normalized rows.. Luckily this isn't heavily used code.
I agree with #Ken White's answer that comma-delimited lists have no place in a normalized database design.
The solution would be simpler and perform better if you stored the fldRoleList as multiple rows in a dependent table:
SELECT t1.*, r1.fldRole
FROM aTable t1 JOIN aTableRoles r1 USING (aTable_id)
WHERE FIND_IN_SET(r1.fldRole, fldInputField);
(see the MySQL function FIND_IN_SET())
But that outputs multiple rows if multiple roles match the comma-separated input string. If you need to restrict the result to one row per aTable entry, with the first matching role:
SELECT t1.*, MIN(r1.fldRole) AS First_fldRole
FROM aTable t1 JOIN aTableRoles r1 USING (aTable_id)
WHERE FIND_IN_SET(r1.fldRole, fldInputField);
GROUP BY t1.aTable_id;
You have a terrible schema design, you know. Comma-delimited lists have no business in a DB.
That being said... You're looking for LIKE.
SELECT * FROM aTable t1 WHERE t.fldRoleList LIKE fldInputField + '%'
If the content might not always match at the beginning, add another percent sign before fldInputField.
SELECT * FROM aTable t1 WHERE t.fldRoleList LIKE '%' + fldInputField + '%'