Placing index columns on the left of a mysql WHERE statement? - sql

I was curious since i read it in a doc. Does writing
select * from CONTACTS where id = ‘098’ and name like ‘Tom%’;
speed up the query as oppose to
select * from CONTACTS where name like ‘Tom%’ and id = ‘098’;
The first has an indexed column on the left side. Does it actually speed things up or is it superstition?
Using php and mysql

Check the query plans with explain. They should be exactly the same.

This is purely superstition. I see no reason that either query would differ in speed. If it was an OR query rather than an AND query however, then I could see that having it on the left may spped things up.

interesting question, i tried this once. query plans are the same (using EXPLAIN).
but considering short-circuit-evaluation i was wondering too why there is no difference (or does mysql fully evaluate boolean statements?)

You may be mis-remembering or mis-reading something else, regarding which side the wildcards are on a string literal in a Like predicate. Putting the wildcard on the right (as in yr example), allows the query engine to use any indices that might exist on the table column you are searching (in this case - name). But if you put the wildcard on the left,
select * from CONTACTS where name like ‘%Tom’ and id = ‘098’;
then the engine cannot use any existing index and must do a complete table scan.

Related

Is it possible to use LIKE with a set of strings instead of a single element?

I have a list of proper names (in a table), and another table with a free-text field. I want to check whether that field contains any of the proper names. If it were just one, I could do
WHERE free_text LIKE "%proper_name%"
but how do you do that for an entire list? Is there a better string function I can use with a list?
Thanks
No, like does not have that capability.
Many databases support regular expressions, which enable to you do what you want. For instance, in Postgres this is phrased as:
where free_text ~ 'name1|name2|name3'
Many databases also have full-text search capabilities that speed such searches.
Both capabilities are highly specific to the database you are using.
Well, you can use LIKE in a standard JOIN, but the query most likely will be slow, because it will search each proper name in each free_text.
For example, if you have 10 proper names in a list and a certain free_text value contains the first name, the server will continue processing the rest of 9 names.
Here is the query:
SELECT -- DISTINCT
free_text_table.*
FROM
free_text_table
INNER JOIN proper_names_table ON free_text_table.free_text LIKE proper_names_table.proper_name
;
If a certain free_text value contains several proper names, that row will be returned several times, so you may need to add DISTINCT to the query. It depends on what you need.
It is possible to use LATERAL JOIN to avoid Cartesian product (where each row in free_text_table is compared to each rows in proper_names_table). The end result may be faster, than the simple variant. It depends on your data distribution.
Here is SQL Server syntax.
SELECT
free_text_table.*
FROM
free_text_table
CROSS APPLY
(
SELECT TOP(1)
proper_names_table.proper_name
FROM proper_names_table
WHERE free_text_table.free_text LIKE proper_names_table.proper_name
-- ORDER BY proper_names_table.frequency
) AS A
;
Here we don't need DISTINCT, there will be at most one row in the result for each row from free_text_table (one or zero). Optimiser should be smart enough to stop reading and processing proper_names_table as soon as the first match is found due to TOP(1) clause.
If you also can somehow order your proper names and put those that are most likely to be found first, then the query is more likely to be faster than a simple JOIN. (Add a suitable ORDER BY clause in subquery).

For an Oracle NUMBER datatype, LIKE operator vs BETWEEN..AND operator

Assume mytable is an Oracle table and it has a field called id. The datatype of id is NUMBER(8). Compare the following queries:
select * from mytable where id like '715%'
and
select * from mytable where id between 71500000 and 71599999
I would think the second is more efficient since I think "number comparison" would require fewer number of assembly language instructions than "string comparison". I need a confirmation or correction. Please confirm/correct and throw any further comment related to either operator.
UPDATE: I forgot to mention 1 important piece of info. id in this case must be an 8-digit number.
If you only want values between 71500000 and 71599999 then yes the second one is much more efficient. The first one would also return values between 7150-7159, 71500-71599 etc. and so forth. You would either need to sift through unecessary results or write another couple lines of code to filter the rest of them out. The second option is definitely more efficient for what you seem to want to do.
It seems like the execution plan on the second query is more efficient.
The first query is doing a full table scan of the id's, whereas the second query is not.
My Test Data:
Execution Plan of first query:
Execution Plan of second query:
I don't like the idea of using LIKE with a numeric column.
Also, it may not give the results you are looking for.
If you have a value of 715000000, it will show up in the query result, even though it is larger than 71599999.
Also, I do not like between on principle.
If a thing is between two other things, it should not include those two other things. But this is just a personal annoyance.
I prefer to use >= and <= This avoids confusion when I read the query. In addition, sometimes I have to change the query to something like >= a and < c. If I started by using the between operator, I would have to rewrite it when I don't want to be inclusive.
Harv
In addition to the other points raised, using LIKE in the manner you suggest would cause Oracle to not use any indexes on the ID column due to the implicit conversion of the data from number to character, resulting in a full table scan when using LIKE versus and index range scan when using BETWEEN. Assuming, of course, you have an index on ID. Even if you don't, however, Oracle will have to do the type conversion on each value it scans in the LIKE case, which it won't have to do in the other.
You can use math function, otherwise you have to use to_char function to use like, but it will cause performance problems.
select * from mytable where floor(id /100000) = 715
or
select * from mytable where floor(id /100000) = TO_NUMBER('715') // this is parametric

Querying for software using SQL query in SCCM

I am looking for specific pieces of software across our network by querying the SCCM Database. My problem is that, for various reasons, sometimes I can search by the program's name and other times I need to search for a specific EXE.
When I run the query below, does it take 13 seconds to run if the where clause contains an AND, but it will run for days with no results if the AND is replaced with OR. I'm assuming it is doing this because I am not properly joining the tables. How can I fix this?
select vrs.Name0
FROM v_r_system as vrs
join v_GS_INSTALLED_SOFTWARE as VIS on VIS.resourceid = vrs.resourceid
join v_GS_SoftwareFile as sf on SF.resourceid = vrs.resourceid
where
VIS.productname0 LIKE '%office%' AND SF.Filename LIKE 'Office2007%'
GROUP BY vrs.Name0
Thanks!
Your LIKE clause contains a wildcard match at the start of a string:
LIKE '%office%'
This prevents SQL Server from using an index on this column, hence the slow running query. Ideally you should change your query so your LIKE clause doesn't use a wildcard at the start.
In the case where the WHERE clause contains an AND its querying based on the Filename clause first (it is able to use an index here and so this is relatively quick) and then filtering this reduced rowset based on your productname0 clause. When you use an OR however it isn't restricted to just returning rows that match your Filename clause, and so it must search through the entire table checking to see if each productname0 field matches.
Here's a good Microsoft article http://msdn.microsoft.com/en-us/library/ms172984.aspx on improving indexes. See the section on Indexes with filter clauses (reiterates the previous answer.)
Have you tried something along these lines instead of a like query?
... where product name in ('Microsoft Office 2000','Microsoft Office xyz','Whateverelse')

Can scalar functions be applied before filtering when executing a SQL Statement?

I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.

SQL Query with Multiple Like Clauses

I would like to create a SQL query, which does the following..
- I have a few parameters, for instance like "John","Smith"
- Now I have a articles tables with a content column, which I would like to be searched
- Now, How can I find out the rows in the articles table, which has the any one of those values("John","Smith")
I cannot use content LIKE "%john% or content LIKE "%smith%", as there could be any number of incoming parameters.
Can you guys please tell me a way to do this....
Thanks
Have you considered full-text search?
While HLGEM's solution is ideal, if full-text search is not possible, you could construct a regular expression that you could test only once per row. How exactly you do that depends on the DBMS you're using.
This depends a lot on the DBMS you're using. Generally - if you don't want to use full-text search - you can almost always use regular expressions to achive this goal. For MySQL see this manual page - they even have example answering your question.
If full text search is overkill, consider putting the parameters in a table and use LIKE in theJOIN` condition e.g.
SELECT * -- column list in production code
FROM Entities AS E1
INNER JOIN Params AS P1
ON E1.entity_name LIKE '%' + P1.param + '%';