I was using Toad's SQL optimizer and it came up with the following addition to my join statements..
instead of say
emp.emplid = dept.emplid
it suggested
emp.emplid = dept.emplid + 0
What does the '+ 0' do? I've searched for the past hour online and I cannot find anything. I know the (+) meaning, but I've never seen anything like this.
The + 0 does what it looks like. It adds 0 to dept.emplid. But from a performance point of view this does make a difference. By turning that into an expression Oracle is not able to use any index on dept.emplid.
So if Oracle is choosing an index on dept.emplid but you would rather it used a different index/plan, then adding + 0 is a way to influence the optimiser, as there is not longer a match on that particular column. Any expression would have done the trick.
The other way to go about this would be to get into optimiser hints. Although this can be a bit of a pain for big queries.
What the ' + 0' does is to indicate to the optimizer that it should use another index. In other words, I'm pretty sure that one of these two fields (mp.emplid = dept.emplid) has, in addition to the foreign key, also another index specific field. As well, the + 0 cancels the index that takes by default optimizer (the foreign key) and tells it to choose another index.
Related
I am working on some legacy SQL code, and am looking for some help trying to fix an odd query I have. It looks like this:
SELECT *
FROM TA LEFT OUTER JOIN TB ON TA.a1 = TB.b3
WHERE TA.a1 LIKE 'usersearch'
OR TB.b1 + ':' + TB.b2 LIKE 'usersearch'
usersearch is a user supplied regex that is unknown at the time of the query's creation.
The usersearch variable is the same in both LIKE sections. This is an insane bit of code, and takes forever to run, but I'm having a hard time figuring out how I can optimize it.
The terrible part is that it constructs a string to perform a regex on for every single line. However, I'm not really sure how to avoid this.
If anyone has any ideas I'd love to hear them!
Your where condition: TB.b1 + ':' + TB.b2 LIKE 'usersearch' is non-sargable and will always result in a table scan. You might want to consider making a computed column that is persisted so that you can index it. That should improve performance. However, your question indicated that 'usersearch' is a RegEx. LIKE does not work with RegEx. It does work with the % and _ wildcards. I'm hoping that this was just a terminology mistake. If it really is a RegEx, then you'll need a very different solution. Regardless, the sargable issue still needs to be resolved.
I have a SQL query from my teammate that I think has a lot of predicates and this is the cause of a bad performance. It looks like this:
WHERE
(#IdSenales IS NULL OR senalesIds.id = comp.IdSenal)
AND
(#IdAnunciantes IS NULL OR anunciantesIds.id = comp.IdAnunciante)
AND
(#IdProgramas IS NULL OR programasIds.id = emision.IdProgramaVariante)
AND
(#IdTipoPublicidades IS NULL OR publicidadesIds.id = orden.IdTipoPublicidad)
AND
(#Canje = 0 OR (#canje = 1 AND comp.IdTipoCondicionCobro != 12))
AND
(emision.Fecha BETWEEN #FechaDesdeContrato AND ISNULL(#FechaHastaContrato, emision.fecha))
AND
(comp.FechaEmision BETWEEN #FechaDesde AND #FechaHasta)
AND
(#IdSectorImputacion = 0 OR #IdSectorImputacion = simp.IdSectorImputacion)
I'm from Argentina so it's commented in spanish (sorry).
My questions is, is it a way to improve the performance by changing the comparisons in the WHERE clause with a function created by me that returns the same?
Thank you very much,
David
This is a bit long for a comment.
The only way you can really significantly improve performance is to use indexes. That would require a bunch of indexes for all the different combinations -- but perhaps a few are more common and would suffice for most use-cases.
SQL Server is pretty bad about optimizing complex where clauses. What you could do is use dynamic SQL. Construct the where clause by only putting in the conditions that are necessary.
Then, be sure you have indexes for the common situations. And when the query is compiled, it should run faster.
As #GordonLinoff mentions, your best option is to look into the indexes used. He's also a much better coder than me; so take his advice over mine if you're able to. However, if dynamic SQL is not allowed at your company for some reason, or the rewrite is not an option, read on...
You may not have as big a problem as you think here; have you seen a performance problem, or are you just looking at the code & thinking "there's a lot of stuff going on with a lot of brackets, so that's bad"?
i.e. take this line: (#IdSenales IS NULL OR senalesIds.id = comp.IdSenal).
This compares a parameter with null, so will only need to be evaulated once, rather than once per line; which isn't too bad. Thereafter it's no different to either not having this statement, or having only senalesIds.id = comp.IdSenal. The same is true for most of these lines.
That said, SQL will generate a query plan the first time it runs this code, and would thereafter use this for all subsequent queries, regardless of which parameters were used; so the plan may be entirely inappropriate for the new set of options. A good fix here is to add OPTION (RECOMPILE). You'll find a good explanation of this here: https://blogs.msdn.microsoft.com/robinlester/2016/08/10/improving-query-performance-with-option-recompile-constant-folding-and-avoiding-parameter-sniffing-issues/
Beyond that, this line may be a problem, since it involves applying a function, the output of which will be different for each row; so it won't be easy to optimise:
(emision.Fecha BETWEEN #FechaDesdeContrato AND ISNULL(#FechaHastaContrato, emision.fecha))
Change this to:
(emision.Fecha >= #FechaDesdeContrato AND (emision.Fecha <= #FechaHastaContrato ))
...and you should be OK.
Full Code:
WHERE
(#IdSenales IS NULL OR senalesIds.id = comp.IdSenal)
AND
(#IdAnunciantes IS NULL OR anunciantesIds.id = comp.IdAnunciante)
AND
(#IdProgramas IS NULL OR programasIds.id = emision.IdProgramaVariante)
AND
(#IdTipoPublicidades IS NULL OR publicidadesIds.id = orden.IdTipoPublicidad)
AND
(#Canje = 0 OR (#canje = 1 AND comp.IdTipoCondicionCobro != 12))
AND
(emision.Fecha >= #FechaDesdeContrato AND (#FechaHastaContrato is null or emision.Fecha <= #FechaHastaContrato ))
AND
(comp.FechaEmision BETWEEN #FechaDesde AND #FechaHasta)
AND
(#IdSectorImputacion = 0 OR #IdSectorImputacion = simp.IdSectorImputacion)
OPTION (RECOMPILE)
Thanks for your suggestion #JohnLBevan !!
I have checked the predicates because I read an article from Gail Shaw that said:
"Another common cause of SQL Server choosing to scan is a query that contains multiple predicates, when no single index exists that has all the columns necessary to evaluate the WHERE clause. For example, an index on (FirstName, Surname), would fully support any query with a WHERE clause of FirstName = #FirstName AND Surname = #Surname. However, if there was one index on FirstName only, and a second separate index on Surname, then SQL Server can use neither one efficiently. It may choose to seek one index, do lookups for the other columns and then do a secondary filter; it may choose to seek both indexes and perform an index intersection, or it may give up and scan the table."
https://www.red-gate.com/simple-talk/sql/database-administration/gail-shaws-sql-server-howlers/
When I read this I remembered I have seen multiple predicates in my query. I want to mention that this query is one of the most expensive queries that returns my query to check the cost of all the queries against the database.
Well, I should check if there is enough indexes and/or create new ones.
David Linares.
I have something what I think is a srange issue. Normally, I think that a Query should last less time if I put a restriction (so that less rows are processed). But I don't know why, this is not the case. Maybe I'm putting something wrong, but I don't get error; the query just seems to run 'till infinity'.
This is the query
SELECT
A.ENTITYID AS ORG_ID,
A.ID_VALUE AS LEI,
A.MODIFIED_BY,
A.AUDITDATETIME AS LAST_DATE_MOD
FROM (
SELECT
CASE WHEN IFE.NEWVALUE IS NOT NULL
then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_TYPE')
ELSE NULL
end as ID_TYPE,
case when IFE.NEWVALUE is not null
then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_VALUE')
ELSE NULL
END AS ID_VALUE,
(select u.username from admin.users u where u.userid = ife.analystuserid) as Modified_by,
ife.*
FROM ife.audittrail ife
WHERE
--IFE.AUDITDATETIME >= '01-JUN-2016' AND
attributeid = 499
AND ROWNUM <= 10000
AND (CASE WHEN IFE.NEWVALUE IS NOT NULL then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_TYPE') ELSE NULL end) = '38') A
--WHERE A.AUDITDATETIME >= '01-JUN-2016';
So I tried with the two clauses commented (one per each time of course).
And with both of them happens the same; the query runs for so long time that I have to abort it.
Do you know why this could be happening? How could I do, maybe in a different way, to put the restriction?
The values of the field AUDITDATETIME are '06-MAY-2017', for example. In that format.
Thank you very much in advance
I think you may misunderstand how databases work.
Firstly, read up on EXPLAIN - you can find out exactly what is taking time, and why, by learning to read the EXPLAIN statement.
Secondly - the performance characteristics of any given query are determined by a whole range of things, but usually the biggest effort goes not in processing rows, but finding them.
Without an index, the database has to look at every row in the database and compare it to your where clause. It's the equivalent of searching in the phone book for a phone number, rather than a name (the phone book is indexed on "last name").
You can improve this by creating indexes - for instance, on columns "AUDITDATETIME" and "attributeid".
Unlike the phone book, a database server can support multiple indexes - and if those indexes match your where clause, your query will be (much) faster.
Finally, using an XML string extraction for a comparison in the where clause is likely to be extremely slow unless you've got an index on that XML data.
This is the equivalent of searching the phone book and translating the street address from one language to another - not only do you have to inspect every address, you have to execute an expensive translation step for each item.
You probably need index(es)... We can all make guesses on what indexes you already have, and need to add, but most dbms have built in query optimizers.
If you are using MS SQL Server you can execute query with query plan, that will tell you what index you need to add to optimize this particular query. It will even let you copy /paste the command to create it.
I have a query which slows down immensely when i add an addition where part
which essentially is just a like lookup on a varchar(500) field
where...
and (xxxxx.yyyy like '% blahblah %')
I've been racking my head but pretty much the query slows down terribly when I add this in.
I'm wondering if anyone has suggestions in terms of changing field type, index setup, or index hints or something that might assist.
any help appreciated.
sql 2000 enterprise.
HERE IS SOME ADDITIONAL INFO:
oops. as some background unfortunately I do need (in the case of the like statement) to have the % at the front.
There is business logic behind that which I can't avoid.
I have since created a full text catalogue on the field which is causing me problems
and converted the search to use the contains syntax.
Unfortunately although this has increased performance on occasion it appears to be slow (slower) for new word searchs.
So if i have apple.. apple appears to be faster the subsequent times but not for new searches of orange (for example).
So i don't think i can go with that (unless you can suggest some tinkering to make that more consistent).
Additional info:
the table contains only around 60k records
the field i'm trying to filter is a varchar(500)
sql 2000 on windows server 2003
The query i'm using is definitely convoluted
Sorry i've had to replace proprietary stuff.. but should give you and indication of the query:
SELECT TOP 99 AAAAAAAA.Item_ID, AAAAAAAA.CatID, AAAAAAAA.PID, AAAAAAAA.Description,
AAAAAAAA.Retail, AAAAAAAA.Pack, AAAAAAAA.CatID, AAAAAAAA.Code, BBBBBBBB.blahblah_PictureFile AS PictureFile,
AAAAAAAA.CL1, AAAAAAAA.CL1, AAAAAAAA.CL2, AAAAAAAA.CL3
FROM CCCCCCC INNER JOIN DDDDDDDD ON CCCCCCC.CID = DDDDDDDD.CID
INNER JOIN AAAAAAAA ON DDDDDDDD.CID = AAAAAAAA.CatID LEFT OUTER JOIN BBBBBBBB
ON AAAAAAAA.PID = BBBBBBBB.Product_ID INNER JOIN EEEEEEE ON AAAAAAAA.BID = EEEEEEE.ID
WHERE
(CCCCCCC.TID = 654321) AND (DDDDDDDD.In_Use = 1) AND (AAAAAAAA.Unused = 0)
AND (DDDDDDDD.Expiry > '10-11-2010 09:23:38') AND
(
(AAAAAAAA.Code = 'red pen') OR
(
(my_search_description LIKE '% red %') AND (my_search_description LIKE '% nose %')
AND (DDDDDDDD.CID IN (63,153,165,305,32,33))
)
)
AND (DDDDDDDD.CID IN (20,32,33,63,64,65,153,165,232,277,294,297,300,304,305,313,348,443,445,446,447,454,472,479,481,486,489,498))
ORDER BY AAAAAAAA.f_search_priority DESC, DDDDDDDD.Priority DESC, AAAAAAAA.Description ASC
You can see throwing in the my_search_description filter also includes a dddd.cid filter (business logic).
This is the part which is slowing things down (from a 1.5-2 second load of my pages down to a 6-8 second load (ow ow ow))
It might be my lack of understanding of how to have the full text search catelogue working.
Am very impressed by the answers so if anyone has any tips I'd be most greatful.
If you haven't already, enable full text indexing.
Unfortunately, using the LIKE clause on a query really does slow things down. Full Text Indexing is really the only way that I know of to speed things up (at the cost of storage space, of course).
Here's a link to an overview of Full-Text Search in SQL Server which will show you how to configure things and change your queries to take advantage of the full-text indexes.
More details would certainly help, but...
Full-text indexing can certainly be useful (depending on the more details about the table and your query). Full Text indexing requires a good bit of extra work both in setup and querying, but it's the only way to try to do the sort of search you seek efficiently.
The problem with LIKE that starts with a Wildcard is that SQL server has to do a complete table scan to find matching records - not only does it have to scan every row, but it has to read the contents of the char-based field you are querying.
With or without a full-text index, one thing can possibly help: Can you narrow the range of rows being searched, so at least SQL doesn't need to scan the whole table, but just some subset of it?
The '% blahblah %' is a problem for improving performance. Putting the wildcard at the beginning tells SQL Server that the string can begin with any legal character, so it must scan the entire index. Your best bet if you must have this filter is to focus on your other filters for improvement.
Using LIKE with a wildcard at the beginning of the search pattern forces the server to scan every row. It's unable to use any indexes. Indexes work from left to right, and since there is no constant on the left, no index is used.
From your WHERE clause, it looks like you're trying to find rows where a specific word exists in an entry. If you're searching for a whole word, then full text indexing may be a solution for you.
Full text indexing creates an index entry for each word that's contained in the specified column. You can then quickly find rows that contain a specific word.
As other posters have correctly pointed out, the use of the wildcard character % within the LIKE expression is resulting in a query plan being produced that uses a SCAN operation. A scan operation touches every row in the table or index, dependant on the type of scan operation being performed.
So the question really then becomes, do you actually need to search for the given text string anywhere within the column in question?
If not, great, problem solved but if it is essential to your business logic then you have two routes of optimization.
Really go to town on increasing the overall selectivity of your query by focusing your optimization efforts on the remaining search arguments.
Implement a Full Text Indexing Solution.
I don't think this is a valid answer, but I'd like to throw it out there for some more experienced posters comments...are these equivlent?
where (xxxxx.yyyy like '% blahblah %')
vs
where patindex(%blahbalh%, xxxx.yyyy) > 0
As far as I know, that's equivlent from a database logic standpoint as it's forcing the same scan. Guess it couldn't hurt to try?
Ok I need to build a query based on some user input to filter the results.
The query basically goes something like this:
SELECT * FROM my_table ORDER BY ordering_fld;
There are four text boxes in which users can choose to filter the data, meaning I'd have to dynamically build a "WHERE" clause into it for the first filter used and then "AND" clauses for each subsequent filter entered.
Because I'm too lazy to do this, I've just made every filter an "AND" clause and put a "WHERE 1" clause in the query by default.
So now I have:
SELECT * FROM my_table WHERE 1 {AND filters} ORDER BY ordering_fld;
So my question is, have I done something that will adversely affect the performance of my query or buggered anything else up in any way I should be remotely worried about?
MySQL will optimize your 1 away.
I just ran this query on my test database:
EXPLAIN EXTENDED
SELECT *
FROM t_source
WHERE 1 AND id < 100
and it gave me the following description:
select `test`.`t_source`.`id` AS `id`,`test`.`t_source`.`value` AS `value`,`test`.`t_source`.`val` AS `val`,`test`.`t_source`.`nid` AS `nid` from `test`.`t_source` where (`test`.`t_source`.`id` < 100)
As you can see, no 1 at all.
The documentation on WHERE clause optimization in MySQL mentions this:
Constant folding:
(a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5
Constant condition removal (needed because of constant folding):
(B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
-> B=5 OR B=6
Note 5 = 5 and 5 = 6 parts in the example above.
You can EXPLAIN your query:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
and see if it does anything differently, which I doubt. I would use 1=1, just so it is more clear.
You might want to add LIMIT 1000 or something, when no parameters are used and the table gets large, will you really want to return everything?
WHERE 1 is a constant, deterministic expression which will be "optimized out" by any decent DB engine.
If there is a good way in your chosen language to avoid building SQL yourself, use that instead. I like Python and Django, and the Django ORM makes it very easy to filter results based on user input.
If you are committed to building the SQL yourself, be sure to sanitize user inputs against SQL injection, and try to encapsulate SQL building in a separate module from your filter logic.
Also, query performance should not be your concern until it becomes a problem, which it probably won't until you have thousands or millions of rows. And when it does come time to optimize, adding a few indexes on columns used for WHERE and JOIN goes a long way.
TO improve performance, use column indexes on fields listen in "WHERE"
Standard SQL Injection Disclaimers here...
One thing you could do, to avoid SQL injection since you know it's only four parameters is use a stored procedure where you pass values for the fields or NULL. I am not sure of mySQL stored proc syntax, but the query would boil down to
SELECT *
FROM my_table
WHERE Field1 = ISNULL(#Field1, Field1)
AND Field2 = ISNULL(#Field2, Field2)
...
ORDRE BY ordering_fld
We've been doing something similiar not too long ago and there're a few things that we observed:
Setting up the indexes on the columns we were (possibly) filtering, improved performance
The WHERE 1 part can be left out completely if the filters're not used. (not sure if it applies to your case) Doesn't make a difference, but 'feels' right.
SQL injection shouldn't be forgotten
Also, if you 'only' have 4 filters, you could build up a stored procedure and pass in null values and check for them. (just like n8wrl suggested in the meantime)
That will work - some considerations:
About dynamically built SQL in general, some databases (Oracle at least) will cache execution plans for queries, so if you end up running the same query many times it won't have to completely start over from scratch. If you use dynamically built SQL, you are creating a different query each time so to the database it will look like 100 different queries instead of 100 runs of the same query.
You'd probably just need to measure the performance to find out if it works well enough for you.
Do you need all the columns? Explicitly specifying them is probably better than using * anyways because:
You can visually see what columns are being returned
If you add or remove columns to the table later, they won't change your interface
Not bad, i didn't know this snippet to get rid of the 'is it the first filter 3' question.
Tho you should be ashamed of your code ( ^^ ), it doesn't do anything to performance as any DB Engine will optimize it.
The only reason I've used WHERE 1 = 1 is for dynamic SQL; it's a hack to make appending WHERE clauses easier by using AND .... It is not something I would include in my SQL otherwise - it does nothing to affect the query overall because it always evaluates as being true and does not hit the table(s) involved so there aren't any index lookups or table scans based on it.
I can't speak to how MySQL handles optional criteria, but I know that using the following:
WHERE (#param IS NULL OR t.column = #param)
...is the typical way of handling optional parameters. COALESCE and ISNULL are not ideal because the query is still utilizing indexes (or worse, table scans) based on a sentinel value. The example I provided won't hit the table unless a value has been provided.
That said, my experience with Oracle (9i, 10g) has shown that it doesn't handle [ WHERE (#param IS NULL OR t.column = #param) ] very well. I saw a huge performance gain by converting the SQL to be dynamic, and used CONTEXT variables to determine what to add. My impression of SQL Server 2005 is that these are handled better.
I have usually done something like this:
for(int i=0; i<numConditions; i++) {
sql += (i == 0 ? "WHERE " : "AND ");
sql += dbFieldNames[i] + " = " + safeVariableValues[i];
}
Makes the generated query a little cleaner.
One alternative i sometimes use is to build the where clause an an array and then join them together:
my #wherefields;
foreach $c (#conditionfields) {
push #wherefields, "$c = ?",
}
my $sql = "select * from table";
if(#wherefields) { $sql.=" WHERE " . join (" AND ", #wherefields); }
The above is written in perl, but most languages have some kind of join funciton.