What can I do to speed this slow query up? - sql

We have a massive, multi-table Sybase query we call the get_safari_exploration_data query, that fetches all sorts of info related to explorers going on a safari, and all the animals they encounter.
This query is slow, and I've been asked to speed it up. The first thing that jumps out at me is that there doesn't seem to be a pressing need for the nested SELECT statement inside the outer FROM clause. In that nested SELECT, there also seems to be several fields that aren't necessary (vegetable, broomhilda, devoured, etc.). I'm also skeptical about the use of the joins ("*=" instead of "INNER JOIN...ON").
SELECT
dog_id,
cat_metadata,
rhino_id,
buffalo_id,
animal_metadata,
has_4_Legs,
is_mammal,
is_carnivore,
num_teeth,
does_hibernate,
last_spotted_by,
last_spotted_date,
purchased_by,
purchased_date,
allegator_id,
cow_id,
cow_name,
cow_alias,
can_be_ridden
FROM
(
SELECT
mp.dog_id as dog_id,
ts.cat_metadata + '-yoyo' as cat_metadata,
mp.rhino_id as rhino_id,
mp.buffalo_id as buffalo_id,
mp.animal_metadata as animal_metadata,
isnull(mp.has_4_Legs, 0) as has_4_Legs,
isnull(mp.is_mammal, 0) as is_mammal,
isnull(mp.is_carnivore, 0) as is_carnivore,
isnull(mp.num_teeth, 0) as num_teeth,
isnull(mp.does_hibernate, 0) as does_hibernate,
jungle_info.explorer as last_spotted_by,
exploring_journal.spotted_datetime as last_spotted_date,
jungle_info.explorer as purchased_by,
early_exploreration_journal.spotted_datetime as purchased_date,
alleg_id as allegator_id,
ho.cow_id,
ho.cow_name,
ho.cow_alias,
isnull(mp.is_ridable,0) as can_be_ridden,
ts.cat_metadata as broomhilda,
ts.squirrel as vegetable,
convert (varchar(15), mp.rhino_id) as tms_id,
0 as devoured
FROM
mammal_pickles mp,
very_tricky_animals vt,
possibly_venomous pv,
possibly_carniv_and_tall pct,
tall_and_skinny ts,
tall_and_skinny_type ptt,
exploration_history last_exploration_history,
master_exploration_journal exploring_journal,
adventurer jungle_info,
exploration_history first_exploration_history,
master_exploration_journal early_exploreration_journal,
adventurer jungle_info,
hunting_orders ho
WHERE
mp.exploring_strategy_id = 47
and mp.cow_id = ho.cow_id
and ho.cow_id IN (20, 30, 50)
and mp.rhino_id = vt.rhino_id
and vt.version_id = pv.version_id
and pv.possibly_carniv_and_tall_id = pct.possibly_carniv_and_tall_id
and vt.tall_and_skinny_id = ts.tall_and_skinny_id
and ts.tall_and_skinny_type_id = ptt.tall_and_skinny_type_id
and mp.alleg_id *= last_exploration_history.exploration_history_id
and last_exploration_history.master_exploration_journal_id *= exploring_journal.master_exploration_journal_id
and exploring_journal.person_id *= jungle_info.person_id
and mp.first_exploration_history_id *= first_exploration_history.exploration_history_id
and first_exploration_history.master_exploration_journal_id *= early_exploreration_journal.master_exploration_journal_id
and early_exploreration_journal.person_id *= jungle_info.person_id
) TEMP_TBL
So I ask:
Am I correct about the nested SELECT?
Am I correct about the unnecessary fields inside the nested SELECT?
Am I correct about the structure/syntax/usage of the joins?
Is there anything else about the structure/nature of this query that jumps out at you as being terribly inefficient/slow?
Unfortunately, unless there is irrefutable, matter-of-fact proof that decomposing this large query into smaller queries is beneficial in the long run, management will simply not approve refactoring it out into multiple, smaller queries, as this will take considerable time to refactor and test. Thanks in advance for any help/insight here!

Am I correct about the nested SELECT?
You would be in some cases, but a competent planner would collapse it and ignore it here.
Am I correct about the unnecessary fields inside the nested SELECT?
Yes, especially considering that some of them don't show up at all in the final list of fields.
Am I correct about the structure/syntax/usage of the joins?
Insofar as I'm aware, *= and =* are merely syntactic sugar for a left and right join, but I might be wrong in stating that. If not, then they merely force the way joins occur, but they may be necessary for your query to work at all.
Is there anything else about the structure/nature of this query that jumps out at you as being terribly inefficient/slow?
Yes.
Firstly, you've some calculations that aren't needed, e.g. convert (varchar(15), mp.rhino_id) as tms_id. Perhaps a join or two as well, but I admittedly haven't looked at the gory details of the query.
Next, you might have a problem with the db design itself, e.g. a cow_id field. (Seriously? :-) )
Last, there occasionally is something to be said about doing multiple queries instead of a single one, to avoid doing tons of joins.
In a blog, for instance, it's usually a good idea to grab the top 10 posts, and then to use a separate query to fetch their tags (where id in (id1, id2, etc.)). In your case, the selective part seems to be around here:
mp.exploring_strategy_id = 47
and mp.cow_id = ho.cow_id
and ho.cow_id IN (20, 30, 50)
so maybe isolate that part in one query, and then build an in () clause using the resulting IDs, and fetch the cosmetic bits and pieces in one or more separate queries.
Oh, and as point out by Gordon, check your indexes as well. But then, note that the indexes may end up of little use without splitting the query into more manageable parts.

I would suggest the following approach.
First, rewrite the query using ANSI standard joins with the on clause. This will make the conditions and filtering much easier to understand. Also, this is "safe" -- you should get exactly the same results as the current version. Be careful, because the *= is an outer join, so not everything is an inner join.
I doubt this step will improve performance.
Then, check each of the reference tables and be sure that the join keys have indexes on them in the reference table. If keys are missing, then add them in.
Then, check whether the left outer joins are necessary. There are filters on tables that are left outer join'ed in . . . these filters convert the outer joins to inner joins. Probably not a performance hit, but you never know.
Then, consider indexing the fields used for filtering (in the where clause).
And, learn how to use explain capabilities. Any nested loop joins (without an index) as likely culprits for performance problems.
As for the nested select, I think Sybase is smart enough to "do the right thing". Even if it wrote out and re-read the result set, that probably would have a marginal effect on the query compared to getting the joins right.
If this is your real data structure, by the way, it sounds like a very interesting domain. It is not often that I see a field called allegator_id in a table.

I will answer some of your questions.
You think that the fields (vegetable, broomhilda, devoured) in nested SELECT could be causing performance issue. Not necessarily. The two unused fields (vegetable, broomhilda) in nested SELECT are from the ts table but the cat_metadata field which is being used is also from ts table. So unless cat_metadata is being covered by index used on ts table, there wont be any performance impact. Because, to extract cat_metadata field the data page from table will need to be fetched anyway. The extraction of other two fields will take little CPU, that's it. So don't worry about that. The 'devoured' field is also a constant. That will not affect the performance either.
Dennis pointed out about usage of convert function convert(varchar(15), mp.rhino_id). I disagree that that will effect performance as it will consume CPU only.
Lastly I would say, try using the set table count to 13, as there are 13 tables in there. Sybase uses four tables at a time for optimisation.

Related

SQL Server querying : improve performance by reducing WHERE clauses

I have a SQL query from my teammate that I think has a lot of predicates and this is the cause of a bad performance. It looks like this:
WHERE
(#IdSenales IS NULL OR senalesIds.id = comp.IdSenal)
AND
(#IdAnunciantes IS NULL OR anunciantesIds.id = comp.IdAnunciante)
AND
(#IdProgramas IS NULL OR programasIds.id = emision.IdProgramaVariante)
AND
(#IdTipoPublicidades IS NULL OR publicidadesIds.id = orden.IdTipoPublicidad)
AND
(#Canje = 0 OR (#canje = 1 AND comp.IdTipoCondicionCobro != 12))
AND
(emision.Fecha BETWEEN #FechaDesdeContrato AND ISNULL(#FechaHastaContrato, emision.fecha))
AND
(comp.FechaEmision BETWEEN #FechaDesde AND #FechaHasta)
AND
(#IdSectorImputacion = 0 OR #IdSectorImputacion = simp.IdSectorImputacion)
I'm from Argentina so it's commented in spanish (sorry).
My questions is, is it a way to improve the performance by changing the comparisons in the WHERE clause with a function created by me that returns the same?
Thank you very much,
David
This is a bit long for a comment.
The only way you can really significantly improve performance is to use indexes. That would require a bunch of indexes for all the different combinations -- but perhaps a few are more common and would suffice for most use-cases.
SQL Server is pretty bad about optimizing complex where clauses. What you could do is use dynamic SQL. Construct the where clause by only putting in the conditions that are necessary.
Then, be sure you have indexes for the common situations. And when the query is compiled, it should run faster.
As #GordonLinoff mentions, your best option is to look into the indexes used. He's also a much better coder than me; so take his advice over mine if you're able to. However, if dynamic SQL is not allowed at your company for some reason, or the rewrite is not an option, read on...
You may not have as big a problem as you think here; have you seen a performance problem, or are you just looking at the code & thinking "there's a lot of stuff going on with a lot of brackets, so that's bad"?
i.e. take this line: (#IdSenales IS NULL OR senalesIds.id = comp.IdSenal).
This compares a parameter with null, so will only need to be evaulated once, rather than once per line; which isn't too bad. Thereafter it's no different to either not having this statement, or having only senalesIds.id = comp.IdSenal. The same is true for most of these lines.
That said, SQL will generate a query plan the first time it runs this code, and would thereafter use this for all subsequent queries, regardless of which parameters were used; so the plan may be entirely inappropriate for the new set of options. A good fix here is to add OPTION (RECOMPILE). You'll find a good explanation of this here: https://blogs.msdn.microsoft.com/robinlester/2016/08/10/improving-query-performance-with-option-recompile-constant-folding-and-avoiding-parameter-sniffing-issues/
Beyond that, this line may be a problem, since it involves applying a function, the output of which will be different for each row; so it won't be easy to optimise:
(emision.Fecha BETWEEN #FechaDesdeContrato AND ISNULL(#FechaHastaContrato, emision.fecha))
Change this to:
(emision.Fecha >= #FechaDesdeContrato AND (emision.Fecha <= #FechaHastaContrato ))
...and you should be OK.
Full Code:
WHERE
(#IdSenales IS NULL OR senalesIds.id = comp.IdSenal)
AND
(#IdAnunciantes IS NULL OR anunciantesIds.id = comp.IdAnunciante)
AND
(#IdProgramas IS NULL OR programasIds.id = emision.IdProgramaVariante)
AND
(#IdTipoPublicidades IS NULL OR publicidadesIds.id = orden.IdTipoPublicidad)
AND
(#Canje = 0 OR (#canje = 1 AND comp.IdTipoCondicionCobro != 12))
AND
(emision.Fecha >= #FechaDesdeContrato AND (#FechaHastaContrato is null or emision.Fecha <= #FechaHastaContrato ))
AND
(comp.FechaEmision BETWEEN #FechaDesde AND #FechaHasta)
AND
(#IdSectorImputacion = 0 OR #IdSectorImputacion = simp.IdSectorImputacion)
OPTION (RECOMPILE)
Thanks for your suggestion #JohnLBevan !!
I have checked the predicates because I read an article from Gail Shaw that said:
"Another common cause of SQL Server choosing to scan is a query that contains multiple predicates, when no single index exists that has all the columns necessary to evaluate the WHERE clause. For example, an index on (FirstName, Surname), would fully support any query with a WHERE clause of FirstName = #FirstName AND Surname = #Surname. However, if there was one index on FirstName only, and a second separate index on Surname, then SQL Server can use neither one efficiently. It may choose to seek one index, do lookups for the other columns and then do a secondary filter; it may choose to seek both indexes and perform an index intersection, or it may give up and scan the table."
https://www.red-gate.com/simple-talk/sql/database-administration/gail-shaws-sql-server-howlers/
When I read this I remembered I have seen multiple predicates in my query. I want to mention that this query is one of the most expensive queries that returns my query to check the cost of all the queries against the database.
Well, I should check if there is enough indexes and/or create new ones.
David Linares.

Performance of SQL functions vs. code functions

We're currently investigating the load against our SQL server and looking at ways to alleviate it. During my post-secondary education, I was always told that, from a performance standpoint, it was cheaper to make SQL Server do the work. But is this true?
Here's an example:
SELECT ord_no FROM oelinhst_sql
This returns 783119 records in 14 seconds. The field is a char(8), but all of our order numbers are six-digits long so each has two blank characters leading. We typically trim this field, so I ran the following test:
SELECT LTRIM(ord_no) FROM oelinhst_sql
This returned the 783119 records in 13 seconds. I also tried one more test:
SELECT LTRIM(RTRIM(ord_no)) FROM oelinhst_sql
There is nothing to trim on the right, but I was trying to see if there was any overhead in the mere act of calling the function, but it still returned in 13 seconds.
My manager was talking about moving things like string trimming out of the SQL and into the source code, but the test results suggest otherwise. My manager also says he heard somewhere that using SQL functions meant that indexes would not be used. Is there any truth to this either?
Only optimize code that you have proven to be the slowest part of your system. Your data so far indicates that SQL string manipulation functions are not effecting performance at all. take this data to your manager.
If you use a function or type cast in the WHERE clause it can often prevent the SQL server from using indexes. This does not apply to transforming returned columns with functions.
It's typically user defined functions (UDFs) that get a bad rap with regards to SQL performance and might be the source of the advice you're getting.
The reason for this is you can build some pretty hairy functions that cause massive overhead with exponential effect.
As you've found with rtrim and ltrim this isn't a blanket reason to stop using all functions on the sql side.
It somewhat depends on what all is encompassed by: "things like string trimming", but, for string trimming at least, I'd definitely let the database do that (there will be less network traffic as well). As for the indexes, they will still be used if you're where clause is just using the column itself (as opposed to a function of the column). Use of the indexes won't be affected whatsoever by using functions on the actual columns you're retrieving (just on how you're selecting the rows).
You may want to have a look at this for performance improvement suggestions: http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/
As I said in my comment, reduce the data read per query and you will get a speed increase.
You said:
our order numbers are six-digits long
so each has two blank characters
leading
Makes me think you are storing numbers in a string, if so why are you not using a numeric data type? The smallest numeric type which will take 6 digits is an INT (I'm assuming SQL Server) and that already saves you 4 bytes per order number, over the number of rows you mention that's quite a lot less data to read off disk and send over the network.
Fully optimise your database before looking to deal with the data outside of it; it's what a database server is designed to do, serve data.
As you found it often pays to measure but I what I think your manager may have been referring to is somthing like this.
This is typically much faster
SELECT SomeFields FROM oelinhst_sql
WHERE
datetimeField > '1/1/2011'
and
datetimeField < '2/1/2011'
than this
SELECT SomeFields FROM oelinhst_sql
WHERE
Month(datetimeField) = 1
and
year(datetimeField) = 2011
even though the rows that are returned are the same

where like over varchar(500)

I have a query which slows down immensely when i add an addition where part
which essentially is just a like lookup on a varchar(500) field
where...
and (xxxxx.yyyy like '% blahblah %')
I've been racking my head but pretty much the query slows down terribly when I add this in.
I'm wondering if anyone has suggestions in terms of changing field type, index setup, or index hints or something that might assist.
any help appreciated.
sql 2000 enterprise.
HERE IS SOME ADDITIONAL INFO:
oops. as some background unfortunately I do need (in the case of the like statement) to have the % at the front.
There is business logic behind that which I can't avoid.
I have since created a full text catalogue on the field which is causing me problems
and converted the search to use the contains syntax.
Unfortunately although this has increased performance on occasion it appears to be slow (slower) for new word searchs.
So if i have apple.. apple appears to be faster the subsequent times but not for new searches of orange (for example).
So i don't think i can go with that (unless you can suggest some tinkering to make that more consistent).
Additional info:
the table contains only around 60k records
the field i'm trying to filter is a varchar(500)
sql 2000 on windows server 2003
The query i'm using is definitely convoluted
Sorry i've had to replace proprietary stuff.. but should give you and indication of the query:
SELECT TOP 99 AAAAAAAA.Item_ID, AAAAAAAA.CatID, AAAAAAAA.PID, AAAAAAAA.Description,
AAAAAAAA.Retail, AAAAAAAA.Pack, AAAAAAAA.CatID, AAAAAAAA.Code, BBBBBBBB.blahblah_PictureFile AS PictureFile,
AAAAAAAA.CL1, AAAAAAAA.CL1, AAAAAAAA.CL2, AAAAAAAA.CL3
FROM CCCCCCC INNER JOIN DDDDDDDD ON CCCCCCC.CID = DDDDDDDD.CID
INNER JOIN AAAAAAAA ON DDDDDDDD.CID = AAAAAAAA.CatID LEFT OUTER JOIN BBBBBBBB
ON AAAAAAAA.PID = BBBBBBBB.Product_ID INNER JOIN EEEEEEE ON AAAAAAAA.BID = EEEEEEE.ID
WHERE
(CCCCCCC.TID = 654321) AND (DDDDDDDD.In_Use = 1) AND (AAAAAAAA.Unused = 0)
AND (DDDDDDDD.Expiry > '10-11-2010 09:23:38') AND
(
(AAAAAAAA.Code = 'red pen') OR
(
(my_search_description LIKE '% red %') AND (my_search_description LIKE '% nose %')
AND (DDDDDDDD.CID IN (63,153,165,305,32,33))
)
)
AND (DDDDDDDD.CID IN (20,32,33,63,64,65,153,165,232,277,294,297,300,304,305,313,348,443,445,446,447,454,472,479,481,486,489,498))
ORDER BY AAAAAAAA.f_search_priority DESC, DDDDDDDD.Priority DESC, AAAAAAAA.Description ASC
You can see throwing in the my_search_description filter also includes a dddd.cid filter (business logic).
This is the part which is slowing things down (from a 1.5-2 second load of my pages down to a 6-8 second load (ow ow ow))
It might be my lack of understanding of how to have the full text search catelogue working.
Am very impressed by the answers so if anyone has any tips I'd be most greatful.
If you haven't already, enable full text indexing.
Unfortunately, using the LIKE clause on a query really does slow things down. Full Text Indexing is really the only way that I know of to speed things up (at the cost of storage space, of course).
Here's a link to an overview of Full-Text Search in SQL Server which will show you how to configure things and change your queries to take advantage of the full-text indexes.
More details would certainly help, but...
Full-text indexing can certainly be useful (depending on the more details about the table and your query). Full Text indexing requires a good bit of extra work both in setup and querying, but it's the only way to try to do the sort of search you seek efficiently.
The problem with LIKE that starts with a Wildcard is that SQL server has to do a complete table scan to find matching records - not only does it have to scan every row, but it has to read the contents of the char-based field you are querying.
With or without a full-text index, one thing can possibly help: Can you narrow the range of rows being searched, so at least SQL doesn't need to scan the whole table, but just some subset of it?
The '% blahblah %' is a problem for improving performance. Putting the wildcard at the beginning tells SQL Server that the string can begin with any legal character, so it must scan the entire index. Your best bet if you must have this filter is to focus on your other filters for improvement.
Using LIKE with a wildcard at the beginning of the search pattern forces the server to scan every row. It's unable to use any indexes. Indexes work from left to right, and since there is no constant on the left, no index is used.
From your WHERE clause, it looks like you're trying to find rows where a specific word exists in an entry. If you're searching for a whole word, then full text indexing may be a solution for you.
Full text indexing creates an index entry for each word that's contained in the specified column. You can then quickly find rows that contain a specific word.
As other posters have correctly pointed out, the use of the wildcard character % within the LIKE expression is resulting in a query plan being produced that uses a SCAN operation. A scan operation touches every row in the table or index, dependant on the type of scan operation being performed.
So the question really then becomes, do you actually need to search for the given text string anywhere within the column in question?
If not, great, problem solved but if it is essential to your business logic then you have two routes of optimization.
Really go to town on increasing the overall selectivity of your query by focusing your optimization efforts on the remaining search arguments.
Implement a Full Text Indexing Solution.
I don't think this is a valid answer, but I'd like to throw it out there for some more experienced posters comments...are these equivlent?
where (xxxxx.yyyy like '% blahblah %')
vs
where patindex(%blahbalh%, xxxx.yyyy) > 0
As far as I know, that's equivlent from a database logic standpoint as it's forcing the same scan. Guess it couldn't hurt to try?

Building Query from Multi-Selection Criteria

I am wondering how others would handle a scenario like such:
Say I have multiple choices for a user to choose from.
Like, Color, Size, Make, Model, etc.
What is the best solution or practice for handling the build of your query for this scneario?
so if they select 6 of the 8 possible colors, 4 of the possible 7 makes, and 8 of the 12 possible brands?
You could do dynamic OR statements or dynamic IN Statements, but I am trying to figure out if there is a better solution for handling this "WHERE" criteria type logic?
EDIT:
I am getting some really good feedback (thanks everyone)...one other thing to note is that some of the selections could even be like (40 of the selections out of the possible 46) so kind of large. Thanks again!
Thanks,
S
What I would suggest doing is creating a function that takes in a delimited list of makeIds, colorIds, etc. This is probably going to be an int (or whatever your key is). And splits them into a table for you.
Your SP will take in a list of makes, colors, etc as you've said above.
YourSP '1,4,7,11', '1,6,7', '6'....
Inside your SP you'll call your splitting function, which will return a table-
SELECT * FROM
Cars C
JOIN YourFunction(#models) YF ON YF.Id = C.ModelId
JOIN YourFunction(#colors) YF2 ON YF2.Id = C.ColorId
Then, if they select nothing they get nothing. If they select everything, they'll get everything.
What is the best solution or practice for handling the build of your query for this scenario?
Dynamic SQL.
A single parameter represents two states - NULL/non-existent, or having a value. Two more means squaring the number of parameters to get the number of total possibilities: 2 yields 4, 3 yields 9, etc. A single, non-dynamic query can contain all the possibilities but will perform horribly between the use of:
ORs
overall non-sargability
and inability to reuse the query plan
...when compared to a dynamic SQL query that constructs the query out of only the absolutely necessary parts.
The query plan is cached in SQL Server 2005+, if you use the sp_executesql command - it is not if you only use EXEC.
I highly recommend reading The Curse and Blessing of Dynamic SQL.
For something this complex, you may want a session table that you update when the user selects their criteria. Then you can join the session table to your items table.
This solution may not scale well to thousands of users, so be careful.
If you want to create dynamic SQL it won't matter if you use the OR approach or the IN approach. SQL Server will process the statements the same way (maybe with little variation in some situations.)
You may also consider using temp tables for this scenario. You can insert the selections for each criteria into temp tables (e.g., #tmpColor, #tmpSize, #tmpMake, etc.). Then you can create a non-dynamic SELECT statement. Something like the following may work:
SELECT <column list>
FROM MyTable
WHERE MyTable.ColorID in (SELECT ColorID FROM #tmpColor)
OR MyTable.SizeID in (SELECT SizeID FROM #tmpSize)
OR MyTable.MakeID in (SELECT MakeID FROM #tmpMake)
The dynamic OR/IN and the temp table solutions work fine if each condition is independent of the other conditions. In other words, if you need to select rows where ((Color is Red and Size is Medium) or (Color is Green and Size is Large)) you'll need to try other solutions.

How bad is my query?

Ok I need to build a query based on some user input to filter the results.
The query basically goes something like this:
SELECT * FROM my_table ORDER BY ordering_fld;
There are four text boxes in which users can choose to filter the data, meaning I'd have to dynamically build a "WHERE" clause into it for the first filter used and then "AND" clauses for each subsequent filter entered.
Because I'm too lazy to do this, I've just made every filter an "AND" clause and put a "WHERE 1" clause in the query by default.
So now I have:
SELECT * FROM my_table WHERE 1 {AND filters} ORDER BY ordering_fld;
So my question is, have I done something that will adversely affect the performance of my query or buggered anything else up in any way I should be remotely worried about?
MySQL will optimize your 1 away.
I just ran this query on my test database:
EXPLAIN EXTENDED
SELECT *
FROM t_source
WHERE 1 AND id < 100
and it gave me the following description:
select `test`.`t_source`.`id` AS `id`,`test`.`t_source`.`value` AS `value`,`test`.`t_source`.`val` AS `val`,`test`.`t_source`.`nid` AS `nid` from `test`.`t_source` where (`test`.`t_source`.`id` < 100)
As you can see, no 1 at all.
The documentation on WHERE clause optimization in MySQL mentions this:
Constant folding:
(a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5
Constant condition removal (needed because of constant folding):
(B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
-> B=5 OR B=6
Note 5 = 5 and 5 = 6 parts in the example above.
You can EXPLAIN your query:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
and see if it does anything differently, which I doubt. I would use 1=1, just so it is more clear.
You might want to add LIMIT 1000 or something, when no parameters are used and the table gets large, will you really want to return everything?
WHERE 1 is a constant, deterministic expression which will be "optimized out" by any decent DB engine.
If there is a good way in your chosen language to avoid building SQL yourself, use that instead. I like Python and Django, and the Django ORM makes it very easy to filter results based on user input.
If you are committed to building the SQL yourself, be sure to sanitize user inputs against SQL injection, and try to encapsulate SQL building in a separate module from your filter logic.
Also, query performance should not be your concern until it becomes a problem, which it probably won't until you have thousands or millions of rows. And when it does come time to optimize, adding a few indexes on columns used for WHERE and JOIN goes a long way.
TO improve performance, use column indexes on fields listen in "WHERE"
Standard SQL Injection Disclaimers here...
One thing you could do, to avoid SQL injection since you know it's only four parameters is use a stored procedure where you pass values for the fields or NULL. I am not sure of mySQL stored proc syntax, but the query would boil down to
SELECT *
FROM my_table
WHERE Field1 = ISNULL(#Field1, Field1)
AND Field2 = ISNULL(#Field2, Field2)
...
ORDRE BY ordering_fld
We've been doing something similiar not too long ago and there're a few things that we observed:
Setting up the indexes on the columns we were (possibly) filtering, improved performance
The WHERE 1 part can be left out completely if the filters're not used. (not sure if it applies to your case) Doesn't make a difference, but 'feels' right.
SQL injection shouldn't be forgotten
Also, if you 'only' have 4 filters, you could build up a stored procedure and pass in null values and check for them. (just like n8wrl suggested in the meantime)
That will work - some considerations:
About dynamically built SQL in general, some databases (Oracle at least) will cache execution plans for queries, so if you end up running the same query many times it won't have to completely start over from scratch. If you use dynamically built SQL, you are creating a different query each time so to the database it will look like 100 different queries instead of 100 runs of the same query.
You'd probably just need to measure the performance to find out if it works well enough for you.
Do you need all the columns? Explicitly specifying them is probably better than using * anyways because:
You can visually see what columns are being returned
If you add or remove columns to the table later, they won't change your interface
Not bad, i didn't know this snippet to get rid of the 'is it the first filter 3' question.
Tho you should be ashamed of your code ( ^^ ), it doesn't do anything to performance as any DB Engine will optimize it.
The only reason I've used WHERE 1 = 1 is for dynamic SQL; it's a hack to make appending WHERE clauses easier by using AND .... It is not something I would include in my SQL otherwise - it does nothing to affect the query overall because it always evaluates as being true and does not hit the table(s) involved so there aren't any index lookups or table scans based on it.
I can't speak to how MySQL handles optional criteria, but I know that using the following:
WHERE (#param IS NULL OR t.column = #param)
...is the typical way of handling optional parameters. COALESCE and ISNULL are not ideal because the query is still utilizing indexes (or worse, table scans) based on a sentinel value. The example I provided won't hit the table unless a value has been provided.
That said, my experience with Oracle (9i, 10g) has shown that it doesn't handle [ WHERE (#param IS NULL OR t.column = #param) ] very well. I saw a huge performance gain by converting the SQL to be dynamic, and used CONTEXT variables to determine what to add. My impression of SQL Server 2005 is that these are handled better.
I have usually done something like this:
for(int i=0; i<numConditions; i++) {
sql += (i == 0 ? "WHERE " : "AND ");
sql += dbFieldNames[i] + " = " + safeVariableValues[i];
}
Makes the generated query a little cleaner.
One alternative i sometimes use is to build the where clause an an array and then join them together:
my #wherefields;
foreach $c (#conditionfields) {
push #wherefields, "$c = ?",
}
my $sql = "select * from table";
if(#wherefields) { $sql.=" WHERE " . join (" AND ", #wherefields); }
The above is written in perl, but most languages have some kind of join funciton.