SQL Server querying : improve performance by reducing WHERE clauses - sql

I have a SQL query from my teammate that I think has a lot of predicates and this is the cause of a bad performance. It looks like this:
WHERE
(#IdSenales IS NULL OR senalesIds.id = comp.IdSenal)
AND
(#IdAnunciantes IS NULL OR anunciantesIds.id = comp.IdAnunciante)
AND
(#IdProgramas IS NULL OR programasIds.id = emision.IdProgramaVariante)
AND
(#IdTipoPublicidades IS NULL OR publicidadesIds.id = orden.IdTipoPublicidad)
AND
(#Canje = 0 OR (#canje = 1 AND comp.IdTipoCondicionCobro != 12))
AND
(emision.Fecha BETWEEN #FechaDesdeContrato AND ISNULL(#FechaHastaContrato, emision.fecha))
AND
(comp.FechaEmision BETWEEN #FechaDesde AND #FechaHasta)
AND
(#IdSectorImputacion = 0 OR #IdSectorImputacion = simp.IdSectorImputacion)
I'm from Argentina so it's commented in spanish (sorry).
My questions is, is it a way to improve the performance by changing the comparisons in the WHERE clause with a function created by me that returns the same?
Thank you very much,
David

This is a bit long for a comment.
The only way you can really significantly improve performance is to use indexes. That would require a bunch of indexes for all the different combinations -- but perhaps a few are more common and would suffice for most use-cases.
SQL Server is pretty bad about optimizing complex where clauses. What you could do is use dynamic SQL. Construct the where clause by only putting in the conditions that are necessary.
Then, be sure you have indexes for the common situations. And when the query is compiled, it should run faster.

As #GordonLinoff mentions, your best option is to look into the indexes used. He's also a much better coder than me; so take his advice over mine if you're able to. However, if dynamic SQL is not allowed at your company for some reason, or the rewrite is not an option, read on...
You may not have as big a problem as you think here; have you seen a performance problem, or are you just looking at the code & thinking "there's a lot of stuff going on with a lot of brackets, so that's bad"?
i.e. take this line: (#IdSenales IS NULL OR senalesIds.id = comp.IdSenal).
This compares a parameter with null, so will only need to be evaulated once, rather than once per line; which isn't too bad. Thereafter it's no different to either not having this statement, or having only senalesIds.id = comp.IdSenal. The same is true for most of these lines.
That said, SQL will generate a query plan the first time it runs this code, and would thereafter use this for all subsequent queries, regardless of which parameters were used; so the plan may be entirely inappropriate for the new set of options. A good fix here is to add OPTION (RECOMPILE). You'll find a good explanation of this here: https://blogs.msdn.microsoft.com/robinlester/2016/08/10/improving-query-performance-with-option-recompile-constant-folding-and-avoiding-parameter-sniffing-issues/
Beyond that, this line may be a problem, since it involves applying a function, the output of which will be different for each row; so it won't be easy to optimise:
(emision.Fecha BETWEEN #FechaDesdeContrato AND ISNULL(#FechaHastaContrato, emision.fecha))
Change this to:
(emision.Fecha >= #FechaDesdeContrato AND (emision.Fecha <= #FechaHastaContrato ))
...and you should be OK.
Full Code:
WHERE
(#IdSenales IS NULL OR senalesIds.id = comp.IdSenal)
AND
(#IdAnunciantes IS NULL OR anunciantesIds.id = comp.IdAnunciante)
AND
(#IdProgramas IS NULL OR programasIds.id = emision.IdProgramaVariante)
AND
(#IdTipoPublicidades IS NULL OR publicidadesIds.id = orden.IdTipoPublicidad)
AND
(#Canje = 0 OR (#canje = 1 AND comp.IdTipoCondicionCobro != 12))
AND
(emision.Fecha >= #FechaDesdeContrato AND (#FechaHastaContrato is null or emision.Fecha <= #FechaHastaContrato ))
AND
(comp.FechaEmision BETWEEN #FechaDesde AND #FechaHasta)
AND
(#IdSectorImputacion = 0 OR #IdSectorImputacion = simp.IdSectorImputacion)
OPTION (RECOMPILE)

Thanks for your suggestion #JohnLBevan !!
I have checked the predicates because I read an article from Gail Shaw that said:
"Another common cause of SQL Server choosing to scan is a query that contains multiple predicates, when no single index exists that has all the columns necessary to evaluate the WHERE clause. For example, an index on (FirstName, Surname), would fully support any query with a WHERE clause of FirstName = #FirstName AND Surname = #Surname. However, if there was one index on FirstName only, and a second separate index on Surname, then SQL Server can use neither one efficiently. It may choose to seek one index, do lookups for the other columns and then do a secondary filter; it may choose to seek both indexes and perform an index intersection, or it may give up and scan the table."
https://www.red-gate.com/simple-talk/sql/database-administration/gail-shaws-sql-server-howlers/
When I read this I remembered I have seen multiple predicates in my query. I want to mention that this query is one of the most expensive queries that returns my query to check the cost of all the queries against the database.
Well, I should check if there is enough indexes and/or create new ones.
David Linares.

Related

Querying time higher with 'Where' than without it

I have something what I think is a srange issue. Normally, I think that a Query should last less time if I put a restriction (so that less rows are processed). But I don't know why, this is not the case. Maybe I'm putting something wrong, but I don't get error; the query just seems to run 'till infinity'.
This is the query
SELECT
A.ENTITYID AS ORG_ID,
A.ID_VALUE AS LEI,
A.MODIFIED_BY,
A.AUDITDATETIME AS LAST_DATE_MOD
FROM (
SELECT
CASE WHEN IFE.NEWVALUE IS NOT NULL
then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_TYPE')
ELSE NULL
end as ID_TYPE,
case when IFE.NEWVALUE is not null
then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_VALUE')
ELSE NULL
END AS ID_VALUE,
(select u.username from admin.users u where u.userid = ife.analystuserid) as Modified_by,
ife.*
FROM ife.audittrail ife
WHERE
--IFE.AUDITDATETIME >= '01-JUN-2016' AND
attributeid = 499
AND ROWNUM <= 10000
AND (CASE WHEN IFE.NEWVALUE IS NOT NULL then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_TYPE') ELSE NULL end) = '38') A
--WHERE A.AUDITDATETIME >= '01-JUN-2016';
So I tried with the two clauses commented (one per each time of course).
And with both of them happens the same; the query runs for so long time that I have to abort it.
Do you know why this could be happening? How could I do, maybe in a different way, to put the restriction?
The values of the field AUDITDATETIME are '06-MAY-2017', for example. In that format.
Thank you very much in advance
I think you may misunderstand how databases work.
Firstly, read up on EXPLAIN - you can find out exactly what is taking time, and why, by learning to read the EXPLAIN statement.
Secondly - the performance characteristics of any given query are determined by a whole range of things, but usually the biggest effort goes not in processing rows, but finding them.
Without an index, the database has to look at every row in the database and compare it to your where clause. It's the equivalent of searching in the phone book for a phone number, rather than a name (the phone book is indexed on "last name").
You can improve this by creating indexes - for instance, on columns "AUDITDATETIME" and "attributeid".
Unlike the phone book, a database server can support multiple indexes - and if those indexes match your where clause, your query will be (much) faster.
Finally, using an XML string extraction for a comparison in the where clause is likely to be extremely slow unless you've got an index on that XML data.
This is the equivalent of searching the phone book and translating the street address from one language to another - not only do you have to inspect every address, you have to execute an expensive translation step for each item.
You probably need index(es)... We can all make guesses on what indexes you already have, and need to add, but most dbms have built in query optimizers.
If you are using MS SQL Server you can execute query with query plan, that will tell you what index you need to add to optimize this particular query. It will even let you copy /paste the command to create it.

What can I do to speed this slow query up?

We have a massive, multi-table Sybase query we call the get_safari_exploration_data query, that fetches all sorts of info related to explorers going on a safari, and all the animals they encounter.
This query is slow, and I've been asked to speed it up. The first thing that jumps out at me is that there doesn't seem to be a pressing need for the nested SELECT statement inside the outer FROM clause. In that nested SELECT, there also seems to be several fields that aren't necessary (vegetable, broomhilda, devoured, etc.). I'm also skeptical about the use of the joins ("*=" instead of "INNER JOIN...ON").
SELECT
dog_id,
cat_metadata,
rhino_id,
buffalo_id,
animal_metadata,
has_4_Legs,
is_mammal,
is_carnivore,
num_teeth,
does_hibernate,
last_spotted_by,
last_spotted_date,
purchased_by,
purchased_date,
allegator_id,
cow_id,
cow_name,
cow_alias,
can_be_ridden
FROM
(
SELECT
mp.dog_id as dog_id,
ts.cat_metadata + '-yoyo' as cat_metadata,
mp.rhino_id as rhino_id,
mp.buffalo_id as buffalo_id,
mp.animal_metadata as animal_metadata,
isnull(mp.has_4_Legs, 0) as has_4_Legs,
isnull(mp.is_mammal, 0) as is_mammal,
isnull(mp.is_carnivore, 0) as is_carnivore,
isnull(mp.num_teeth, 0) as num_teeth,
isnull(mp.does_hibernate, 0) as does_hibernate,
jungle_info.explorer as last_spotted_by,
exploring_journal.spotted_datetime as last_spotted_date,
jungle_info.explorer as purchased_by,
early_exploreration_journal.spotted_datetime as purchased_date,
alleg_id as allegator_id,
ho.cow_id,
ho.cow_name,
ho.cow_alias,
isnull(mp.is_ridable,0) as can_be_ridden,
ts.cat_metadata as broomhilda,
ts.squirrel as vegetable,
convert (varchar(15), mp.rhino_id) as tms_id,
0 as devoured
FROM
mammal_pickles mp,
very_tricky_animals vt,
possibly_venomous pv,
possibly_carniv_and_tall pct,
tall_and_skinny ts,
tall_and_skinny_type ptt,
exploration_history last_exploration_history,
master_exploration_journal exploring_journal,
adventurer jungle_info,
exploration_history first_exploration_history,
master_exploration_journal early_exploreration_journal,
adventurer jungle_info,
hunting_orders ho
WHERE
mp.exploring_strategy_id = 47
and mp.cow_id = ho.cow_id
and ho.cow_id IN (20, 30, 50)
and mp.rhino_id = vt.rhino_id
and vt.version_id = pv.version_id
and pv.possibly_carniv_and_tall_id = pct.possibly_carniv_and_tall_id
and vt.tall_and_skinny_id = ts.tall_and_skinny_id
and ts.tall_and_skinny_type_id = ptt.tall_and_skinny_type_id
and mp.alleg_id *= last_exploration_history.exploration_history_id
and last_exploration_history.master_exploration_journal_id *= exploring_journal.master_exploration_journal_id
and exploring_journal.person_id *= jungle_info.person_id
and mp.first_exploration_history_id *= first_exploration_history.exploration_history_id
and first_exploration_history.master_exploration_journal_id *= early_exploreration_journal.master_exploration_journal_id
and early_exploreration_journal.person_id *= jungle_info.person_id
) TEMP_TBL
So I ask:
Am I correct about the nested SELECT?
Am I correct about the unnecessary fields inside the nested SELECT?
Am I correct about the structure/syntax/usage of the joins?
Is there anything else about the structure/nature of this query that jumps out at you as being terribly inefficient/slow?
Unfortunately, unless there is irrefutable, matter-of-fact proof that decomposing this large query into smaller queries is beneficial in the long run, management will simply not approve refactoring it out into multiple, smaller queries, as this will take considerable time to refactor and test. Thanks in advance for any help/insight here!
Am I correct about the nested SELECT?
You would be in some cases, but a competent planner would collapse it and ignore it here.
Am I correct about the unnecessary fields inside the nested SELECT?
Yes, especially considering that some of them don't show up at all in the final list of fields.
Am I correct about the structure/syntax/usage of the joins?
Insofar as I'm aware, *= and =* are merely syntactic sugar for a left and right join, but I might be wrong in stating that. If not, then they merely force the way joins occur, but they may be necessary for your query to work at all.
Is there anything else about the structure/nature of this query that jumps out at you as being terribly inefficient/slow?
Yes.
Firstly, you've some calculations that aren't needed, e.g. convert (varchar(15), mp.rhino_id) as tms_id. Perhaps a join or two as well, but I admittedly haven't looked at the gory details of the query.
Next, you might have a problem with the db design itself, e.g. a cow_id field. (Seriously? :-) )
Last, there occasionally is something to be said about doing multiple queries instead of a single one, to avoid doing tons of joins.
In a blog, for instance, it's usually a good idea to grab the top 10 posts, and then to use a separate query to fetch their tags (where id in (id1, id2, etc.)). In your case, the selective part seems to be around here:
mp.exploring_strategy_id = 47
and mp.cow_id = ho.cow_id
and ho.cow_id IN (20, 30, 50)
so maybe isolate that part in one query, and then build an in () clause using the resulting IDs, and fetch the cosmetic bits and pieces in one or more separate queries.
Oh, and as point out by Gordon, check your indexes as well. But then, note that the indexes may end up of little use without splitting the query into more manageable parts.
I would suggest the following approach.
First, rewrite the query using ANSI standard joins with the on clause. This will make the conditions and filtering much easier to understand. Also, this is "safe" -- you should get exactly the same results as the current version. Be careful, because the *= is an outer join, so not everything is an inner join.
I doubt this step will improve performance.
Then, check each of the reference tables and be sure that the join keys have indexes on them in the reference table. If keys are missing, then add them in.
Then, check whether the left outer joins are necessary. There are filters on tables that are left outer join'ed in . . . these filters convert the outer joins to inner joins. Probably not a performance hit, but you never know.
Then, consider indexing the fields used for filtering (in the where clause).
And, learn how to use explain capabilities. Any nested loop joins (without an index) as likely culprits for performance problems.
As for the nested select, I think Sybase is smart enough to "do the right thing". Even if it wrote out and re-read the result set, that probably would have a marginal effect on the query compared to getting the joins right.
If this is your real data structure, by the way, it sounds like a very interesting domain. It is not often that I see a field called allegator_id in a table.
I will answer some of your questions.
You think that the fields (vegetable, broomhilda, devoured) in nested SELECT could be causing performance issue. Not necessarily. The two unused fields (vegetable, broomhilda) in nested SELECT are from the ts table but the cat_metadata field which is being used is also from ts table. So unless cat_metadata is being covered by index used on ts table, there wont be any performance impact. Because, to extract cat_metadata field the data page from table will need to be fetched anyway. The extraction of other two fields will take little CPU, that's it. So don't worry about that. The 'devoured' field is also a constant. That will not affect the performance either.
Dennis pointed out about usage of convert function convert(varchar(15), mp.rhino_id). I disagree that that will effect performance as it will consume CPU only.
Lastly I would say, try using the set table count to 13, as there are 13 tables in there. Sybase uses four tables at a time for optimisation.

Oracle SQL '+ 0' syntax for join?

I was using Toad's SQL optimizer and it came up with the following addition to my join statements..
instead of say
emp.emplid = dept.emplid
it suggested
emp.emplid = dept.emplid + 0
What does the '+ 0' do? I've searched for the past hour online and I cannot find anything. I know the (+) meaning, but I've never seen anything like this.
The + 0 does what it looks like. It adds 0 to dept.emplid. But from a performance point of view this does make a difference. By turning that into an expression Oracle is not able to use any index on dept.emplid.
So if Oracle is choosing an index on dept.emplid but you would rather it used a different index/plan, then adding + 0 is a way to influence the optimiser, as there is not longer a match on that particular column. Any expression would have done the trick.
The other way to go about this would be to get into optimiser hints. Although this can be a bit of a pain for big queries.
What the ' + 0' does is to indicate to the optimizer that it should use another index. In other words, I'm pretty sure that one of these two fields (mp.emplid = dept.emplid) has, in addition to the foreign key, also another index specific field. As well, the + 0 cancels the index that takes by default optimizer (the foreign key) and tells it to choose another index.

where like over varchar(500)

I have a query which slows down immensely when i add an addition where part
which essentially is just a like lookup on a varchar(500) field
where...
and (xxxxx.yyyy like '% blahblah %')
I've been racking my head but pretty much the query slows down terribly when I add this in.
I'm wondering if anyone has suggestions in terms of changing field type, index setup, or index hints or something that might assist.
any help appreciated.
sql 2000 enterprise.
HERE IS SOME ADDITIONAL INFO:
oops. as some background unfortunately I do need (in the case of the like statement) to have the % at the front.
There is business logic behind that which I can't avoid.
I have since created a full text catalogue on the field which is causing me problems
and converted the search to use the contains syntax.
Unfortunately although this has increased performance on occasion it appears to be slow (slower) for new word searchs.
So if i have apple.. apple appears to be faster the subsequent times but not for new searches of orange (for example).
So i don't think i can go with that (unless you can suggest some tinkering to make that more consistent).
Additional info:
the table contains only around 60k records
the field i'm trying to filter is a varchar(500)
sql 2000 on windows server 2003
The query i'm using is definitely convoluted
Sorry i've had to replace proprietary stuff.. but should give you and indication of the query:
SELECT TOP 99 AAAAAAAA.Item_ID, AAAAAAAA.CatID, AAAAAAAA.PID, AAAAAAAA.Description,
AAAAAAAA.Retail, AAAAAAAA.Pack, AAAAAAAA.CatID, AAAAAAAA.Code, BBBBBBBB.blahblah_PictureFile AS PictureFile,
AAAAAAAA.CL1, AAAAAAAA.CL1, AAAAAAAA.CL2, AAAAAAAA.CL3
FROM CCCCCCC INNER JOIN DDDDDDDD ON CCCCCCC.CID = DDDDDDDD.CID
INNER JOIN AAAAAAAA ON DDDDDDDD.CID = AAAAAAAA.CatID LEFT OUTER JOIN BBBBBBBB
ON AAAAAAAA.PID = BBBBBBBB.Product_ID INNER JOIN EEEEEEE ON AAAAAAAA.BID = EEEEEEE.ID
WHERE
(CCCCCCC.TID = 654321) AND (DDDDDDDD.In_Use = 1) AND (AAAAAAAA.Unused = 0)
AND (DDDDDDDD.Expiry > '10-11-2010 09:23:38') AND
(
(AAAAAAAA.Code = 'red pen') OR
(
(my_search_description LIKE '% red %') AND (my_search_description LIKE '% nose %')
AND (DDDDDDDD.CID IN (63,153,165,305,32,33))
)
)
AND (DDDDDDDD.CID IN (20,32,33,63,64,65,153,165,232,277,294,297,300,304,305,313,348,443,445,446,447,454,472,479,481,486,489,498))
ORDER BY AAAAAAAA.f_search_priority DESC, DDDDDDDD.Priority DESC, AAAAAAAA.Description ASC
You can see throwing in the my_search_description filter also includes a dddd.cid filter (business logic).
This is the part which is slowing things down (from a 1.5-2 second load of my pages down to a 6-8 second load (ow ow ow))
It might be my lack of understanding of how to have the full text search catelogue working.
Am very impressed by the answers so if anyone has any tips I'd be most greatful.
If you haven't already, enable full text indexing.
Unfortunately, using the LIKE clause on a query really does slow things down. Full Text Indexing is really the only way that I know of to speed things up (at the cost of storage space, of course).
Here's a link to an overview of Full-Text Search in SQL Server which will show you how to configure things and change your queries to take advantage of the full-text indexes.
More details would certainly help, but...
Full-text indexing can certainly be useful (depending on the more details about the table and your query). Full Text indexing requires a good bit of extra work both in setup and querying, but it's the only way to try to do the sort of search you seek efficiently.
The problem with LIKE that starts with a Wildcard is that SQL server has to do a complete table scan to find matching records - not only does it have to scan every row, but it has to read the contents of the char-based field you are querying.
With or without a full-text index, one thing can possibly help: Can you narrow the range of rows being searched, so at least SQL doesn't need to scan the whole table, but just some subset of it?
The '% blahblah %' is a problem for improving performance. Putting the wildcard at the beginning tells SQL Server that the string can begin with any legal character, so it must scan the entire index. Your best bet if you must have this filter is to focus on your other filters for improvement.
Using LIKE with a wildcard at the beginning of the search pattern forces the server to scan every row. It's unable to use any indexes. Indexes work from left to right, and since there is no constant on the left, no index is used.
From your WHERE clause, it looks like you're trying to find rows where a specific word exists in an entry. If you're searching for a whole word, then full text indexing may be a solution for you.
Full text indexing creates an index entry for each word that's contained in the specified column. You can then quickly find rows that contain a specific word.
As other posters have correctly pointed out, the use of the wildcard character % within the LIKE expression is resulting in a query plan being produced that uses a SCAN operation. A scan operation touches every row in the table or index, dependant on the type of scan operation being performed.
So the question really then becomes, do you actually need to search for the given text string anywhere within the column in question?
If not, great, problem solved but if it is essential to your business logic then you have two routes of optimization.
Really go to town on increasing the overall selectivity of your query by focusing your optimization efforts on the remaining search arguments.
Implement a Full Text Indexing Solution.
I don't think this is a valid answer, but I'd like to throw it out there for some more experienced posters comments...are these equivlent?
where (xxxxx.yyyy like '% blahblah %')
vs
where patindex(%blahbalh%, xxxx.yyyy) > 0
As far as I know, that's equivlent from a database logic standpoint as it's forcing the same scan. Guess it couldn't hurt to try?

How bad is my query?

Ok I need to build a query based on some user input to filter the results.
The query basically goes something like this:
SELECT * FROM my_table ORDER BY ordering_fld;
There are four text boxes in which users can choose to filter the data, meaning I'd have to dynamically build a "WHERE" clause into it for the first filter used and then "AND" clauses for each subsequent filter entered.
Because I'm too lazy to do this, I've just made every filter an "AND" clause and put a "WHERE 1" clause in the query by default.
So now I have:
SELECT * FROM my_table WHERE 1 {AND filters} ORDER BY ordering_fld;
So my question is, have I done something that will adversely affect the performance of my query or buggered anything else up in any way I should be remotely worried about?
MySQL will optimize your 1 away.
I just ran this query on my test database:
EXPLAIN EXTENDED
SELECT *
FROM t_source
WHERE 1 AND id < 100
and it gave me the following description:
select `test`.`t_source`.`id` AS `id`,`test`.`t_source`.`value` AS `value`,`test`.`t_source`.`val` AS `val`,`test`.`t_source`.`nid` AS `nid` from `test`.`t_source` where (`test`.`t_source`.`id` < 100)
As you can see, no 1 at all.
The documentation on WHERE clause optimization in MySQL mentions this:
Constant folding:
(a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5
Constant condition removal (needed because of constant folding):
(B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
-> B=5 OR B=6
Note 5 = 5 and 5 = 6 parts in the example above.
You can EXPLAIN your query:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
and see if it does anything differently, which I doubt. I would use 1=1, just so it is more clear.
You might want to add LIMIT 1000 or something, when no parameters are used and the table gets large, will you really want to return everything?
WHERE 1 is a constant, deterministic expression which will be "optimized out" by any decent DB engine.
If there is a good way in your chosen language to avoid building SQL yourself, use that instead. I like Python and Django, and the Django ORM makes it very easy to filter results based on user input.
If you are committed to building the SQL yourself, be sure to sanitize user inputs against SQL injection, and try to encapsulate SQL building in a separate module from your filter logic.
Also, query performance should not be your concern until it becomes a problem, which it probably won't until you have thousands or millions of rows. And when it does come time to optimize, adding a few indexes on columns used for WHERE and JOIN goes a long way.
TO improve performance, use column indexes on fields listen in "WHERE"
Standard SQL Injection Disclaimers here...
One thing you could do, to avoid SQL injection since you know it's only four parameters is use a stored procedure where you pass values for the fields or NULL. I am not sure of mySQL stored proc syntax, but the query would boil down to
SELECT *
FROM my_table
WHERE Field1 = ISNULL(#Field1, Field1)
AND Field2 = ISNULL(#Field2, Field2)
...
ORDRE BY ordering_fld
We've been doing something similiar not too long ago and there're a few things that we observed:
Setting up the indexes on the columns we were (possibly) filtering, improved performance
The WHERE 1 part can be left out completely if the filters're not used. (not sure if it applies to your case) Doesn't make a difference, but 'feels' right.
SQL injection shouldn't be forgotten
Also, if you 'only' have 4 filters, you could build up a stored procedure and pass in null values and check for them. (just like n8wrl suggested in the meantime)
That will work - some considerations:
About dynamically built SQL in general, some databases (Oracle at least) will cache execution plans for queries, so if you end up running the same query many times it won't have to completely start over from scratch. If you use dynamically built SQL, you are creating a different query each time so to the database it will look like 100 different queries instead of 100 runs of the same query.
You'd probably just need to measure the performance to find out if it works well enough for you.
Do you need all the columns? Explicitly specifying them is probably better than using * anyways because:
You can visually see what columns are being returned
If you add or remove columns to the table later, they won't change your interface
Not bad, i didn't know this snippet to get rid of the 'is it the first filter 3' question.
Tho you should be ashamed of your code ( ^^ ), it doesn't do anything to performance as any DB Engine will optimize it.
The only reason I've used WHERE 1 = 1 is for dynamic SQL; it's a hack to make appending WHERE clauses easier by using AND .... It is not something I would include in my SQL otherwise - it does nothing to affect the query overall because it always evaluates as being true and does not hit the table(s) involved so there aren't any index lookups or table scans based on it.
I can't speak to how MySQL handles optional criteria, but I know that using the following:
WHERE (#param IS NULL OR t.column = #param)
...is the typical way of handling optional parameters. COALESCE and ISNULL are not ideal because the query is still utilizing indexes (or worse, table scans) based on a sentinel value. The example I provided won't hit the table unless a value has been provided.
That said, my experience with Oracle (9i, 10g) has shown that it doesn't handle [ WHERE (#param IS NULL OR t.column = #param) ] very well. I saw a huge performance gain by converting the SQL to be dynamic, and used CONTEXT variables to determine what to add. My impression of SQL Server 2005 is that these are handled better.
I have usually done something like this:
for(int i=0; i<numConditions; i++) {
sql += (i == 0 ? "WHERE " : "AND ");
sql += dbFieldNames[i] + " = " + safeVariableValues[i];
}
Makes the generated query a little cleaner.
One alternative i sometimes use is to build the where clause an an array and then join them together:
my #wherefields;
foreach $c (#conditionfields) {
push #wherefields, "$c = ?",
}
my $sql = "select * from table";
if(#wherefields) { $sql.=" WHERE " . join (" AND ", #wherefields); }
The above is written in perl, but most languages have some kind of join funciton.