I am wondering how others would handle a scenario like such:
Say I have multiple choices for a user to choose from.
Like, Color, Size, Make, Model, etc.
What is the best solution or practice for handling the build of your query for this scneario?
so if they select 6 of the 8 possible colors, 4 of the possible 7 makes, and 8 of the 12 possible brands?
You could do dynamic OR statements or dynamic IN Statements, but I am trying to figure out if there is a better solution for handling this "WHERE" criteria type logic?
EDIT:
I am getting some really good feedback (thanks everyone)...one other thing to note is that some of the selections could even be like (40 of the selections out of the possible 46) so kind of large. Thanks again!
Thanks,
S
What I would suggest doing is creating a function that takes in a delimited list of makeIds, colorIds, etc. This is probably going to be an int (or whatever your key is). And splits them into a table for you.
Your SP will take in a list of makes, colors, etc as you've said above.
YourSP '1,4,7,11', '1,6,7', '6'....
Inside your SP you'll call your splitting function, which will return a table-
SELECT * FROM
Cars C
JOIN YourFunction(#models) YF ON YF.Id = C.ModelId
JOIN YourFunction(#colors) YF2 ON YF2.Id = C.ColorId
Then, if they select nothing they get nothing. If they select everything, they'll get everything.
What is the best solution or practice for handling the build of your query for this scenario?
Dynamic SQL.
A single parameter represents two states - NULL/non-existent, or having a value. Two more means squaring the number of parameters to get the number of total possibilities: 2 yields 4, 3 yields 9, etc. A single, non-dynamic query can contain all the possibilities but will perform horribly between the use of:
ORs
overall non-sargability
and inability to reuse the query plan
...when compared to a dynamic SQL query that constructs the query out of only the absolutely necessary parts.
The query plan is cached in SQL Server 2005+, if you use the sp_executesql command - it is not if you only use EXEC.
I highly recommend reading The Curse and Blessing of Dynamic SQL.
For something this complex, you may want a session table that you update when the user selects their criteria. Then you can join the session table to your items table.
This solution may not scale well to thousands of users, so be careful.
If you want to create dynamic SQL it won't matter if you use the OR approach or the IN approach. SQL Server will process the statements the same way (maybe with little variation in some situations.)
You may also consider using temp tables for this scenario. You can insert the selections for each criteria into temp tables (e.g., #tmpColor, #tmpSize, #tmpMake, etc.). Then you can create a non-dynamic SELECT statement. Something like the following may work:
SELECT <column list>
FROM MyTable
WHERE MyTable.ColorID in (SELECT ColorID FROM #tmpColor)
OR MyTable.SizeID in (SELECT SizeID FROM #tmpSize)
OR MyTable.MakeID in (SELECT MakeID FROM #tmpMake)
The dynamic OR/IN and the temp table solutions work fine if each condition is independent of the other conditions. In other words, if you need to select rows where ((Color is Red and Size is Medium) or (Color is Green and Size is Large)) you'll need to try other solutions.
Related
Thank you for checking my question out!
I'm trying to write a query for a very specific problem we're having at my workplace and I can't seem to get my head around it.
Short version: I need to be able to target columns by their name, and more specifically by a part of their name that will be consistent throughout all the columns I need to combine or compare.
More details:
We have (for example), 5 different surveys. They have many questions each, but SOME of the questions are part of the same metric, and we need to create a generic field that keeps it. There's more background to the "why" of that, but it's pretty important for us at this point.
We were able to kind of solve this with either COALESCE() or CASE statements but the challenge is that, as more surveys/survey versions continue to grow, our vendor inevitably generates new columns for each survey and its questions.
Take this example, which is what we do currently and works well enough:
CASE
WHEN SURVEY_NAME = 'Service1' THEN SERV1_REC
WHEN SURVEY_NAME = 'Notice1' THEN FNOL1_REC
WHEN SURVEY_NAME = 'Status1' THEN STAT1_REC
WHEN SURVEY_NAME = 'Sales1' THEN SALE1_REC
WHEN SURVEY_NAME = 'Transfer1' THEN Null
ELSE Null
END REC
And also this alternative which works well:
COALESCE(SERV1_REC, FNOL1_REC, STAT1_REC, SALE1_REC) as REC
But as I mentioned, eventually we will have a "SALE2_REC" for example, and we'll need them BOTH on this same statement. I want to create something where having to come into the SQL and make changes isn't needed. Given that the columns will ALWAYS be named "something#_REC" for this specific metric, is there any way to achieve something like:
COALESCE(all columns named LIKE '%_REC') as REC
Bonus! Related, might be another way around this same problem:
Would there also be a way to achieve this?
SELECT (columns named LIKE '%_REC') FROM ...
Thank you very much in advance for all your time and attention.
-Kendall
Table and column information in Db2 are managed in the system catalog. The relevant views are SYSCAT.TABLES and SYSCAT.COLUMNS. You could write:
select colname, tabname from syscat.tables
where colname like some_expression
and syscat.tabname='MYTABLE
Note that the LIKE predicate supports expressions based on a variable or the result of a scalar function. So you could match it against some dynamic input.
Have you considered storing the more complicated properties in JSON or XML values? Db2 supports both and you can query those values with regular SQL statements.
I'm trying to optimize a big query (with no sub-queries but huge table size) in Oracle PL/ SQL (10g). To start, I removed few tables from the join and cached the data they were returning in collections (nested tables to be specific), which lead to two choices
Example:
(say the variable name is cache) and the condition in the original query was
where a.col_in_a = b.col_in_b;
so I changed it to,
select b.col_in_b bulk collect into cache from b where -- some conditions;
option 1
where a.col_in_a member of cache;
option 2
where a.col_in_b in (select column_value from table(cache));
Question1: which of the above two is better.
Going further, I also had a choice of not using collections and simply use a with clause with materialize hint
with cache as (SELECT /*+ materialize */
col_in_b from b where -- some condtions
)
select col_in_a from a,b where a.col_in_b = cache.col_in_b;
Question 2 use collections or with clause to cache
Once I take this decision, I would like to make this new query a little flexible, i.e. make it dynamic with few tables and conditions changing little bit here and there. This leads me to two more options
1. EXECUTE IMMEDIATE 'big select query'
2. OPEN-WITH-FOR 'big select query'
Since, I'm storing the result of this big query in a ref cursor, I believe OPEN-WITH-FOR is the way to go. Which leads me to writing the dynamic query in two ways
Question 3: How to make something like below work in the ways metioned in question 1 and which is the best?
open cursor for 'select col_in_a from a,b where a.col_in_b in :cache' using dummyCache
where dummyCache can be a comma separated list of strings OR can I pass a nested table using a bind variable and use a member of clause.. I'm not sure how to make this work..
I hope I've tried to explain clearly. Request all the folks to suggest me some better way as well..
I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.
This is hopefully just a simple question involving performance optimizations when it comes to queries in Sql 2008.
I've worked for companies that use Stored Procs a lot for their ETL processes as well as some of their websites. I've seen the scenario where they need to retrieve specific records based on a finite set of key values. I've seen it handled in 3 different ways, illustrated via pseudo-code below.
Dynamic Sql that concatinates a string and executes it.
EXEC('SELECT * FROM TableX WHERE xId IN (' + #Parameter + ')'
Using a user defined function to split a delimited string into a table
SELECT * FROM TableY INNER JOIN SPLIT(#Parameter) ON yID = splitId
USING XML as the Parameter instead of a delimited varchar value
SELECT * FROM TableZ JOIN #Parameter.Nodes(xpath) AS x (y) ON ...
While I know creating the dynamic sql in the first snippet is a bad idea for a large number of reasons, my curiosity comes from the last 2 examples. Is it more proficient to do the due diligence in my code to pass such lists via XML as in snippet 3 or is it better to just delimit the values and use an udf to take care of it?
There is now a 4th option - table valued parameters, whereby you can actually pass a table of values in to a sproc as a parameter and then use that as you would normally a table variable. I'd be preferring this approach over the XML (or CSV parsing approach)
I can't quote performance figures between all the different approaches, but that's one I'd be trying - I'd recommend doing some real performance tests on them.
Edit:
A little more on TVPs. In order to pass the values in to your sproc, you just define a SqlParameter (SqlDbType.Structured) - the value of this can be set to any IEnumerable, DataTable or DbDataReader source. So presumably, you already have the list of values in a list/array of some sort - you don't need to do anything to transform it into XML or CSV.
I think this also makes the sproc clearer, simpler and more maintainable, providing a more natural way to achieve the end result. One of the main points is that SQL performs best at set based/not looping/non string manipulation activities.
That's not to say it will perform great with a large set of values passed in. But with smaller sets (up to ~1000) it should be fine.
UDF invocation is a little bit more costly than splitting the XML using the built-in function.
However, this only needs to be done once per query, so the performance difference will be negligible.
I have two queries from v$sqlarea. For example
query 1: select * from employee emp where emp.eid = 5
query 2: select * from employee v where v.eid = 15
Both are exactly the same in structure. but they will be compiled separately each time..
I need to match such queries that vary only by alias names or bind variables.
The inbuilt function utl_match.jaro_winkler_similarity() in oracle provides a pretty good string match algorithm. But sql perspective match is not provided. Is there any other solution?
there is a script on Asktom that will find exactly this kind of statements (statements NOT using binds).
The output of that last query will show you statements that are identical in the shared
pool after all numbers and character string constants have been removed. These
statements -- and more importantly their counts -- are the potential bottlenecks. In
addition to causing the contention, they will be HUGE cpu consumers.
Oracle actually does this internally, to support cursor_sharing = similar. I am not aware that they expose this functionality anywhere.
The alias names is a tricky one. You could look for SQLs with the same PLAN_HASH_VALUE as a starting point.