Query Optimization Suggestion

Query Optimization Suggestion - sql

Suppose you have a query (PL-SQL) such as:
Select a.*
From Table a
Where a.foo in (#1)
or a.bar in (#1);
Where #1 is a list containing about 10.000 (string) parameters. Yes, the list is repeated in both restrictions. And for any given row, a.foo <> a.bar.
This list comes from a webservice, and it changes according to a set of parameters. Suppose it is not possible to store them. The strings in this list are numeric strings, with 9 characters, such as '001234567'.
Is there a better way to structure this query?

Given the information in your question (and comments) then the simple answer is no there is not a better way to structure your query, however there is almost certainly a better way to design this process so that you aren't forced to use the procedure to which this horrible pseudo-code snippet alludes.

Related

SQL DB2 - How to SELECT or compare columns based on their name?

Thank you for checking my question out!
I'm trying to write a query for a very specific problem we're having at my workplace and I can't seem to get my head around it.
Short version: I need to be able to target columns by their name, and more specifically by a part of their name that will be consistent throughout all the columns I need to combine or compare.
More details:
We have (for example), 5 different surveys. They have many questions each, but SOME of the questions are part of the same metric, and we need to create a generic field that keeps it. There's more background to the "why" of that, but it's pretty important for us at this point.
We were able to kind of solve this with either COALESCE() or CASE statements but the challenge is that, as more surveys/survey versions continue to grow, our vendor inevitably generates new columns for each survey and its questions.
Take this example, which is what we do currently and works well enough:
CASE
WHEN SURVEY_NAME = 'Service1' THEN SERV1_REC
WHEN SURVEY_NAME = 'Notice1' THEN FNOL1_REC
WHEN SURVEY_NAME = 'Status1' THEN STAT1_REC
WHEN SURVEY_NAME = 'Sales1' THEN SALE1_REC
WHEN SURVEY_NAME = 'Transfer1' THEN Null
ELSE Null
END REC
And also this alternative which works well:
COALESCE(SERV1_REC, FNOL1_REC, STAT1_REC, SALE1_REC) as REC
But as I mentioned, eventually we will have a "SALE2_REC" for example, and we'll need them BOTH on this same statement. I want to create something where having to come into the SQL and make changes isn't needed. Given that the columns will ALWAYS be named "something#_REC" for this specific metric, is there any way to achieve something like:
COALESCE(all columns named LIKE '%_REC') as REC
Bonus! Related, might be another way around this same problem:
Would there also be a way to achieve this?
SELECT (columns named LIKE '%_REC') FROM ...
Thank you very much in advance for all your time and attention.
-Kendall

Table and column information in Db2 are managed in the system catalog. The relevant views are SYSCAT.TABLES and SYSCAT.COLUMNS. You could write:
select colname, tabname from syscat.tables
where colname like some_expression
and syscat.tabname='MYTABLE
Note that the LIKE predicate supports expressions based on a variable or the result of a scalar function. So you could match it against some dynamic input.
Have you considered storing the more complicated properties in JSON or XML values? Db2 supports both and you can query those values with regular SQL statements.

Replace one column in a table with a column in a different table and select *

Apologies for the somewhat confusing Title, I've been struggling to find an answer to my question, partly because it's hard to concisely describe it in the title line or come up with a good search string for it. Anyhoooo, here's the problem I'm facing:
Short version of the question is:
How can I write the following (invalid but understandable SQL) in valid SQL understood by Oracle:
select B.REPLACER as COL, A.* except A.COL from A join B on a.COL = B.COL;
Here's the long version (if you already know what I want from reading the short version, you don't need to read this :P ):
My (simplified) task is to come up with service that massages a table's data and provide it as a sub-query. The table has a lot of columns (a few dozens or more), and I am stuck with using "select *" rather than explicitly listing out all columns one by one, because new columns may be added to or removed from the table without me knowing, although my downstream systems will know and adjust accordingly.
Say, this table (let's call it Table A from now on) has a column called "COL", and we need to replace the values in that COL with the value in the REPLACER column of table B where the two COL value matches.
How do I do this? I cannot rename the column because the downstream systems expect "COL"; I cannot do without the "expect A.COL" part because that would cause the sql to be ambiguous.
Appreciate your help, almighty StackOverflow
Ray

You can either use table.* or table.fieldName.
There is no syntax available for table.* (except field X).
This means that you can only get what you want by explicitly listing all of the fields...
select
A.field1,
A.field2,
B.field3,
A.field4,
etc
from
A join B on a.COL = B.COL;
This means that you may need to re-model your data so as to ensure you don't keep getting new fields. OR write dynamic sql. Interrogate the database to find out the column names, use code to write a query as above, and then run that dynamically generated query.

Try this: (not tested)
select Case B.COL
when null then A.COL
else B.REPLACER
end as COLAB, A.*
from A left join B on A.COL = B.COL;
This should get the B.REPLACER when exists B.COL = A.COL, you can add more column in the select (like sample col1, col2) or use A.* (change COL into COLAB to make it distinguish with A.COL in A.*) .

Like said before, you cannot specify in regular sql which column not to select. you could write a procedure for that, but it would be quite complex, because you would need to return a variable table type. Probably something with refcursor magic stuff.
The closest I could come up with is joining with using. This will give you the column col in the first field once and for the rest all columns in a and b. So not what you want basically. :)
select *
from a
join b using (col)

Let's start from first principles. select * from .... is a bug waiting to happend and has no place in production code. Of course everybody uses it because it entails less typing but that doesn't make it a good practice.
Beyond that, the ANSI SQL standard doesn't support select * except col1 from .... syntax. I know a lot of people wish it would but it doesn't.
There are a couple of ways to avoid excessive typing. One is to generate the query by selecting from data dictionary, using one of the views like USER_TAB_COLUMNS. It is worth writing the PL?SQL block to do this if you need lots of queries like this.
A simpler hack is to use the SQL*Plus describe to list out the structure of table A; cut'n'paste it into an IDE which supports regular expressions and edit the columns to give you the query's projection.
Both these options might strike you as labourious but frankly either workaround (and especially the second) would have taken less effort than asking StackOverflow. You'll know better next time.

Hide Empty columns

I got a table with 75 columns,. what is the sql statement to display only the columns with values in in ?
thanks

It's true that a similar statement doesn't exist (in a SELECT you can use condition filters only for the rows, not for the columns). But you could try to write a (bit tricky) procedure. It must check which are the columns that contains at least one not NULL/empty value, using queries. When you get this list of columns just join them in a string with a comma between each one and compose a query that you can run, returning what you wanted.
EDIT: I thought about it and I think you can do it with a procedure but under one of these conditions:
find a way to retrieve column names dynamically in the procedure, that is the metadata (I never heard about it, but I'm new with procedures)
or hardcode all column names (loosing generality)
You could collect column names inside an array, if stored procedures of your DBMS support arrays (or write the procedure in a programming language like C), and loop on them, making a SELECT each time, checking if it's an empty* column or not. If it contains at least one value concatenate it in a string where column names are comma-separated. Finally you can make your query with only not-empty columns!
Alternatively to stored procedure you could write a short program (eg in Java) where you can deal with a better flexibility.
*if you check for NULL values it will be simple, but if you check for empty values you will need to manage with each column data type... another array with data types?

I would suggest that you write a SELECT statement and define which COLUMNS you wish to display and then save that QUERY as a VIEW.
This will save you the trouble of typing in the column names every time you wish to run that query.
As marc_s pointed out in the comments, there is no select statement to hide columns of data.

You could do a pre-parse and dynamically create a statement to do this, but this would be a very inefficient thing to do from a SQL performance perspective. Would strongly advice against what you are trying to do.

A simplified version of this is to just select the relevant columns, which was what I needed personally. A quick search of what we're dealing with in a table
SELECT * FROM table1 LIMIT 10;
-> shows 20 columns where im interested in 3 of them. Limit is just to not overflow the console.
SELECT column1,column3,colum19 FROM table1 WHERE column3='valueX';
It is a bit of a manual filter but it works for what I need.

Building Query from Multi-Selection Criteria

I am wondering how others would handle a scenario like such:
Say I have multiple choices for a user to choose from.
Like, Color, Size, Make, Model, etc.
What is the best solution or practice for handling the build of your query for this scneario?
so if they select 6 of the 8 possible colors, 4 of the possible 7 makes, and 8 of the 12 possible brands?
You could do dynamic OR statements or dynamic IN Statements, but I am trying to figure out if there is a better solution for handling this "WHERE" criteria type logic?
EDIT:
I am getting some really good feedback (thanks everyone)...one other thing to note is that some of the selections could even be like (40 of the selections out of the possible 46) so kind of large. Thanks again!
Thanks,
S

What I would suggest doing is creating a function that takes in a delimited list of makeIds, colorIds, etc. This is probably going to be an int (or whatever your key is). And splits them into a table for you.
Your SP will take in a list of makes, colors, etc as you've said above.
YourSP '1,4,7,11', '1,6,7', '6'....
Inside your SP you'll call your splitting function, which will return a table-
SELECT * FROM
Cars C
JOIN YourFunction(#models) YF ON YF.Id = C.ModelId
JOIN YourFunction(#colors) YF2 ON YF2.Id = C.ColorId
Then, if they select nothing they get nothing. If they select everything, they'll get everything.

What is the best solution or practice for handling the build of your query for this scenario?
Dynamic SQL.
A single parameter represents two states - NULL/non-existent, or having a value. Two more means squaring the number of parameters to get the number of total possibilities: 2 yields 4, 3 yields 9, etc. A single, non-dynamic query can contain all the possibilities but will perform horribly between the use of:
ORs
overall non-sargability
and inability to reuse the query plan
...when compared to a dynamic SQL query that constructs the query out of only the absolutely necessary parts.
The query plan is cached in SQL Server 2005+, if you use the sp_executesql command - it is not if you only use EXEC.
I highly recommend reading The Curse and Blessing of Dynamic SQL.

For something this complex, you may want a session table that you update when the user selects their criteria. Then you can join the session table to your items table.
This solution may not scale well to thousands of users, so be careful.

If you want to create dynamic SQL it won't matter if you use the OR approach or the IN approach. SQL Server will process the statements the same way (maybe with little variation in some situations.)
You may also consider using temp tables for this scenario. You can insert the selections for each criteria into temp tables (e.g., #tmpColor, #tmpSize, #tmpMake, etc.). Then you can create a non-dynamic SELECT statement. Something like the following may work:
SELECT <column list>
FROM MyTable
WHERE MyTable.ColorID in (SELECT ColorID FROM #tmpColor)
OR MyTable.SizeID in (SELECT SizeID FROM #tmpSize)
OR MyTable.MakeID in (SELECT MakeID FROM #tmpMake)
The dynamic OR/IN and the temp table solutions work fine if each condition is independent of the other conditions. In other words, if you need to select rows where ((Color is Red and Size is Medium) or (Color is Green and Size is Large)) you'll need to try other solutions.

Sql Optimization: Xml or Delimited String

This is hopefully just a simple question involving performance optimizations when it comes to queries in Sql 2008.
I've worked for companies that use Stored Procs a lot for their ETL processes as well as some of their websites. I've seen the scenario where they need to retrieve specific records based on a finite set of key values. I've seen it handled in 3 different ways, illustrated via pseudo-code below.
Dynamic Sql that concatinates a string and executes it.
EXEC('SELECT * FROM TableX WHERE xId IN (' + #Parameter + ')'
Using a user defined function to split a delimited string into a table
SELECT * FROM TableY INNER JOIN SPLIT(#Parameter) ON yID = splitId
USING XML as the Parameter instead of a delimited varchar value
SELECT * FROM TableZ JOIN #Parameter.Nodes(xpath) AS x (y) ON ...
While I know creating the dynamic sql in the first snippet is a bad idea for a large number of reasons, my curiosity comes from the last 2 examples. Is it more proficient to do the due diligence in my code to pass such lists via XML as in snippet 3 or is it better to just delimit the values and use an udf to take care of it?

There is now a 4th option - table valued parameters, whereby you can actually pass a table of values in to a sproc as a parameter and then use that as you would normally a table variable. I'd be preferring this approach over the XML (or CSV parsing approach)
I can't quote performance figures between all the different approaches, but that's one I'd be trying - I'd recommend doing some real performance tests on them.
Edit:
A little more on TVPs. In order to pass the values in to your sproc, you just define a SqlParameter (SqlDbType.Structured) - the value of this can be set to any IEnumerable, DataTable or DbDataReader source. So presumably, you already have the list of values in a list/array of some sort - you don't need to do anything to transform it into XML or CSV.
I think this also makes the sproc clearer, simpler and more maintainable, providing a more natural way to achieve the end result. One of the main points is that SQL performs best at set based/not looping/non string manipulation activities.
That's not to say it will perform great with a large set of values passed in. But with smaller sets (up to ~1000) it should be fine.

UDF invocation is a little bit more costly than splitting the XML using the built-in function.
However, this only needs to be done once per query, so the performance difference will be negligible.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas