Why would an IN condition be slower than "=" in sql? - sql

Check the question This SELECT query takes 180 seconds to finish (check the comments on the question itself).
The IN get to be compared against only one value, but still the time difference is enormous.
Why is it like that?

Summary: This is a known problem in MySQL and was fixed in MySQL 5.6.x. The problem is due to a missing optimization when a subquery using IN is incorrectly indentified as dependent subquery instead of an independent subquery.
When you run EXPLAIN on the original query it returns this:
1 'PRIMARY' 'question_law_version' 'ALL' '' '' '' '' 10148 'Using where'
2 'DEPENDENT SUBQUERY' 'question_law_version' 'ALL' '' '' '' '' 10148 'Using where'
3 'DEPENDENT SUBQUERY' 'question_law' 'ALL' '' '' '' '' 10040 'Using where'
When you change IN to = you get this:
1 'PRIMARY' 'question_law_version' 'ALL' '' '' '' '' 10148 'Using where'
2 'SUBQUERY' 'question_law_version' 'ALL' '' '' '' '' 10148 'Using where'
3 'SUBQUERY' 'question_law' 'ALL' '' '' '' '' 10040 'Using where'
Each dependent subquery is run once per row in the query it is contained in, whereas the subquery is run only once. MySQL can sometimes optimize dependent subqueries when there is a condition that can be converted to a join but here that is not the case.
Now this of course leaves the question of why MySQL believes that the IN version needs to be a dependent subquery. I have made a simplified version of the query to help investigate this. I created two tables 'foo' and 'bar' where the former contains only an id column, and the latter contains both an id and a foo id (though I didn't create a foreign key constraint). Then I populated both tables with 1000 rows:
CREATE TABLE foo (id INT PRIMARY KEY NOT NULL);
CREATE TABLE bar (id INT PRIMARY KEY, foo_id INT NOT NULL);
-- populate tables with 1000 rows in each
SELECT id
FROM foo
WHERE id IN
(
SELECT MAX(foo_id)
FROM bar
);
This simplified query has the same problem as before - the inner select is treated as a dependent subquery and no optimization is performed, causing the inner query to be run once per row. The query takes almost one second to run. Changing the IN to = again allows the query to run almost instantly.
The code I used to populate the tables is below, in case anyone wishes to reproduce the results.
CREATE TABLE filler (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;
DELIMITER $$
CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
DECLARE _cnt INT;
SET _cnt = 1;
WHILE _cnt <= cnt DO
INSERT
INTO filler
SELECT _cnt;
SET _cnt = _cnt + 1;
END WHILE;
END
$$
DELIMITER ;
CALL prc_filler(1000);
INSERT foo SELECT id FROM filler;
INSERT bar SELECT id, id FROM filler;

It's about inner queries a.k.a subqueries vs joins, not about IN vs =, ant the reasons are explained in that post.
MySQL's version 5.4 is suppposed to introduce an improved optimiser, that can rewrite some subqueries into more efficient form.
The worst thing you can do, is to use so called correlated subquery
http://dev.mysql.com/doc/refman/5.1/en/correlated-subqueries.html

SQL optimizers don't always do what you expect them to do. I'm not sure there's a better answer than that. That's why you have to examine EXPLAIN PLAN output, and profile your queries to find out where the time is spent.

It is interesting but the problem can be also solved with the prepared statements (not sure if it is suitable for everybody), e.g.:
mysql> EXPLAIN SELECT * FROM words WHERE word IN (SELECT word FROM phrase_words);
+----+--------------------+--------------+...
| id | select_type | table |...
+----+--------------------+--------------+...
| 1 | PRIMARY | words |...
| 2 | DEPENDENT SUBQUERY | phrase_words |...
+----+--------------------+--------------+...
mysql> EXPLAIN SELECT * FROM words WHERE word IN ('twist','rollers');
+----+-------------+-------+...
| id | select_type | table |...
+----+-------------+-------+...
| 1 | SIMPLE | words |...
+----+-------------+-------+...
So just prepare the statement in a stored procedure, then execute it. Here is the idea:
SET #words = (SELECT GROUP_CONCAT(word SEPARATOR '\',\'') FROM phrase_words);
SET #words = CONCAT("'", #words, "'");
SET #query = CONCAT("SELECT * FROM words WHERE word IN (", #words, ");";
PREPARE q FROM #query;
EXECUTE q;

Related

SQL calculated field based of data in another table

I have two tables, Engineering table and Instrumentation table. In the Engineering table I have the columns and possible data below:
Tag | Speed Control
PC-1234 |
ME-1235 |
BF-1236 |
In the Instrumentation Table I have the following columns and data
Function | Tag
SC | 1234
SC | 1235
SC | 1237
I want to automate the Speed Control column in the Engineering table to say Yes or No IF there is a line of data in the Instrumentation table with the function as SC and the Tag column have matching data with the number part of the Tag column in the Engineering table. So the results would like like the below:
Tag | Speed Control
PC-1234 | Yes
ME-1235 | Yes
BF-1236 | No
Please help with the best way to do this. Thanks in advance for any help.
You don't want a separate column in the Engineering table for this. You just need a view which you can query
CREATE VIEW EngineeringSpeedControl
AS
SELECT
e.Tag,
SpeedControl = CASE WHEN i.Tag IS NULL THEN 'No' ELSE 'Yes' END
FROM dbo.Engineering e
LEFT JOIN dbo.Instrumentation i
ON i.Tag = RIGHT(e.Tag, LEN(e.Tag) - 3)
AND i.[Function] = 'SC';
Unfortunately, due to the poor design of the tables, you need to muck around with string manipulation.
Ideally you would have the Engineering.Tag column split into separate parts, so you could just do a straight join
LEFT JOIN dbo.Instrumentation i
ON i.Tag = e.Tag
AND i.[Function] = 'SC';
as i don't know if the Tag column in engineers is always of the same format, i keep my query so that it can have similar design xxxxx-nnnnnnnnn with a minus between.
UPDATE [dbo].[Engineering]
SET [Speed Control] =
CASE WHEN EXISTS ( SELECT 1 FROM [dbo].[Instrumentation] i WHERE RIGHT([dbo].[Engineering] .[tag],CHARINDEX('-', (REVERSE([dbo].[Engineering] .[tag]))) - 1) = CAST(i.[Tag] AS VARCHAR(10))) then 'YES' ELSE 'NO' END
WHERE [Speed Control] IS NULL
result will be
Create a function and use a function in calculated columns
create table instrumentation([Function] varchar(200) null, Tag varchar(200) null)
insert into instrumentation values('SC', '1234'),('SC', '1235'),('SC', '1237')
create Function fn_Speed (#tag varchar(200))
returns varchar(200)
as
begin
declare #tagg varchar(200)= (select SUBSTRING(#tag, charindex('-', #tag)+1,10))
declare #result varchar(200)
--return #tagg
If exists (
select 1 from instrumentation where tag =#tagg)
select #result= 'True'
else
select #result= 'False'
return #result
end
Create table engineering (tag varchar(200), Speed as dbo.fn_Speed (tag) )
insert into engineering(tag)values('PC-1234'), ('ME-1235'), ('BF-1236')

How does one automatically insert the results of several function calls into a table?

Wasn't sure how to title the question but hopefully this makes sense :)
I have a table (OldTable) with an index and a column of comma separated lists. I'm trying to split the strings in the list column and create a new table with the indexes coupled with each of the sub strings of the string it was connected to in the old table.
Example:
OldTable
index | list
1 | 'a,b,c'
2 | 'd,e,f'
NewTable
index | letter
1 | 'a'
1 | 'b'
1 | 'c'
2 | 'd'
2 | 'e'
2 | 'f'
I have created a function that will split the string and return each sub string as a record in a 1 column table as so:
SELECT * FROM Split('a,b,c', ',', 1)
Which will result in:
Result
index | string
1 | 'a'
1 | 'b'
1 | 'c'
I was hoping that I could use this function as so:
SELECT * FROM Split((SELECT * FROM OldTable), ',')
And then use the id and string columns from OldTable in my function (by re-writing it slightly) to create NewTable. But I as far as I understand sending tables into the function doesn't work as I get: "Subquery returned more than 1 value. ... not premitted ... when the subquery is used as an expression."
One solution I was thinking of would be to run the function, as is, on all the rows of OldTable and insert the result of each call into NewTable. But I'm not sure how to iterate each row without a function. And I can't send tables into the a function to iterate so I'm back at square one.
I could do it manually but OldTable contains a few records (1000 or so) so it seems like automation would be preferable.
Is there a way to either:
Iterate over OldTable row by row, run the row through Split(), add the result to NewTable for all rows in OldTable. Either by a function or through regular sql-transactions
Re-write Split() to take a table variable after all
Get rid of the function altogether and just do it in sql transactions?
I'd prefer to not use procedures (don't know if there is a solutions with them either) mostly because I don't want the functionality inside of the DB to be exposed to the outside. If, however that is the "best"/only way to go I'll have to consider it. I'm quite (read very) new to SQL so it might be a needless worry.
Here is my Split() function if it is needed:
CREATE FUNCTION Split (
#string nvarchar(4000),
#delimitor nvarchar(10),
#indexint = 0
)
RETURNS #splitTable TABLE (id int, string nvarchar(4000) NOT NULL) AS
BEGIN
DECLARE #startOfSubString smallint;
DECLARE #endOfSubString smallint;
SET #startOfSubString = 1;
SET #endOfSubString = CHARINDEX(#delimitor, #string, #startOfSubString);
IF (#endOfSubString <> 0)
WHILE #endOfSubString > 0
BEGIN
INSERT INTO #splitTable
SELECT #index, SUBSTRING(#string, #startOfSubString, #endOfSubString - #startOfSubString);
SET #startOfSubString = #endOfSubString+1;
SET #endOfSubString = CHARINDEX(#delimitor, #string, #startOfSubString);
END;
INSERT INTO #splitTable
SELECT #index, SUBSTRING(#string, #startOfSubString, LEN(#string)-#startOfSubString+1);
RETURN;
END
Hope my problem and attempt was explained and possible to understand.
You are looking for cross apply:
SELECT t.index, s.item
FROM OldTable t CROSS APPLY
(dbo.split(t.list, ',')) s(item);
Inserting in the new table just requires an insert or select into clause.

Execute table valued function from row values

Given a table as below where fn contains the name of an existing table valued functions and param contains the param to be passed to the function
fn | param
----------------
'fn_one' | 1001
'fn_two' | 1001
'fn_one' | 1002
'fn_two' | 1002
Is there a way to get a resulting table like this by using set-based operations?
The resulting table would contain 0-* lines for each line from the first table.
param | resultval
---------------------------
1001 | 'fn_one_result_a'
1001 | 'fn_one_result_b'
1001 | 'fn_two_result_one'
1002 | 'fn_two_result_one'
I thought I could do something like (pseudo)
select t1.param, t2.resultval
from table1 t1
cross join exec sp_executesql('select * from '+t1.fn+'('+t1.param+')') t2
but that gives a syntax error at exec sp_executesql.
Currently we're using cursors to loop through the first table and insert into a second table with exec sp_executesql. While this does the job correctly, it is also the heaviest part of a frequently used stored procedure and I'm trying to optimize it. Changes to the data model would probably imply changes to most of the core of the application and that would cost more then just throwing hardware at sql server.
I believe that this should do what you need, using dynamic SQL to generate a single statement that can give you your results and then using that with EXEC to put them into your table. The FOR XML trick is a common one for concatenating VARCHAR values together from multiple rows. It has to be written with the AS [text()] for it to work.
--=========================================================
-- Set up
--=========================================================
CREATE TABLE dbo.TestTableFunctions (function_name VARCHAR(50) NOT NULL, parameter VARCHAR(20) NOT NULL)
INSERT INTO dbo.TestTableFunctions (function_name, parameter)
VALUES ('fn_one', '1001'), ('fn_two', '1001'), ('fn_one', '1002'), ('fn_two', '1002')
CREATE TABLE dbo.TestTableFunctionsResults (function_name VARCHAR(50) NOT NULL, parameter VARCHAR(20) NOT NULL, result VARCHAR(200) NOT NULL)
GO
CREATE FUNCTION dbo.fn_one
(
#parameter VARCHAR(20)
)
RETURNS TABLE
AS
RETURN
SELECT 'fn_one_' + #parameter AS result
GO
CREATE FUNCTION dbo.fn_two
(
#parameter VARCHAR(20)
)
RETURNS TABLE
AS
RETURN
SELECT 'fn_two_' + #parameter AS result
GO
--=========================================================
-- The important stuff
--=========================================================
DECLARE #sql VARCHAR(MAX)
SELECT #sql =
(
SELECT 'SELECT ''' + T1.function_name + ''', ''' + T1.parameter + ''', F.result FROM ' + T1.function_name + '(' + T1.parameter + ') F UNION ALL ' AS [text()]
FROM
TestTableFunctions T1
FOR XML PATH ('')
)
SELECT #sql = SUBSTRING(#sql, 1, LEN(#sql) - 10)
INSERT INTO dbo.TestTableFunctionsResults
EXEC(#sql)
SELECT * FROM dbo.TestTableFunctionsResults
--=========================================================
-- Clean up
--=========================================================
DROP TABLE dbo.TestTableFunctions
DROP TABLE dbo.TestTableFunctionsResults
DROP FUNCTION dbo.fn_one
DROP FUNCTION dbo.fn_two
GO
The first SELECT statement (ignoring the setup) builds a string which has the syntax to run all of the functions in your table, returning the results all UNIONed together. That makes it possible to run the string with EXEC, which means that you can then INSERT those results into your table.
A couple of quick notes though... First, the functions must all return identical result set structures - the same number of columns with the same data types (technically, they might be able to be different data types if SQL Server can always do implicit conversions on them, but it's really not worth the risk). Second, if someone were able to update your functions table they could use SQL injection to wreak havoc on your system. You'll need that to be tightly controlled and I wouldn't let users just enter in function names, etc.
You cannot access objects by referencing their names in a SQL statement. One method would be to use a case statement:
select t1.*,
(case when fn = 'fn_one' then dbo.fn_one(t1.param)
when fn = 'fn_two' then dbo.fn_two(t1.param)
end) as resultval
from table1 t1 ;
Interestingly, you could encapsulate the case as another function, and then do:
select t1.*, dbo.fn_generic(t1.fn, t1.param) as resultval
from table1 t1 ;
However, in SQL Server, you cannot use dynamic SQL in a user-defined function (defined in T-SQL), so you would still need to use case or similar logic.
Either of these methods is likely to be much faster than a cursor, because they do not require issuing multiple queries.

creating a SQL table with multiple columns automatically

I must create an SQL table with 90+ fields, the majority of them are bit fields like N01, N02, N03 ... N89, N90 is there a fast way of creating multiple fileds or is it possible to have one single field to contain an array of values true/false? I need a solution that can also easily be queried.
There is no easy way to do this and it will be very challenging to do queries against such a table. Create a table with three columns - item number, bit field number and a value field. Then you will be able to write 'good' succinct Tsql queries against the table.
At least you can generate ALTER TABLE scripts for bit fields, and then run those scripts.
DECLARE #COUNTER INT = 1
WHILE #COUNTER < 10
BEGIN
PRINT 'ALTER TABLE table_name ADD N' + RIGHT('00' + CONVERT(NVARCHAR(4), #COUNTER), 2) + ' bit'
SET #COUNTER += 1
END
TLDR: Use binary arithmetic.
For a structure like this
==============
Table_Original
==============
Id | N01| N02 |...
I would recommend an alternate table structure like this
==============
Table_Alternate
==============
Id | One_Col
This One_Col is of varchar type which will have value set as
cast(n01 as nvarchar(1)) + cast(n02 as nvarchar(1))+ cast(n03 as nvarchar(1)) as One_Col
I however feel that you'd use C# or some other programming language to set value into column. You can also use bit and bit-shift operations.
Whenever you need to get a value, you can use SQL or C# syntax(treating as string)
In sql query terms you can use a query like
SELECT SUBSTRING(one_col,#pos,1)
and #pos can be set like
DECLARE #Colname nvarchar(4)
SET #colname=N'N32'
-- ....
SET #pos= CAST(REPLACE(#colname,'N','') as INT)
Also you can use binary arithmetic too with ease in any programming language.
Use three columns.
Table
ID NUMBER,
FIELD_NAME VARCHAR2(10),
VALUE NUMBER(1)
Example
ID FIELD VALUE
1 N01 1
1 N02 0
.
1 N90 1
.
2 N01 0
2 N02 1
.
2 N90 1
.
You can also OR an entire column for a fieldname (or fieldnameS):
select DECODE(SUM(VALUE), 0, 0, 1) from table where field_name = 'N01';
And even perform an AND
select EXP(SUM(LN(VALUE))) from table where field_name = 'N01';
(see http://viralpatel.net/blogs/row-data-multiplication-in-oracle/)

SQL use column values for IF conditions

I have a Conditions table that looks like this :
Conditions
ConditionID
MappingID
VariableID
CompareToVariableID
ConditionOperator ( can be == or <> )
ConjunctionOperator ( can be && or || )
ConjunctionOrder
Now I have a SP that I want to implement these conditions on depending on MappingID.
If a MappingID is selected, all mapped conditions should be checked before anything happens in my SP.
SELECT *
FROM Conditions
WHERE MappingID = #MappingID
ORDER BY ConjunctionOrder
For example if a certain MappingID had three rows in the Conditions table then :
IF VARIABLEID CONDITIONOPERATOR COMPARETOVARIABLEID CONJUNCTIONOPERATOR --row1
VARIABLEID CONDITIONOPERATOR COMPARETOVARIABLEID CONJUNCTIONOPERATOR --row2
VARIABLEID CONDITIONOPERATOR COMPARETOVARIABLEID --row3 (ignore ConjunctionOperator for last row)
BEGIN
--my code goes here
END
ELSE
BEGIN
--my code here
END
I would like to know how I can implement the IF statement.
One way of doing it is like this (these are the steps in your stored procedure)
Take your conditions off the Conditons table and produce a special booolean expression out of it. You can do it by using a cursor over your SELECT query from your post. You should convert your variable IDs so that they won't mix with the variable values (you'll see later what I mean). Let's suppose your query returned:
|-----------------------------------------------------------------------------------------|
|VariableID |COmpareToVariableID |ConditionOperator |ConjunctionOperator |ConjunctionOrder|
|-----------|--------------------|------------------|--------------------|----------------|
|A |B |== |&& |1 |
|C |D |>> (NE) | |2 |
|-----------|--------------------|------------------|--------------------|----------------|
Your expression (kept in a local SP variable) should be:
SELECT 1 WHERE #A# = #B# AND #C# >> #D#
In other words you add SELECT 1 WHERE at the beginning and relpace condition operator == with = and conjunction operator && with 'AND', and || with 'OR'
Now you do a cursor query over your Variables table. In each loop you replace the names of the variables with its values (i.e. #A# is replaced with A value). At the end your expression should look like 'SELECT 1 WHERE 1=2 AND 3<>3'.
You use sp_executesql to execute your dynamically created query. You can put the output in an OUTPUT parameter and check it in your IF statment in your SP.
This is a basic example of what you want:
IF EXISTS (SELECT * FROM sys.tables WHERE name = '{0}') Select 1 ELSE SELECT 0
You can also try:
IF (SELECT ConditionID FROM Conditions)=COMPARETOVARIABLEID
BEGIN
---Do your thing
END
GO