T-Sql - Select query in another select query takes long time - sql

I have a procedure with arguments but its calling takes a very long time. I decided to check what is wrong with my query and came to the conclusion that the problem is Column In (SELECT [...]).
Both queries return 1500 rows.
First query: time 45 second
Second query: time 0 second
1.
declare #FILTER_OPTION int
declare #ID_DISTRIBUTOR type_int_value
declare #ID_DATA_TYPE type_bigint_value
declare #ID_AGGREGATION_TYPE type_int_value
set #FILTER_OPTION = 8
insert into #ID_DISTRIBUTOR values (19)
insert into #ID_DATA_TYPE values (30025)
insert into #ID_AGGREGATION_TYPE values (10)
SELECT * FROM dbo.[DATA] WHERE
[ID_DISTRIBUTOR] IN (select [VALUE] from #ID_DISTRIBUTOR)
AND [ID_DATA_TYPE] IN (select [VALUE] from #ID_DATA_TYPE)
AND [ID_AGGREGATION_TYPE] IN (select [VALUE] from #ID_AGGREGATION_TYPE)
2.
select * FROM dbo.[DATA] WHERE
[ID_DISTRIBUTOR] IN (19)
AND [ID_DATA_TYPE] IN (30025)
AND [ID_AGGREGATION_TYPE] IN (10)
Why this is happening?
How should I create a stored procedure that takes an array of arguments to use it quickly?
Edit:
Maybe it's a problem with indexes? indexes are created on these three columns.

For such a large performance difference, I would guess that you have one or more indexes. In particular, if you have an index on (ID_DISTRIBUTOR, ID_DATA_TYPE, ID_AGGREGATION_TYPE), then the second query can make use of the index. SQL Server can recognize that the IN is really = and the query is a simple lookup.
In the first case, SQL Server doesn't "know" that the subqueries really have only one row in them. That requires a different set of optimizations. In particular, the above index cannot be used, because the IN generally optimizes differently from =.
As for what to do. First, look at the execution plans so you can see the different between the two versions. Then, test the second version with more than one value in the IN lists.
If you can live with just one value for each comparison, then use = rather than IN.

Related

Is there any SQL query character limit while executing it by using the JDBC driver [duplicate]

I'm using the following code:
SELECT * FROM table
WHERE Col IN (123,123,222,....)
However, if I put more than ~3000 numbers in the IN clause, SQL throws an error.
Does anyone know if there's a size limit or anything similar?!!
Depending on the database engine you are using, there can be limits on the length of an instruction.
SQL Server has a very large limit:
http://msdn.microsoft.com/en-us/library/ms143432.aspx
ORACLE has a very easy to reach limit on the other side.
So, for large IN clauses, it's better to create a temp table, insert the values and do a JOIN. It works faster also.
There is a limit, but you can split your values into separate blocks of in()
Select *
From table
Where Col IN (123,123,222,....)
or Col IN (456,878,888,....)
Parameterize the query and pass the ids in using a Table Valued Parameter.
For example, define the following type:
CREATE TYPE IdTable AS TABLE (Id INT NOT NULL PRIMARY KEY)
Along with the following stored procedure:
CREATE PROCEDURE sp__Procedure_Name
#OrderIDs IdTable READONLY,
AS
SELECT *
FROM table
WHERE Col IN (SELECT Id FROM #OrderIDs)
Why not do a where IN a sub-select...
Pre-query into a temp table or something...
CREATE TABLE SomeTempTable AS
SELECT YourColumn
FROM SomeTable
WHERE UserPickedMultipleRecordsFromSomeListOrSomething
then...
SELECT * FROM OtherTable
WHERE YourColumn IN ( SELECT YourColumn FROM SomeTempTable )
Depending on your version, use a table valued parameter in 2008, or some approach described here:
Arrays and Lists in SQL Server 2005
For MS SQL 2016, passing ints into the in, it looks like it can handle close to 38,000 records.
select * from user where userId in (1,2,3,etc)
I solved this by simply using ranges
WHERE Col >= 123 AND Col <= 10000
then removed unwanted records in the specified range by looping in the application code. It worked well for me because I was looping the record anyway and ignoring couple of thousand records didn't make any difference.
Of course, this is not a universal solution but it could work for situation if most values within min and max are required.
You did not specify the database engine in question; in Oracle, an option is to use tuples like this:
SELECT * FROM table
WHERE (Col, 1) IN ((123,1),(123,1),(222,1),....)
This ugly hack only works in Oracle SQL, see https://asktom.oracle.com/pls/asktom/asktom.search?tag=limit-and-conversion-very-long-in-list-where-x-in#9538075800346844400
However, a much better option is to use stored procedures and pass the values as an array.
You can use tuples like this:
SELECT * FROM table
WHERE (Col, 1) IN ((123,1),(123,1),(222,1),....)
There are no restrictions on number of these. It compares pairs.

SQL Function in column running slow

I have a computed column(function) that is causing one of my tables to be extremely slow (its output is a column in my table. I thought it might be some logical statements in my function. I commented those out and just returned a string called 'test'. This still caused the table to be slow. I believe the SELECT statement is slowing down the function. When I comment out the select statement, everything is cherry. I think I am not using functions in the correct manner.
FUNCTION [dbo].[Pend_Type](#Suspense_ID int, #Loan_ID nvarchar(10),#Suspense_Date datetime, #Investor nvarchar(10))
RETURNS nvarchar(20)
AS
BEGIN
DECLARE #Closing_Date Datetime, #Paid_Date Datetime
DECLARE #pendtype nvarchar(20)
--This is the issue!!!!
SELECT #Closing_Date = Date_Closing, #Paid_Date = Date_Paid from TABLE where Loan_ID = #Loan_ID
SET #pendtype = 'test'
--commented out logic
RETURN #pendtype
END
UPDATE:
I have another computed column that does something similar and is a column in the same table. This one runs fast. Anyone see a difference in why this would be?
Declare #yOrn AS nvarchar(1)
IF((Select count(suspense_ID) From TABLE where suspense_ID = #suspenseID) = 0)
SET #yOrn = 'N'
ELSE
SET #yOrn = 'Y'
RETURN #yOrn
You have isolated the performance problem in the select statement:
SELECT TOP 1 #Closing_Date = Date_Closing, #Paid_Date = Date_Paid
from TABLE
where Loan_ID = #Loan_ID;
To make this run faster, create a composite index on table(Load_id, Date_Closing, Date_Paid).
By the way, you are using top with no order by. When multiple rows match, you can get any one of them back. Normally, top is used with order by.
EDIT:
You can create the index by issuing the following command:
create index idx_table_load_closing_paid on table(Load_id, Date_Closing, Date_Paid);
Scalar functions are often executed like cursors, one row at a time; that is why they are slow and are to be avoided. I would not use the function as written but would write a set-based version instead. incidentally a select top 1 without an order by column will not always give you the same record and is generally a poor practice. In this case I would think you would want the latest date for instance or the earliest one.
In this particular case I think you would be better off not using a function but using a derived table join.

Optimizing stored procedure with multiple "LIKE"s

I am passing in a comma-delimited list of values that I need to compare to the database
Here is an example of the values I'm passing in:
#orgList = "1123, 223%, 54%"
To use the wildcard I think I have to do LIKE but the query runs a long time and only returns 14 rows (the results are correct, but it's just taking forever, probably because I'm using the join incorrectly)
Can I make it better?
This is what I do now:
declare #tempTable Table (SearchOrg nvarchar(max) )
insert into #tempTable
select * from dbo.udf_split(#orgList) as split
-- this splits the values at the comma and puts them in a temp table
-- then I do a join on the main table and the temp table to do a like on it....
-- but I think it's not right because it's too long.
select something
from maintable gt
join #tempTable tt on gt.org like tt.SearchOrg
where
AYEAR= ISNULL(#year, ayear)
and (AYEAR >= ISNULL(#yearR1, ayear) and ayear <= ISNULL(#yearr2, ayear))
and adate = ISNULL(#Date, adate)
and (adate >= ISNULL(#dateR1, adate) and adate <= ISNULL(#DateR2 , adate))
The final result would be all rows where the maintable.org is 1123, or starts with 223 or starts with 554
The reason for my date craziness is because sometimes the stored procedure only checks for a year, sometimes for a year range, sometimes for a specific date and sometimes for a date range... everything that's not used in passed in as null.
Maybe the problem is there?
Try something like this:
Declare #tempTable Table
(
-- Since the column is a varchar(10), you don't want to use nvarchar here.
SearchOrg varchar(20)
);
INSERT INTO #tempTable
SELECT * FROM dbo.udf_split(#orgList);
SELECT
something
FROM
maintable gt
WHERE
some where statements go here
And
Exists
(
SELECT 1
FROM #tempTable tt
WHERE gt.org Like tt.SearchOrg
)
Such a dynamic query with optional filters and LIKE driven by a table (!) are very hard to optimize because almost nothing is statically known. The optimizer has to create a very general plan.
You can do two things to speed this up by orders of magnitute:
Play with OPTION (RECOMPILE). If the compile times are acceptable this will at least deal with all the optional filters (but not with the LIKE table).
Do code generation and EXEC sp_executesql the code. Build a query with all LIKE clauses inlined into the SQL so that it looks like this: WHERE a LIKE #like0 OR a LIKE #like1 ... (not sure if you need OR or AND). This allows the optimizer to get rid of the join and just execute a normal predicate).
Your query may be difficult to optimize. Part of the question is what is in the where clause. You probably want to filter these first, and then do the join using like. Or, you can try to make the join faster, and then do a full table scan on the results.
SQL Server should optimize a like statement of the form 'abc%' -- that is, where the wildcard is at the end. (See here, for example.) So, you can start with an index on maintable.org. Fortunately, your examples meet this criteria. However, if you have '%abc' -- the wildcard comes first -- then the optimization won't work.
For the index to work best, it might also need to take into account the conditions in the where clause. In other words, adding the index is suggestive, but the rest of the query may preclude the use of the index.
And, let me add, the best solution for these types of searches is to use the full text search capability in SQL Server (see here).

How to structure a query with a large, complex where clause?

I have an SQL query that takes these parameters:
#SearchFor nvarchar(200) = null
,#SearchInLat Decimal(18,15) = null
,#SearchInLng Decimal(18,15) = null
,#SearchActivity int = null
,#SearchOffers bit = null
,#StartRow int
,#EndRow int
The variables #SearchFor, #SearchActivity, #SearchOffers can be either null or not null. #SearchInLat and #SearchInLng must both null, or both have values.
I'm not going to post the whole query as its boring and hard to read, but the WHERE clause is shaped like this:
( -- filter by activity --
(#SearchActivity IS NULL)
OR (#SearchActivity = Activities.ActivityID)
)
AND ( -- filter by Location --
(#SearchInLat is NULL AND #SearchInLng is NULL)
OR ( ... )
)
AND ( -- filter by activity --
#SearchActivity is NULL
OR ( ... )
)
AND ( -- filter by has offers --
#SearchOffers is NULL
OR ( ... )
)
AND (
... -- more stuff
)
I have read that this is a bad way to structure a query - that SqlServer has trouble working out an efficient execution plan with lots of clauses like this, so I'm looking for other ways to do it.
I see two ways of doing this:
Construct the query as a string in my client application, so that the WHERE clause only contains filters for the relevant parameters. The problem with this is it means not accessing the database through stored procedures, as everything else is at the moment.
Change the stored procedure so that it examines which arguments are null, and executes child procedures depending on which arguments it is passed. The problem here is that it would mean repeating myself a lot in the definition of the procs, and thus be harder to maintain.
What should I do? Or should I just keep on as I am currently doing? I have OPTION (RECOMPILE) set for the procedures, but I've heard that this doesn't work right in Server 2005. Also, I plan to add more parameters to this proc, so I want to make sure whatever solution I have is fairly scaleable.
The answer is to use DynamicSQL (be it in the client, or in an SP using sp_executesql), but the reason why is long, so here's a link...
Dynamic Search Conditions in T-SQL
A very short version is that one-size does not fit all. And as the optimiser creates one plan for one query, it's slow. So the solution is to continue using parameterised queries (for execution plan caching), but to have many queries, for the different types of search that can happen.
Perhaps an alternative might be to perform several separate select statements?
e.g.
( -- filter by activity --
if #SearchActivity is not null
insert into tmpTable (<columns>)
select *
from myTable
where (#SearchActivity = Activities.ActivityID)
)
( -- filter by Location --
if #SearchInLat is not null and #SearchInLng is not null
insert into tmpTable (<columns>)
select *
from myTable
where (latCol = #SearchInLat AND lngCol = #SearchInLng)
etc...
then select the temp table to return the final result set.
I'm not sure how this would work with respect to the optimiser and the query plans, but each individual select would be very straightforward and could utilise the indexes that you would have created on each column which should make them very quick.
Depending on your requirements it also may make sense to create a primary key on the temp table to allow you to join to it on each select (to avoid duplicates).
Look at the performance first, like others have said.
If possible, you can use IF clauses to simplify the queries based on what parameters are provided.
You could also use functions or views to encapsulate some of the code if you find you are repeating it often.

How to efficiently SELECT rows from database table based on selected set of values

I have a transaction table of 1 million rows. The table has a field name "Code" to keep customer's ID. There are about 10,000 different customer code.
I have an GUI interface allow user to render a report from transaction table. User may select arbitrary number of customers for rendering.
I use IN operator first and it works for few customers:
SELECT * FROM TRANS_TABLE WHERE CODE IN ('...', '...', '...')
I quickly run into problem if I select few thousand customers. There is limitation using IN operator.
An alternate way is create a temporary table with only one field of CODE, and inject selected customer codes into the temporary table using INSERT statement. I may then using
SELECT A.* FROM TRANS_TABLE A INNER JOIN TEMP B ON (A.CODE=B.CODE)
This works nice for huge selection. However, there is performance overhead for temporary table creation, INSERT injection and dropping of temporary table.
Do you aware of better solution to handle this situation?
If you use SQL Server 2008, the fastest way to do this is usually with a Table-Valued Parameter (TVP):
CREATE TYPE CodeTable AS TABLE
(
Code int NOT NULL PRIMARY KEY
)
DECLARE #Codes AS CodeTable
INSERT #Codes (Code) VALUES (1)
INSERT #Codes (Code) VALUES (2)
INSERT #Codes (Code) VALUES (3)
-- Snip codes
SELECT t.*
FROM #Codes c
INNER JOIN Trans_Table t
ON t.Code = c.Code
Using ADO.NET, you can populate the TVP directly from your code, so you don't need to generate all those INSERT statements - just pass in a DataTable and ADO.NET will handle the rest. So you can write a Stored Procedure like this:
CREATE PROCEDURE GetTransactions
#Codes CodeTable READONLY
AS
SELECT t.*
FROM #Codes c
INNER JOIN Trans_Table t
ON t.Code = c.Code
... and just pass in the #Codes value as a parameter.
You can generate SQL such as
SELECT * FROM TRANS_TABLE WHERE CODE IN (?,?,?,?,?,?,?,?,?,?,?)
and re-use it in a loop until you've loaded all the IDs you need. The advantage is that if you only need a few IDs your DB doesn't need to parse all those in-clauses. If many IDs is a rare case then the performance hit may not matter. If you are not worried about the SQL parsing cache then you can limit the size of the in clause to the DB's actual limit, so that sometimes you don't need a loop and other times you do.
As you have to pass the IDs somehow, IN should be the fastest way.
MSDN mentions:
Including an extremely large number of values (many thousands) in an IN clause can consume resources and return errors 8623 or 8632. To work around this problem, store the items in the IN list in a table.
If you still can use IN and the query is to slow, you could try to adjust your indexes like using some covering index for your query. Looking up random values by the clustered index can be slow, because of the random disk I/O required. A covering index could reduce that problem.
If you really pass limit of IN and you create a temporary table, I don't expect the creation of the table to be a major problem, as long as you insert the values at once (not thousands of queries of course). Choose the method with the least overhead, like one of those mentioned here:
http://blog.sqlauthority.com/2008/07/02/sql-server-2008-insert-multiple-records-using-one-insert-statement-use-of-row-constructor/
Of course, if there is some static pattern in your IDs you could select by that (like in SPs or UDFs). If you get those thousands of IDs out of your database itself, instead of passing them back and forth, you could just store them or use a subquery...
Maybe you could pass the customer codes to a stored procedure comma separated and use the split sql function mentioned here: http://www.devx.com/tips/Tip/20009.
Then declare a scalar table where you insert the splitted values in and use an IN clause.
CREATE PROCEDURE prc_dosomething (
#CustomerCodes varchar(MAX)
)
AS
DECLARE #customercodetable table(code varchar(10)) -- or whatever length you require.
SET #customercodetable = UTILfn_Split(#CustomerCodes) -- see the article above for the split function.
-- do some magic stuff here :).