Stored procedure SQL optimization - sql

Im running a query and its taking a long time. Here is my sample code:
SELECT #AppleCount=COUNT(*)
FROM (
SELECT * FROM #iDToStoreMapping sm
WHERE StoreFront=73
AND sm.CategoryCountryCategoryTYpeMappingID NOT IN
(SELECT * FROM #FinishedDls)
) rows
#AppleCount is supposed to be all the categoryCountryCategoryTypeMappingId's in existence and #FinishedDls has that id if an application finished its download and wrote that id in there, so this query is supposed to get the count of those ids which haven't downloaded yet. There's about 50k ids. and i have to run this query 3 times, but each one takes a couple mins. Is there anything im doing wrong?

Sometimes using an explicit join instead of not in results in better performance:
SELECT #AppleCount = COUNT(*)
FROM #iDToStoreMapping sm left outer join
#FinishedDls fd
on sm.CategoryCountryCategoryTYpeMappingID = fd.id
WHERE StoreFront = 73 and
fd.id is null;

I didnt use a primary key on my table variables and that is what caused the terrible performance. Sorry everyone.

Related

SQL - LEFT JOIN takes an extremely long time to execute

I'm trying to see if there are any rows in table A which I've missed in table B.
For this, I'm using the following query:
SELECT t1.cusa
FROM patch t1
LEFT JOIN trophy t2
ON t2.titleid = t1.titleid
WHERE t2.titleid IS NULL
And the query worked before, but now that the trophy table has nearly 200.000 rows, it's extremely slow. I've waited 5 minutes for it to execute but it was still loading and timed out eventually.
Is there any way to speed this query up?
Adding Indexes to titleId on both tables (but especially t2) is the quickest way to get better performance. 200K records is nothing for SQL Server.
Try this and it might perform a bit better!
SELECT t1.cusa
FROM patch t1
WHERE NOT EXISTS (SELECT 1
FROM trophy t2
WHERE t2.titleid = t1.titleid );

Long SQL subquery trouble

I just registered and want to ask.
I learn sql queries not so long time and I got a trouble when I decided to move a table to another database. A few articles were read about building long subqueries , but they didn't help me.
Everything works perfect before that my action.
I just moved the table and tried to rewrite the query while whole day.
update [dbo].Full
set [salary] = 1000
where [dbo].Full.id in (
select distinct k1.id
from (
select id, Topic, User
from Full
where User not in (select distinct topic_name from [DB_1].dbo.S_School)
) k1
where k1.id not in (
select distinct k2.id
from (
select id, Topic, User
from Full
where User not in (select distinct topic_name from [DB_1].dbo.Shool)
) k2,
List_School t3
where charindex (t3.NameApp, k2.Topic)>5
)
)
I moved table List_School to database [DB_1] and I can't to bend with it.
I can't write [DB_1].dbo.List_School. Should I use one more subquery?
I even thought about create a few temporary tables but it can influence on speed of execution.
Sql gurus , please invest some your time on me. Thank you in advance.
I will be happy for each hint, which you give me.
There appear to be a number of issues. You are comparing the user column to the topic_name column. An expected meaning of those column names would suggest you are not comparing the correct columns. But that is a guess.
In the final subquery you have an ansi join on table List_School but no join columns which means the join witk k2 is a cartesian product (aka cross join) which is not what you would want in most situations. Again a guess as no details of actual problem data or error messages was provided.

Can this SQL Query be optimized to run faster?

I have an SQL Query (For SQL Server 2008 R2) that takes a very long time to complete. I was wondering if there was a better way of doing it?
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND t.Code NOT IN (SELECT Code FROM ExcludedCodes)
Table1 has around 90Million rows in it and is indexed by Name and Code.
ExcludedCodes only has around 30 rows in it.
This query is in a stored procedure and gets called around 40k times, the total time it takes the procedure to finish is 27 minutes.. I believe this is my biggest bottleneck because of the massive amount of rows it queries against and the number of times it does it.
So if you know of a good way to optimize this it would be greatly appreciated! If it cannot be optimized then I guess im stuck with 27 min...
EDIT
I changed the NOT IN to NOT EXISTS and it cut the time down to 10:59, so that alone is a massive gain on my part. I am still going to attempt to do the group by statement as suggested below but that will require a complete rewrite of the stored procedure and might take some time... (as I said before, im not the best at SQL but it is starting to grow on me. ^^)
In addition to workarounds to get the query itself to respond faster, have you considered maintaining a column in the table that tells whether it is in this set or not? It requires a lot of maintenance but if the ExcludedCodes table does not change often, it might be better to do that maintenance. For example you could add a BIT column:
ALTER TABLE dbo.Table1 ADD IsExcluded BIT;
Make it NOT NULL and default to 0. Then you could create a filtered index:
CREATE INDEX n ON dbo.Table1(name)
WHERE IsExcluded = 0;
Now you just have to update the table once:
UPDATE t
SET IsExcluded = 1
FROM dbo.Table1 AS t
INNER JOIN dbo.ExcludedCodes AS x
ON t.Code = x.Code;
And ongoing you'd have to maintain this with triggers on both tables. With this in place, your query becomes:
SELECT #Count = COUNT(Name)
FROM dbo.Table1 WHERE IsExcluded = 0;
EDIT
As for "NOT IN being slower than LEFT JOIN" here is a simple test I performed on only a few thousand rows:
EDIT 2
I'm not sure why this query wouldn't do what you're after, and be far more efficient than your 40K loop:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
WHERE src.Code NOT IN (SELECT Code FROM dbo.ExcludedCodes)
GROUP BY src.Name;
Or the LEFT JOIN equivalent:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
LEFT OUTER JOIN dbo.ExcludedCodes AS x
ON src.Code = x.Code
WHERE x.Code IS NULL
GROUP BY src.Name;
I would put money on either of those queries taking less than 27 minutes. I would even suggest that running both queries sequentially will be far faster than your one query that takes 27 minutes.
Finally, you might consider an indexed view. I don't know your table structure and whether your violate any of the restrictions but it is worth investigating IMHO.
You say this gets called around 40K times. WHy? Is it in a cursor? If so do you really need a cursor. Couldn't you put the values you want for #name in a temp table and index it and then join to it?
select t.name, count(t.name)
from table t
join #name n on t.name = n.name
where NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t.code)
group by t.name
That might get you all your results in one query and is almost certainly faster than 40K separate queries. Of course if you need the count of all the names, it's even simpleer
select t.name, count(t.name)
from table t
NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t
group by t.name
NOT EXISTS typically performs better than NOT IN, but you should test it on your system.
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND NOT EXISTS (SELECT 1 FROM ExcludedCodes e WHERE e.Code = t.Code)
Without knowing more about your query it's tough to supply concrete optimization suggestions (i.e. code suitable for copy/paste). Does it really need to run 40,000 times? Sounds like your stored procedure needs reworking, if that's feasible. You could exec the above once at the start of the proc and insert the results in a temp table, which can keep the indexes from Table1, and then join on that instead of running this query.
This particular bit might not even be the bottleneck that makes your query run 27 minutes. For example, are you using a cursor over those 90 million rows, or scalar valued UDFs in your WHERE clauses?
Have you thought about doing the query once and populating the data in a table variable or temp table? Something like
insert into #temp (name, Namecount)
values Name, Count(name)
from table1
where name not in(select code from excludedcodes)
group by name
And don't forget that you could possibly use a filtered index as long as the excluded codes table is somewhat static.
Start evaluating the execution plan. Which is the heaviest part to compute?
Regarding the relation between the two tables, use a JOIN on indexed columns: indexes will optimize query execution.

Why is an IN statement with a list of items faster than an IN statement with a subquery?

I'm having the following situation:
I've got a quite complex view from which I've to select a couple of records.
SELECT * FROM VW_Test INNER JOIN TBL_Test ON VW_Test.id = TBL_Test.id
WHERE VW_Test.id IN (1000,1001,1002,1003,1004,[etc])
This returns a result practically instantly (currently with 25 items in that IN statement). However when I use the following query it slows down really fast.
SELECT * FROM VW_Test INNER JOIN TBL_Test ON VW_Test.id = TBL_Test.id
WHERE VW_Test.id IN (SELECT id FROM TBL_Test)
With 25 records in the TBL_Test this query takes about 5 seconds. I've got an index on that id in the TBL_Test.
Anyone got an idea why this happens and how to get performance up?
EDIT: I forgot to mention that this subquery
SELECT id FROM TBL_Test
returns a result instantly as well.
Well, when using a subquery the database engine will first have to generate the results for the subquery before it can do anything else, which takes time. If you have a predefined list, this will not need to happen and the engine can simply use those values 'as is'. At least, this is how I understand it.
How to improve performance: do away with the subquery. I don't think you even need the IN clause in this case. The INNER JOIN should suffice.

SQL queries with views and subqueries

select nid, avg, std from sView1
where sid = 4891
and nid in (select distinct nid from tblref where rid = 799)
and oidin (select distinct oid from tblref where rid = 799)
and anscount > 3
This is a query I'm currently trying to run. And running it like this takes about 3-4 seconds. However, if I replace the "4891" value with a subquery saying (select distinct sid from tblref where rid = 799) the procedure just hangs, even though the subquery only returns one sid.
The query is supposed to return a dataset with averages (avg) and standard deviations (std) over a resultset which is calculated through nested views in sView1. This dataset is then run through another view to get some top-level averages and stdevs.
The averages may need to include more than 1 sid (sid identifies a dataset).
It's difficult describing it more without revealing codebase and codestructure that shouldn't be revealed ;)
Can anyone suggest why the query hangs when trying to use the subquery? (The code is rebuilt from originally using nested cursors, since I have been told that cursors are the work of the devil, and nested cursors may make me sterile)
Try this. Exists returns as soon as it finds a matching condition, select distinct will require going through the dataset and optionally sorting it to remove the duplicates.
SELECT nid,avg,std from sView1 AS SV
WHERE EXISTS (SELECT * FROM TblRef AS TR WHERE sv.sid = Tr.sid AND Sv.nid = tr.nid AND sv.oid = tr.oid AND tr.rid = 799)
AND ansCount>3
Also, it is pretty difficult to provide a meaningful answer without access to query plans and table structures. So DDL and sample data will definitely help.