Large query increases TempDB - sql-server-2005

I have huge query on my SQL 2005 Server. This have to be run once everyday but when this query runs temp db grows from 2GB to 48GB. What is the best way top optimize or find the reason why tempdb is growing when this query adds/updates on 80K records with (~120 columns) on a single table?
What should I do on this query that tempdb wouldn't grow so much?
Any suggestions will be appreciated.
NOTE:
This query doesn't have any temptables or table variables or CTEs.
just bunch of
INSERT INTO ... with MULTITABLE JOINS with SUBQueries...

You may want to look at this. It's likely that your query is using a temp-table to run, but it's very hard to tell without knowing anything about it.
Looking at your question update, it seems probable your subqueries are using the temptable strategy, which floods your TempDB. Try to get rid of those subqueries and/or reduce the amount of data you are working with in a single query run should help reduce growth of your TempDB.

Without seeing the exact code, it is hard to help you. But the query seems to need to be optimized.
Of course you could just size your temp db to stay at 48 GB, at least this way it won;t have to take the time to grow when this thing runs.

Related

SAS: Approximating runtime/processing time for a query without running

Was wondering if there is a way to estimate the total run time for a query without actually processing the query?
I have found when running particular queries it might take hours and I guess it would come in handy to know the approximate completion time as there have been times where I was stuck at work due to waiting for a query to finish.
Sorry not sure if this is a silly question I'm a bit new with SAS.
Thanks guys.
This comes down to how well you know the data you're working with. There is no simple method of estimating this that is guaranteed to work in all situations, as there are so many factors that contribute to query performance. That said, there are a few heuristics you can use:
If you're reading every row from a large table, try reading a small proportion of it first before scaling up to get an idea of how much that read will contribute to the total query execution time.
Try running your query with proc sql inobs = 100 _method; to find out what sorts of joins the query planner is selecting. If there are any cartesian joins (sqxjsl in the log output), your query is going to take at least O(m*n) to run, where m and n are the numbers of rows in the tables being joined.
Check whether there are any indexes on the tables that could potentially speed up your query.

SQL Performance - Views vs Stored Procedure

I have a statement that takes around 15 seconds to load, which is way too long.. I would like to see what is the best way to 'Cache' this data into the memory. Would I use somekind of View or Stored Procedure for this? I'm aware i can use triggers and another table, but I would like to avoid that at all costs, there is quite a bit of memory to spare.
Any suggestions?
You could check out indexed views (usually called materialized views in other RDBMS).
Do you know why your query is taking 15 seconds to run? Is the query working off the correct indexes? As others have mentioned, running the same query within a stored procedure is going to produce the same performance as the execution plan will be the same.
You might get better mileage out of using the SQL Query Optimizer and optimizing out the bottlenecks in your query. This is a good article on using the SQL Query Optimizer.
IT ALL DEPEND ON YOU, MAKE SURE TO CHECK UR EXECUTION PLAIN AND TRY TO AFFORD TOO MUCH OF SCAN, U WILL GET BETTER PERFROMANCE. I HOPE THIS HELP

How to improve query performance

I have a lot of records in table. When I execute the following query it takes a lot of time. How can I improve the performance?
SET ROWCOUNT 10
SELECT StxnID
,Sprovider.description as SProvider
,txnID
,Request
,Raw
,Status
,txnBal
,Stxn.CreatedBy
,Stxn.CreatedOn
,Stxn.ModifiedBy
,Stxn.ModifiedOn
,Stxn.isDeleted
FROM Stxn,Sprovider
WHERE Stxn.SproviderID = SProvider.Sproviderid
AND Stxn.SProviderid = ISNULL(#pSProviderID,Stxn.SProviderid)
AND Stxn.status = ISNULL(#pStatus,Stxn.status)
AND Stxn.CreatedOn BETWEEN ISNULL(#pStartDate,getdate()-1) and ISNULL(#pEndDate,getdate())
AND Stxn.CreatedBy = ISNULL(#pSellerId,Stxn.CreatedBy)
ORDER BY StxnID DESC
The stxn table has more than 100,000 records.
The query is run from a report viewer in asp.net c#.
This is my go-to article when I'm trying to do a search query that has several search conditions which might be optional.
http://www.sommarskog.se/dyn-search-2008.html
The biggest problem with your query is the column=ISNULL(#column, column) syntax. MSSQL won't use an index for that. Consider changing it to (column = #column AND #column IS NOT NULL)
You should consider using the execution plan and look for missing indexes. Also, how long it takes to execute? What is slow for you?
Maybe you could also not return so many rows, but that is just a guess. Actually we need to see your table and indexes plus the execution plan.
Check sql-tuning-tutorial
For one, use SELECT TOP () instead of SET ROWCOUNT - the optimizer will have a much better chance that way. Another suggestion is to use a proper inner join instead of potentially ending up with a cartesian product using the old style table,table join syntax (this is not the case here but it can happen much easier with the old syntax). Should be:
...
FROM Stxn INNER JOIN Sprovider
ON Stxn.SproviderID = SProvider.Sproviderid
...
And if you think 100K rows is a lot, or that this volume is a reason for slowness, you're sorely mistaken. Most likely you have really poor indexing strategies in place, possibly some parameter sniffing, possibly some implicit conversions... hard to tell without understanding the data types, indexes and seeing the plan.
There are a lot of things that could impact the performance of query. Although 100k records really isn't all that many.
Items to consider (in no particular order)
Hardware:
Is SQL Server memory constrained? In other words, does it have enough RAM to do its job? If it is swapping memory to disk, then this is a sure sign that you need an upgrade.
Is the machine disk constrained. In other words, are the drives fast enough to keep up with the queries you need to run? If it's memory constrained, then disk speed becomes a larger factor.
Is the machine processor constrained? For example, when you execute the query does the processor spike for long periods of time? Or, are there already lots of other queries running that are taking resources away from yours...
Database Structure:
Do you have indexes on the columns used in your where clause? If the tables do not have indexes then it will have to do a full scan of both tables to determine which records match.
Eliminate the ISNULL function calls. If this is a direct query, have the calling code validate the parameters and set default values before executing. If it is in a stored procedure, do the checks at the top of the s'proc. Unless you are executing this with RECOMPILE that does parameter sniffing, those functions will have to be evaluated for each row..
Network:
Is the network slow between you and the server? Depending on the amount of data pulled you could be pulling GB's of data across the wire. I'm not sure what is stored in the "raw" column. The first question you need to ask here is "how much data is going back to the client?" For example, if each record is 1MB+ in size, then you'll probably have disk and network constraints at play.
General:
I'm not sure what "slow" means in your question. Does it mean that the query is taking around 1 second to process or does it mean it's taking 5 minutes? Everything is relative here.
Basically, it is going to be impossible to give a hard answer without a lot of questions asked by you. All of these will bear out if you profile the queries, understand what and how much is going back to the client and watch the interactions amongst the various parts.
Finally depending on the amount of data going back to the client there might not be a way to improve performance short of hardware changes.
Make sure Stxn.SproviderID, Stxn.status, Stxn.CreatedOn, Stxn.CreatedBy, Stxn.StxnID and SProvider.Sproviderid all have indexes defined.
(NB -- you might not need all, but it can't hurt.)
I don't see much that can be done on the query itself, but I can see things being done on the schema :
Create an index / PK on Stxn.SproviderID
Create an index / PK on SProvider.Sproviderid
Create indexes on status, CreatedOn, CreatedBy, StxnID
Something to consider: When ROWCOUNT or TOP are used with an ORDER BY clause, the entire result set is created and sorted first and then the top 10 results are returned.
How does this run without the Order By clause?

Is it quicker to insert sorted data into a Sybase table?

A table in Sybase has a unique varchar(32) column, and a few other columns. It is indexed on this column too.
At regular intervals, I need to truncate it, and repopulate it with fresh data from other tables.
insert into MyTable
select list_of_columns
from OtherTable
where some_simple_conditions
order by MyUniqueId
If we are dealing with a few thousand rows, would it help speed up the insert if we have the order by clause for the select? If so, would this gain in time compensate for the extra time needed to order the select query?
I could try this out, but currently my data set is small and the results don’t say much.
With only a few thousand rows, you're not likely to see much difference even if it is a little faster. If you anticipate approaching 10,000 rows or so, that's when you'll probably start seeing a noticeable difference -- try creating a large test data set and doing a benchmark to see if it helps.
Since you're truncating, though, deleting and recreating the index should be faster than inserting into a table with an existing index. Again, for a relatively small table, it shouldn't matter -- if everything can fit comfortably in the amount of RAM you have available, then it's going to be pretty quick.
One other thought -- depending on how Sybase does its indexing, passing a sorted list could slow it down. Try benchmarking against an ORDER BY RANDOM() to see if this is the case.
I don't believe order speeds in INSERT, so don't run ORDER BY in a vain attempt to improve performance.
I'd say that it doesn't really matter in which order you execute these functions.
Just use the normal way of inserting INSERT INTO, and do the rest afterwards.
I can't say about sybase, but MS SQL inserts faster if records are sorted carefully. Sorting can minimize number of index expansions. As you know it is better to populate the table ant then create index. Sorting data before insertion leads to the similar effect.
The order in which you insert data will generally not improve performance. The issues that affect insert speed have more to do with your databases mechanisms for data storage than the order of inserts.
One performance problem you may experience when inserting a lot of data into a table is the time it takes to update indexes on the table. However again in this case the order in which you insert data will not help you.
If you have a lot of data and by a lot I mean hundreds of thousands perhaps millions of records you could consider dropping the indexes on the table, inserting the records then recreating the indexes.
Dropping and recreating indexes (at least in SQL server) is by far the best way to do the inserts. At least some of the time ;-) Seriously though, if you aren't noticing any major performance problems, don't mess with it.

Temp tables and SQL SELECT performance

Why does the use of temp tables with a SELECT statement improve the logical I/O count? Wouldn't it increase the amount of hits to a database instead of decreasing it. Is this because the 'problem' is broken down into sections? I'd like to know what's going on behind the scenes.
There's no general answer. It depends on how the temp table is being used.
The temp table may reduce IO by caching rows created after a complex filter/join that are used multiple times later in the batch. This way, the DB can avoid hitting the base tables multiple times when only a subset of the records are needed.
The temp table may increase IO by storing records that are never used later in the query, or by taking up a lot of space in the engine's cache that could have been better used by other data.
Creating a temp table to use all of its contents once is slower than including the temp's query in the main query because the query optimizer can't see past the temp table and it forces a (probably) unnecessary spool of the data instead of allowing it to stream from the source tables.
I'm going to assume by temp tables you mean a sub-select in a WHERE clause. (This is referred to as a semijoin operation and you can usually see that in the text execution plan for your query.)
When the query optimizer encounter a sub-select/temp table, it makes some assumptions about what to do with that data. Essentially, the optimizer will create an execution plan that performs a join on the sub-select's result set, reducing the number of rows that need to be read from the other tables. Since there are less rows, the query engine is able to read less pages from disk/memory and reduce the amount of I/O required.
AFAIK, at least with mysql, tmp tables are kept in RAM, making SELECTs much faster than anything that hits the HD
There are a class of problems where building the result in a collection structure on the database side is much preferable to returning the result's parts to the client, roundtripping for each part.
For example: arbitrary depth recursive relationships (boss of)
There's another class of query problems where the data is not and will not be indexed in a manner that makes the query run efficiently. Pulling results into a collection structure, which can be indexed in a custom way, will reduce the logical IO for these queries.