I wrote an application that performs around 40 queries and then does some processing on the results of each query. (Right now I'm using Qt 3.2.2 in Visual C++ 6.0 on Windows XP with SQL Server 2005, but that's not required.) The paradigm is to create a QSqlQuery object with the query (this causes the query to be performed) and then while (query.next()) { operate(query.value(0)); } By profiling I find that the query.next() call is taking up half the time of the program, which seems excessive as it's just fetching a row of data (6 or 7 fields).
This performance is unacceptable, and I'm looking for a way to improve this. I'm open to changing anything -- switching my compiler, switching languages, switching the paradigm I use to get data from the database. How can I speed this up?
Here's the query:
select rtrim(Portfolio.securityid), rtrim(type), rtrim(coordinate), rtrim(value)
from MarketData
inner join portfolio
on cast(MarketData.securityid as varchar(36))=portfolio.securityid
where Portfolioname=?
and type in
('bond_profit', 'bondoption_profit', 'equity_profit', 'equityoption_profit')
and marketdate=?
order by Portfolio.securityid, type, coordinate
CPU usage is around 40% while the program is running, so I suspect it's spending the majority of its time waiting for the .next() call to return with more data.
Performing the same query in SSMS returns 4.5 million rows in about 5 minutes, but the total time spent waiting on .next() during the run of the program is 30 minutes.
Related
My task is to optimize a pretty heavy query (~10 000 rows). I would like to use multithreading, so each of the threads processed and returned a specific range of data, for example, I create 3 threads.
1st thread processes and returns first 100 rows,
2nd - next 100 rows,
3rd - next 100 rows
When a thread has finished it's work, it takes next 100 rows and so on till there are no more data to be returned.
I've read about TPL, but it has been a native functionality since .NET 4.0, but my project is based on 3.5. Also I read about Reactive library, which has TPL functionality for .NET 3.5, but was unable to get it working for me.
It boils down to this: how do I break the query down to pieces, which could be executed by a number of threads? (possibly in a loop)
P.S I prefer LINQ, but a simple textual script is acceptable as well
So after some tinkering I found a pretty basic way to achieve multithreaded query processing without TPL on .NET Framework 3.5
My approach:
Get the total row count of the table
Batch size = row count / thread count
Create the threads so each of them would get a specific row subset depending on the batch size. Info for SQL servers < 2012 Here and 2012+ Here
(example: table has 300 rows, we use 3 threads, each thread would return a batch of 100 rows)
Start all the threads and wait for them to complete (I used a flag)
Dispose of the threads
Don't forget to add "MultipleActiveResultSets=True" (MARS) when writing your connection string or db connection configuration. This will allow multiple batches on a single connection
This one works quite well for me. Please comment on this, if you have a better idea on how to approach multithreaded querying on .NET 3.5
I am trying to diagnose slow application performance on a client site.
A log file on the client machine tells me execution time for each query measured From the application side. It appears that many bare-bones simple queries to the remote DB are taking an exorbitant amount of time to complete. For example,
SELECT CONVERT(varchar, GETDATE(), 121)
This query is repeatedly taking over 5 seconds to execute as timed from the application. Other queries almost as simple (inserting one recordset into one table) are taking over a minute to complete. On my test system (with a copy of the client's database) I do not experience any of these problems.
I would suspect a slow network, except that the problem reliably disappears after running a report from Crystal Reports. Then after 1-2 hours the application slows down again.
For the sake of isolating the problem further, I would like to retrieve/log the execution time on the server side. I am trying to figure out what the best way of doing this is. I could use a variable to obtain the execution time for a single query, but I don't have the option of modifying every single query in my application.
sys.dm_exec_query_stats looked very promising for retrieving execution times for previous queries, but the millisecond values it reports for last_elapsed_time seem far too high.
Can anyone help me figure out how to obtain timing for my queries?
Here it is A way
set statistics time on
SELECT CONVERT(varchar, GETDATE(), 121)
set statistics time off
And it will report the time spend for the query as
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
I'm using an SQL Server 2012 and SET STATISTICS TIME ON to measure the CPU-time for my sql statements. I use this because i only want to get the time the database needs to execute the statement.
When returning large data from a select, i noticed the CPU-time going up pretty high, like using TOP 2000 will need about 400ms, but without it will need about 10000ms CPU-time.
What i'm not sure about is:
Is it possible that the CPU-time i get returned includes something like the time it needs to display the millions of rows returned in my Sql Server Management Studio? That would be pretty much of a bad situation.
Update:
The time i want to recieve is the execution time of the sql server without the time needed for the ssms to display the rows. There are several time statistics display in the Client statistics , but after searching for a long time it's really hard to find good references explaining what they are. Any suggestions?
Idea: elapsed time(sql server execution time) - client processing time (Client statistics)
Maybe this is an option?
In a multi-threaded world, CPU time is increasingly less helpful for simple tuning. Execution time is worth looking at.
To see if execution time (elapsed time) spent on displaying results is included you could SELECT TOP 2000 * INTO #temp to compare execution times.
Update:
My quick tests suggest the overhead of creating/inserting into a #temp table outweighs that of displaying results (at 5000). When I go to 50,000 results the SELECT INTO runs more quickly. The counts at which the two become equivalent depends on how many and what type of fields are returned. I tested with:
SET STATISTICS TIME ON
SELECT TOP 50000 NEWID()
FROM master..spt_values v1, master..spt_values v2
WHERE v1.number > 100
SET STATISTICS TIME OFF
-- CPU time = 32 ms, elapsed time = 121 ms.
SET STATISTICS TIME ON
SELECT TOP 50000 NEWID() col1
INTO #test
FROM master..spt_values v1, master..spt_values v2
WHERE v1.number > 100
SET STATISTICS TIME OFF
-- CPU time = 15 ms, elapsed time = 87 ms.
CPU time in SET STATISTICS TIME ON only counts the time that SQL Server needs to execute the query. It doesn't include any time the client takes to render the results. It also excludes any time SQL Server spends waiting for buffers to clear. In short, it really is pretty independent of the client.
I have the following script:
SELECT
DEPT.F03 AS F03, DEPT.F238 AS F238, SDP.F04 AS F04, SDP.F1022 AS F1022,
CAT.F17 AS F17, CAT.F1023 AS F1023, CAT.F1946 AS F1946
FROM
DEPT_TAB DEPT
LEFT OUTER JOIN
SDP_TAB SDP ON SDP.F03 = DEPT.F03,
CAT_TAB CAT
ORDER BY
DEPT.F03
The tables are huge, when I execute the script in SQL Server directly it takes around 4 min to execute, but when I run it in the third party program (SMS LOC based on Delphi) it gives me the error
<msg> out of memory</msg> <sql> the code </sql>
Is there anyway I can lighten the script to be executed? or did anyone had the same problem and solved it somehow?
I remember having had to resort to the ROBUST PLAN query hint once on a query where the query-optimizer kind of lost track and tried to work it out in a way that the hardware couldn't handle.
=> http://technet.microsoft.com/en-us/library/ms181714.aspx
But I'm not sure I understand why it would work for one 'technology' and not another.
Then again, the error message might not be from SQL but rather from the 3rd-party program that gathers the output and does so in a 'less than ideal' way.
Consider adding paging to the user edit screen and the underlying data call. The point being you dont need to see all the rows at one time, but they are available to the user upon request.
This will alleviate much of your performance problem.
I had a project where I had to add over 7 million individual lines of T-SQL code via batch (couldn't figure out how to programatically leverage the new SEQUENCE command). The problem was that there was limited amount of memory available on my VM (I was allocated the max amount of memory for this VM). Because of the large amount lines of T-SQL code I had to first test how many lines it could take before the server crashed. For whatever reason, SQL (2012) doesn't release the memory it uses for large batch jobs such as mine (we're talking around 12 GB of memory) so I had to reboot the server every million or so lines. This is what you may have to do if resources are limited for your project.
I am currently trying to ring more performance out of my reporting website which uses linq to sql and an sql server express 2008 database.
I am finding that as I now approach a million rows in on of my more 'ugly' tables that performance is becoming a real issue, with one report in particular taking 3 minutes to generate.
Essentially, I have a loop that, for each user, hits the database and grabs a collection of data on them. This data is then queried in various ways (and more rows loaded as needed) until I have a nice little summary object that I can fire off to a set of silverlight charts. Lazy loading comes is used and the reporting pulls into data from around 8 linked tables.
The problem is I don't know where the bottleneck now is and how to improve performance. Due to certain constraints I was forced to use uniqueidentifiers for a number of primary keys in the tables involved - could this be an issue?
Basically, I need to put time into increasing performance but don't have enough to do that with both the database or the linq to sql. Is there anyway I can see where the bottlenecks are?
As im running express I don't have access to the profiler. I am considering rewriting my queries into compiled linq to sql but fear the database may be the culprit.
I understand this question is a bit open ended and its hard to answer without knowing much more about my setup (database schema etc) but any advice on how to find out where the bottlenecks are is more appreciated!
Thanks
UPDATE:
Thanks for all the great advice guys, and some links to some great tools.
UPDATE for those interested
I have been unable to make my queries any quicker through tweaking the linq. the problem seems to be that the majority of my database access code takes place in a loop. I can't see a way around it. Basically I am building up a report by looking through a number of users data - hence the loop. Pulling all the records up front seems a bit crazy - 800,000 + rows. My gut feeling is that there is a much better way, but its a technological leap too far for me!
However, adding another index to one of the foreign keys in one of the tables boosted performance so now the report takes 20 seconds to generate as opposed to 3 minutes!
I used this excelent tool: Linq2Sql profiler. It works on the application side, so there is no need for database server profiling functionality.
You have to add one line of initialization code to your application and then in separate desktop application profiler shows you SQL query for each LINQ query with exact line of code where it was executed (cs or aspx), database time and application time of executions and it even detects some common performance problems like n+1 queries (query executed for iteration) or unbounded datasets. You have to pay for it, but the trial version is also available.
As you're using SQL Express which doesn't have Profiler, there is a free third party profiler you can download here. I've used it when running SQL Express. That will allow you to trace what's going on in the database.
Also, you can query the Dynamic Management Views to see what the costly queries are:
e.g. TOP 10 queries that have taken the most time
SELECT TOP 10 t.text, q.*, p.query_plan
FROM sys.dm_exec_query_stats q
CROSS APPLY sys.dm_exec_sql_text(q.sql_handle) t
CROSS APPLY sys.dm_exec_query_plan (q.plan_handle) AS p
ORDER BY q.total_worker_time DESC
There are 2 tools I use for this, LinqPad and the Visual Studio Debugger. First, check out LinqPad, even the free version is very powerful, showing you execution time, the SQL generated and you can use it to run any code snippet...it's tremendously useful.
Second, you can use the Visual studio debugger, this is something we use on our DataContext (note: only use this in debug, it's a performance hit and completely unnecessary outside of debugging)
#if DEBUG
private readonly Stopwatch Watch = new Stopwatch();
private static void Connection_StateChange(object sender, StateChangeEventArgs e)
{
if (e.OriginalState == ConnectionState.Closed && e.CurrentState == ConnectionState.Open)
{
Current.Watch.Start();
}
else if (e.OriginalState == ConnectionState.Open && e.CurrentState == ConnectionState.Closed)
{
Current.Watch.Stop();
string msg = string.Format("SQL took {0}ms", Current.Watch.ElapsedMilliseconds);
Trace.WriteLine(msg);
}
}
#endif
private static DataContext New
{
get
{
var dc = new DataContext(ConnectionString);
#if DEBUG
if (Debugger.IsAttached)
{
dc.Connection.StateChange += Connection_StateChange;
dc.Log = new DebugWriter();
}
#endif
return dc;
}
}
In a debug build, as an operation completes with each context, we see the timestamp in the debug window and the SQL it ran. The DebugWriter class you see can be found here (Credit: Kris Vandermotten). We can quickly see if a query's taking a while. To use it we just initiate a DataContext by:
var DB = DataContext.New;
(The profiler is not an option for me since we don't use SQL server, this answer is simply to give you some alternatives that have been very useful for me)