In Oracle, there's a view called V$SQLAREA that lists statistics on shared SQL area and contains one row per SQL string. It provides statistics on SQL statements that are in memory, parsed, and ready for execution.
There is one column -ROWS_PROCESSED that sums the Total number of rows processed on behalf of this SQL statement.
I'm looking for collateral information in SQLSERVER 2005.
I looked in some of the DMV's (like sys.dm_exec_query_stats), but I haven't found anything related.
##ROWCOUNT won't be useful to me, as I want incremental statistics information that will sum the rows_processed of the top cpu/io/execution consumption queries in the database.
I would appreciate any help in regards the subject.
Thanks in advance,
Roni.
I saw that when I query the following query, I receive the Query Plan in XML.
Inside the XML plan code, there's a "EstimateRows" part with a number that correlates to the number of estimation rows of the query.
I'm thinking of the option to substr the query_plan column to retreive only the above information (unless I would find it in some system views/tables).
Where can I find the Actual number of rows of the query ? Where is it stored ?
SELECT
case when sql_handle IS NULL
then ' '
else ( substring(st.text,(statement_start_offset+2)/2,
(case when qs.statement_end_offset = -1
then len(convert(nvarchar(MAX),st.text))*2
else statement_end_offset
end - statement_start_offset) /2 ) )
end as query_text ,
query_plan,
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle)
cross apply sys.dm_exec_sql_text(sql_handle) st;
There isn't a direct equivalent especially for rowcounts as far as I know. The relevant dmvs track IO cost which is used to show missing indexed, most expensive queries etc
This will give you stats per SQL statement for current sessions. YMMV: I've just put it together based on scripts I have lying around.
SELECT
*
FROM
sys.dm_exec_query_stats QS
CROSS APPLY
sys.dm_exec_sql_text(sql_handle) ST
My scripts are based on the links I mentioned here
You can use the system variable ##rowcount to know the number of records affected by last statement.
Related
After looking in Azure's QueryPerformanceInsight for a sql database I use daily, I have noticed that a query is being executed a lot of times per day (thousands per hour it seems). The query is basically:
select * from TableName t where t.Id = id
I am trying to hunt down what is executing this query so many times in such a short space of time, and have put this together:
select top 1 text.[text], stat.last_execution_time, stat.execution_count, plans.*, stat.*
from sys.dm_exec_cached_plans as plans
join sys.dm_exec_query_stats as stat on plans.plan_handle = stat.plan_handle
cross apply sys.dm_exec_sql_text(plans.plan_handle) as text
order by stat.execution_count desc
This is showing me the query that I am looking for (as the text is the same as shown in Azure Insights) and it also shows how quickly the execution count is rising.
The issue is, I dont know how to find where this query is being executed from! None of the 3 tables I am referencing in the above query have a SessionId column on them so I am unsure where to go from here!
If anyone has any experience about hunting down a query being spammed against a database over and over, please help!
I have a process in sql server that seems to never end. To spot if there is a block in this process I used EXEC sp_who2 and I seen that the SPID is 197.
The status is runnable and there is no block. Command is inserting. The weird thing is the CPU time which is the biggest: 68891570 and the DISK IO operations: 16529185.
This process truncates two tables and then insert the data from a another table to these two tables. It is true that there is a lot of information (101962758 rows in the origin table) but I think that there is too much time.
What can I do to accelerate this process or to spot what is happening?
Thank you
It depends on the scenario here. I recommend the following steps in order to decide where to move next.
Find most expensive queries
Using the following SQL to determine the most expensive queries:
SELECT TOP 10 SUBSTRING(qt.TEXT, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(qt.TEXT)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)+1),
qs.execution_count,
qs.total_logical_reads, qs.last_logical_reads,
qs.total_logical_writes, qs.last_logical_writes,
qs.total_worker_time,
qs.last_worker_time,
qs.total_elapsed_time/1000000 total_elapsed_time_in_S,
qs.last_elapsed_time/1000000 last_elapsed_time_in_S,
qs.last_execution_time,
qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY qs.total_logical_reads DESC -- logical reads
-- ORDER BY qs.total_logical_writes DESC -- logical writes
-- ORDER BY qs.total_worker_time DESC -- CPU time
Execution plan
This could help to determine what is going on with your actual query. More information could be found here.
Performance tips
Indexes. Remove all indexes, except for those needed by the insert (SELECT INTO)
Constraints and triggers. Remove them from the table.
Choosing good clustered index. New records will be inserted at the end of the table.
Fill factor. Set it to 0 or 100 (the same as 0). This will reduce the number of pages that the data is spread across.
Recovery model. Change it to Simple.
Also
Consider reviewing Insert into table select * from table vs bulk insert.
From the info you provided,it seems the query is still running...
This process truncates two tables and then insert the data from a another table to these two tables. It is true that there is a lot of information (101962758 rows in the origin table) but I think that there is too much time.
Can you follow below process (instead of your approach), assuming your tables are Main, T1, T2.
Select * into t1_dup,t2_dup from main
rename t1,t2 to t1dup,t2dup
rename t1_dup,t2_dup to t1,t2
drop t1dup,t2dup
I'm looking to improve the performance of one of our stored procedures and have come across something that I'm struggling to find information on. I'm by no means a DBA, so my knowledge of SQL is not brilliant.
Here's a simplified version of the problem:
If I use the following query -
SELECT * FROM Product
ORDER BY Name
OFFSET 100 ROWS
FETCH NEXT 28 ROWS ONLY
I get the results in around 20ms
However if I apply a conditional ordering -
DECLARE #so int = 1
SELECT * FROM Product
ORDER BY
CASE WHEN #so = 1 THEN Name END,
CASE WHEN #so = 2 THEN Name END DESC
OFFSET 100 ROWS
FETCH NEXT 28 ROWS ONLY
The overall request in my mind is the same, but the results take 600ms, 30x longer.
The execution plans are drastically different, but being a novice I've no idea how to bring the execution path for the second case into line with the the first case.
Is this even possible, or should I look at creating separate procedures for the order by cases and move choosing the order logic to the code?
NB. This is using MS SQL Server
The reason is because SQL Server can no longer use an index. One solution is dynamic SQL. Another is a simple IF:
IF (#so = 1)
BEGIN
SELECT p.*
FROM Product p
ORDER BY Name
OFFSET 100 ROWS
FETCH NEXT 28 ROWS ONLY
END;
ELSE
BEGIN
SELECT p.*
FROM Product p
ORDER BY Name DESC
OFFSET 100 ROWS
FETCH NEXT 28 ROWS ONLY;
END:
Gordon Linoff is right that this prevents an index from being used, but to expand a bit on that:
When SQL Server prepares execution of a query, it generates an execution plan. This is a query being compiled to steps that the database engine can execute. It's at this point, generally, that it looks at which indices are available for use, but at this point, parameter values are not yet known, so the query optimiser cannot see whether an index on name is useful.
The workarounds in his answer are valid, but I'd like to offer one more:
Add OPTION (RECOMPILE) to your query. This forces your query execution plan to be recompiled each time, and each time, the parameter values are known, and allows the optimiser to optimise for those specific parameter values. It will generally be a bit less efficient than fully dynamic SQL, since dynamic SQL allows each possible statement's execution plan to be cached, but it will likely be better than what you have now, and more maintainable than the other options.
I have signficant performcance issues (up to time-out) in MS Access 2010 with the query below. The table TempTableAnalysis contains between 10'000-15'000 records. I have already received input from this forum to work with a temporary table in the top 10 query (MS Access 2010 SQL Top N query by group performance issue)
Can anyone explain how to implement the temporary table in the subquery and how to join it? I can't get it to work.
Any other suggestions to improve performance are highly appreciated.
Here is my query:
SELECT
t2.Loc,
t2.ABCByPick,
t2.Planner,
t2.DmdUnit,
ROUND(t2.MASE,2) AS MASE,
ROUND(t2.AFAR,2) AS AFAR
FROM TempTableAnalysis AS t2
WHERE t2.MASE IN (
SELECT TOP 10 t1.MASE
FROM TempTableAnalysis AS t1
WHERE t1.ABCByPick = t2.ABCByPick
ORDER BY t1.MASE DESC
)
ORDER BY
t2.ABCByPick,
t2.MASE DESC;
Optimizing Access Query Performance For Large Data Sets
Based on your posted SQL Query, you have some options available to optimize and speed up the performance.
SELECT
t2.Loc,
t2.ABCByPick,
t2.Planner,
t2.DmdUnit,
ROUND(t2.MASE,2) AS MASE,
ROUND(t2.AFAR,2) AS AFAR
FROM TempTableAnalysis AS t2
...
This is the first part where TempTableAnalysis is the multi-thousand record subquery. If you want to squeeze a little more performance out of the use of this "Temp" Table, don't use it as a dynamic query (i.e., calculated on demand each time the query is opened), try constructing a macro that pushes the output to a static table:
Appending Subquery Data to a Static Table:
Create a QUERY object and change its type to DELETE. Design it to delete the contents of your "temporary" table object. If you prefer using SQL, the command will look like:
DELETE My_Table.*
FROM My_Table;
Create a QUERY object and change its type to APPEND. Design it to query all fields from your query defined by the SQL statement of this OP. Again, the SQL version of this task has the following syntax:
INSERT INTO StaticAnalysisTable ( ID, Loc, Item, AvgOfScaledError )
SELECT t1.ID, t1.Loc, t1.Item, t1.AvgOfScaledError
FROM TempTableAnalysis as t1;
The next step is to automate the population of this static table and it is optional. It's simple however and will make it less likely that you will make the mistake of forgetting to "Refresh" and accessing your static table while it has stale data... causing inaccuracies in your results.
Create a macro with two steps. Each step will have the following definition: OPEN QUERY. When prompted for the query to open, reference the objects you created in the previous two steps in the following order (important): (1) DELETE Query: (your delete query name) then (2) APPEND Query: (your append query name).
SQL Query Comments and Suggestions
The following part of the posted SQL query could use some help:
...
WHERE t2.MASE IN (
SELECT TOP 10 t1.MASE
FROM TempTableAnalysis AS t1
WHERE t1.ABCByPick = t2.ABCByPick
ORDER BY t1.MASE DESC
)
ORDER BY
t2.ABCByPick,
t2.MASE DESC;
There is a join across the sub query that generates the TOP-10 data and the outermost query that correlates these results with the supplementing MASE table data. This isn't necessary if the TempTableAnalysis.MASE represents a key value.
ORDER BY
in the inner most query isn't necessary unless it is intended to force some sort of selection criteria (as in when using SQL analytical functions) this doesn't look like one of those cases. Ordering records from large data sets is also a wasteful cpu and memory sink.
EDIT: Just as a counter-point argument, the ORDER BY clause used beside a TOP N query actually has a purpose, but I am still not clear if it is necessary. Just to round out the discussion, another SO thread talks about How to Select Top 10 in an Access Query.
WHERE t2.MASE IN (...
You may be experiencing blocks in performance with very large in-list set operations. On an Oracle database server, I have discovered with other developers that there is a limitation to the number of discrete elements in an in-list query operator. That value was in the thousands... which may be further limited based on server and database resources.
Consider using a SQL JOIN operator. The place where you define TABLE objects can also be populated with SQL defined queries with aliases known as INLINE VIEWS. Since you're using ACCESS, if an inline view does not work directly, just define another ACCESS QUERY object and reference it in your final query as if it were a table...
A possible rewrite to the ending part of the original query:
SELECT
t2.Loc,
t2.ABCByPick,
t2.Planner,
...
FROM TempTableAnalysis AS t2,
(SELECT TOP 10 t1.MASE, t1.ABCByPick
FROM TempTableAnalysis AS t1) AS ttop
WHERE t2.MASE = ttop.MASE
AND t2.ABCByPick = ttop.ABCByPick
ORDER BY
t2.ABCByPick,
t2.MASE DESC;
You will definitely need to run through these recommendations and validate the output data for accuracy. This represents approaches to capturing some of the "low-hanging fruit" (easy items) that you can pursue to speed up your query and reporting operations.
Conclusions and Closing Comments
As a background to other readers, the database object TempTableAnalysis is not a static table. It is the result of a sub query presented in another SO post requesting help with a Access TOP N Query. The query comes from multiple tables approaching 10,000 records in size (each?).
Tip: A query result in Access ALSO has potential table-like behaviors. You can push the output to a table for joining (as described above) or just join to the query object itself (careful though, especially when you get to "chaining" multiple query operations...)
The strategy of this solution was:
To minimize the number of trips through one or more instances of this very large table.
To pre-process and index optimize any data that would otherwise be "static" for the duration of its analysis.
To audit and review the SQL code used to obtain the final results.
Definitely look into Access MACROS. Coupled with identifying static data in your data sets, you can offload processing of your complex background analytic queries to improve the user experience when they view and query through the final results. Good Luck!
Given the example queries below (Simplified examples only)
DECLARE #DT int; SET #DT=20110717; -- yes this is an INT
WITH LargeData AS (
SELECT * -- This is a MASSIVE table indexed on dt field
FROM mydata
WHERE dt=#DT
), Ordered AS (
SELECT TOP 10 *
, ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number
FROM LargeData
)
SELECT * FROM Ordered
and ...
DECLARE #DT int; SET #DT=20110717;
BEGIN TRY DROP TABLE #LargeData END TRY BEGIN CATCH END CATCH; -- dump any possible table.
SELECT * -- This is a MASSIVE table indexed on dt field
INTO #LargeData -- put smaller results into temp
FROM mydata
WHERE dt=#DT;
WITH Ordered AS (
SELECT TOP 10 *
, ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number
FROM #LargeData
)
SELECT * FROM Ordered
Both produce the same results, which is a limited and ranked list of values from a list based on a fields data.
When these queries get considerably more complicated (many more tables, lots of criteria, multiple levels of "with" table alaises, etc...) the bottom query executes MUCH faster then the top one. Sometimes in the order of 20x-100x faster.
The Question is...
Is there some kind of query HINT or other SQL option that would tell the SQL Server to perform the same kind of optimization automatically, or other formats of this that would involve a cleaner aproach (trying to keep the format as much like query 1 as possible) ?
Note that the "Ranking" or secondary queries is just fluff for this example, the actual operations performed really don't matter too much.
This is sort of what I was hoping for (or similar but the idea is clear I hope). Remember this query below does not actually work.
DECLARE #DT int; SET #DT=20110717;
WITH LargeData AS (
SELECT * -- This is a MASSIVE table indexed on dt field
FROM mydata
WHERE dt=#DT
**OPTION (USE_TEMP_OR_HARDENED_OR_SOMETHING) -- EXAMPLE ONLY**
), Ordered AS (
SELECT TOP 10 *
, ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number
FROM LargeData
)
SELECT * FROM Ordered
EDIT: Important follow up information!
If in your sub query you add
TOP 999999999 -- improves speed dramatically
Your query will behave in a similar fashion to using a temp table in a previous query. I found the execution times improved in almost the exact same fashion. WHICH IS FAR SIMPLIER then using a temp table and is basically what I was looking for.
However
TOP 100 PERCENT -- does NOT improve speed
Does NOT perform in the same fashion (you must use the static Number style TOP 999999999 )
Explanation:
From what I can tell from the actual execution plan of the query in both formats (original one with normal CTE's and one with each sub query having TOP 99999999)
The normal query joins everything together as if all the tables are in one massive query, which is what is expected. The filtering criteria is applied almost at the join points in the plan, which means many more rows are being evaluated and joined together all at once.
In the version with TOP 999999999, the actual execution plan clearly separates the sub querys from the main query in order to apply the TOP statements action, thus forcing creation of an in memory "Bitmap" of the sub query that is then joined to the main query. This appears to actually do exactly what I wanted, and in fact it may even be more efficient since servers with large ammounts of RAM will be able to do the query execution entirely in MEMORY without any disk IO. In my case we have 280 GB of RAM so well more then could ever really be used.
Not only can you use indexes on temp tables but they allow the use of statistics and the use of hints. I can find no refernce to being able to use the statistics in the documentation on CTEs and it says specifically you cann't use hints.
Temp tables are often the most performant way to go when you have a large data set when the choice is between temp tables and table variables even when you don't use indexes (possobly because it will use statistics to develop the plan) and I might suspect the implementation of the CTE is more like the table varaible than the temp table.
I think the best thing to do though is see how the excutionplans are different to determine if it is something that can be fixed.
What exactly is your objection to using the temp table when you know it performs better?
The problem is that in the first query SQL Server query optimizer is able to generate a query plan. In the second query a good query plan isn't able to be generated because you're inserting the values into a new temporary table. My guess is there is a full table scan going on somewhere that you're not seeing.
What you may want to do in the second query is insert the values into the #LargeData temporary table like you already do and then create a non-clustered index on the "valuefield" column. This might help to improve your performance.
It is quite possible that SQL is optimizing for the wrong value of the parameters.
There are a couple of options
Try using option(RECOMPILE). There is a cost to this as it recompiles the query every time but if different plans are needed it might be worth it.
You could also try using OPTION(OPTIMIZE FOR #DT=SomeRepresentatvieValue) The problem with this is you pick the wrong value.
See I Smell a Parameter! from The SQL Server Query Optimization Team blog