I would like to find out or at least estimate how much memory does a single query (a specific query) eats up while executing. There is no point in posting the query here as I would like to do this on multiple queries and see if there is a change over different databases. Is there any way to get this info?
Using SQL Server 2008 R2
thanks
Gilad.
You might want to take a look into DMV (Dynamic Management Views) and specifically into sys.dm_exec_query_memory_grants. See for example this query (taken from here):
DECLARE #mgcounter INT
SET #mgcounter = 1
WHILE #mgcounter <= 5 -- return data from dmv 5 times when there is data
BEGIN
IF (SELECT COUNT(*)
FROM sys.dm_exec_query_memory_grants) > 0
BEGIN
SELECT *
FROM sys.dm_exec_query_memory_grants mg
CROSS APPLY sys.dm_exec_sql_text(mg.sql_handle) -- shows query text
-- WAITFOR DELAY '00:00:01' -- add a delay if you see the exact same query in results
SET #mgcounter = #mgcounter + 1
END
END
While issuing the above query it will wait until some query is running and will collect the memory data. So to use it, just run the above query and after that your query that you want to monitor.
What do you mean by "how much memory a query eats up?", and why exactly do you want to know?
I don't think memory in SQL Server works the way you might imagine - memory management in SQL Server is an incredibly complex topic - you could easily write entire books about SQL Servers memory management. I can't claim to know that much about SQL Servers memory management, but I do know that there is pretty much no useful information that you can extrapolate from knowing how much memory a single query uses up.
That said, if you did want to have a go at understanding whats going on with memory when you execute a query then I would probably start with looking at the buffer pool. Nearly all memory in SQL Server is organised into 8KB chunks (the same size as a page) of memory that can be used to store anything from a data page or index page to a cached query plans. The buffer pool is the main memory component in SQL Server - All 8KB chunks of memory not in use elsewhere remains in the buffer pool to be used as a cache for data pages.
Note that in order for a data page or index page to be used it must exist in memory - this means that if it doesn't already exist in memory elsewhere ready for use, a free buffer must be made available to ready the page in to. The buffer pool serves both as a pool of "expendable" free buffers, and a cache of pages already present in memory.
You can examine whats in the buffer pool using DMVs, there is a suitable query listed on this page:
What's swimming in your bufferpool?
By cleaning out your buffer pool using the command DBCC DROPCLEANBUFFERS (DONT DO THIS ON A PRODUCTION SQL SERVER!!!) and then executing your query, in theory the new pages that appear in the buffer pool should be the pages that were used in the last query.
This can give you a rough idea of the data and index pages used in a query, however doesn't cover other areas of SQL Server where memory is used, such as in the query plan cache, SQL Server Workers etc..
Like I said, SQL Server memory management is complex - If you really want to know more I recommend that you buy a book on SQL Server internals.
Update: You can also use the query statistics to view the aggregate performance statistics for a query including "physical reads" (pages read from the disk) and "logcal reads" (pages read from the buffer pool). See this page for a suitable query.
This might also give you some more hints on how much memory a query is using, however beware - playing around I found queries that performed many more logical reads than they did physical reads, which as far as I can work out meant that they read the same pages over and over again, i.e. 100 logical reads != 100 pages used in the buffer pool.
Related
I want to set up a Postgres server on AWS, the biggest table will be 10GB - do I have to select 10GB of memory for this instance?
What happens when my query result is larger than 10GB?
Nothing will happen, the entire result set is not loaded into memory. The maximum available memory will be used and re-used as needed while the result is prepared and will spill over to disk as needed.
See PostgreSQL resource documentation for more info.
Specifically, look at work_mem:
work_mem (integer)
Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files.
As long as you don't run out of working memory on a single operation or set of parallel operations you are fine.
Edit: The above was an answer to the question What happens when you query a 10GB table without 10GB of memory on the server/instance?
Here is an updated answer to the updated question:
Only server side resources are used to produce the result set
Assuming JDBC drivers are used, by default, the entire result set is sent to your local computer which could cause out of memory errors
This behavior can be changed by altering the fetch size through the use of a cursor.
Reference to this behavior here
Getting results based on a cursor
On the server side, with a simple query like yours it just keeps a "cursor" which points to where it's at, as it's spooling the results to you, and uses very little memory. Now if there were some "sorts" in there or what not, that didn't have indexes it could use, that might use up lots of memory, not sure there. On the client side the postgres JDBC client by default loads the "entire results" into memory before passing them back to you (overcomeable by specifying a fetch count).
With more complex queries (for example give me all 100M rows, but order them by "X" where X is not indexed) I don't know, but probably internally it creates a temp table (so it won't run out of RAM) which, treated as a normal table, uses disk backing. If there's a matching index then it can just traverse that, using a pointer, still uses little RAM.
Using Management Studio, I have a table with the six following columns on my SQL Server:
FileID - int
File_GUID - nvarchar(258)
File_Parent_GUID - nvarchar (258)
File Extension nvarchar(50)
File Name nvarchar(100)
File Path nvarchar(400)
It has a primary key on FileID.
This table has around 200M rows.
If I try and process the full data, I receive an memory error.
So I have decided to load this in partitions, using a select statement in every 20M where I split on the FileID number.
These selects take forever, the retrieval of rows is extremely slow, I have no idea why.. There are no calculations whatsoever, just a pull of data using a SELECT.
When I ran the query analyzer I see:
Select cost = 0%
Clustered Index Cost = 100%
Do you guys have any idea on why this could be happening or maybe some tips that I can apply ?
My query:
Select * FROM Dim_TFS_File
Thank you!!
Monitor the query while it's running to see if it's blocked or waiting on resources. If you can't easily see where the bottleneck is during monitoring the database and client machines, I suggest you run a couple of simple tests to help identify where you should focus your efforts. Ideally, run the tests with no other significant activity and a cold cache.
First, run the query on the database server and discard the results. This can be done from SSMS with the discard results option (Query-->Query Options-->Results-->Grid-->Discard Results after execution). Alternatively, use a Powershell script like the one below:
$connectionString = "Data Source=YourServer;Initial Catalog=YourDatabase;Integrated Security=SSPI;Application Name=Performance Test";
$connection = New-Object System.Data.SqlClient.SqlConnection($connectionString);
$command = New-Object System.Data.SqlClient.SqlCommand("SELECT * FROM Dim_TFS_File;", $connection);
$command.CommandTimeout = 0;
$sw = [System.Diagnostics.Stopwatch]::StartNew();
$connection.Open();
[void]$command.ExecuteNonQuery(); #this will discard returned results
$connection.Close();
$sw.Stop();
Write-Host "Done. Elapsed time is $($sw.Elapsed.ToString())";
Repeat the above test on the client machine. The elapsed time difference reflects data transfer network overhead. If the client machine test is significantly faster than the application, focus you efforts on the app code. Otherwise, take a closer look at database and network. Below are some random notes that might help remediate performance issues.
This trivial query will likely perform a full clustered index scan. The limiting performance factors on the database server will be:
CPU: Throughtput of this single-threaded query will be limited by spead of a single CPU core.
Storage: The SQL Server storage engine will use read-ahead reads during large scans to fetch data asynchronously so that data will already be in memory by the time it is needed by the query. Sequential read performance is important to keep up with the query.
Fragmentation: Fragmentation will result in more disk head movement against spinning media, adding several milliseconds per physical disk IO. This is typically a consideration only for large sequential scans on a single-spindle or low-end local storage arrays, not SSD or enterprise class SAN. Fragmentation can be eliminated with a reorganizing or rebuilding the clustered index. Be sure to specify MAXDOP 1 for rebuilds for maximum benefits.
SQL Server streams results as fast as they can be consumed by the client app but the client may be constrained by network bandwidth and latency. It seems you are returning many GB of data, which will take quite some time. You can reduce bandwidth needs considerably with different data types. For example, assuming the GUID-named columns actually contain GUIDs, using uniqueidentifier instead of nvarchar will save about 80 bytes per row over the netowrk and on disk. Similarly, use varchar instead of nvarchar if you don't actually need Unicode characters to cut data size by half.
Client processing time: The time to process 20M rows by the app code will be limited by CPU and code efficiency (especially memory management). Since you ran out of memory, it seems you are either loading all rows into memory or have a leak. Even without an outright out of memory error, high memory usage can result in paging and greatly slow throughput. Importantly, the database and network performance is moot if the app code can't process rows as fast as data are returned.
How can I configure the maximum memory that a query (select query) can use in sql server 2008?
I know there is a way to set the minimum value but how about the max value? I would like to use this because I have many processes in parallel. I know about the MAXDOP option but this is for processors.
Update:
What I am actually trying to do is run some data load continuously. This data load is in the ETL form (extract transform and load). While the data is loaded I want to run some queries ( select ). All of them are expensive queries ( containing group by ). The most important process for me is the data load. I obtained an average speed of 10000 rows/sec and when I run the queries in parallel it drops to 4000 rows/sec and even lower. I know that a little more details should be provided but this is a more complex product that I work at and I cannot detail it more. Another thing that I can guarantee is that my load speed does not drop due to lock problems because I monitored and removed them.
There isn't any way of setting a maximum memory at a per query level that I can think of.
If you are on Enterprise Edition you can use resource governor to set a maximum amount of memory that a particular workload group can consume which might help.
In SQL 2008 you can use resource governor to achieve this. There you can set the request_max_memory_grant_percent to set the memory (this is the percent relative to the pool size specified by the pool's max_memory_percent value). This setting in not query specific, it is session specific.
In addition to Martin's answer
If your queries are all the same or similar, working on the same data, then they will be sharing memory anyway.
Example:
A busy web site with 100 concurrent connections running 6 different parametrised queries between them on broadly the same range of data.
6 execution plans
100 user contexts
one buffer pool with assorted flags and counters to show usage of each data page
If you have 100 different queries or they are not parametrised then fix the code.
Memory per query is something I've never thought or cared about since last millenium
If I look in my profiler for SQL-server, it comes up with a lot of duplicate queries such as:
exec sp_executesql N'SELECT *
FROM [dbo].[tblSpecifications] AS [t0]
WHERE [t0].[clientID] = #p0
ORDER BY [t0].[Title]', N'#p0 int', #p0 = 21
A lot of these queries are not needed to display real time data, that is, if someone inserted a new record that was matched in that query it wouldn't matter if it didn't display for up to an hour after insertion.
You can output cache the asp.net pages, but I was wondering if there was similar functionality on the dbms (SQL-server in particular), which saves a query results in a cache and renews that cache after a set period of time, say 1 hour, with the aim of improving retrieval speeds of records.
In SQL Server 2000 and prior, you can use DBCC PINTABLE (databaseid, tableid), but its best to allow SQL Server to manage your memory
If you have an expensive aggregate query that you would like "cached", create an indexed view to materialize the results.
Otherwise, the amount of time a database page remains in memory is determined by the least recently used policy. The header of each data page in cache stores details about the last two times it was accessed. A background process scans the cache, and decrements a usecount if the page has not been accessed since the last scan. When SQL Server needs to free cache, pages with the lowest usecount are flushed first. (Professional SQL Server 2008 Internals and Troubleshooting)
sys.dm_os_buffer_descriptors contains one row for each data page currently in cache
Query results are not cached, but the data pages themselves will remain in cache until they are pushed out by other read operations. They next time your query is submitted, these pages will be read from memory instead of disk.
This is a main reason to avoid table scans where possible. If the table being scanned is big enough, your cache gets flooded with potentially useless data.
A lot of people have a "who cares how long the query takes, it is running it batch mode" attitude, but they fail to see the impact on other processes, such as the one you mentioned.
No, but there are a ton of caching solutions out there such as Memcached and Ehcache.
Not to miss the obvious, you could also create a wholly separate reporting table and update it hourly. While there'd be a cost in populating and administering it, you could limit the fields to what's needed and optimize the indices for reads.
When I run a certain stored procedure for the first time it takes about 2 minutes to finish. When I run it for the second time it finished in about 15 seconds. I'm assuming that this is because everything is cached after the first run. Is it possible for me to "warm the cache" before I run this procedure for the first time? Is the cached information only used when I call the same stored procedure with the same parameters again or will it be used if I call the same stored procedure with different params?
When you peform your query, the data is read into memory in blocks. These blocks remain in memory but they get "aged". This means the blocks are tagged with the last access and when Sql Server requires another block for a new query and the memory cache is full, the least recently used block (the oldest) is kicked out of memory. (In most cases - full tables scan blocks are instantly aged to prevent full table scans overrunning memory and choking the server).
What is happening here is that the data blocks in memory from the first query haven't been kicked out of memory yet so can be used for your second query, meaning disk access is avoided and performance is improved.
So what your question is really asking is "can I get the data blocks I need into memory without reading them into memory (actually doing a query)?". The answer is no, unless you want to cache the entire tables and have them reside in memory permanently which, from the query time (and thus data size) you are describing, probably isn't a good idea.
Your best bet for performance improvement is looking at your query execution plans and seeing whether changing your indexes might give a better result. There are two major areas that can improve performance here:
creating an index where the query could use one to avoid inefficient queries and full table scans
adding more columns to an index to avoid a second disk read. For example, you have a query that returns columns A, and B with a where clause on A and C and you have an index on column A. Your query will use the index for column A requiring one disk read but then require a second disk hit to get columns B and C. If the index had all columns A, B and C in it the second disk hit to get the data can be avoided.
I don't think that generating the execution plan will cost more that 1 second.
I believe that the difference between first and second run is caused by caching the data in memory.
The data in the cache can be reused by any further query (stored procedure or simple select).
You can 'warm' the cache by reading the data through any select that reads the same data. But that will even cost about 90 seconds as well.
You can check the execution plan to find out which tables and indexes your query uses. You can then execute some SQL to get the data into the cache, depending on what you see.
If you see a clustered index seek, you can simply do SELECT * FROM my_big_table to force all the table's data pages into the cache.
If you see a non-clustered index seek, you could try SELECT first_column_in_index FROM my_big_table.
To force a load of a specific index, you can also use the WITH(INDEX(index)) table hint in your cache warmup queries.
SQL server cache data read from disc.
Consecutive reads will do less IO.
This is of great help since disk IO is usually the bottleneck.
More at:
http://blog.sqlauthority.com/2014/03/18/sql-server-performance-do-it-yourself-caching-with-memcached-vs-automated-caching-with-safepeak/
The execution plan (the cached info for your procedure) is reused every time, even with different parameters. It is one of the benefits of using stored procs.
The very first time a stored procedure is executed, SQL Server generates an execution plan and puts it in the procedure cache.
Certain changes to the database can trigger an automatic update of the execution plan (and you can also explicitly demand a recompile).
Execution plans are dropped from the procedure cache based an their "age". (from MSDN: Objects infrequently referenced are soon eligible for deallocation, but are not actually deallocated unless memory is required for other objects.)
I don't think there is any way to "warm the cache", except to perform the stored proc once. This will guarantee that there is an execution plan in the cache and any subsequent calls will reuse it.
more detailed information is available in the MSDN documentation: http://msdn.microsoft.com/en-us/library/ms181055(SQL.90).aspx