I am running a fairly simple query (LEFT JOINing a few tables on their primary keys) but SQL server is asking for a large amount of memory to run this query:
This results in only a few queries being run at a time. Is there a way to force less memory to be granted or for less memory to be requested per query?
I am using SQL server 2012 (not the enterprise version), and I have tried reducing the Max degree of parallelism and increasing the parallelism threshold without much change in the memory requirements.
Related
Using Management Studio, I have a table with the six following columns on my SQL Server:
FileID - int
File_GUID - nvarchar(258)
File_Parent_GUID - nvarchar (258)
File Extension nvarchar(50)
File Name nvarchar(100)
File Path nvarchar(400)
It has a primary key on FileID.
This table has around 200M rows.
If I try and process the full data, I receive an memory error.
So I have decided to load this in partitions, using a select statement in every 20M where I split on the FileID number.
These selects take forever, the retrieval of rows is extremely slow, I have no idea why.. There are no calculations whatsoever, just a pull of data using a SELECT.
When I ran the query analyzer I see:
Select cost = 0%
Clustered Index Cost = 100%
Do you guys have any idea on why this could be happening or maybe some tips that I can apply ?
My query:
Select * FROM Dim_TFS_File
Thank you!!
Monitor the query while it's running to see if it's blocked or waiting on resources. If you can't easily see where the bottleneck is during monitoring the database and client machines, I suggest you run a couple of simple tests to help identify where you should focus your efforts. Ideally, run the tests with no other significant activity and a cold cache.
First, run the query on the database server and discard the results. This can be done from SSMS with the discard results option (Query-->Query Options-->Results-->Grid-->Discard Results after execution). Alternatively, use a Powershell script like the one below:
$connectionString = "Data Source=YourServer;Initial Catalog=YourDatabase;Integrated Security=SSPI;Application Name=Performance Test";
$connection = New-Object System.Data.SqlClient.SqlConnection($connectionString);
$command = New-Object System.Data.SqlClient.SqlCommand("SELECT * FROM Dim_TFS_File;", $connection);
$command.CommandTimeout = 0;
$sw = [System.Diagnostics.Stopwatch]::StartNew();
$connection.Open();
[void]$command.ExecuteNonQuery(); #this will discard returned results
$connection.Close();
$sw.Stop();
Write-Host "Done. Elapsed time is $($sw.Elapsed.ToString())";
Repeat the above test on the client machine. The elapsed time difference reflects data transfer network overhead. If the client machine test is significantly faster than the application, focus you efforts on the app code. Otherwise, take a closer look at database and network. Below are some random notes that might help remediate performance issues.
This trivial query will likely perform a full clustered index scan. The limiting performance factors on the database server will be:
CPU: Throughtput of this single-threaded query will be limited by spead of a single CPU core.
Storage: The SQL Server storage engine will use read-ahead reads during large scans to fetch data asynchronously so that data will already be in memory by the time it is needed by the query. Sequential read performance is important to keep up with the query.
Fragmentation: Fragmentation will result in more disk head movement against spinning media, adding several milliseconds per physical disk IO. This is typically a consideration only for large sequential scans on a single-spindle or low-end local storage arrays, not SSD or enterprise class SAN. Fragmentation can be eliminated with a reorganizing or rebuilding the clustered index. Be sure to specify MAXDOP 1 for rebuilds for maximum benefits.
SQL Server streams results as fast as they can be consumed by the client app but the client may be constrained by network bandwidth and latency. It seems you are returning many GB of data, which will take quite some time. You can reduce bandwidth needs considerably with different data types. For example, assuming the GUID-named columns actually contain GUIDs, using uniqueidentifier instead of nvarchar will save about 80 bytes per row over the netowrk and on disk. Similarly, use varchar instead of nvarchar if you don't actually need Unicode characters to cut data size by half.
Client processing time: The time to process 20M rows by the app code will be limited by CPU and code efficiency (especially memory management). Since you ran out of memory, it seems you are either loading all rows into memory or have a leak. Even without an outright out of memory error, high memory usage can result in paging and greatly slow throughput. Importantly, the database and network performance is moot if the app code can't process rows as fast as data are returned.
I am working on optimizing the load on a decently large SQL server 2008 cluster, and I have a sample of the queries being submitted to the server over a short timespan. This amounts to about 1.7 million queries and I am working on determining how many are one off ad-hoc queries and how many are substantially the same and being submitted often by applications so that the highest usage and highest resource intensive queries are optimized first.
To do this, I wish to use the Query Hash and Query Plan hashes, and add them to my analysis table. The DMV's in SQL server only keep these values for a few minutes (maybe a bit longer depending on memory usage) so I can't query the DMV's to pull the hashes. I know that the hashes can be generated one at a time by using the SET SHOWPLAN_XML option, but that isn't exactly friendly as showplan must be turned on, the result returned and parsed, then show plan turned off to save to a table.
I am hoping there is an undocumented function that can generate the 2 hashes and return a value I can store to a table; so far I haven't found one. Does anyone know of such a function?
I understand there are limitations to using sqlite, but I'd like to know if it should be able to handle this scenario.
My table has over 300 million records and the db is about 12 gigs. The data import util with sqlite is nice and fast. But then I added an index to a string column in this table, and it ran all night to complete this operation. I haven't compared this to other db's, but seemed quite slow to me.
Now that my index is added, I'm wanting to look for duplicates in the data. So I'm trying to run a "having count > 0" query and it seems to be taking hours as well. My query looks like:
select col1, count(*)
from table1
group by col1
having count(*) > 1
I would assume this query would use my index on col1, but the slow query execution makes me wonder if it is not?
Would perhaps sql server handle this kind of thing better?
SQLite's count() isn't optimized - it does a full table scan even if indexed. Here is the recommended approach to speed things up. Run EXPLAIN QUERY PLAN to verify and you'll see:
EXPLAIN QUERY PLAN SELECT COUNT(FIELD_NAME) FROM TABLE_NAME;
I get something like this:
0|0|0|SCAN TABLE TABLE_NAME (~1000000 rows)
But then I added an index to a string column in this table, and it ran all night to complete this
operation. I haven't compared this to other db's, but seemed quite slow to me.
I hate to tell yuo, but how does your server look like? Not arguing, but that is a possibly very resoruce intensive operation that may require a lot of IO and normal computers or chehap web servers with a slow hard disc are not suited for significant database work. I run hundreds og gigabyte db project work and my smallest "large data" server has 2 SSD and 8 Velociraptors for data and log. The largest one has 3 storage nodes with a total of 1000gb SSD discs - simply because IO is what a db server lives and breathes on.
So I'm trying to run a "having count > 0" query and it seems to be taking hours as well
How much RAM? ENough to fit it all in memory, or a low memory virtual server where the missing memory blows up to bad IO? How much memory can / does SqlLite use? How is the temp setup? In memory? Sql server would possibly use a lot of memory / tempdb space for this type of check.
increase the sqlite cache via PRAGMA cache_size=<number of pages>. the memory used is <number of pages> times <size of page>. (which can be set via PRAGMA page_size=<size of page>)
by setting those values to 16000 and 32768 respectively (or about 512MB), i was able to get this one program's bulk load down from 20mins to 2mins. (although i think that if the disk on that system wasn't so slow, this might not have had as much effect)
but you might not have this extra memory available on lesser embedded platforms, i don't recommend increasing it as much as i did on those, but for desktop or laptop level systems it can greatly help.
How can I configure the maximum memory that a query (select query) can use in sql server 2008?
I know there is a way to set the minimum value but how about the max value? I would like to use this because I have many processes in parallel. I know about the MAXDOP option but this is for processors.
Update:
What I am actually trying to do is run some data load continuously. This data load is in the ETL form (extract transform and load). While the data is loaded I want to run some queries ( select ). All of them are expensive queries ( containing group by ). The most important process for me is the data load. I obtained an average speed of 10000 rows/sec and when I run the queries in parallel it drops to 4000 rows/sec and even lower. I know that a little more details should be provided but this is a more complex product that I work at and I cannot detail it more. Another thing that I can guarantee is that my load speed does not drop due to lock problems because I monitored and removed them.
There isn't any way of setting a maximum memory at a per query level that I can think of.
If you are on Enterprise Edition you can use resource governor to set a maximum amount of memory that a particular workload group can consume which might help.
In SQL 2008 you can use resource governor to achieve this. There you can set the request_max_memory_grant_percent to set the memory (this is the percent relative to the pool size specified by the pool's max_memory_percent value). This setting in not query specific, it is session specific.
In addition to Martin's answer
If your queries are all the same or similar, working on the same data, then they will be sharing memory anyway.
Example:
A busy web site with 100 concurrent connections running 6 different parametrised queries between them on broadly the same range of data.
6 execution plans
100 user contexts
one buffer pool with assorted flags and counters to show usage of each data page
If you have 100 different queries or they are not parametrised then fix the code.
Memory per query is something I've never thought or cared about since last millenium
I am trying to increase one of my request performance.
My request is made of 10 different select .
The actual production query is taking 36sec to execute.
If I display the execution plan, for one select I have a query cost of 18%.
So I change a in clause (in this select) with an xml query (http://www.codeproject.com/KB/database/InClauseAndSQLServer.aspx).
The new query now takes 28 sec to execute, but sql server tells me that the above select has a query cost of 100%. And this is the only change I made. And there is no parallelism in any query.
PRODUCTION :
36sec, my select is 18% (the others are 10%).
NEW VERSION :
28sec, my select is 100% (the others are 0%).
Do you have any idea how sql server compute this "query cost" ? (I start to believe that it's random or something like that).
Query cost is a unitless measure of a combination of CPU cycles, memory, and disk IO.
Very often you will see operators or plans with a higher cost but faster execution time.
Primarily this is due to the difference in speed of the above three components. CPU and Memory are fairly quick, and also uncommon as bottlenecks. If you can shift some pressure from the disk IO subsystem to the CPU, the query may show a higher cost but should execute substantially faster.
If you want to get more detailed information about the execution of your specific queries, you can use:
SET STATISTICS IO ON
SET STATISTICS TIME ON
This will output detailed information about CPU cycles, plan creation, and page reads (both from disk and from memory) to the messages tab.