max memory per query - sql

How can I configure the maximum memory that a query (select query) can use in sql server 2008?
I know there is a way to set the minimum value but how about the max value? I would like to use this because I have many processes in parallel. I know about the MAXDOP option but this is for processors.
Update:
What I am actually trying to do is run some data load continuously. This data load is in the ETL form (extract transform and load). While the data is loaded I want to run some queries ( select ). All of them are expensive queries ( containing group by ). The most important process for me is the data load. I obtained an average speed of 10000 rows/sec and when I run the queries in parallel it drops to 4000 rows/sec and even lower. I know that a little more details should be provided but this is a more complex product that I work at and I cannot detail it more. Another thing that I can guarantee is that my load speed does not drop due to lock problems because I monitored and removed them.

There isn't any way of setting a maximum memory at a per query level that I can think of.
If you are on Enterprise Edition you can use resource governor to set a maximum amount of memory that a particular workload group can consume which might help.

In SQL 2008 you can use resource governor to achieve this. There you can set the request_max_memory_grant_percent to set the memory (this is the percent relative to the pool size specified by the pool's max_memory_percent value). This setting in not query specific, it is session specific.

In addition to Martin's answer
If your queries are all the same or similar, working on the same data, then they will be sharing memory anyway.
Example:
A busy web site with 100 concurrent connections running 6 different parametrised queries between them on broadly the same range of data.
6 execution plans
100 user contexts
one buffer pool with assorted flags and counters to show usage of each data page
If you have 100 different queries or they are not parametrised then fix the code.
Memory per query is something I've never thought or cared about since last millenium

Related

the faster way to extract all records from oracle

I have oracle table contain 900 million records , this table partioned to 24 partion , and have indexes :
i try to using hint and i put fetch_buffer to 100000:
select /+ 8 parallel +/
* from table
it take 30 minutes to get 100 million records
my question is :
is there are any way more faster to get the 900 million (all data in the table ) ? should i use partions and did 24 sequential queries ? or should i use indexes and split my query to 10 queries for example
The network is almost certainly the bottleneck here. Oracle parallelism only impacts the way the database retrieves the data, but data is still sent to the client with a single thread.
Assuming a single thread doesn't already saturate your network, you'll probably want to build a concurrent retrieval solution. It helps that the table is already partitioned, then you can read large chunks of data without re-reading anything.
I'm not sure how to do this in Scala, but you want to run multiple queries like this at the same time, to use all the client and network resources possible:
select * from table partition (p1);
select * from table partition (p2);
...
Not really an answer but too long for a comment.
A few too many variables can impact this to give informed advice, so the following are just some general hints.
Is this over a network or local on the server? If the database is remote server then you are paying a heavy network price. I would suggest (if possible) running the extract on the server using the BEQUEATH protocol to avoid using the network. Once the file(s) complete, is will be quicker to compress and transfer to destination than transferring the data direct from database to local file via JDBC row processing.
With JDBC remember to set the cursor fetch size to reduce round tripping - setFetchSize. The default value is tiny (10 I think), try something like 1000 to see how that helps.
As for the query, you are writing to a file so even though Oracle might process the query in parallel, your write to file process probably doesn't so it's a bottleneck.
My approach would be to write the Java program to operate off a range of values as command line parameters, and experiment to find which range size and concurrent instances of the Java give optimal performance. The range will likely fall within discrete partitions so you will benefit from partition pruning (assuming the range value is an a indexed column ideally the partition key).
Roughly speaking I would start with range of 5m, and run concurrent instances that match the number of CPU cores - 2; this is not a scientifically derive number just one that I tend to use as my first stab and see what happens.

How to view throughput using a SELECT?

This seems like it should be simple. I am trying to get an idea of the read transfer rate on a DB using a SELECT statement. e.g. I want 'x MB/sec'. The table to be used in this query has millions of rows.
I have tried using SET IO STATISTICS ON/OFF on either side of my SELECT but this doesn't return the transfer rate to me, I just get the number of rows affected and # of reads, writes.
A simple way I can think of is to load up perfmon on your machine and watch it while you're doing the select. This will give you the transfer rate to your machine from the DB.
If you want to know the IO to disk on the DB, then you'll probably have to stop all other loads, load up perfmon on the DB, and watch it while you're executing the select. This result is highly dependent on how much of the data is already in the cache.
If you can't isolate your select, then you can average what your baseline is and see how much more throughput there is during your select.
If you can't pull up perfmon, then you can see if the relevant counters are in sys.dm_os_performance_counters (http://technet.microsoft.com/en-us/library/ms187743.aspx).

libpq very slow for large (20 million record) database

I am new to SQL/RDBMS.
I have an application which adds rows with 10 columns in PostgreSQL server using the libpq library. Right now, my server is running on same machine as my visual c++ application.
I have added around 15-20 million records. The simple query of getting total count is taking 4-5 minutes using select count(*) from <tableName>;.
I have indexed my table with the time I am entering the data (timecode). Most of the time I need count with different WHERE / AND clauses added.
Is there any way to make things fast? I need to make it as fast as possible because once the server moves to network, things will become much slower.
Thanks
I don't think network latency will be a large factor in how long your query takes. All the processing is being done on the PostgreSQL server.
The PostgreSQL MVCC design means each row in the table - not just the index(es) - must be walked to calculate the count(*) which is an expensive operation. In your case there are a lot of rows involved.
There is a good wiki page on this topic here http://wiki.postgresql.org/wiki/Slow_Counting with suggestions.
Two suggestions from this link, one is to use an index column:
select count(index-col) from ...;
... though this only works under some circumstances.
If you have more than one index see which one has the least cost by using:
EXPLAIN ANALYZE select count(index-col) from ...;
If you can live with an approximate value, another is to use a Postgres specific function for an approximate value like:
select reltuples from pg_class where relname='mytable';
How good this approximation is depends on how often autovacuum is set to run and many other factors; see the comments.
Consider pg_relation_size('tablename') and divide it by the seconds spent in
select count(*) from tablename
That will give the throughput of your disk(s) when doing a full scan of this table. If it's too low, you want to focus on improving that in the first place.
Having a good I/O subsystem and well performing operating system disk cache is crucial for databases.
The default postgres configuration is meant to not consume too much resources to play nice with other applications. Depending on your hardware and the overall utilization of the machine, you may want to adjust several performance parameters way up, like shared_buffers, effective_cache_size or work_mem. See the docs for your specific version and the wiki's performance optimization page.
Also note that the speed of select count(*)-style queries have nothing to do with libpq or the network, since only one resulting row is retrieved. It happens entirely server-side.
You don't state what your data is, but normally the why to handle tables with a very large amount of data is to partition the table. http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html
This will not speed up your select count(*) from <tableName>; query, and might even slow it down, but if you are normally only interested in a portion of the data in the table this can be helpful.

How long should a query that returns 5 million records take?

I realise the answer should probably be 'as little time as possible' but I'm trying to learn how to optimise databases and I have no idea what an acceptable time is for my hardware.
For a start I'm using my local machine with a copy of sql server 2008 express. I have a dual-core processor, 2GB ram and a 64bit OS (if that makes a difference). I'm only using a simple table with about 6 varchar fields.
At first I queried the data without any indexing. This took a ridiculously long amount of time so I cancelled and added a clustered index (using the PK) to the table. This cut the time down to 1 minute 14 sec. I have no idea if this is the best I can get or whether I'm still able to cut this down even further?
Am I limited by my hardware or is there anything else I can do to my table/database/queries to get results faster?
FYI I'm only using a standard SELECT * FROM <Table> to retrieve my results.
EDIT: Just to clarify, I'm only doing this for testing purposes. I don't NEED to pull out all the data, I'm just using that as a consistent test to see if I can cut down the query times.
I suppose what I'm asking is: Is there anything I can do to speed up the performance of my queries other than a) upgrading hardware and b) adding indexes (assuming the schema is already good)?
I think you are asking the wrong question.
First of all - why do you need so many articles at one time on the local machine? What do you want to do with them? I'm asking because I think you want to transfer this of data to somewhere, so you should be measuring how long it takes to transfer the data.
Some advice:
Your applications should not select 5 million records at the time. Try to split your query and get the data in smaller sets.
UPDATE:
Because you are doing this for testing, I suggest that you
Remove * from your query - it takes SQL server some time to resolve this.
Put your data in temporary storage, try using VIEW or a temporary table for this.
Use plan caching on your server
to improve performance. But even if you're just testing, I still don't understand why you would need such tests if your application would never use such a query. Testing just for the sake of testing is a bad use of time
Look at the query execution plan. If your query is doing a table scan, it will obviously take a long time. The query execution plan can help you decide what kind of indexing you would need on the table. Also, creating table partitions can help sometimes in cases where the data is partitioned by a condition (usually date and time).
I did 5.5 million in 20 seconds. That's taking over 100k schedules with different frequencies and forecasting them for the next 25 years. Just max scenario testing, but proves the speed you can achieve in a scheduling system as an example.
The best optimized way depends on the indexing strategy you choose. As many of the above answers, i too would say partitioning the table would help sometimes. And its not the best practice to query all the billion record in a single time frame. Will give you much better results if you could try to query partially with the iterations. you may check this link to clear the doubts on the minimum requirements for the Sql server 2008 Minimum H/W and S/W Requirements for Sql server 2008
When fecthing 5 million rows you are almost 100% going spool to tempdb. you should try to optimize your temp Db by adding additional files. if you have multiple drives on seperate disks you should split the table data into different ndf files located on seperate disks. parititioning wont help when querying all the data on the disk
U can also use a query hint to force parrallelism MAXDOP this will increase the CPU utilization. Ensure that the columns contain few nulls as possible and rebuild ur indexes and stats

SQL Server single query memory usage

I would like to find out or at least estimate how much memory does a single query (a specific query) eats up while executing. There is no point in posting the query here as I would like to do this on multiple queries and see if there is a change over different databases. Is there any way to get this info?
Using SQL Server 2008 R2
thanks
Gilad.
You might want to take a look into DMV (Dynamic Management Views) and specifically into sys.dm_exec_query_memory_grants. See for example this query (taken from here):
DECLARE #mgcounter INT
SET #mgcounter = 1
WHILE #mgcounter <= 5 -- return data from dmv 5 times when there is data
BEGIN
IF (SELECT COUNT(*)
FROM sys.dm_exec_query_memory_grants) > 0
BEGIN
SELECT *
FROM sys.dm_exec_query_memory_grants mg
CROSS APPLY sys.dm_exec_sql_text(mg.sql_handle) -- shows query text
-- WAITFOR DELAY '00:00:01' -- add a delay if you see the exact same query in results
SET #mgcounter = #mgcounter + 1
END
END
While issuing the above query it will wait until some query is running and will collect the memory data. So to use it, just run the above query and after that your query that you want to monitor.
What do you mean by "how much memory a query eats up?", and why exactly do you want to know?
I don't think memory in SQL Server works the way you might imagine - memory management in SQL Server is an incredibly complex topic - you could easily write entire books about SQL Servers memory management. I can't claim to know that much about SQL Servers memory management, but I do know that there is pretty much no useful information that you can extrapolate from knowing how much memory a single query uses up.
That said, if you did want to have a go at understanding whats going on with memory when you execute a query then I would probably start with looking at the buffer pool. Nearly all memory in SQL Server is organised into 8KB chunks (the same size as a page) of memory that can be used to store anything from a data page or index page to a cached query plans. The buffer pool is the main memory component in SQL Server - All 8KB chunks of memory not in use elsewhere remains in the buffer pool to be used as a cache for data pages.
Note that in order for a data page or index page to be used it must exist in memory - this means that if it doesn't already exist in memory elsewhere ready for use, a free buffer must be made available to ready the page in to. The buffer pool serves both as a pool of "expendable" free buffers, and a cache of pages already present in memory.
You can examine whats in the buffer pool using DMVs, there is a suitable query listed on this page:
What's swimming in your bufferpool?
By cleaning out your buffer pool using the command DBCC DROPCLEANBUFFERS (DONT DO THIS ON A PRODUCTION SQL SERVER!!!) and then executing your query, in theory the new pages that appear in the buffer pool should be the pages that were used in the last query.
This can give you a rough idea of the data and index pages used in a query, however doesn't cover other areas of SQL Server where memory is used, such as in the query plan cache, SQL Server Workers etc..
Like I said, SQL Server memory management is complex - If you really want to know more I recommend that you buy a book on SQL Server internals.
Update: You can also use the query statistics to view the aggregate performance statistics for a query including "physical reads" (pages read from the disk) and "logcal reads" (pages read from the buffer pool). See this page for a suitable query.
This might also give you some more hints on how much memory a query is using, however beware - playing around I found queries that performed many more logical reads than they did physical reads, which as far as I can work out meant that they read the same pages over and over again, i.e. 100 logical reads != 100 pages used in the buffer pool.