Select statement low performance on simple table - sql

Using Management Studio, I have a table with the six following columns on my SQL Server:
FileID - int
File_GUID - nvarchar(258)
File_Parent_GUID - nvarchar (258)
File Extension nvarchar(50)
File Name nvarchar(100)
File Path nvarchar(400)
It has a primary key on FileID.
This table has around 200M rows.
If I try and process the full data, I receive an memory error.
So I have decided to load this in partitions, using a select statement in every 20M where I split on the FileID number.
These selects take forever, the retrieval of rows is extremely slow, I have no idea why.. There are no calculations whatsoever, just a pull of data using a SELECT.
When I ran the query analyzer I see:
Select cost = 0%
Clustered Index Cost = 100%
Do you guys have any idea on why this could be happening or maybe some tips that I can apply ?
My query:
Select * FROM Dim_TFS_File
Thank you!!

Monitor the query while it's running to see if it's blocked or waiting on resources. If you can't easily see where the bottleneck is during monitoring the database and client machines, I suggest you run a couple of simple tests to help identify where you should focus your efforts. Ideally, run the tests with no other significant activity and a cold cache.
First, run the query on the database server and discard the results. This can be done from SSMS with the discard results option (Query-->Query Options-->Results-->Grid-->Discard Results after execution). Alternatively, use a Powershell script like the one below:
$connectionString = "Data Source=YourServer;Initial Catalog=YourDatabase;Integrated Security=SSPI;Application Name=Performance Test";
$connection = New-Object System.Data.SqlClient.SqlConnection($connectionString);
$command = New-Object System.Data.SqlClient.SqlCommand("SELECT * FROM Dim_TFS_File;", $connection);
$command.CommandTimeout = 0;
$sw = [System.Diagnostics.Stopwatch]::StartNew();
$connection.Open();
[void]$command.ExecuteNonQuery(); #this will discard returned results
$connection.Close();
$sw.Stop();
Write-Host "Done. Elapsed time is $($sw.Elapsed.ToString())";
Repeat the above test on the client machine. The elapsed time difference reflects data transfer network overhead. If the client machine test is significantly faster than the application, focus you efforts on the app code. Otherwise, take a closer look at database and network. Below are some random notes that might help remediate performance issues.
This trivial query will likely perform a full clustered index scan. The limiting performance factors on the database server will be:
CPU: Throughtput of this single-threaded query will be limited by spead of a single CPU core.
Storage: The SQL Server storage engine will use read-ahead reads during large scans to fetch data asynchronously so that data will already be in memory by the time it is needed by the query. Sequential read performance is important to keep up with the query.
Fragmentation: Fragmentation will result in more disk head movement against spinning media, adding several milliseconds per physical disk IO. This is typically a consideration only for large sequential scans on a single-spindle or low-end local storage arrays, not SSD or enterprise class SAN. Fragmentation can be eliminated with a reorganizing or rebuilding the clustered index. Be sure to specify MAXDOP 1 for rebuilds for maximum benefits.
SQL Server streams results as fast as they can be consumed by the client app but the client may be constrained by network bandwidth and latency. It seems you are returning many GB of data, which will take quite some time. You can reduce bandwidth needs considerably with different data types. For example, assuming the GUID-named columns actually contain GUIDs, using uniqueidentifier instead of nvarchar will save about 80 bytes per row over the netowrk and on disk. Similarly, use varchar instead of nvarchar if you don't actually need Unicode characters to cut data size by half.
Client processing time: The time to process 20M rows by the app code will be limited by CPU and code efficiency (especially memory management). Since you ran out of memory, it seems you are either loading all rows into memory or have a leak. Even without an outright out of memory error, high memory usage can result in paging and greatly slow throughput. Importantly, the database and network performance is moot if the app code can't process rows as fast as data are returned.

Related

What happens when a SQL query runs out of memory?

I want to set up a Postgres server on AWS, the biggest table will be 10GB - do I have to select 10GB of memory for this instance?
What happens when my query result is larger than 10GB?
Nothing will happen, the entire result set is not loaded into memory. The maximum available memory will be used and re-used as needed while the result is prepared and will spill over to disk as needed.
See PostgreSQL resource documentation for more info.
Specifically, look at work_mem:
work_mem (integer)
Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files.
As long as you don't run out of working memory on a single operation or set of parallel operations you are fine.
Edit: The above was an answer to the question What happens when you query a 10GB table without 10GB of memory on the server/instance?
Here is an updated answer to the updated question:
Only server side resources are used to produce the result set
Assuming JDBC drivers are used, by default, the entire result set is sent to your local computer which could cause out of memory errors
This behavior can be changed by altering the fetch size through the use of a cursor.
Reference to this behavior here
Getting results based on a cursor
On the server side, with a simple query like yours it just keeps a "cursor" which points to where it's at, as it's spooling the results to you, and uses very little memory. Now if there were some "sorts" in there or what not, that didn't have indexes it could use, that might use up lots of memory, not sure there. On the client side the postgres JDBC client by default loads the "entire results" into memory before passing them back to you (overcomeable by specifying a fetch count).
With more complex queries (for example give me all 100M rows, but order them by "X" where X is not indexed) I don't know, but probably internally it creates a temp table (so it won't run out of RAM) which, treated as a normal table, uses disk backing. If there's a matching index then it can just traverse that, using a pointer, still uses little RAM.

INSERT INTO goes much slower with time in SQL Server 2012

We have a very big database WriteDB, which store raw trading data and we use this table to fast writes. Then with sql scripts I import data from WriteDB into ReadDB in comparatively the same table, but extended with some extra values + relation added. Import script is like that:
TRUNCATE TABLE [ReadDB].[dbo].[Price]
GO
INSERT INTO [ReadDB].[dbo].[Price]
SELECT a.*, 0 as ValueUSD, 0 as ValueEUR
from [WriteDB].[dbo].[Price] a
JOIN [ReadDB].[dbo].[Companies] b ON a.QuoteId = b.QuoteID
So initially there is around 130 mil. rows in this table (~50GB). Each day some of them added, some of them changes, so right now we decide not over complicate logic and just re-import all data. The problem that for some reason with time this script works longer and longer, on the almost same amount of data. First run it's take ~1h, now it's already taken 3h
Also SQL Server after import work not well. After import (or during it) if I try to run different queries, even the simplest they often fail with timeout errors.
What is the reason of such bad behavior and how to fix this?
One theory is that your first 50GB dataset has filled available memory for caching. Upon truncating the table, your cache is now effectively empty. This alternating behavior makes effective use of the cache difficult and incurs a substantial number of cache misses / increased IO time.
Consider the following sequence of events:
You load your initial dataset into WriteDb. During the load operation, pages in WriteDb are cached. There's very little memory contention because there's only one copy of the dataset and sufficient memory.
You initially populate ReadDb. The pages required to populate ReadDb (the data in WriteDb) are already largely cached. Fewer reads are required from disk, and your IO time can be dedicated to writing the inserted data for ReadDb. (This is your fast first run.)
You load your second dataset into WriteDb. During the load operation, there is insufficient memory to cache both existing data in ReadDb and new data written to WriteDb. This memory contention leads to fewer pages of WriteDb cached.
You truncate ReadDb. This invalidates a substantial portion of your cache (i.e. the 50GB of ReadDb data that was cached).
You then attempt your second load of ReadDb. Here you have very little of WriteDb cached, so your IO time is split between reading pages of WriteDb (your query) and writing pages of ReadDb (your insert). (This is your slow second run.)
You could test this theory by comparing the SQL Server cache miss ratio during your first and second load operations.
Some ways to improve performance might be to:
Use separate disk arrays for ReadDb / WriteDb to increase parallel IO performance.
Increase the available cache (amount of server memory) to accomodate the combined size of ReadDb + WriteDb and minimize cache misses.
Minimize the impact of each load operation on existing cached pages by using a MERGE statement instead of dumping / loading 50GB of data at a time.

Oracle 10g Full table scan(parallel access) 100x times faster than index access by rowid

There was a query in production which was running for several hours(5-6) hours. I looked into its execution plan, and found that it was ignoring a parallel hint on a huge table. Reason - it was using TABLE ACCESS BY INDEX ROWID. So after I added a /*+ full(huge_table) */ hint before the parallel(huge_table) hint, the query started running in parallel, and it finished in less than 3 minutes. What I could not fathom was the reason for this HUGE difference in performance.
The following are the advantages of parallel FTS I can think of:
Parallel operations are inherently fast if you have more idle CPUs.
Parallel operations in 10g are direct I/O which bypass
buffer cache which means there is no risk of "buffer busy waits" or
any other contention for buffer cache.
Sure there are the above advantages but then again the following disadvantages are still there:
Parallel operations still have to do I/O, and this I/O would be more than what we have for TABLE ACCESS BY INDEX ROWID as the entire table is scanned and is costlier(all physical reads)
Parallel operations are not very scalable which means if there aren't enough free resources, it is going to be slow
With the above knowledge at hand, I see only one reason that could have caused the poor performance for the query when it used ACCESS BY INDEX ROWID - some sort of contention like "busy buffer waits". But it doesn't show up on the AWR top 5 wait events. The top two events were "db file sequential read" and "db file scattered read". Is there something else that I have missed to take into consideration? Please enlighten me.
First, without knowing anything about your data volumes, statistics, the selectivity of your predicates, etc. I would guess that the major benefit you're seeing is from doing a table scan rather than trying to use an index. Indexes are not necessarily fast and table scans are not necessarily slow. If you are using a rowid from an index to access a row, Oracle is limited to doing single block reads (sequential reads in Oracle terms) and that it's going to have to read the same block many times if the block has many rows of interest. A full table scan, on the other hand, can do nice, efficient multiblock reads (scattered reads in Oracle terms). Sure, an individual single block read is going to be more efficient than a single multiblock read but the multiblock read is much more efficient per byte read. Additionally, if you're using an index, you've potentially got to read a number of blocks from the index periodically to find out the next rowid to read from the table.
You don't actually need to read all that much data from the table before a table scan is more efficient than an index. Depending on a host of other factors, the tipping point is probably in the 10-20% range (that's a very, very rough guess). Imagine that you had to get a bunch of names from the phone book and that the phone book had an index that included the information you're filtering on and the page that the entry is on. You could use an index to find the name of a single person you want to look at, flip to the indicated page, record the information, flip back to the index, find the next name, flip back, etc. Or you could simply start at the first name, scan until you find a name of interest, record the information, and continue the scan. It doesn't take too long before you're better off ignoring the index and just reading from the table.
Adding parallelism doesn't reduce the amount of work your query does (in fact, adding in parallel query coordination means that you're doing more work). It's just that you're doing that work over a shorter period of elapsed time by using more of the server's available resources. If you're running the query with 6 parallel slaves, that could certainly allow the query to run 5 times faster overall (parallel query obviously scales a bit less than linearly because of overheads). If that's the case, you'd expect that doing a table scan made the query 20 times faster and adding parallelism added another factor of 5 to get your 100x improvement.

Performance of queries using count(*) on tables with many rows (300 million+)

I understand there are limitations to using sqlite, but I'd like to know if it should be able to handle this scenario.
My table has over 300 million records and the db is about 12 gigs. The data import util with sqlite is nice and fast. But then I added an index to a string column in this table, and it ran all night to complete this operation. I haven't compared this to other db's, but seemed quite slow to me.
Now that my index is added, I'm wanting to look for duplicates in the data. So I'm trying to run a "having count > 0" query and it seems to be taking hours as well. My query looks like:
select col1, count(*)
from table1
group by col1
having count(*) > 1
I would assume this query would use my index on col1, but the slow query execution makes me wonder if it is not?
Would perhaps sql server handle this kind of thing better?
SQLite's count() isn't optimized - it does a full table scan even if indexed. Here is the recommended approach to speed things up. Run EXPLAIN QUERY PLAN to verify and you'll see:
EXPLAIN QUERY PLAN SELECT COUNT(FIELD_NAME) FROM TABLE_NAME;
I get something like this:
0|0|0|SCAN TABLE TABLE_NAME (~1000000 rows)
But then I added an index to a string column in this table, and it ran all night to complete this
operation. I haven't compared this to other db's, but seemed quite slow to me.
I hate to tell yuo, but how does your server look like? Not arguing, but that is a possibly very resoruce intensive operation that may require a lot of IO and normal computers or chehap web servers with a slow hard disc are not suited for significant database work. I run hundreds og gigabyte db project work and my smallest "large data" server has 2 SSD and 8 Velociraptors for data and log. The largest one has 3 storage nodes with a total of 1000gb SSD discs - simply because IO is what a db server lives and breathes on.
So I'm trying to run a "having count > 0" query and it seems to be taking hours as well
How much RAM? ENough to fit it all in memory, or a low memory virtual server where the missing memory blows up to bad IO? How much memory can / does SqlLite use? How is the temp setup? In memory? Sql server would possibly use a lot of memory / tempdb space for this type of check.
increase the sqlite cache via PRAGMA cache_size=<number of pages>. the memory used is <number of pages> times <size of page>. (which can be set via PRAGMA page_size=<size of page>)
by setting those values to 16000 and 32768 respectively (or about 512MB), i was able to get this one program's bulk load down from 20mins to 2mins. (although i think that if the disk on that system wasn't so slow, this might not have had as much effect)
but you might not have this extra memory available on lesser embedded platforms, i don't recommend increasing it as much as i did on those, but for desktop or laptop level systems it can greatly help.

SQL Server single query memory usage

I would like to find out or at least estimate how much memory does a single query (a specific query) eats up while executing. There is no point in posting the query here as I would like to do this on multiple queries and see if there is a change over different databases. Is there any way to get this info?
Using SQL Server 2008 R2
thanks
Gilad.
You might want to take a look into DMV (Dynamic Management Views) and specifically into sys.dm_exec_query_memory_grants. See for example this query (taken from here):
DECLARE #mgcounter INT
SET #mgcounter = 1
WHILE #mgcounter <= 5 -- return data from dmv 5 times when there is data
BEGIN
IF (SELECT COUNT(*)
FROM sys.dm_exec_query_memory_grants) > 0
BEGIN
SELECT *
FROM sys.dm_exec_query_memory_grants mg
CROSS APPLY sys.dm_exec_sql_text(mg.sql_handle) -- shows query text
-- WAITFOR DELAY '00:00:01' -- add a delay if you see the exact same query in results
SET #mgcounter = #mgcounter + 1
END
END
While issuing the above query it will wait until some query is running and will collect the memory data. So to use it, just run the above query and after that your query that you want to monitor.
What do you mean by "how much memory a query eats up?", and why exactly do you want to know?
I don't think memory in SQL Server works the way you might imagine - memory management in SQL Server is an incredibly complex topic - you could easily write entire books about SQL Servers memory management. I can't claim to know that much about SQL Servers memory management, but I do know that there is pretty much no useful information that you can extrapolate from knowing how much memory a single query uses up.
That said, if you did want to have a go at understanding whats going on with memory when you execute a query then I would probably start with looking at the buffer pool. Nearly all memory in SQL Server is organised into 8KB chunks (the same size as a page) of memory that can be used to store anything from a data page or index page to a cached query plans. The buffer pool is the main memory component in SQL Server - All 8KB chunks of memory not in use elsewhere remains in the buffer pool to be used as a cache for data pages.
Note that in order for a data page or index page to be used it must exist in memory - this means that if it doesn't already exist in memory elsewhere ready for use, a free buffer must be made available to ready the page in to. The buffer pool serves both as a pool of "expendable" free buffers, and a cache of pages already present in memory.
You can examine whats in the buffer pool using DMVs, there is a suitable query listed on this page:
What's swimming in your bufferpool?
By cleaning out your buffer pool using the command DBCC DROPCLEANBUFFERS (DONT DO THIS ON A PRODUCTION SQL SERVER!!!) and then executing your query, in theory the new pages that appear in the buffer pool should be the pages that were used in the last query.
This can give you a rough idea of the data and index pages used in a query, however doesn't cover other areas of SQL Server where memory is used, such as in the query plan cache, SQL Server Workers etc..
Like I said, SQL Server memory management is complex - If you really want to know more I recommend that you buy a book on SQL Server internals.
Update: You can also use the query statistics to view the aggregate performance statistics for a query including "physical reads" (pages read from the disk) and "logcal reads" (pages read from the buffer pool). See this page for a suitable query.
This might also give you some more hints on how much memory a query is using, however beware - playing around I found queries that performed many more logical reads than they did physical reads, which as far as I can work out meant that they read the same pages over and over again, i.e. 100 logical reads != 100 pages used in the buffer pool.