Indexing SQL-database slow down inserts too much

Indexing SQL-database slow down inserts too much - sql

I have 2 queries taking too long time, timing out when running them inside an azure website.
1st.
SELECT Value FROM SEN.ValueTable WHERE OptId = #optId
2d
INSERT INTO SEN.ValueTable (Value, OptId)
SELECT Value, OptId FROM REF.ValueTable WHERE OptId = #optId
The both SELECTS will always return 7860 Values. The problem is that I do around 10 of these queries with different #optId. First I ran without any Indexes, then the 1st Query would timeout every now and then. I then added a non-clustered index to SEN.ValueTable and then the 2d Query began to timeout.
1st Query from an Azure VM
2d Query from an Azure-WebApp
I've tried to increase the timeout-time through the .config-files, but they still timeout within 30seconds (There is no time limit from the customer, the retrieving of data from the sql-database will not be the slow thing of the application anyway).
Is there anyway to speed it up/get rid of the timeouts? Will indexing the REF.ValueTable speed the insert up anything?

First, the obvious solution is to add an index to SEN.ValueTable(OptId, Value) and to have no index on REF.ValueTable(OptId, Value). I think this gets around your performance problem.
More importantly, it should be not be taking 30 seconds to fetch or insert 7,860 rows -- nothing like that. So, what else is going on? Is there a trigger on REF.ValueTable() that might be slowing things down? Are there other constraints? Are the columns particularly wide? I mean, if Value is VARCHAR(MAX) and normally 100 Mbytes, then inserting values might be an issue.

If you really run such a query:
SELECT Value, OptId
FROM REF.ValueTable
WHERE OptId = #optId;
The best index for it would be the following:
CREATE INDEX idx_ValueTable_OptId_Value
ON REF.ValueTable (OptId)
INCLUDE (Value);
Any index will slow inserts down, but will benefit read queries. If you want more elaborate answer, post more details - table DDLs and execution plans.

Try resumable online index rebuild -
https://azure.microsoft.com/en-us/blog/resumable-online-index-rebuild-is-in-public-preview-for-azure-sql-db/

Related

Executing same simple select statement or stored procedure on SQL Azure takes long time or times-out

I have two SQL Server Azure instances with Standard S2: 50 DTUs. When I run simple select statements on two instances, one of them takes more time than other or times out. Slower one have more records in tables in slower instance.
Both the instances have same table schema. Number of records in tables present in slower instances, LogEvidence table have 1324928 and LogItem table have 649391. Number of records in tables present in faster instances, LogEvidence table have 89504 and LogItem table have 89496.
Below is the simple select statement
select count(*) from logitem
Above simple select statement takes 0s on faster instance and on slower instance it takes 138s. And if I execute any stored procedure, slower instance takes more times or times out.
Time taken by both instances should be almost same.

Those simple queries perform big scans on the table and involve reading all rows. If the table has a clustered index you don't have to perform a SELECT COUNT(*) to know the number of records the table has. The following query should to that faster:
SELECT OBJECT_NAME(ps.object_id) , i.name , row_count
FROM sys.dm_db_partition_stats AS ps INNER JOIN sys.indexes AS i
ON ps.index_id = i.index_id AND ps.object_id = i.object_id
WHERE i.name like '%logitem%'
If the table does not have an Id please add an autoid on the table and make it the clustered index.
You can also try to add a useless WHERE clause like below to the query, and you may get a better performance.
SELECT count(*)
FROM logitem
WHERE id > 0
Where Id is the autoid column.

I had some experience with azure, and from your description I think there is one of following things you can do:
Since you are using only count, then indexes play no role. Though I understand other answer says to use where id>0, but azure should count 1M rows without 30 second timeout. But for other queries you need Indexes, or Azure will fail.
Check if your server is not under maintenance, it is low chance but it does happen with us, we are on s4 and occasionally our server just get slow, but after 10-30 minute it works fine. Maybe the actual hardware get in some process that slows it down.
This is most important reason for slow execution, especially if you have lot of write and delete happen on your server. Check the database size. Azure database got fragmented too quickly, we have to optimize it's data fragmentation every 10 days, if your bacpac size is 100MB and your database size in Azure shows like 5-6 GB, then it definitely need optimization as lot of fragments were generated. MSDN has given some queries to recreate indexes and remove fragmentation, I don't remember them URL, but simple google search will bring that. It should speed things up.
Azure has feature that auto generate indexes, check if both table share same indexes, maybe your faster version has some index Azure create by itself.

You should step back and ponder your assumption:
1. "performance should be about the same" - you have more data in one case vs. the other. In the limit, you should expect the performance of the second one to potentially be somewhat slower than the original one.
Now, let's go into the "why" it can be slower and how you can investigate each case:
Step 1: Look at the query plans for each case and see what you have. Likely, you will have something like:
StreamAgg <- Clustered Index Scan
(if you have other b-tree indexes, you might scan one of them and it might be faster since the index would not be as wide and thus the index will have fewer pages to scan)
Step 2: You can look at the actual execution times and resource use for each query to see why they are different. One way to do this is to run "set statistics time on", then "set statistics io on", then run your query. it will dump out extra information into SSMS when you run the query from there. (You can read about this here: https://learn.microsoft.com/en-us/sql/t-sql/statements/set-statistics-io-transact-sql?view=sql-server-2017)
If you review the output from each one, you may find reasons for why the performance is different. One possible explanation is that the amount of memory is limited in an S2 and you are just at the boundary for where all the pages fit in memory vs. not for the two examples. In that case, doing a count(*) query would need to cycle through all the pages and do much more IO than in the smaller case where they might all be in memory already.
Step 3: You can also potentially examine the query store to get insight into why one case is fast and one case is not. An overview of how to use it is here:
https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-2017
Note: it is on-by-default in SQL Azure so you can just go look at the time window when you ran the queries to get insight into what was happening at that time in your database.
Finally, you might consider ways to make the query go faster if you need it to be faster.
* creating a narrow b-tree index on the table may help for that one query (count(*) doesn't return any columns and just needs a count of rows from some non-filtered index).
* you could use a Columnstore (which requires an S3 or above for memory reasons). This kind of column-oriented index is optimized for this kind of query and would be much faster as the size of the table increases in the future.
Hope that help

deleting 500 records from table with 1 million records shouldn't take this long

I hope someone can help me. I have a simple sql statement
delete from sometable
where tableidcolumn in (...)
I have 500 records I want to delete and recreate. The table recently grew to over 1 mill records. The problem is the statement above is taking over 5 minutes without completing. I have a primary key and 2 non clustered non unique indexes. My delete statement is using the primary key.
Can anyone help me understand why this statement is taking so long and how I can speed it up?

There are two areas I would look at first, locking and a bad plan.
Locking - run your query and while it is running see if it is being blocked by anything else "select * from sys.dm_exec_requests where blocking_session_id <> 0" if you see anything blocking your request then I would start with looking at:
https://www.simple-talk.com/sql/database-administration/the-dba-as-detective-troubleshooting-locking-and-blocking/
If there is no locking then get the execution plan for the insert, what is it doing? it it exceptionally high?
Other than that, how long do you expect it to take? Is it a little bit longer than that or a lot longer? Did it only get so slow after it grew significantly or has it been getting slower over a long period of time?
What is the I/O performance, what are your average read / write times etc etc.

TL;DR: Don't do that (instead of a big 'in' clause: preload and use a temporary table).
With the number of parameters, unknown backend configuration (even though it should be fine by today's standards) and not able to guess what your in-memory size may be during processing, you may be hitting (in order) a stack, batch or memory size limit, starting with this answer. Also possible to hit an instruction size limit.
The troubleshooting comments may lead you to another answer. My pivot's the 'in' clause, statement size, and that all of these links include advice to preload a temporary table and use that with your query.

SQL Table causes slow running Queries

I have a sql table that has approximately 6,000 records with 17 columns each. If I do a basic search on the table (i.e. Select * from table_Orders)... it takes 1.5 minutes to return all the records! Any query that I run using this table is also very slow. I have reindexed the table so fragmentation is not an issue. This table has 2 nvarchar(max) columns in it that is storing xml data. Returning the table without these columns is extremely fast (less than 1 second). So, i'm guessing it's the xml data that is bogging down the queries. Is there anything I can do to speed up performance of queries that utilize columns with xml in them? Any insight will be greatly appreciated. I don't typically work with xml within sql, so I don't even know where to start.

This sounds like network speed and NOT search times. Even a table scan of 6000 rows only takes a fraction of a second - to search all the rows. Returning those rows to a client though... you're downloading all that data, so you're going to see a difference when you retrieve a lot of it. This has nothing to do with "query performance" and there isn't anything you can do about it unless you can make the network faster or deliver less data.
You can test this by issuing queries searching for a key in your clustered index. Assuming you have a clustered index on RowID...
select RowId, NonXmlColumn where RowId = 3 -- or some other reasonable key
select RowId, XmlColumn where RowId = 3
The search time for those queries will be the same. So, any difference in speed can be attributed to the network.

I would not store that data in a table. I would not store in VARCHAR if I did. Not to sound like a jerk.
SQL Server has a xml datatype: http://msdn.microsoft.com/en-us/library/hh403385.aspx
It says there is limitations. You should see what applies to your scenario.
If you need to preserve the XML - stick it somewhere else and pull from it what ever fields you'll need to search on.

Speed-up SQL Insert Statements

I am facing an issue with an ever slowing process which runs every hour and inserts around 3-4 million rows daily into an SQL Server 2008 Database.
The schema consists of a large table which contains all of the above data and has a clustered index on a datetime field (by day), a unique index on a combination of fields in order to exclude duplicate inserts, and a couple more indexes on 2 varchar fields.
The typical behavior as of late, is that the insert statements get suspended for a while before they complete. The overall process used to take 4-5 mins and now it's usually well over 40 mins.
The inserts are executed by a .net service which parses a series of xml files, performs some data transformations and then inserts the data to the DB. The service has not changed at all, it's just that the inserts take longer than they use to.
At this point I'm willing to try everything. Please, let me know whether you need any more info and feel free to suggest anything.
Thanks in advance.

Sounds like you have exhausted the buffer pools ability to cache all the pages needed for the insert process. Append-style inserts (like with your date table) have a very small working set of just a few pages. Random-style inserts have basically the entire index as their working set. If you insert a row at a random location the existing page that row is supposed to be written to must be read first.
This probably means tons of disk seeks for inserts.
Make sure to insert all rows in one statement. Use bulk insert or TVPs. This allows SQL Server to optimize the query plan by sorting the inserts by key value making IO much more efficient.
This will, however, not realize a big speedup (I have seen 5x in similar situations). To regain the original performance you must bring the working set back into memory. Add RAM, purge old data, or partition such that you only need to touch very few partitions.

drop index's before insert and set them up on completion

How to quickly duplicate rows in SQL

Edit: Im running SQL Server 2008
I have about 400,000 rows in my table. I would like to duplicate these rows until my table has 160 million rows or so. I have been using an statement like this:
INSERT INTO [DB].[dbo].[Sales]
([TotalCost]
,[SalesAmount]
,[ETLLoadID]
,[LoadDate]
,[UpdateDate])
SELECT [TotalCost]
,[SalesAmount]
,[ETLLoadID]
,[LoadDate]
,[UpdateDate]
FROM [DB].[dbo].[Sales]
This process is very slow. and i have to re-issue the query some large number of times Is there a better way to do this?

To do this many inserts you will want to disable all indexes and constraints (including foreign keys) and then run a series of:
INSERT INTO mytable
SELECT fields FROM mytable
If you need to specify ID, pick some number like 80,000,000 and include in the SELECT list ID+80000000. Run as many times as necessary (no more than 10 since it should double each time).
Also, don't run within a transaction. The overhead of doing so over such a huge dataset will be enormous. You'll probably run out of resources (rollback segments or whatever your database uses) anyway.
Then re-enable all the constraints and indexes. This will take a long time but overall it will be quicker than adding to indexes and checking constraints on a per-row basis.

Since each time you run that command it will double the size of your table, you would only need to run it about 9 times (400,000 * 29 = 204,800,000). Yes, it might take a while because copying that much data takes some time.

The speed of the insert will depend on a number of things...the physical disk speed, indexes, etc. I would recommend removing all indexes from the table and adding them back when you're done. If the table is heavily indexed then that should help quite a bit.
You should be able to repeatedly run that query in a loop until the desired number of rows is achieved. Every time you run it you'll double the data, so you'll end up with:
400,000
800,000
1,600,000
3,200,000
6,400,000
12,800,000
25,600,000
51,200,000
102,400,000
204,800,000
After nine executions.

You don't state your SQL database, but most have a bulk loading tool to handle this scenario. Check the docs. If you have to do it with INSERTs, remove all indexes from the table first and reapply them after the data is INSERTed; this will generally be much faster than indexing during insertion.

this may still take a while to run... you might want to turn off logging while you create your data.
INSERT INTO [DB].[dbo].[Sales] (
[TotalCost] ,[SalesAmount] ,[ETLLoadID]
,[LoadDate] ,[UpdateDate]
)
SELECT s.[TotalCost] ,s.[SalesAmount] ,s.[ETLLoadID]
,s.[LoadDate] ,s.[UpdateDate]
FROM [DB].[dbo].[Sales] s (NOLOCK)
CROSS JOIN (SELECT TOP 400 totalcost FROM [DB].[dbo].[Sales] (NOLOCK)) o

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas