deleting 500 records from table with 1 million records shouldn't take this long

deleting 500 records from table with 1 million records shouldn't take this long - sql

I hope someone can help me. I have a simple sql statement
delete from sometable
where tableidcolumn in (...)
I have 500 records I want to delete and recreate. The table recently grew to over 1 mill records. The problem is the statement above is taking over 5 minutes without completing. I have a primary key and 2 non clustered non unique indexes. My delete statement is using the primary key.
Can anyone help me understand why this statement is taking so long and how I can speed it up?

There are two areas I would look at first, locking and a bad plan.
Locking - run your query and while it is running see if it is being blocked by anything else "select * from sys.dm_exec_requests where blocking_session_id <> 0" if you see anything blocking your request then I would start with looking at:
https://www.simple-talk.com/sql/database-administration/the-dba-as-detective-troubleshooting-locking-and-blocking/
If there is no locking then get the execution plan for the insert, what is it doing? it it exceptionally high?
Other than that, how long do you expect it to take? Is it a little bit longer than that or a lot longer? Did it only get so slow after it grew significantly or has it been getting slower over a long period of time?
What is the I/O performance, what are your average read / write times etc etc.

TL;DR: Don't do that (instead of a big 'in' clause: preload and use a temporary table).
With the number of parameters, unknown backend configuration (even though it should be fine by today's standards) and not able to guess what your in-memory size may be during processing, you may be hitting (in order) a stack, batch or memory size limit, starting with this answer. Also possible to hit an instruction size limit.
The troubleshooting comments may lead you to another answer. My pivot's the 'in' clause, statement size, and that all of these links include advice to preload a temporary table and use that with your query.

Related

Postgres SQL sentence performance

I´ve a Postgres instance running in a 16 cores/32 Gb WIndows Server workstation.
I followed performance improvements tips I saw in places like this: https://www.postgresql.org/docs/9.3/static/performance-tips.html.
When I run an update like:
analyze;
update amazon_v2
set states_id = amazon.states_id,
geom = amazon.geom
from amazon
where amazon_v2.fid = amazon.fid
where fid is the primary key in both tables and both has 68M records, it takes almost a day to run.
Is there any way to improve the performance of SQL sentences like this? Should I write a stored procedure to process it record by record, for example?

You don't show the execution plan but I bet it's probably performing a Full Table Scan on amazon_v2 and using an Index Seek on amazon.
I don't see how to improve performance here, since it's close to optimal already. The only think I can think of is to use table partitioning and parallelizing the execution.
Another totally different strategy, is to update the "modified" rows only. Maybe you can track those to avoid updating all 68 million rows every time.

Your query is executed in a very log transaction. The transaction may be blocked by other writers. Query pg_locks.
Long transactions have negative impact on performance of autovacuum. Does execution time increase other time? If,so check table bloat.
Performance usually increases when big transactions are dived into smaller. Unfortunately, the operation is no longer atomic and there is no golden rule on optimal batch size.
You should also follow advice from https://stackoverflow.com/a/50708451/6702373
Let's sum it up:
Update modified rows only (if only a few rows are modified)
Check locks
Check table bloat
Check hardware utilization (related to other issues)
Split the operation into batches.
Replace updates with delete/truncate & insert/copy (this works if the update changes most rows).
(if nothing else helps) Partition table

Select LIMIT 1 takes long time on postgresql

I'm running a simple query on localhost PostgreSQL database and it runs too long:
SELECT * FROM features LIMIT 1;
I expect such query to be finished in a fraction of a second as it basically says "peek anywhere in the database and pick one row". Or it doesn't?
table size is 75GB with estimated row count 1.84405e+008
I'm the only user of the database
the database server was just started, so I guess nothing is cached in memory

I totally agree with #larwa1n with the content he comment on your post.
The reason here, I guess, is the performance of SELECT is too slow.
With my experience maybe there are another reasons. I list as below:
The table is too big, so let add some WHERE CLAUSE and INDEX
The performance of your server/disk drive is too slow.
Other process take most resource.
Another reason maybe come from maintenance task, let check again does the autovacuum is running? If not, check is this table is vacuum already? If not, let do a vacuum full on that table. Sometimes, when you do a lot of insert/update/delete on a large table without vacuum will make the table save in fragmented disk block, which will take longer time in query.
Hopefully, this answer will help you find out the final reason.

BigQuery query taking a long time

a simple count query on one of my tables takes a long time to complete (~18 secs), this table has around half a million rows, and making the same query in a bigger table (around 3 mil) takes less than 3 secs. The schema is exactly the same and the query is a simple SELECT count(*) FROM [dataset.table]
Any ideas why this is happening and what can I do to prevent it?

It looks like the issue with your table is that it was created in a lot of small chunks; this takes more work to query, since we spend a lot of time on filesystem operations (listing files and opening them).
Even so, a table the size of yours should not be so slow; BigQuery is currently experiencing high filesystem load that is causing high variability in latency. We're actively working on resolving this one. So that is the first problem.
The second problem is that we probably should do a better job of compacting the table. I've filed an internal bug that we should tweak our heuristics to be a bit more aggressive in compaction.
As a workaround, you can compact the table manually by copying the table in place. In other words, run a SELECT * from ... and writing the output to the same table, using writeDisposition:WRITE_TRUNCATE, destinationTable:<your table> and allowLargeResults:true and flattenSchema:false.
Again, this last step shouldn't be needed, but for now it should improve your situation.

View Clustered Index Seek over 0.5 million rows takes 7 minutes

Take a look at this execution plan: http://sdrv.ms/1agLg7K
It’s not estimated, it’s actual. From an actual execution that took roughly 30 minutes.
Select the second statement (takes 47.8% of the total execution time – roughly 15 minutes).
Look at the top operation in that statement – View Clustered Index Seek over _Security_Tuple4.
The operation costs 51.2% of the statement – roughly 7 minutes.
The view contains about 0.5M rows (for reference, log2(0.5M) ~= 19 – a mere 19 steps given the index tree node size is two, which in reality is probably higher).
The result of that operator is zero rows (doesn’t match the estimate, but never mind that for now).
Actual executions – zero.
So the question is: how the bleep could that take seven minutes?! (and of course, how do I fix it?)
EDIT: Some clarification on what I'm asking here.
I am not interested in general performance-related advice, such as "look at indexes", "look at sizes", "parameter sniffing", "different execution plans for different data", etc.
I know all that already, I can do all that kind of analysis myself.
What I really need is to know what could cause that one particular clustered index seek to be so slow, and then what could I do to speed it up.
Not the whole query.
Not any part of the query.
Just that one particular index seek.
END EDIT
Also note how the second and third most expensive operations are seeks over _Security_Tuple3 and _Security_Tuple2 respectively, and they only take 7.5% and 3.7% of time. Meanwhile, _Security_Tuple3 contains roughly 2.8M rows, which is six times that of _Security_Tuple4.
Also, some background:
This is the only database from this project that misbehaves.
There are a couple dozen other databases of the same schema, none of them exhibit this problem.
The first time this problem was discovered, it turned out that the indexes were 99% fragmented.
Rebuilding the indexes did speed it up, but not significantly: the whole query took 45 minutes before rebuild and 30 minutes after.
While playing with the database, I have noticed that simple queries like “select count(*) from _Security_Tuple4” take several minutes. WTF?!
However, they only took several minutes on the first run, and after that they were instant.
The problem is not connected to the particular server, neither to the particular SQL Server instance: if I back up the database and then restore it on another computer, the behavior remains the same.

First I'd like to point out a little misconception here: although the delete statement is said to take nearly 48% of the entire execution, this does not have to mean it takes 48% of the time needed; in fact, the 51% assigned inside that part of the query plan most definitely should NOT be interpreted as taking 'half of the time' of the entire operation!
Anyway, going by your remark that it takes a couple of minutes to do a COUNT(*) of the table 'the first time' I'm inclined to say that you have an IO issue related to said table/view. Personally I don't like materialized views very much so I have no real experience with them and how they behave internally but normally I would suggest that fragmentation is causing its toll on the underlying storage system. The reason it works fast the second time is because it's much faster to access the pages from the cache than it was when fetching them from disk, especially when they are all over the place. (Are there any (max) fields in the view ?)
Anyway, to find out what is taking so long I'd suggest you rather take this code out of the trigger it's currently in, 'fake' an inserted and deleted table and then try running the queries again adding times-stamps and/or using some program like SQL Sentry Plan Explorer to see how long each part REALLY takes (it has a duration column when you run a script from within the program).
It might well be that you're looking at the wrong part; experience shows that cost and actual execution times are not always as related as we'd like to think.

Observations include:
Is this the biggest of these databases that you are working with? If so, size matters to the optimizer. It will make quite a different plan for large datasets versus smaller data sets.
The estimated rows and the actual rows are quite divergent. This is most apparent on the fourth query. "delete c from #alternativeRoutes...." where the _Security_Tuple5 estimates returning 16 rows, but actually used 235,904 rows. For that many rows an Index Scan could be more performant than Index Seeks. Are the statistics on the table up to date or do they need to be updated?
The "select count(*) from _Security_Tuple4" takes several minutes, the first time. The second time is instant. This is because the data is all now cached in memory (until it ages out) and the second query is fast.
Because the problem moves with the database then the statistics, any missing indexes, et cetera are in the database. I would also suggest checking that the indexes match with other databases using the same schema.
This is not a full analysis, but it gives you some things to look at.

Fyodor,
First:
The problem is not connected to the particular server, neither to the particular SQL Server instance: if I back up the database and then restore it on another computer, the behavior remains the same.
I presume that you: a) run this query in isolated environment, b) the data is not under mutation.
Is this correct?
Second: post here your CREATE INDEX script. Do you have a funny FILLFACTOR? SORT_IN_TEMPDB?
Third: which type is your ParentId, ObjectId? int, smallint, uniqueidentifier, varchar?

Select query takes 3 seconds to pull 330 records. Need Optimization

I have and normal select query that take nearly 3 seconds to execute (Select * from users). There are only 310 records in the user table.
The configuration of the my production server is
SQl Server Express Editon
Server Configuration : Pentium 4 HT, 3 ghz , 2 GB ram
Column Nam Type NULL COMMENTS
Column Nam Type NULL
user_companyId int Not Null
user_userId int Not Null Primary Column
user_FirstName nvarchar(50) Null
user_lastName nvarchar(60) Null
user_logon nvarchar(50) Null
user_password nvarchar(255) Null
user_emailid nvarchar(255) Null
user_status bit Null
user_Notification bit Null
user_role int Null
user_verifyActivation nvarchar(255) Null
user_verifyEmail nvarchar(255) Null
user_loginattempt smallint Null
user_createdby int Null
user_updatedby int Null
user_createddate datetime Null
user_updateddate datetime Null
user_Department nvarchar(1000) Null
user_Designation nvarchar(1000) Null

As there is no where clause this isn't down to indexes etc, Sql will do a full table scan and return all the data. I'd be looking at other things running on the machine, or SQL having run for a long time and used up a lot of VM. Bottom line is that this isn't a SQL issue - it's a machine issue.

Is anything else happening on this machine?
Even if you made worst possible data structure, SELECT * FROM Users should not take 3 seconds for 310 records. Either there is more (a lot more) records inside or there is some problem outside of SQL server (some other process blocking code or hardware issues).

Well, indexes don't much matter here--you're getting a table scan with that query.
So, there are only a few other items that could be bogging this down. One is row size. What columns do you have? If you have tons of text or image columns, this could be causing a delay in bringing these back.
As for your hardware, what's your HDD's RPMs? Remember, this is reading off of a disk, so if there are any other IO tasks being carried out, this will obviously cause a performance hit.

There's a number of things you should consider:
Don't use the Express edition, it's probably called that for a reason. Go with a real DBMS (and, yes, this includes the non-express SQL Server).
Use of "select * from" is always a bad idea unless you absolutely need every column. Change it to get only the columns you need.
Are you using a "where" or "order by" clause. If so, you need to ensure you have indexes set up correctly (even for 330 rows, since tables always get bigger than you think).
Use EXPLAIN, or whatever tool Microsoft provides as an equivalent. It will show you why the query is running slow.
If your DB is on a different machine, there may be network issues (not necessarily problems, you may have stateful packet filters that slow down connections, for example).
Examine the system loads of the boxes you're using. If there are other processes using a lot of CPU grunt or disk I/O, they may be causing slowdown.

The best thing to do is profile the server and also pay attention to what kinds of locks are occurring during the query.
If you are using any isolation level options, or the default, determine if you can lower the isolation level, which will decrease the locking occurring on the table. The more locking that occurs, the more conflicts you will have where the query has to wait for other queries to complete. However, be very careful when lowering the isolation level, because you need to understand how that will effect the data you get back.
Determine if you can add where criteria to your query. Restricting the records you query can reduce locks.
Edit: If you find locking issues, you need to also evaluate any other queries that are being run.

If it's consistently 3 seconds, the problem isn't the query or the schema, for any reason that wouldn't be obviously irrational to the designer. I'd guess it's hardware or network issues. Is there anything you can do with the database that doesn't take 3 seconds? What do SET STATISTICS IO ON and SET STATISTICS TIME ON tell you? I bet there's nothing there that supports 3 seconds.

Without a better indexing strategy, leaving certain columns out will only reduce the impact on the network, which shouldn't be awfulf for only 310 rows. My guess is that it's a locking issue.
So consider using:
SELECT * FROM Users (NOLOCK);
This will mean that you don't respect any locks that are currently on the table by other connections. You may get 'dirty reads', seeing data which hasn't been committed yet - but it could be worthwhile from a performance perspective.
Let's face it - if you've considered the ramifications, then it should be fine...
Rob

The first thing you should do for any performance problem is to be an execution plan for the query - this is a representation of what run path SQL server chooses when it executes your query. Best place to look for info on how to do this is Google - you want a statistics plan as it includes information about how many rows are returned.
This doesn't sound like a problem with the execution plan however, as the query is so simple - in fact I'm fairly sure that query counts as a "trivial plan", i.e. there is only 1 possible plan. 
This leaves locking or hardware issues (is the query only slow on your production database, and is it always slow or does the execution time vary?) the query will attemp to get a shared lock on the whole table - if anyone is writing then you will be blocked from Reading until the writer is finished. You can check to see if this us the source of your problem by looking at the DMVs see http://sqllearningsdmvdmf.blogspot.com/2009/03/top-10-resource-wait-times.html
Finally, there are restrictions on SQL express in terms of CPU utilisation, memory use etc... What is the load on your server like? (operations per second)

Without the table structure to know what your table looks like, we can't answer such a question....
What about not using SELECT * FROM Users, but actually specify which fields you really need from the table??
SELECT user_companyId, user_userId,
user_FirstName, user_lastName, user_logon
FROM Users
How does that perform?? Do you still need 3 seconds for this query, or is this significantly faster?
If you really need all the users and all their attributes, then maybe that's just the time it takes on your system to retrieve that amount of data. The best way to speed up things is to limit the attributes retrieved (do you really need the user's photo??) by specifying a list of fields, and to limit the number of elements retrieved using a WHERE clause (do you REALLY need ALL users? Not just those that.........)
Marc

is there a possible way that the performance may degrade based on the column size (length of the data in the column)..
in your table u got last 2 columns with the size of NVARCHAR(1000), is it always filled with that amount of data..??
am not a sql expert, but consider the packetsize its about to return for 310 records with these 2 columns & without ll be different..
i saw similar post here in stack.. u can just go through this
performance-in-sql

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas