Select ... for update vs select on large amount of data

Select ... for update vs select on large amount of data - sql

Assume a huge table with several hundred million records and columns are well indexed. Is there any performance concern between
SELECT * from HUGE_TABLE where ... AND ... FOR UPDATE
and
SELECT * from HUGE_TABLE where ... AND ...
the main reason for the FOR UPDATE clause is because we may have several instance of the application running the same query at same time, but need to avoid update conflicts.
At this point is there I concern about two performance issue: 1. if there is no other query running, is select for update slower?. 2. if there are many other active query to select / update on the huge table, what will be the performance to this entire situation on this table (also update in the question)

SELECT ... FOR UPDATE creates at least two performance issues compared to a regular SELECT:
Blocking other sessions. This is a bit obvious and you already understand this, but it's worth mentioning that of course creating more locks can cause performance issues. The good news is that Oracle never escalates locks, so it will only lock exactly what you ask it too.
Writing lock data. The FOR UPDATE works by making a small update to each relevant block. It acts like a regular change in some ways, and will create redo and undo records as you can see by running queries like select used_urec from gv$transaction;. Depending on how many rows are locked, this could be significantly more expensive than a regular SELECT, even if no other sessions are involved.

Related

Lock SQL Table/Row and using NOLOCK/READPAST [duplicate]

This question already has answers here:
Effect of NOLOCK hint in SELECT statements
(5 answers)
SQL 2008+ NOLOCK vs READPAST Considerations for Reporting Accuracy
(3 answers)
Closed last year.
I am the end-user of a highly updated Microsoft SQL Server DB containing dozens of tables with hunreds of millions of rows each.
A banking DB is a good example for what I am working with, with the exception that in my DB UPDATE statement are rearly used and INSERT statements are used frequently (once a row as entered a table, it rarely changes).
I, personally, not using any UPDATE/INSERT statement, only SELECT statement (with complex WHERE/ JOIN/ CROSS/ GROUP clues).
I have some questions about locking and using NOLOCK/READPAST.
1.how can I know if a query I am using is locking only a row or the entire table?
for example, I noticed this query didn't locked other users from inserting new data to the table:
SELECT *
FROM Table
while this query did:
SELECT COUNT(Date)
FROM Table
This is of course just examples, not actual full queris I am using.
As I mentioned, rows rarely changing so locking a row isn't concerning me but locking a table is highly concerning.
2.I would like to know the risks of using NOLOCK/READPAST in my queries (to revoke any concern I might have about locking a table from updating).
I searched about it a lot but I could not find a full answer.
I dont care If by using NOLOCK/READPAST I might get past data (that again, data i rarely changes) or I might miss some newly added data.
I did read in a couple of places that using NOLOCK might cause duplicate data/ corrupted data, this is a problem for me.
3.what exactly is the diffrent between READPASY and NOLOCK? which one is "safer" regarding the concerns mentiond above
Thank you.

This is highly dependent on your servers settings. Generally speaking, you want to lock records, even when you are just reading them because you don't want data to change while you are reading it. This isn't just something that affects updated records, but also inserts. You can learn more about read commits and snapshop isolation here:
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/snapshot-isolation-in-sql-server
Both NOLOCK/READPAST should be avoided at all cost. There are a very small handful of scenarios where these make sense, but they are exceedingly rare. You are better off optimizing your query to perform better and reduce the amount of records being locked and the time that the records spend being locked. One case that I can see NOLOCK being useful would be a log table that only has inserts, and your query doesn't join the data to other tables, AND a dirty record wouldn't cause problems.
NOLOCK doesn't lock records that it reads. The risk here is that records you are reading can literally change mid read. This means you can begin reading a record and get some values for some columns before the update was made and some column values from after the update. If another transaction rolls back you could end up reading records that were never actually committed to the database.
READPAST skips any rows that are locked. If another query runs and the criteria causes rows 1-25 of 100 to be locked while you are querying the same data you are only going to see records 26-100. To your query locked rows don't exist.
Great article with the details:
https://www.mssqltips.com/sqlservertip/4468/compare-sql-server-nolock-and-readpast-table-hints/
You would be far better served by spending time learning to optimize your queries to reduce the number of records they need to lock, and improving the performance so that the amount of time those locks exist is kept to a minimum.

Does it affect performance to frequently repopulate a highly read database table?

I have a database table with about 2500 rows in, which is frequently read by my web application. Will it affect the performance of reading from that table if all of the data in it is frequently (e.g. every 1-5 minutes) deleted and re-inserted?
By that I mean:
DELETE FROM MyTable
INSERT INTO MyTable SELECT ...

Probably not, at the given numbers ...
However, if you have one or more index(es) on your table (to help with read/select, or automatically on any PK/UK ...) you should consider that every delete/insert may result in re-calculation of any such index (on top of the delete/insert as such), not directly affecting table-reads as such, but adding to the overall load on the DB server.
There is no sourcecode, but it appears you are using this table as intermediate/interface to sth. else, so while 'updating' you'd probably want to make sure to bundle your delete(s)/insert(s) in transactions, best you can, rather than e.g. executing them all individually, like in a loop. Or see if you can keep your PKs and rather just update ...?
This could also help reduce fragmentation in the underlying storage ...

locking logic for frequently queried view on frequently updated tables - please advise?

We face the following situation (Teradata):
Business layer frequently executes long-running queries on Table X_Past UNION ALL Table X_Today.
Table X_Today gets updated frequently, say once every 10 minutes. X_Past only once after midnight (per full-load).
Writing process should not block reading process.
Writing should happen as soon as new data is available.
Proposed approach:
2 "Today" and a "past" table, plus a UNION ALL view that selects from one of them based on the value in a load status table.
X_Today_1
X_Today_0
X_Past
loading process with load in X_Today_1 and set the active_table value in the load status table to "X_Today_1"
next time it will load X_Today_0 and set the active_table value to "X_Today_0"
etc.
The view that is used to select on the table will be built as follows:
select *
from X_PAST
UNION ALL
select td1.*
from X_Today_1 td1
, ( select active_table from LOAD_STATUS ) active_tab1
where active_tab1.te_active_table = 'X_Today_1'
UNION ALL
select td0.*
from X_Today_0 td0
, ( select active_table from STATUS_LOG ) active_tab0
where active_tab1.te_active_table = 'X_Today_0'
my main questions:
when executing the select, will there be a lock on ALL tables, or only on those that are actually accessed for data? Since because of the where clause, data from one of the Today_1/0 tables will always be ignored and this table should be availablew for loading;
do we need any form of locking or is the default locking mechanism that what we want (which I suspect it is)?
will this work, or am I overlooking something?
It is important that the loading process will wait in case the reading process takes longer than 20 minutes and the loader is about to refresh the second table again. The reading process should never really be blocked, except maybe by itself.
Any input is much appreciated...
thank you for your help.

A few comments to your questions:
Depending on the query structure, the Optimizer will try to get the default locks (in this case a READ lock) at different levels -- most likely table or row-hash locks. For example, if you do a SELECT * FROM my_table WHERE PI_column = 'value', you should get a row-hash lock and not a table lock.
Try running an EXPLAIN on your SELECT and see if it gives you any locking info. The Optimizer might be smart enough to determine there are 0 rows in one of the joined tables and reduce the lock requests. If it still locks both tables, see the end of this post for an alternative approach.
Your query written as-is will result in READ locks, which would block any WRITE requests on the tables. If you are worried about locking issues / concurrency, have you thought about using an explicit ACCESS lock? This would allow your SELECT to run without ever having to wait for your write queries to complete. This is called a "dirty read", since there could be other requests still modifying the tables while they are being read, so it may or may not be appropriate depending on your requirements.
Your approach seems feasible. You could also do something similar, but instead of having two UNIONs, have a single "X_Today" view that points to the "active" table. After your load process completes, you could re-point the view to the appropriate table as needed via a MACRO call:
-- macros (switch between active / loading)
REPLACE MACRO switch_to_today_table_0 AS
REPLACE VIEW X_Today AS SELECT * FROM X_Today_0;
REPLACE MACRO switch_to_today_table_1 AS
REPLACE VIEW X_Today AS SELECT * FROM X_Today_1;
-- SELECT query
SELECT * FROM X_PAST UNION ALL SELECT * FROM X_Today;
-- Write request
MERGE INTO x_today_0...;
-- Switch active "today" table to must recently loaded one
EXEC switch_to_today_table_0;
You'd have to manage which table to write to (or possible do that using a view too) and which "switch" macro to call within your application.
One thing to think about is that having two physical tables that logically represent the same table (i.e. should have the same data) may potentially allow for situations where one table is missing data and needs to be manually synced.
Also, if you haven't looked at them already, a few ideas to optimize your SELECT queries to run faster: row partitioning, indexes, compression, statistics, primary index selection.

Would a table lock speed up an update statement in Oracle 10g enterprise?

We have a fairly wide table BaseData with some 33 millions rows in it. Then we have an update query that joins it to several other tables containing all kinds of parameters, some functions are applied, there is a group by original Id and then the results are written back to the BaseData table in a few columns.
This process is very slow so I'm looking into ways of speeding it up. I have most of my experience in SQLServer so all this type of internals of Oracle I don't know yet.
One thing I suspect is that during the update Oracle creates versions of every row so any oher readers can read that unaffected row. This however takes up considerable resources. Is there any way to have the update take a write lock on the table so it wouldn't create versions of every row?
Any other tips you guys have for large updates? We already broke it down into batches. Each batch is in a seperate partition of the table and then several updates are run in parallel. But still its all much too slow.

The short answer is that no, in Oracle, taking an exclusive lock on a table won't prevent other sessions from reading it, or having to incur the work of generating a read-consistent view of the data. Similarly, in Oracle, you can't tell a session to enable "dirty reads."
Well, the first question is what's slow - is it all the work of joining and applying functions, or is it the writing back? How does a SELECT my_updated_resultset FROM BASEDATA JOIN... perform compared to your update statement? Have you verified that there's contention between the readers of BaseData and the update process? Also, it's it too slow for the business, or just slower than you think it should be?
Another option to consider is to use partition exchange to perform your updates. The high level concept would be:
CREATE TABLE BASEDATA_XCHG as SELECT * FROM BASEDATA WHERE 1 = 0;
INSERT /*+ append */ INTO BASEDATA_XCHG SELECT my_updated_resultset FROM BASEDATA PARTITION (ONLY_ONE_PARTITION) JOIN...
Create all the required indexes and constraints on the BASEDATA_XCHG table.
ALTER TABLE BASEDATA EXCHANGE PARTITION (ONLY_ONE_PARTITION) WITH BASEDATA_XCHG
If you're updating most of the rows in a partition of BASEDATA table, don't update them - create a new table and exchange it out. Tim Gorman has an excellent paper called "Scaling to Infinity" that covers this concept in greater depth; you may wish to check it out.

In addition to Adam's answer:
Run an EXPLAIN PLAN on your update statement and check the execution plan.
Chances are that adding indexes to support your joins and WHERE conditions can speed up the query.

Oracle uses undo segments for read consistency (along with SCNs, read more here)
I'm assuming these large batch processes are running on a staging area and not a "prod" instance that is being used by a lot of various processes. If you are updating 25% or more (rough figures) of some big table, it may be better to do a CTAS (create table as select...) than attempting updates. Your CTAS would contain the update logic for the new table. Once done, add indexes/grants/etc on new table and rename new to old. You can also add a parallel hint and nologging on the CTAS to potentially speed things up even more.

SQL transaction affecting a big amount of rows

The situation is as follows:
A big production client/server system where one central database table has a certain column that has had NULL as default value but now has 0 as default value. But all the rows created before that change of course still have value as null and that generates a lot of unnecessary error messages in this system.
Solution is of course simple as that:
update theTable set theColumn = 0 where theColumn is null
But I guess it's gonna take a lot of time to complete this transaction? Apart from that, will there be any other issues I should think of before I do this? Will this big transaction block the whole database, or that particular table during the whole update process?
This particular table has about 550k rows and 500k of them has null value and will be affected by the above sql statement.

The impact on the performance of other connected clients depends on:
How fast the servers hardware is
How many indexes containing the column your update statement has to update
Which transaction isolation settings the other clients connect to the database
The db engine will acquire write locks, so when your clients only need read access to the table, it should not be a big problem.
500.000 records sounds not too much for me, but as i said, the time and resources the update takes depends on many factors.
Do you have a similar test system, where you can try out the update?
Another solution is to split the one big update into many small ones and call them in a loop.
When you have clients writing frequently to that table, your update statement might get blocked "forever". I have seen databases where performing the update row by row was the only way of getting the update through. But that was a table with about 200.000.000 records and about 500 very active clients!

it's gonna take a lot of time to complete this transaction
there's no definite way to say this. Depends a lot on the hardware, number of concurrent sessions, whether the table has got locks, the number of interdependent triggers et al.
Will this big transaction block the whole database, or that particular table during the whole update process
If the "whole database" is dependent on this table then it might.
will there be any other issues I should think of before I do this
If the table has been locked by other transaction - you might run into a row-lock situation. In rare cases, perhaps a dead lock situation. Best would be to ensure that no one is utilizing the table, check for any pre-exising locks and then run the statement.

Locking issues are vendor specific.
Asuming no triggers on the table, half a million rows is not much for a dediated database server even with many indexes on the table.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas