When is a row actually inserted into DB? - sql

When is a row actually inserted into the database? Is it when "INSERT" statement is finished? or when "COMMIT" statement is finished after "INSERT" statement?

Later than you think. The principles here apply generally.
The whole point of the transaction log is to ensure ACID works in case of a power failure just as the INSERT finishes. The INSERT will be rolled forward or rolled back as part of the recovery phase (in most RDBMS)
So, it's more important that the transaction log entry is acknowledged as stored on the media. Then the INSERT can commit.
The data page containing the changed row will end up on disk eventually (checkpoint etc) but not necessarily at the point of successful commit.
However, the data page is in memory and available for use.
Note, an INSERT could cause a page split, indexes to be updated, triggers to fire etc so what I've said is simplified.
And it doesn't matter one way or the other when the data ends up on disk: as long as I can get the data and it's safe in case of, say, power failure
An oldie but still relevant for SQL Server: SQL Server 2000 I/O Basics
And what I've summarized is Write Ahead Logging

If you are running inside a transaction, when the transaction is committed. Otherwise, immediately.

Depends on the database/table implementation. It might just be when the transaction log is integrated - until which time the row is only inserted in the transaction log, and in memory.

Related

What does ExecuteNonQuery() do during a bulk insert?

I have some bulk insert vb.net code (working) that I have written. It calls ExecuteNonQuery() for each insert and then at the end does a commit().
My question is on where these inserts are placed, while waiting for the commit() command? I have not made any changes to support batching as yet. So with my existing code a million rows will be inserted before calling commit(). I ask this question obviously to know if I will run into memory issues, hence forcing me to makes changes to my code now.
In the normal rollback journal mode, changes are simply written to the database. However, to allow atomic commits, the previous contents of all changed database pages are written to the rollback journal so that a rollback can restore the previous state.
(When you do so many inserts that new pages need to be allocated, there is no old state for those pages.)
In WAL mode, all changes are written to the write-ahead log.
In either case, nothing is actually written until the amount of data overflows the page cache (which has a size of about 2 MB by default).
So the size of a transaction is not limited by memory, only by disk space.
in bulk insert query, command.ExecuteNonQuery() returns no of rows effected by insert,update or delete statement.
In your case after each successful insert , it will return 1 as integer.
if you are not using a transaction query,Commit doesn't make any sense. If you're not explicitly using a transaction, changes are committed automatically.

Why is an implicit table lock being released prior to end of transaction in RedShift?

I have an ETL process that is building dimension tables incrementally in RedShift. It performs actions in the following order:
Begins transaction
Creates a table staging_foo like foo
Copies data from external source into staging_foo
Performs mass insert/update/delete on foo so that it matches staging_foo
Drop staging_foo
Commit transaction
Individually this process works, but in order to achieve continuous streaming refreshes to foo and redundancy in the event of failure, I have several instances of the process running at the same time. And when that happens I occasionally get concurrent serialization errors. This is because both processes are replaying some of the same changes to foo from foo_staging in overlapping transactions.
What happens is that the first process creates the staging_foo table, and the second process is blocked when it attempts to create a table with the same name (this is what I want). When the first process commits its transaction (which can take several seconds) I find that the second process gets unblocked before the commit is complete. So it appears to be getting a snapshot of the foo table before the commit is in place, which causes the inserts/updates/deletes (some of which may be redundant) to fail.
I am theorizing based on the documentation http://docs.aws.amazon.com/redshift/latest/dg/c_serial_isolation.html where it says:
Concurrent transactions are invisible to each other; they cannot detect each other's changes. Each concurrent transaction will create a snapshot of the database at the beginning of the transaction. A database snapshot is created within a transaction on the first occurrence of most SELECT statements, DML commands such as COPY, DELETE, INSERT, UPDATE, and TRUNCATE, and the following DDL commands :
ALTER TABLE (to add or drop columns)
CREATE TABLE
DROP TABLE
TRUNCATE TABLE
The documentation quoted above is somewhat confusing to me because it first says a snapshot will be created at the beginning of a transaction, but subsequently says a snapshot will be created only at the first occurrence of some specific DML/DDL operations.
I do not want to do a deep copy where I replace foo instead of incrementally updating it. I have other processes that continually query this table so there is never a time when I can replace it without interruption. Another question asks a similar question for deep copy but it will not work for me: How can I ensure synchronous DDL operations on a table that is being replaced?
Is there a way for me to perform my operations in a way that I can avoid concurrent serialization errors? I need to ensure that read access is available for foo so I can't LOCK that table.
OK, Postgres (and therefore Redshift [more or less]) uses MVCC (Multi Version Concurrency Control) for transaction isolation instead of a db/table/row/page locking model (as seen in SQL Server, MySQL, etc.). Simplistically every transaction operates on the data as it existed when the transaction started.
So your comment "I have several instances of the process running at the same time" explains the problem. If Process 2 starts while Process 1 is running then Process 2 has no visibility of the results from Process 1.

Is a commit needed on a select query in DB2?

I have a vendor reporting product executing queries to pull report data, no inserts, no updates just reading data.
We have double our heap size 3 times and are now at 1024 4k pages, The app will run fine for a week then we will begin to see DB2 SQL error: SQLCODE: -954, SQLSTATE: 57011 indicating the transaction log is not able to accomodate the request.
Its not the size of the reports since they run fine after a recycle. I spoke with another DBA on this. He believe the problem was in a difference between ORACLE and DB2 in that the vendor code is crappy and not issuing commits on the selects. This is causing the references to not be cleaned up and is slowly accumulating as garbage in the heap.
I wanted to know if this is accurate as I thought only inserts and updates needed to have commits included. Is there any IBM documentation on this?
We are currently recycling on a weekly basis to alleviate the problem but I would like to have a good handle on the issue before going back to the vendor asking them to alter their code.
Any transaction needs to be properly terminated -- why did you think that only applies to inserts and updates? Consider running transactionally a "select a from b where c > 12" and then "select a from b where c <= 12"; within a transaction the DB has to guarantee that every a gets returned exactly once either from the first or second select, not both (assuming c is never null;-). Without transactionality, some a's might fall between the cracks or be returned twice if their corresponding c was changed by a different transaction, and that's just not ACID!-)
So when you do not need separate SELECT queries to be transactional wrt each other, tell the DB! And the way you tell, is by terminating the transaction after each select (normally commit is what you use for the purpose, though I guess you could, indifferently, choose to use rollback here;-).
Per Alex's response, the first SQL activity after any CONNECT, COMMIT, or ROLLBACK initiates a transaction.
To get a handle on your resource issue (transaction logs full), you should investigate your application that issues the reports - ensure that transactions are being closed out explicitly in code. I've seen cases where application developers rely upon the Garbage Collector to clean up database objects - while those objects are waiting for cleanup, the database resources (transactions) are held open.
It's always good practice to explicitly COMMIT or ROLLBACK your transactions as soon as you are done with the data - regardless of the programming methodology you use.
I get this error when committing transaction on a SELECT query, but despite the error it does return a Result-Set that include queried data.
tran.Commit();
error [hy011] [ibm] cli0126e the operation is invalid sqlstate=hy011
I changed my code to tran.Rollback(); and the error disapered.
Can anyone explain this behavior?

Update a newly created row before final commit

insert into XYZ(col1, col2) values (1,2)
update XYZ set ... where col1 = 1
COMMIT
As in can see in the above code, we havent yet commited our insert statement, and we performed an update operation on the same row, and finally we commit the whole batch.
What exactly would happen in this case? Are there any chances of losing data in this scenario?
your session is always able to see its own modifications, even before you issue a commit.
the newly inserted row would by updated.
The only way you can "lose data" would be an interruption before the commit, in which case no operations would happen at all
The important words in Vincent's response are "your session".
A separate session will only see the unmodified data until you commit. That's part of read consistency means.
Depending on the frameworks and tools you're using, your session may get a lock on the record when you perform the update, preventing other sessions from updating it until you commit or rollback.
For further reading, here is a link to the "Data Concurrency and Consistency" section of the excellent Oracle Concepts Guide 10gR2
http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/consist.htm
Infact All transactions are stored in Rollback Segmant with in Table space memory of that particular instance.. A Rollback segment is a storage space within a table space that holds transaction information used to guarantee data integrity during a ROLLBACK and used to provide read consistency across multiple transactions.

SQL Transactions (b)locking Selects - is my understanding correct

We are using ADO.NET to connect to a SQL 2005 server, and doing a number of inserts/updates and selects in it. We changed one of the updates to be inside a transaction however it appears to (b)lock the entire table when we do it, regardless of the IsolationLevel we set on the transaction.
The behavior that I seem to see is that:
If you have no transactions then it's an all out fight (losers getting dead locked)
If you have a few transactions then they win all the time and block all others out unless
If you have a few transactions and you set something like nolock on the rest then you get transactions and nothing blocked. This is because every statement (select/insert/delete/update) has an isolationlevel regardless of transactions.
Is this correct?
The answer to your question is: It depends.
If you are updating a table, SQL Server uses several strategies to decide how many rows to lock, row level locks, page locks or full table locks.
If you are updating more than a certain percentage of the table (configurable as I remember), then SQL Server gives you a table level lock, which may block selects.
The best reference is:
Understanding Locking in SQL Server:
http://msdn.microsoft.com/en-us/library/aa213039(SQL.80).aspx
(for SQL Server 2000)
Introduction to Locking in SQL Server: http://www.sqlteam.com/article/introduction-to-locking-in-sql-server
Isolation Levels in the Database Engine: http://msdn.microsoft.com/en-us/library/ms189122.aspx (for SQL server 2008, but 2005 version is available).
Good luck.
Your update statement (i.e one that changes data) will hold locks regardless of the isolation level and whether you have explicitly defined a transaction of not.
What you can control is the granularity of the locks by using query hints. So if the update is locking the entire table, then you can specify a query hint to only lock the affected rows (ROWLOCK hint). That is unless your query is updating the whole table of course.
So to answer your question, the first connection to request locks on a resource will hold those locks for the duration of the transaction. You can specify that a select does not hold locks by using the read uncommitted isolation level, statements that change data insert/update/delete always hold locks regardless. The next connection to request locks on the same resource will wait until the first has finished and will then hold its locks. Dead locking is a specific scenario where two connections are holding locks and each is waiting for the other connection's resource, to avoid the engine waiting forever, one connection is chosen as the deadlock victim.