sql queries and inserts

sql queries and inserts - sql

I have a random question. If I were to do a sql select and while the sql server was querying my request someone else does a insert statement... could that data that was inputted in that insert statement also be retrieved from my select statement?

Queries are queued, so if the SELECT occurs before the INSERT there's no possibility of seeing the newly inserted data.
Using default isolation levels, SELECT is generally given higher privilege over others but still only reads COMMITTED data. So if the INSERT data has not been committed by the time the SELECT occurs--again, you wouldn't see the newly inserted data. If the INSERT has been committed, the subsequent SELECT will include the newly inserted data.
If the isolation level allowed reading UNCOMMITTED (AKA dirty) data, then yes--a SELECT occurring after the INSERT but before the INSERT data was committed would return that data. This is not recommended practice, because UNCOMMITTED data could be subject to a ROLLBACK.

If the SELECT statement is executed before the INSERT statement, the selected data will certainly not include the new inserted data.
What happens in MySQL with MyISAM, the default engine, is that all INSERT statements require a table lock; as a result, once an INSERT statement is executed, it first waits for all existing SELECTs to complete before locking the table, performs the INSERT, and then unlocks it.
For more information, see: Internal Locking Methods in the MySQL manual

No, a SELECT that is already executing that the moment of the INSERT will never gather new records that did not exist when the SELECT statement started executing.
Also if you use the transactional storage engine InnoDB, you can be assured that your SELECT will not include rows that are currently being inserted. That's the purpose of transaction isolation, or the "I" in ACID.
For more details see http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html because there are some nuances about read-committed and read-uncommitted transaction isolation modes.

I don't know particulars for MySQL, but in SQL Server it would depend on if there were any locking hints used, and the default behavior for locks. You have a couple of options:
Your SELECT locks the table, which means the INSERT won't process until your select is finished.
Your SELECT is able to do a "dirty read" which means the transaction doesn't care if you get slightly out-of-date data, and you miss the INSERT
Your SELECT is able to do a "dirty read" but the INSERT happens before the SELECT hits that row, and you get the result that was added.

The only way you do that is with a "dirty read".
Take a look at MYSql's documentation on TRANSACTION ISOLATION LEVELS to get a better understanding of what that is.

Related

SQL Server row versioning overhead on INSERT/SELECT workflow

SQL Server 2016
Workflow consists of several continuous inserts from one writer and occasional select from a separate reader which returns several rows (all same table). Insert latency is prioritized over select performance. There are no updates/deletes and the selects will never need to return rows that have recently been inserted.
Both ALLOW_SNAPSHOT_ISOLATION and READ_COMMITTED_SNAPSHOT are set to ON.
The issue is that whenever a select query is sent via SqlCommand.ExecuteReader, there is a significant spike in insert latency until SqlCommand.ExecuteReader returns with a SqlDataReader. Since insert latency is important, this degradation needs to be minimized. Select is under read committed isolation level.
Using NOLOCK table hint in the select query does not show this same spike in insert latency & given the use case of the table, dirty reads aren't a concern since they can't happen.
Using READPAST table hint gives similar results to no hint (read committed snapshot).
I haven't found anything online that explains this discrepancy. What overhead is there with read committed snapshot (current state) that impacts insert latency that is not seen when NOLOCK is used?

SELECT after INSERT doesn't select recently added rows

This is quite a basic and somewhat strange question, I guess. Suppose I have a stored procedure that contains an INSERT (or MERGE) statement, followed by a SELECT statement.
Can I always assume that the INSERT statement has finished writing/committing data when I run SELECT? Is it to be expected that the SELECT statement (sometimes) doesn't select all recently inserted rows? If so, what options do I have to make the SELECT statement wait for the INSERT statement to have finished (in a stored procedure) or include possibly uncommitted data?

If it is in the same session it will see it , whether committed or not, unless it has been rolled back.
Once committed other sessions can see it.

If 'select' is from different session and you want read uncommited data
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

In SQL Server 2005, when does a Select query block Inserts or Updates to the same or other table(s)?

In the past I always thought that select query would not blocks other insert sql. However, recently I wrote a query that takes a long time (more than 2 min) to select data from a table. During the select, a number of insert sql statements were timing out.
If select blocks insert, what would be the solution way to prevent the timeout without causing dirty read?
I have investigate option of using isolation snapshot, but currently I have no access to change the client's database to enable the “ALLOW_SNAPSHOT_ISOLATION”.
Thanks

When does a Select query block Inserts or Updates to the same or
other table(s)?
When it holds a lock on a resource that is mutually exclusive with one that the insert or update statement needs.
Under readcommitted isolation level with no additional locking hints then the S locks taken out are typically released as soon as the data is read. For repeatable read or serializable however they will be held until the end of the transaction (statement for a single select not running in an explicit transaction).
serializable will often take out range locks which will cause additional blocking over and above that caused by the holding of locks on the rows and pages actually read.

READPAST might be what you're looking for - check out this article.

How to do a massive insert

I have an application in which I have to insert in a database, with SQL Server 2008, in groups of N tuples and all the tuples have to be inserted to be a success insert, my question is how I insert these tuples and in the case that someone of this fail, I do a rollback to eliminate all the tuples than was inserted correctly.
Thanks

On SQL Server you might consider doing a bulk insert.

From .NET, you can use SQLBulkCopy.
Table-valued parameters (TVPs) are a second route. In your insert statement, use WITH (TABLOCK) on the target table for minimal logging. eg:
INSERT Table1 WITH (TABLOCK) (Col1, Col2....)
SELECT Col1, Col1, .... FROM #tvp
Wrap it in a stored procedure that exposes #tvp as parameter, add some transaction handling, and call this procedure from your app.
You might even try passing the data as XML if it has a nested structure, and shredding it to tables on the database side.

You should look into transactions. This is a good intro article that discusses rolling back and such.

If you are inserting the data directly from the program, it seems like what you need are transactions. You can start a transaction directly in a stored procedure or from a data adapter written in whatever language you are using (for instance, in C# you might be using ADO.NET).
Once all the data has been inserted, you can commit the transaction or do a rollback if there was an error.
See Scott Mitchell's "Managing Transactions in SQL Server Stored Procedures for some details on creating, committing, and rolling back transactions.

For MySQL, look into LOAD DATA INFILE which lets you insert from a disk file.
Also see the general MySQL discussion on Speed of INSERT Statements.
For a more detailed answer please provide some hints as to the software stack you are using, and perhaps some source code.

You have two competing interests, doing a large transaction (which will have poor performance, high risk of failure), or doing a rapid import (which is best not to do all in one transaction).
If you are adding rows to a table, then don't run in a transaction. You should be able to identify which rows are new and delete them should you not like how the look on the first round.
If the transaction is complicated (each row affects dozens of tables, etc) then run them in transactions in small batches.
If you absolutely have to run a huge data import in one transaction, consider doing it when the database is in single user mode and consider using the checkpoint keyword.

What does a transaction around a single statement do?

I understand how a transaction might be useful for co-ordinating a pair of updates. What I don't understand is wrapping single statements in transactions, which is 90% of what I've ever seen. In fact, in real life code it is more common in my experience to find a series of logically related transactions each wrapped in their own transaction, but the whole is not wrapped in a transaction.
In MS-SQL, is there any benefit from wrapping single selects, single updates, single inserts or single deletes in a transaction?
I suspect this is superstitious programming.

It does nothing. All individual SQL Statements, (with rare exceptions like Bulk Inserts with No Log, or Truncate Table) are automaticaly "In a Transaction" whether you explicitly say so or not.. (even if they insert, update, or delete millions of rows).
EDIT: based on #Phillip's comment below... In current versions of SQL Server, Even Bulk Inserts and Truncate Table do write some data to the transaction log, although not as much as other operations do. The critical distinction from a transactional perspective, is that in these other types of operations, the data in your database tables being modified is not in the log in a state that allows it to be rolled back.
All this means is that the changes the statement makes to data in the database are logged to the transaction log so that they can be undone if the operation fails.
The only function that the "Begin Transaction", "Commit Transaction" and "RollBack Transaction" commands provide is to allow you to put two or more individual SQL statements into the same transaction.
EDIT: (to reinforce marks comment...) YES, this could be attributed to "superstitious" programming, or it could be an indication of a fundamental misunderstanding of the nature of database transactions. A more charitable interpretation is that it is simply the result of an over-application of consistency which is inappropriate and yet another example of Emersons euphemism that:
A foolish consistency is the hobgoblin of little minds,
adored by little statesmen and philosophers and divines

As Charles Bretana said, "it does nothing" -- nothing in addition to what is already done.
Ever hear of the "ACID" requirements of a relational database? That "A" stands for Atomic, meaning that either the statement works in its entirety, or it doesn't--and while the statement is being performed, no other queries can be done on the data affected by that query. BEGIN TRANSACTION / COMMIT "extends" this locking functionality to the work done by multiple statements, but it adds nothing to single statements.
However, the database transaction log is always written to when a database is modified (insert, update, delete). This is not an option, a fact that tends to irritate people. Yes, there's wierdness with bulk inserts and recovery modes, but it still gets written to.
I'll name-drop isolation levels here too. Fussing with this will impact individual commands, but doing so will still not make a declared-transaction-wrapped query perform any differently than a "stand-alone" query. (Note that they can be very powerful and very dangeroug with multi-statement declared transactions.) Note also that "nolock" does not apply to inserts/updates/deletes -- those actions always required locks.

For me, wrapping a single statement in a transaction means that I have the ability to roll it back if I, say, forget a WHERE clause when executing a manual, one-time UPDATE statement. It has saved me a few times.
e.g.
--------------------------------------------------------------
CREATE TABLE T1(CPK INT IDENTITY(1,1) NOT NULL, Col1 int, Col2 char(3));
INSERT INTO T1 VALUES (101, 'abc');
INSERT INTO T1 VALUES (101, 'abc');
INSERT INTO T1 VALUES (101, 'abc');
INSERT INTO T1 VALUES (101, 'abc');
INSERT INTO T1 VALUES (101, 'abc');
INSERT INTO T1 VALUES (101, 'abc');
INSERT INTO T1 VALUES (101, 'abc');
SELECT * FROM T1
--------------------------------------------------------------
/* MISTAKE SCENARIO (run each row individually) */
--------------------------------------------------------------
BEGIN TRAN YOUR_TRANS_NAME_1; /* open a trans named YOUR_TRANS_NAME_1 */
UPDATE T1 SET COL2 = NULL; /* run some update statement */
SELECT * FROM T1; /* OOPS ... forgot the where clause */
ROLLBACK TRAN YOUR_TRANS_NAME_1; /* since it did bad things, roll it back */
SELECT * FROM T1; /* tans rolled back, data restored. */
--------------------------------------------------------------
/* NO MISTAKES SCENARIO (run each row individually) */
--------------------------------------------------------------
BEGIN TRAN YOUR_TRANS_NAME_2;
UPDATE T1 SET COL2 = 'CBA' WHERE CPK = 4; /* run some update statement */
SELECT * FROM T1; /* did it correctly this time */
COMMIT TRAN YOUR_TRANS_NAME_2 /* commit (close) the trans */
--------------------------------------------------------------
DROP TABLE T1
--------------------------------------------------------------

One possible excuse is that that single statement could cause a bunch of other SQL to run via triggers, and that they're protecting against something going bad in there, although I'd expect any DBMS to have the common sense to use implicit transactions in the same way already.
The other thing I can think of is that some APIs allow you to disable autocommit, and the code's written just in case someone does that.

When you start an explicit transaction and issue a DML, the resources being locked by the statement remain locked, and the results of statement are not visible from outside the transaction until you manually commit or rollback it.
This is what you may or may not need.
For instance, you may want to show preliminary results to outer world while still keeping a lock on them.
In this case, you start another transaction which places a lock request before the first one commits, thus avoiding race condition
Implicit transactions are commited or rolled back immediatley after the DML statement completes or fails.

SQL Server has a setting which allows turning autocommit off for a session. It's even the default for some clients (see https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-2017)
Depending on a framework and/or a database client you use, not putting each individual command into its own transaction might cause them to be all lumped together into a default transaction. Explicitly wrapping each of them in a transaction clearly declares the intent and actually makes sure it happens the way the programmer intended, regardless of the current autocommit setting, especially if there isn't a company-wide policy on autocommit.
If the begin tran / commit tran commands are being observed in the database (as per your comment here), it is also possible that a framework is generating them on behalf of an unsuspecting programmer. (How many developers closely inspect SQL code generated by their framework?)
I hope this is still relevant, despite the question being somewhat ancient.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas