I want to run a series of SQL INSERT statements.
The problem is I want an all-or-nothing approach. Either they all execute or if one of them doesn't execute then no changes are made to the database.
The only solution i can think of is using a conditional loop but that would mean a lot of redundant code (determining changes made, dropping tables etc)
Is there a simpler solution?
I have searched extensively for a solution, but didnt find any similar questions, so apologies if it has been asked before
You need to use a Transaction, you can find an MSDN example here.
Fortunately, a good database is one that has the four ACID properties - Atomic, Consistent, Integrated and Durable.
The first property- Atomic - refers to the behavior of transactions where either the entire transaction is commited at a time or no changes take place at all.
Read Korth's book "Database System Concepts" for further reference.
If you are using an outstanding database like Oracle, MS SQL Server, MySQL, DB2, etc., all you have to do is study a little bit about how they handle transactions and place your DML statements within these transactions.
Find out about Oracle's transactions support here.
P.S.- Looks like you're working in banking domain. These people are hell-bent on these things.
Related
I just realized I've had a headache for years. Well, metaphorically speaking. In reality I was looking at my database structure and somehow just realized I never use transactions. Doh.
There's a lot of data on the internet about transactions (begin transaction, rollback, commit, etc.), but surprisingly not much detail about exactly why they are vital, and just exactly how vital?
I understand the concept of handling if something goes wrong. This made sense when one is doing multiple updates, for example, in multiple tables in one go, but this is bad practice as far as I know and I don't do this. All of my queries just update one table. If a query errors, it cancels, transaction or no transaction. What else could go wrong or potentially corrupt a one table update, besides my pulling the plug out of my server?
In other words, my question is,
exactly how vital is it that i implement transactions on all of my tables - I am fully blasphemous for not having them, or does it really matter that much?
UPDATE
+1 to invisal, who pointed out that queries are automatically wrapped as transactions, which I did not know. Pointed out multiple good references on the subject of my question.
This made a lot of sense when one is doing multiple updates, for
example, in multiple tables in one go. But basically all of my queries
just update one table at a time. If a query errors, it cancels,
transaction or no transaction.
In your case, it does nothing. A single statement has its own transaction itself. For more information you can read the existed question and answers:
What does a transaction around a single statement do?
Transaction necessary for single update query?
Do i need transaction for joined query?
Most important property of the database is to keep your data, reliably.
Database reliability is assured by conforming to ACID principles (Atomicity, Consistency, Isolation, Durability). In the context of databases, a single logical operation on the data is called a transaction. Without transactions, such reliability would not be possible.
In addition to reliability, using transactions properly lets you improve performance of some data operations considerably. For example, you can start transaction, insert a lot of data (say 100k rows), and only then commit. Server does not have to actually write to disk until commit is called, effectively batching data in memory. This allows to improve performance a lot.
You should be aware that every updating action against your database is performed inside a transaction, even if only 1 table (SQL server automatically creates a transaction for it).
The reason for always doing transactions is to ensure ACID as others have mentioned. Here I'd like to elaborate on the isolation point. Without transaction isolation, you may have problems with: read uncommitted, unrepeatable read, phantom read,..
it depends if you are updating one table and one row, then the only advantage is going to be in the logging... but if you update multiple row in a table at one time... without transactions you could still run into somecurruption
Well it depends, SQL is most of the times used for supporting data for some host languages like c, c++, java, php, c# and others. Well I have not worked with much technologies.. but if you are using following combinations then here is my point of view:
SQL with C / C++ : Commit Required
SQL with Java : Not Required
SQL with C# : Not Required
SQL with PHP : Not Required
And it also depends which SQL you are using. It would also depend from different flavors of SQL like Oracle SQL, SQL Server, SQLite, MySQL etc...
When you are using Oracle SQL in its console, like Oracle 11g, Oracle 10g etc... COMMIT is required.
And as far as corruption of table and data is concerned. YES it happens, I had a very bad experience with it. So, if you pull out your wire or something while you are updating in your table, then you might end up with a massive disaster.
Well concluding, I will suggest you to do commit.
I'm currently writing an application that uses SQLite. SQLite doesn't have the ON UPDATE function for timestamps so I wrote my first trigger for it. Then I wrote a trigger to add the current timestamp on the modified and created fields on an insert.
The problem came when I went and deleted the setting of the modified/created fields for the insert. I felt like I was hiding something from developers that might see my code in the future. It could be a source of confusion.
How will they know that the sql is coming from a trigger. Should I comment it? Is it bad practice?
As a rule of thumb, triggers are meant to implement SQL functional rules, such as inclusions, exclusions, partitions etc.
This kind of thing belongs to the model and should be implemented as triggers whenever it is possible. It has to be delivered with the database otherwise the model would be broken.
Regarding to your case, it is more a hack than anything. If you can't do differently, do it and then add a comment like you said. But it should remain an exception.
Keep in mind that almost everything a trigger is doing could be done at the application layer (whichever you want)
Good observation. There are some things only triggers can do. However I suggest that if there is any alternative to using a trigger then use the alternative. I'm not familiar with SQLite, but in any other database I would use a DEFAULT rather than a trigger to timestamp a new record. To capture updated date I would enclose this in a stored procedure, or whatever database side logic you have (kind of what RandomUs1r suggested). I might consider a trigger but only for very basic operations.
You are correct that triggers can be confusing and difficult to debug.
"I felt like I was hiding something from developers..." - this is a very good point. I've came across many developers who use ##Identity and were genuinely shocked that if somebody put a trigger on the table which inserted another row, they'd end up with the wrong identity. (As opposed to SCOPE_IDENTITY() - I know these are sql server specific, but that's pretty much all I know...)
It is hidden - other than documentation I'm not sure you can make it more visible either.
Which is why many avoid them where possible - I guess if there's no easy way around using them in some cases then as long as its well documented, etc. I think like cursors, although scorned by many they can be very powerful and useful...but if they can be avoided probably for the best.
On the code that modifies the record, to get the current timestamp... in SQLLite, try:
DATETIME('NOW')
I was just wondering if a storage engine type existed that allowed you to do version control on row level contents. For instance, if I have a simple table with ID, name, value, and ID is the PK, I could see that row 354 started as (354, "zak", "test")v1 then was updated to be (354, "zak", "this is version 2 of the value")v2 , and could see a change history on the row with something like select history (value) where ID = 354.
It's kind of an esoteric thing, but it would beat having to keep writing these separate history tables and functions every time a change is made...
It seems you are looking more for auditing features. Oracle and several other DBMS have full auditing features. But many DBAs still end up implementing trigger based row auditing. It all depends on your needs.
Oracle supports several granularities of auditing that are easy to configure from the command line.
I see you tagged as MySQL, but asked about any storage engine. Anyway, other answers are saying the same thing, so I'm going delete this post as originally it was about the flashback features.
Obviously you are really after a MySQL solution, so this probably won't help you much, but Oracle has a feature called Total Recall (more formally Flashback Archive) which automates the process you are currently hand-rolling. The Archive is a set of compressed tables which are populated with changes automatically, and queryable with a simple AS OF syntax.
Naturally being Oracle they charge for it: it needs an additional license on top of the Enterprise Edition, alas. Find out more (PDF).
Oracle and Sql Server both call this feature Change Data Capture. There is no equivalent for MySql at this time.
You can achieve similar behavior with triggers (search for "triggers to catch all database changes") - particularly if they implement SQL92 INFORMATION_SCHEMA.
Otherwise I'd agree with mrjoltcola
Edit: The only gotcha I'd mention with MySQL and triggers is that (as of the latest community version I downloaded) it requires the user account have the SUPER privilege, which can make things a little ugly
CouchDB has full versioning for every change made, but it is part of the NOSQL world, so would probably be a pretty crazy shift from what you are currently doing.
The wikipedia article on google's bigtable mentions that it allows versioning by adding a time dimension to the tables:
Each table has multiple dimensions
(one of which is a field for time,
allowing versioning).
There are also links there to several non-google implementations of a bigtable-type dbms.
I think Big table, the Google DB engine, does something like that : it associate a timestamp with every update of a row.
Maybe you can try Google App Engine.
There is a Google paper explaining how Big Table works.
The book Refactoring Databases has some insights on the matter.
But it also points out there is no real solution currently, other then carefully making changes and managing them manually.
One approximation to this is a temporal database - which allows you to see the status of the whole database at different times in the past. I'm not sure that wholly answers your question though; it would not allow you to see the contents of Row 1 at time t1 while simultaneously letting you look at the contents of Row 2 at a separate time t2.
"It's kind of an esoteric thing, but it would beat having to keep writing these separate history tables and functions every time a change is made..."
I wouldn't call audit trails (which is obviously what you're talking of) an "esoteric thing" ...
And : there is still a difference between the history of database updates, and the history of reality. Historical database tables should really be used to reflect the history of reality, NOT the history of database updates.
The history of database updates is already kept by the DBMS in its logs and journals. If someone needs to inquire the history of database upates, then he/she should really resort to the logs and journals, not to any kind of application-level construct that can NEVER provide sufficient guarantee that it reflects ALL updates.
I'm using locks and transactions a bit like a VBA excel programmer trying to write a multithreaded c++ server for the first time...
I've tried to ask my coworkers for advice, but while we're all quite good (or so we think) at designing complex databases, writing fast and efficient queries, using index and constraints when it's needed, and so on, none of us has a good knowledge of this topic.
All the online resources I've found are either syntaxical references, or dummy tutorials explaining that a transaction begins with 'begin tran' and ends with a commit or with a rollback.
I've browsed SO too without success.
What I'm looking for is a list of simple real world problems, along with the right way to solve them.
Example :
Let's say I've got a table with one Active bit column, and that I don't want to have two active rows at the same time. Of course, many processes can try to insert data at the same time.
should I lock the whole table ?
or maybe use a data constraint so that an insert of a second "Active" row will fail ?
or use a transaction with the repeatable read isolation level ?
or maybe write :
update tbFoo set Active=0
insert into tbFoo (foo, Active) select 'foo',1 where not exists (select * from tbFoo where Active=1)
Please don't comment/answer on this specific problem and on my silly suggestions. I'm just trying to pinpoint the fact that I don't have a clue :)
Where can I find some good walkthroughs on simple yet relevant locking situations? If it makes a difference, I'm using SQL Server 2008
I'm also curious of knowing if other people feel the same way I do on this topics.
Q&A from Chas Boyd
Kalen Delaney article
And of course, good old BOL
For SQL 2008, you can probably use MERGE to achieve the update and insert in one transaction.safe statement.
I've not used it yet so I'd do one of these:
- A trigger to check after the fact (so it's in the same transaction)
- TABLOCKX lock for the MERGE
In this case though, check this thread (and my answer of course :)
The best bet in my opinion if you really need to understand how locks and transactions work in SQL server, is to find out how the SQL server engine works internally.
I found the book “Inside Microsoft® SQL Server® 2008: T-SQL Programming” to be the most comprehensive guide.
The section “Handle transactions, concurrency, and error handling” should help you with the specific issue you are having.
However i found that i learned little nuggets from all sections of the book, all of which have made life that bit easier for me when dealing with SQL server.
I know it seems almost nostalgic to buy a paper book these days, but this is the one i would recommend, as I found it difficult to get a comprehensive guide online!
You need to get deep into the product specific documentation to understand this.
It is product specific, and what works in one data does not work in another. That said, for most cases careful choice of isolation level will be sufficient, it is only when you start to really push things that more use of query hints will be needed.
For SQL Server, "Inside SQL Server 2000" (MSPress) is where I got enough detail to know most of the time I do not need that level of detail (and also working with people with decades of Oracle specialisation).
We have a bit of a messy database situation.
Our main back-office system is written in Visual Fox Pro with local data (yes, I know!)
In order to effectively work with the data in our websites, we have chosen to regularly export data to a SQL database. However the process that does this basically clears out the tables each time and does a re-insert.
This means we have two SQL databases - one that our FoxPro export process writes to, and another that our websites read from.
This question is concerned with the transform from one SQL database to the other (SqlFoxProData -> SqlWebData).
For a particular table (one of our main application tables), because various data transformations take places during this process, it's not a straightforward UPDATE, INSERT and DELETE statements using self-joins, but we're having to use cursors instead (I know!)
This has been working fine for many months but now we are starting to hit upon performance problems when an update is taking place (this can happen regularly during the day)
Basically when we are updating SqlWebData.ImportantTable from SqlFoxProData.ImportantTable, it's causing occasional connection timeouts/deadlocks/other problems on the live websites.
I've worked hard at optimising queries, caching etc etc, but it's come to a point where I'm looking for another strategy to update the data.
One idea that has come to mind is to have two copies of ImportantTable (A and B), some concept of which table is currently 'active', updating the non-active table, then switching the currenly actice table
i.e. websites read from ImportantTableA whilst we're updating ImportantTableB, then we switch websites to read from ImportantTableB.
Question is, is this feasible and a good idea? I have done something like it before but I'm not convinced it's necessarily good for optimisation/indexing etc.
Any suggestions welcome, I know this is a messy situation... and the long term goal would be to get our FoxPro application pointing to SQL.
(We're using SQL 2005 if it helps)
I should add that data consistency isn't particularly important in the instance, seeing as the data is always slightly out of date
There are a lot of ways to skin this cat.
I would attack the locking issues first. It is extremely rare that I would use CURSORS, and I think improving the performance and locking behavior there might resolve a lot of your issues.
I expect that I would solve it by using two separate staging tables. One for the FoxPro export in SQL and one transformed into the final format in SQL side-by-side. Then either swapping the final for production using sp_rename, or simply using 3 INSERT/UPDATE/DELETE transactions to apply all changes from the final table to production. Either way, there is going to be some locking there, but how big are we talking about?
You should be able to maintain one db for the website and just replicate to that table from the other sql db table.
This is assuming that you do not update any data from the website itself.
"For a particular table (one of our main application tables), because various data transformations take places during this process, it's not a straightforward UPDATE, INSERT and DELETE statements using self-joins, but we're having to use cursors instead (I know!)"
I cannot think of a case where I would ever need to perform an insert, update or delete using a cursor. If you can write the select for the cursor, you can convert it into an insert, update or delete. You can join to other tables in these statements and use the case stament for conditional processing. Taking the time to do this in a set -based fashion may solve your problem.
One thing you may consider if you have lots of data to move. We occassionally create a view to the data we want and then have two tables - one active and one that data will be loaded into. When the data is finsihed loading, as part of your process run a simple command to switch the table the view uses to the one you just finshed loading to. That way the users are only down for a couple of seconds at most. You won't create locking issues where they are trying to access data as you are loading.
You might also look at using SSIS to move the data.
Do you have the option of making the updates more atomic, rather than the stated 'clear out and re-insert'? I think Visual Fox Pro supports triggers, right? For your key tables, can you add a trigger to the update/insert/delete to capture the ID of records that change, then move (or delete) just those records?
Or how about writing all changes to an offline database, and letting SQL Server replication take care of the sync?
[sorry, this would have been a comment, if I had enough reputation!]
Based on your response to Ernie above, you asked how you replicate databases. Here is Microsoft's how-to about replication in SQL2005.
However, if you're asking about replication and how to do it, it indicates to me that you are a little light in experience for SQL server. That being said, it's fairly easy to muck things up and while I'm all for learning by experience, if this is mission critical data, you might be better off hiring a DBA or at the very least, testing the #$##$% out of this before you actually implement it.