What's the best way to insert/update/delete multiple records in a database from an application? - sql

Given a small set of entities (say, 10 or fewer) to insert, delete, or update in an application, what is the best way to perform the necessary database operations? Should multiple queries be issued, one for each entity to be affected? Or should some sort of XML construct that can be parsed by the database engine be used, so that only one command needs to be issued?
I ask this because a common pattern at my current shop seems to be to format up an XML document containing all the changes, then send that string to the database to be processed by the database engine's XML functionality. However, using XML in this way seems rather cumbersome given the simple nature of the task to be performed.

It depends on how many you need to do, and how fast the operations need to run. If it's only a few, then doing them one at a time with whatever mechanism you have for doing single operations will work fine.
If you need to do thousands or more, and it needs to run quickly, you should re-use the connection and command, changing the arguments for the parameters to the query during each iteration. This will minimize resource usage. You don't want to re-create the connection and command for each operation.

You didn't mention what database you are using, but in SQL Server 2008, you can use table variables to pass complex data like this to a stored procedure. Parse it there and perform your operations. For more info, see Scott Allen's article on ode to code.

Most databases support BULK UPDATE or BULK DELETE operations.

From a "business entity" design standpoint, if you are doing different operations on each of a set of entities, you should have each entity handle its own persistence.
If there are common batch activities (like "delete all older than x date", for instance), I would write a static method on a collection class that executes the batch update or delete. I generally let entities handle their own inserts atomically.

The answer depends on the volume of data you're talking about. If you've got a fairly small set of records in memory that you need to synchronise back to disk then multiple queries is probably appropriate. If it's a larger set of data you need to look at other options.
I recently had to implement a mechanism where an external data feed gave me ~17,000 rows of dta that I needed to synchronise with a local table. The solution I chose there was to load the external data into a staging table and call a stored proc that did the synchronisation completely within the database.

Related

How do you translate old SQL database data to a new table layout?

We have and old database with a poorly thought out table structure, virtually no relationships setup and no naming schemes. I've created a new database with a clean relational data structure that implements proper design practices.
I'm looking for advice on different methods to migrate the old data over to the new format. This will require a lot of data re-shaping which won't be fun. The data is heavily accessed and the challenge will be to keep both databases in sync for all relevant data (accounts, important services etc).
I thought triggers might be the way to go here - but maybe there is a different method that I am unaware of (maybe MS Sync Framework, or a code-level data adapter which will be more work because there is so much data access code spread all over the place, classic ASP and .Net over dozens of projects). The database in question is SQL Server 2005, running in SQL Server 2000 compatibility mode.
I think the way to go is to write a stored procedure in the new database, which will actually pull your delta changes (only the modifications that were done from the last run to the instant the stored proc is run), and put this stored procedure in the sql agent job.
Configure the sql agent job to run for every 15 minutes and let the data sync in.
disadvantages of using triggers in this scenario
triggers will reduce the performance, as the sql server will execute the trigger code as well along with the update/ insert /delete statements and includes these as part of the execution at every time, i.e. if your trigger code takes 2 seconds to execute and the update statement with no trigger takes 2 seconds to execute, then the update time will be increased to 4 seconds with trigger in place. So employing triggers in this case might result in huge performance bottle neck.
I'm dealing with the same situation at my work, and I'm currently writing an application to do the migration. The original database has no established relationships, so it's really like a set of disconnected spreadsheets. By building my own application, I'm able to migrate the data using newly-established foreign keys, and assign data-specific defaults in place of nulls.

Triggers vs stored procedure vs inline SQL in DAL for updating tables in a database with history

Say if I have a database with history tables:
[SomeTable]
SomeColumn1
SomeColumn2
UpdatedDTM
UpdatedUserID
IsObselete
[SomeTableHistory]
SomeColumn1
SomeColumn2
UpdatedDTM
UpdatedUserID
AuditedDTM
When a row in SomeTable is updated, the following needs to happen:
The row in SomeTable needs inserting into SomeTableHistory.
The row in SomeTable needs updating with the new values.
The UpdatedDTM column in SomeTable needs setting to the current time.
My first thought was to use stored procedures:
add_sometable_entry(SomeColumn1, SomeColumn2, UserID)
update_sometable_entry(ID, SomeColumn1, SomeColumn2, UserID)
expire_sometable_entry(ID, UserID)
Then I wondered if maybe I should use triggers instead, allowing the usual "insert into sometable" and "update sometable" sql calls to work on SomeTable and with the history mechanisms working automatically.
Or course then there is the option of just inlining the sql to do this for each history table within the DAL.
I'm currently leaning towards the stored procedures so I can keep the DAL clean, and also only allowing insert/update/delete access to the database via the stored procedures which will help to stop distributors/customers from "having a play" and manipulating the tables directly.
What are peoples thoughts/experience on this?
I'm using PostgreSQL (though that should have no bearing, should it?..)
AFAIC, the users should not be able to insert/update/delete the tables directly (unless you want chaos and complete loss of data/referential Integrity. That should be allowed from stored procedures only, which ensure that the whatever Transactions (business functions) are allowed, are executed within the context of a Transaction; Atomic as per ACID Properties; that it is correct and complete; errors are handled consistently; etc.
Triggers are incapable of that.
Now, with a Historic or Audit requirement, which should be transactional, that is merely adding a few lines to the existing Transactions/stored procs. It would absurd to deploy half the code in stored procs and the rest in triggers; and if you do the trigger code unnecessarily complicates the otherwise clean transaction code.
The only circumstance where triggers are worth considering is, where you have a Non-SQL, eg. no Transactions and no stored procs (if it had either one and not the other, I would still normalise the code segments). In that circumstance, sure, deploy the code for maintaining historic tables in triggers, and the rest elsewhere.
Of course, the business logic that relates to the database, should be in the database, not outside it. Eg. in the real world beyond tiny Non-SQLs, a single corporate database may be used by five apps: all the validation and the transactions for the Db reside in one place, in the Db. It would be stupid to place that anywhere outside the Db.
There is a separate requirement, that the DAL should not attempt invalid actions, which waste server resources; and therefore it must check or validate every action before attempting it. That is not "duplicating" such validation code which may exist at the top of each transaction; it saves both the user interaction time and server resources.
Triggers are powerful, but tend to be a pain, as they are hard to test/debug.
You could argue that Stored Procedures would be better.
But if you're going to be writing 'code', you could argue that (rather than in an SP) its much nicer to put code in the DAL, where its easier to get version control, unit testing, debugging etc.
Finally, if you look at 'all updates to (someentity) should be recorded as (someentityhistory)' as a business rule (aka domain rule), then you could argue that in your business-logic-layer (aka domain logic layer) you should have code that implements this rule. So that would move the code up above the DAL into the Business-Layer. (so that it would be dealing with entity objects rather than SQL).
One factor to bear in mind is: if someone is importing data or doing bulk data updates, are they going to do it via the business layer (through an API or something) or with direct database access? If its direct database access, and you still want this rule to fire, then that might be an argument for triggers.
In summary: you could argue.
I'm not a big fan of triggers in general, but this is one usage of them that I wouldn't object to.
However, there are other benefits to stored procedures as you have said to enforce only permitted, valid updates to the tables. And if you are going to have stored procedures for those reasons then they may as well do the auditing while they are about it!
For storing the history, the tablelog addon might be helpful.
http://pgfoundry.org/projects/tablelog/

What is the best way to log all user request operations: (inserts, updates, deletes in Sql Server 2008?

I have a database with 50 tables and I want to log users requests, such as inserts, updates or deletes on all the tables in the database. I can also create a trigger for this for each request type.
What is the best way to do this from a performance perspective or is there a better way to track this?
You can also create audit tables which are populated by triggers (and which allow much more flexibility than change data capture). The critical component is to capture sets of data not try to work row-by-row. It does add some overhead yes, but if you write the triggers correctly, it isn't that much. Be sure to capture who (including which application if you have multiple applications hitting the database) and when as well as the old and new values. Set up one audit table per table you want audited (too much locking if you use only one audit table). And at the time you set up your system, write the code to get data back from a bad transaction or set of transactions. That makes it easier to recover when you do have something go wrong and you need to revert. We use two tables per table audited, one contains the info about the process that did the changes (name of the application, date, user, etc. and an auditid), the other contains the details about what was changed (old and new values, ID of the record being affected and column affected). Our structure enables us to use the same structure for each table being audited, and allows the tables to change without having to change the audit table and allows us to easily script the audit tables for a new tables. It is also easy for us to see what records were changed at the same time or in the same process or to find out which of the many applications which touch our database was responsible for the bad data as well as telling us who in particular was responsible for the bad data. This helps us track down application bugs and find out why the data was changed the way it was in some cases. It also makes it easier for us to track down all the data that was affected by a broken process rather than just the one we knew about.
If you have Enterprise Edition, look into Change Data Capture. If you don't have Enterprise and aren't interested in capturing the historical values of the columns that change, look into Change Tracking.
See Comparing Change Data Capture and Change Tracking to understand the differences between the two.
Assuming all requests to insert, update and/or delete data goes through some middle-tier data access layer, I would suggest you do your logging there. This is where we do all of ours. It is much simpler than trying to extract the actual insert / delete / update statements out of SQL Server.
If you want to do auditing of data, you can look into Change Data Capture (CDC). But this requires the Enterprise Edition.

Best practices for writing SQL scripts for deployment

I was wondering what are the best practices in order to write SQL scripts to set up databases for production and/or development, for instance:
Should I include the CREATE DATABASE statement?
Should I create users for the database in the same script?
Is correct to disable FK check before executing the body of the script?
May I include the hole script in a transaction?
Is better to generate 1 script per database than one script for all of them?
Thanks!
The problem with your question is is hard to answer as it depends on the way the scripts are used in what you are trying to achieve. you also don't say which DB server you are using as there are tools provided which can make some tasks easier.
Taking your points in order, here are some suggestions, which will probably be very different to everyone elses :)
Should I include the CREATE DATABASE
statement?
What alternative are you thinking of using? If your question is should you put the CREATE DATABASE statement in the same script as the table creation it depends. When developing DB I use a separate create DB script as I have a script to drop all objects and so I don't need to create the database again.
Should I create users for the database in the same script?
I wouldn't, simply because the users may well change but your schema has not. Might as well manage those changes in a smaller script.
Is correct to disable FK check before executing the body of the script?
If you are importing the data in an attempt to recover the database then you may well have to if you are using auto increment IDs and want to keep the same values. Also you may end up importing the tables "out of order" an not want checks performed.
May I include the whole script in a transaction?
Yes, you can, but again it depends on the type of script you are running. If you are importing data after rebuilding a db then the whole import should work or fail. However, your transaction file is going to be huge during the import.
Is better to generate 1 script per database than one script for all of them?
Again, for maintenance purposes it's probably better to keep them separate.
This probably depends what kind of database and how it is used and deployed. I am developing a n-tier standard application that is deployed at many different customer sites.
I do not add a CREATE DATABASE statement in the script. Creating the the database is a part of the installation script which allows the user to choose server, database name and collation
I have no knowledge about the users at my customers sites so I don't add create users statements also the only user that needs access to the database is the user executing the middle tire application.
I do not disable FK checks. I need them to protect the consistency of the database, even if it is I who wrote the body scripts. I use FK to capture my errors.
I do not include the entire script in one transaction. I require from the users to take a backup of the db before they run any db upgrade scripts. For creating of a new database there is nothing to protect so running in a transaction is unnecessary. For upgrades there are sometimes extensive changes to the db. A couple of years ago we switched from varchar to nvarchar in about 250 tables. Not something you would like to do in one transaction.
I would recommend you to generate one script per database and version control the scripts separately.
Direct answers, please ask if you need to expand on any point
* Should I include the CREATE DATABASE statement?
Normally I would include it since you are creating and owning the database.
* Should I create users for the database in the same script?
This is also a good idea, especially if your application uses specific users.
* Is correct to disable FK check before executing the body of the script?
If the script includes data population, then it helps to disable it so that the order is not too important, otherwise you can get into complex scripts to insert (without fk link), create fk record, update fk column.
* May I include the hole script in a transaction?
This is normally not a good idea. Especially if data population is included as the transaction can become quite unwieldy large. Since you are creating the database, just drop it and start again if something goes awry.
* Is better to generate 1 script per database than one script for all of them?
One per database is my recommendation so that they are isolated and easier to troubleshoot if the need arises.
For development purposes it's a good idea to create one script per database object (one script for each table, stored procedure, etc). If you check them into your source control system that way then developers can check out individual objects and you can easily keep track of versions and know what changed and when.
When you deploy you may want to combine the changes for each release into one single script. Tools like Red Gate SQL compare or Visual Studio Team System will help you do that.
Should I include the CREATE DATABASE statement?
Should I create users for the database in the same script?
That depends on your DBMS and your customer.
In an Oracle environment you will probably never be allowed to do such a thing (mainly because in the Oracle world a "database" is something completely different than e.g. in the PostgreSQL or MySQL world).
Sometimes the customer will have a DBA that won't let you create databases (or schemas or users - depending on the DBMS in use). So you will need to supply that information to the DBA in order for him/her to prepare the environment for your script.
May I include the hole script in a transaction?
That totally depends on the DBMS that you are using.
Some DBMS don't support transactional DDL and will implicitely commit any transaction when you execute a DDL statement, so you need to consider the order of your installation script.
For populating the tables with data I would definitely try to do that in a single transaction, but again this depends on your DBMS.
Some DBMS are faster if you commit only once or very seldomly (Oracle and PostgreSQL fall into this category) but will slow down if you commit more often.
Other DBMS handle smaller but more transactions better and will slow down if the transactions get too big (SQL Server and MySQL tend to fall into that direction)
The best practices will differ considerably on whether it is the first time set-up or a new version being pushed. For the first time set-up yes you need create database and create table scripts. For a new version, you need to script only the changes from the previous version, so no create database and no create table unless it is a new table. Now you need alter table statements becasue you don't want to lose the existing data. I do usually write stored procs, functions and views with a drop and create statment as dropping those pbjects doesn't generally affect the underlying data.
I find it best to create all database changes with scripts that are stored in source control under the version. So if a client is new, you run the create version 1.0 scripts, then apply all the other versions in order. If a client is just upgrading from version 1.2 to version 1.3, then you run just the scripts in version 1.3 source control repository. This would also include scripts to populate or add records to lookup tables.
For transactions you may want to break them up into several chunks not to leave a prod database locked in one transaction.
We also write reversal scripts to return to the old version if need be. This makes life easier if you have a part of a change that causes unanticipated problems on prod (usually performance issues).

Bulk Translation Of Table Contents

I'm currently performing a migration operation from a legacy database. I need to perform migration of millions of originating rows, breaking the original content apart into multiple destination parent / child rows.
As it's not a simple 1 to 1 migration and the the resulting rows are parent / children row based on identity generated keys, what's the best mechanism for performing the migration?
I'm assuming that I can't use bulk insert as the identity values for the child rows cannot be determined at the point of generating the script content? The only solution I can currently think of is to set the identity explicitly and then have a predetermined starting point for the import.
If anyone else has any input I'd appreciate the feedback.
This is my standard approach:
create your new data model
pull the data into the new DB unchanged
write (and run) a SQL script to perform the migration
test
(optional) drop the tables with the legacy data
You can get a long way towards migrating the data with plain SQL. For the case you described, you might not need to deal with a single Cursor to get it across.
Running the process in Query Analyzer (or an analog in your dbms), you'll have the advantage that you can wrap everything in a Transaction so that you can roll back if anything goes wacky along the way. Write it in little bits and test it in chunks, on your dev database. Once everything is working correctly, set the script loose on your production database.
Sorted.
Thanks for the suggestion but I'd prefer to produce a programmatic solution. I'm currently using Nant / CruiseControl to automate the tests and need something I can recreate on the fly based on the current live legacy content.