Consider a regular web application doing mostly form-based CRUD operations over SQL database. Should there be explicit transaction management in such web application? Or should it simply use autocommit mode? And if doing transactions, is "transaction per request" sufficient?
I would only use explicit transactions when you're doing things that are actually transactional, e.g., issuing several SQL commands that are highly interrelated. I guess the classic example of this is a banking application -- withdrawing money from one account and depositing it in another account must always succeeed or fail as a batch, otherwise someone gets ripped off!
We use transactions on SO, but only sparingly. Most of our database updates are standalone and atomic. Very few have the properties of the banking example above.
I strongly recommend using transaction mode to safe data integrity because autocommit mode can cause partial data saving.
This is usually handled for me at the database interface layer - The web application rarely calls multiple stored procedures within a transaction. It usually calls a single stored procedure which manages the entire transaction, so the web application only needs to worry about whether it fails.
Usually the web application is not allowed access to other things (tables, views, internal stored procedures) which could allow the database to be in an invalid state if they were attempted without being wrapped in a transaction initiated at the connection level by the client prior to their calls.
There are exceptions to this where a transaction is initiated by the web application, but they are generally few and far between.
You should use transactions given that different users will be hitting the database at the same time. I would recommend you do not use autocommit. Use explicit transaction brackets. As to the resolution of each transaction, you should bracket a particular unit of work (whatever that means in your context).
You might also want to look into the different transaction isolation levels that your SQL database supports. They will offer a range of behaviours in terms of what reading users see of partially updated records.
It depends on how is the CRUD handling done, if and only if all creations and modifications of model instances is made in a single update or insert query, you can use autocommit.
If you are dealing with CRUD in multiple queries mode (a bad idea, IMO) then you certainly should define transactions explicitly, as these queries would certainly be 'transactionally related', you won't want to end with a half model in your database. This is relevant because some web frameworks tend to do things the 'multiple query' way for various reasons.
As for which transaction mode to use it depends on what you can support in terms of data views (ie, how current the data needs to be when seen by clients) and what you'll have to support in terms of performance.
it is better to insert/update into multiple tables in a single stored procedure. That way, there is no need to manage transactions from the web application.
Related
What is the "best" way to look for changes in tables on a SQL Server 2008 instance?
We have an external application and the user wants to be "informed", when changes happen...
Today we use triggers, but the performance is not the best.
I thougt SqlDependency (Service Broker) in combination with .NET (C# application...) is faster. Or are there any other possibilities?
Thanks in advance,
Frank
Consider using Change Tracking.
Change tracking is a lightweight solution that provides an efficient change tracking mechanism for applications. Typically, to enable applications to query for changes to data in a database and access information that is related to the changes, application developers had to implement custom change tracking mechanisms. Creating these mechanisms usually involved a lot of work and frequently involved using a combination of triggers, timestamp columns, new tables to store tracking information, and custom cleanup processes.
Synchronous change tracking will always have some overhead. However, using change tracking can help minimize the overhead. The overhead will frequently be less than that of using alternative solutions, especially solutions that require the use triggers.
If you would change your strategy to use stored procedures for altering data, your stored procedure could send change notification along with updating data.
Change notification can be implemented f.ex. as another table, watched by your application.
Say if I have a database with history tables:
[SomeTable]
SomeColumn1
SomeColumn2
UpdatedDTM
UpdatedUserID
IsObselete
[SomeTableHistory]
SomeColumn1
SomeColumn2
UpdatedDTM
UpdatedUserID
AuditedDTM
When a row in SomeTable is updated, the following needs to happen:
The row in SomeTable needs inserting into SomeTableHistory.
The row in SomeTable needs updating with the new values.
The UpdatedDTM column in SomeTable needs setting to the current time.
My first thought was to use stored procedures:
add_sometable_entry(SomeColumn1, SomeColumn2, UserID)
update_sometable_entry(ID, SomeColumn1, SomeColumn2, UserID)
expire_sometable_entry(ID, UserID)
Then I wondered if maybe I should use triggers instead, allowing the usual "insert into sometable" and "update sometable" sql calls to work on SomeTable and with the history mechanisms working automatically.
Or course then there is the option of just inlining the sql to do this for each history table within the DAL.
I'm currently leaning towards the stored procedures so I can keep the DAL clean, and also only allowing insert/update/delete access to the database via the stored procedures which will help to stop distributors/customers from "having a play" and manipulating the tables directly.
What are peoples thoughts/experience on this?
I'm using PostgreSQL (though that should have no bearing, should it?..)
AFAIC, the users should not be able to insert/update/delete the tables directly (unless you want chaos and complete loss of data/referential Integrity. That should be allowed from stored procedures only, which ensure that the whatever Transactions (business functions) are allowed, are executed within the context of a Transaction; Atomic as per ACID Properties; that it is correct and complete; errors are handled consistently; etc.
Triggers are incapable of that.
Now, with a Historic or Audit requirement, which should be transactional, that is merely adding a few lines to the existing Transactions/stored procs. It would absurd to deploy half the code in stored procs and the rest in triggers; and if you do the trigger code unnecessarily complicates the otherwise clean transaction code.
The only circumstance where triggers are worth considering is, where you have a Non-SQL, eg. no Transactions and no stored procs (if it had either one and not the other, I would still normalise the code segments). In that circumstance, sure, deploy the code for maintaining historic tables in triggers, and the rest elsewhere.
Of course, the business logic that relates to the database, should be in the database, not outside it. Eg. in the real world beyond tiny Non-SQLs, a single corporate database may be used by five apps: all the validation and the transactions for the Db reside in one place, in the Db. It would be stupid to place that anywhere outside the Db.
There is a separate requirement, that the DAL should not attempt invalid actions, which waste server resources; and therefore it must check or validate every action before attempting it. That is not "duplicating" such validation code which may exist at the top of each transaction; it saves both the user interaction time and server resources.
Triggers are powerful, but tend to be a pain, as they are hard to test/debug.
You could argue that Stored Procedures would be better.
But if you're going to be writing 'code', you could argue that (rather than in an SP) its much nicer to put code in the DAL, where its easier to get version control, unit testing, debugging etc.
Finally, if you look at 'all updates to (someentity) should be recorded as (someentityhistory)' as a business rule (aka domain rule), then you could argue that in your business-logic-layer (aka domain logic layer) you should have code that implements this rule. So that would move the code up above the DAL into the Business-Layer. (so that it would be dealing with entity objects rather than SQL).
One factor to bear in mind is: if someone is importing data or doing bulk data updates, are they going to do it via the business layer (through an API or something) or with direct database access? If its direct database access, and you still want this rule to fire, then that might be an argument for triggers.
In summary: you could argue.
I'm not a big fan of triggers in general, but this is one usage of them that I wouldn't object to.
However, there are other benefits to stored procedures as you have said to enforce only permitted, valid updates to the tables. And if you are going to have stored procedures for those reasons then they may as well do the auditing while they are about it!
For storing the history, the tablelog addon might be helpful.
http://pgfoundry.org/projects/tablelog/
Im wondering about the real advantage of performing dml commands (inserts, updates, deletes) in the database via stored procedures for simple CRUD applications. Whats the beneffit with that appoach over just using some generic procedure in the front-end that generates the dml commands?
Thanks in advance.
Performance wise you are unlikely to see any benefit. I think it is more about security of the database.
The advantage of stored procedures in any case is the ability for the DBA to control the security access to the data differently. It often is a preference call by the DBA. Putting the CRUD access to the server in the server means they control 100% access to the server. Your code has to meet their stored proc "API".
If you include the logic in the Visual FoxPro code via a remote view, cursor adapter, or SQL Passthrough SQLExec() it means you have 100% of the code control and the DBA has to grant you access to the database components, or through the application role your code would use for the connection. Your code might be a bit more flexible with respect to building the CRUD SQL statement on the fly. The stored proc is going to have to handle flexible parameters to build the statements generically.
Rick Schummer
Visual Foxpro? For single record updates, I doubt that there is any performance benefit to SP's. The effort to maintain them certainly trumps any marginal performance gain you might get.
The stored procedures gurus might have thoughts on other benefits besides performance.
One of the main benefits is control of access. The application only has EXECUTE permission and no direct data access permission. This way the administrator can inspect the procedures and ensure they use proper access paths (ie. indexes). If the application has direct access to the tables, developers will write crappy SQL and bring down the server.
For stored procs:
Remove SQL injection risks
Encapsulation (security, treat them like methods)
Allow client code to change (same API to database)
Deal with increased complexity (eg insert parent an child in a SQL-side transaction)
Easier to manage transactions
I have seen some guidance which recommends that you secure a database by layering all data access through stored procedures.
I know that for SQL Server, you can secure tables, and even columns against CRUD operations.
For example:
--// Logged in as 'sa'
USE AdventureWorks;
GRANT SELECT ON Person.Address(AddressID, AddressLine1) to Matt;
GRANT UPDATE ON Person.Address(AddressLine1) to Matt;
--// Logged in as 'Matt'
SELECT * from Person.Address; --// Fail
SELECT AddressID, AddressLine1 from Person.Address; --// Succeed
UPDATE Person.Address SET AddressLine1 = '#____ 2700 Production Way'
WHERE AddressID = 497; --// Succeed
Given that you can secure tables and even columns against CRUD operations, how does using a stored procedure provide additional security, or management of security?
Because by limiting all access to those stored procs you have established a defined interface to the database, through which all access must occur... Since you will have DENY'd Direct select, insert, update, and delete operations against the tables and views, noone can directly write sql of their own design that does whatever they want to... If you want to limit inserts into the employee table where the employee is assigned to more than three projects to to only those employees that have a score greater than 85 on a proficiency test, then you can write that constraint intoi the SaveEmployee sproc, and have it throw an exception to any client code that attempts to do that...
Sure you COULD do the same thing using client-side code, but using sProcs makes the process easier to design and manage, cause it's all in one place, and ALL applications that attempt to access this database system HAVE to conform to whatever constraints and/or security provisions you define in the SProcs... No rogue developer writing a new separate client app that hits the database can ignore or work-around a constraint or security provision in a SProc if that SProc s the ONLY WAY to insert or update a record...
You might not want to give Matt carte-blanc to update certain tables or columns directly. What if Matt decided to do this:
UPDATE Person.Address SET AddressLine1 = NULL
Whoops. Matt forgot the WHERE clause and just hosed your database. Or maybe Matt just got pissed at his boss and has decided to quit at the end of the day. Or maybe Matt's password isn't as secure as it should have been and now a hacker has it.
This is just one simple example. Control over tables and columns could become much more complex and might be untenable through anything other than stored procedures.
Stored procedures provide additional security by allowing users to perform CRUD operations (insert, update, delete) but only in a limited fashion. For example allowing user Matt to update the address of some rows but not others.
It allows you to add data checks to make sure that the data inserted is valid data, not random garbage. For most things you can use constraints and or triggers to do some of this work, but there are limitations. Stored procedures enhance security by ensuring that operations being performed are allowed by the user.
It's easier to track changes to the database though a single point of access, controlled by your applications, rather than through any number of interfaces. And the procedure can update an audit log.
In SQL Server you do not have to grant any direct access to tables if you properly use stored procs (that means no dynamic SQl). This means your users can only do thoses things defined by the procs. If you have any financial data at all in your database or data of a sensitive nature, only the fewest possible number of people (generally only dbas) should have direct access to the tables. This seriously reduces the risk of fraud or disgruntled employees trashing your business critical data or employees stealing personal inmformation to commit identity theft. In accounting terms this is a necessary internal control and developer convenience or personal desires to do everything dynamically from the user interface should be trumped by the insecurity of of the data. Unfortunately in all too few companies, it is not. Most developers seem to only worry about outside threats to their data, but internal ones are often far more critical.
If you restrict the user at the table level and then the user fires off a query to do a legititmate insert, it won't happen. If you give them the rights to do inserts, then they can do any adhoc insert they want and aren't just limited to the ones coming from the user interface. With stored procs, they can only do the things specifically defined by the proc.
In most (all?) RDBMS's you can 'GRANT' access on specific tables to specific users. A stored procedure can run as a different user, one with greater access. But the Stored procedure is not the same as giving access to the whole table, rather it could first check some things and only return rows that match your particular security concerns.
You might be able to do similar checks with a view but stored procedures are usually much more flexible since they can run almost any SQL - compare the results and decide what rows to return.
The stored procedure is better because the security on the stored procedure (IIRC) will trump security on the tables/columns.
For single-table access, that's not a big deal. However, if you have an operation that involves multiple columns on multiple tables, having one access/deny flag for a table/column might not work for all situations.
However, a stored procedure can perform the composite operation and you can set the security appropriately on that.
Simply put, it lets you define security functionally rather than structurally. In other words, it restricts what a user is allowed to do (with a greater degree of granularity) rather than what database objects are accessible (at very coarse granularity.)
Note that we're speaking of "security controlled by the DBA", rather than by the site or system administrator or software module, all of which are useful and part of the overall security infrastructure.
The first benefit, discussed at length here, is better control of permissions - users can be limited to specific rows, not just per column (which btw is heck to manage in a large system); SPs can enforce business logic and transactional logic; data might be only retrieved dependant on other data (e.g. a join); updates might be limited to single rows at a time; etc.
Second, this can provide an additional layer of protection against SQL Injection (albeit its not complete and automatic). While this may be broken be dynamic SQL inside the SP, or by bad concatenation calls, the SP does enforce parameter types and whatnot, separating code from data.
Third, it comes down to control, at the development phase - typically you'd have trained DBAs writing the SPs, as opposed to programmers (who are trained in code...)
This is, not to mention, non-security benefits, such as better performance.
In stored procedures, you can add logic controls. You can return a error code if something is not right instead of update table data directly.
For example, you have a feedback system. Feedback can only be submitted after the administrat started the feedback campaign. It is simply updating a flag in some table.
Then when user comes to submit feedback, SP can check if the flag is set.
Select #IsFeedbackDefined = IsFeedbackDefined From sometable where ID = #ID
IF #IsFeedbackDefined is Null or #IsFeedbackDefined = false
Begin
Return -2 --can not submit feedback
End
Given a small set of entities (say, 10 or fewer) to insert, delete, or update in an application, what is the best way to perform the necessary database operations? Should multiple queries be issued, one for each entity to be affected? Or should some sort of XML construct that can be parsed by the database engine be used, so that only one command needs to be issued?
I ask this because a common pattern at my current shop seems to be to format up an XML document containing all the changes, then send that string to the database to be processed by the database engine's XML functionality. However, using XML in this way seems rather cumbersome given the simple nature of the task to be performed.
It depends on how many you need to do, and how fast the operations need to run. If it's only a few, then doing them one at a time with whatever mechanism you have for doing single operations will work fine.
If you need to do thousands or more, and it needs to run quickly, you should re-use the connection and command, changing the arguments for the parameters to the query during each iteration. This will minimize resource usage. You don't want to re-create the connection and command for each operation.
You didn't mention what database you are using, but in SQL Server 2008, you can use table variables to pass complex data like this to a stored procedure. Parse it there and perform your operations. For more info, see Scott Allen's article on ode to code.
Most databases support BULK UPDATE or BULK DELETE operations.
From a "business entity" design standpoint, if you are doing different operations on each of a set of entities, you should have each entity handle its own persistence.
If there are common batch activities (like "delete all older than x date", for instance), I would write a static method on a collection class that executes the batch update or delete. I generally let entities handle their own inserts atomically.
The answer depends on the volume of data you're talking about. If you've got a fairly small set of records in memory that you need to synchronise back to disk then multiple queries is probably appropriate. If it's a larger set of data you need to look at other options.
I recently had to implement a mechanism where an external data feed gave me ~17,000 rows of dta that I needed to synchronise with a local table. The solution I chose there was to load the external data into a staging table and call a stored proc that did the synchronisation completely within the database.