This is an issue that I've spent hours researching in the past. It seems to me to be something that should have been addressed by modern RDBMS solutions but as yet I have not found anything that really addresses what I see to be an incredibly common need in any Web or Windows application with a database back-end.
I speak of dynamic sorting. In my fantasy world, it should be as simple as something like:
ORDER BY #sortCol1, #sortCol2
This is the canonical example given by newbie SQL and Stored Procedure developers all over forums across the Internet. "Why isn't this possible?" they ask. Invariably, somebody eventually comes along to lecture them about the compiled nature of stored procedures, of execution plans in general, and all sorts of other reasons why it isn't possible to put a parameter directly into an ORDER BY clause.
I know what some of you are already thinking: "Let the client do the sorting, then." Naturally, this offloads the work from your database. In our case though, our database servers aren't even breaking a sweat 99% of the time and they aren't even multi-core yet or any of the other myriad improvements to system architecture that happen every 6 months. For this reason alone, having our databases handle sorting wouldn't be a problem. Additionally, databases are very good at sorting. They are optimized for it and have had years to get it right, the language for doing it is incredibly flexible, intuitive, and simple and above all any beginner SQL writer knows how to do it and even more importantly they know how to edit it, make changes, do maintenance, etc. When your databases are far from being taxed and you just want to simplify (and shorten!) development time this seems like an obvious choice.
Then there's the web issue. I've played around with JavaScript that will do client-side sorting of HTML tables, but they inevitably aren't flexible enough for my needs and, again, since my databases aren't overly taxed and can do sorting really really easily, I have a hard time justifying the time it would take to re-write or roll-my-own JavaScript sorter. The same generally goes for server-side sorting, though it is already probably much preferred over JavaScript. I'm not one that particularly likes the overhead of DataSets, so sue me.
But this brings back the point that it isn't possible — or rather, not easily. I've done, with prior systems, an incredibly hack way of getting dynamic sorting. It wasn't pretty, nor intuitive, simple, or flexible and a beginner SQL writer would be lost within seconds. Already this is looking to be not so much a "solution" but a "complication."
The following examples are not meant to expose any sort of best practices or good coding style or anything, nor are they indicative of my abilities as a T-SQL programmer. They are what they are and I fully admit they are confusing, bad form, and just plain hack.
We pass an integer value as a parameter to a stored procedure (let's call the parameter just "sort") and from that we determine a bunch of other variables. For example... let's say sort is 1 (or the default):
DECLARE #sortCol1 AS varchar(20)
DECLARE #sortCol2 AS varchar(20)
DECLARE #dir1 AS varchar(20)
DECLARE #dir2 AS varchar(20)
DECLARE #col1 AS varchar(20)
DECLARE #col2 AS varchar(20)
SET #col1 = 'storagedatetime';
SET #col2 = 'vehicleid';
IF #sort = 1 -- Default sort.
BEGIN
SET #sortCol1 = #col1;
SET #dir1 = 'asc';
SET #sortCol2 = #col2;
SET #dir2 = 'asc';
END
ELSE IF #sort = 2 -- Reversed order default sort.
BEGIN
SET #sortCol1 = #col1;
SET #dir1 = 'desc';
SET #sortCol2 = #col2;
SET #dir2 = 'desc';
END
You can already see how if I declared more #colX variables to define other columns I could really get creative with the columns to sort on based on the value of "sort"... to use it, it usually ends up looking like the following incredibly messy clause:
ORDER BY
CASE #dir1
WHEN 'desc' THEN
CASE #sortCol1
WHEN #col1 THEN [storagedatetime]
WHEN #col2 THEN [vehicleid]
END
END DESC,
CASE #dir1
WHEN 'asc' THEN
CASE #sortCol1
WHEN #col1 THEN [storagedatetime]
WHEN #col2 THEN [vehicleid]
END
END,
CASE #dir2
WHEN 'desc' THEN
CASE #sortCol2
WHEN #col1 THEN [storagedatetime]
WHEN #col2 THEN [vehicleid]
END
END DESC,
CASE #dir2
WHEN 'asc' THEN
CASE #sortCol2
WHEN #col1 THEN [storagedatetime]
WHEN #col2 THEN [vehicleid]
END
END
Obviously this is a very stripped down example. The real stuff, since we usually have four or five columns to support sorting on, each with possible secondary or even a third column to sort on in addition to that (for example date descending then sorted secondarily by name ascending) and each supporting bi-directional sorting which effectively doubles the number of cases. Yeah... it gets hairy really quick.
The idea is that one could "easily" change the sort cases such that vehicleid gets sorted before the storagedatetime... but the pseudo-flexibility, at least in this simple example, really ends there. Essentially, each case that fails a test (because our sort method doesn't apply to it this time around) renders a NULL value. And thus you end up with a clause that functions like the following:
ORDER BY NULL DESC, NULL, [storagedatetime] DESC, blah blah
You get the idea. It works because SQL Server effectively ignores null values in order by clauses. This is incredibly hard to maintain, as anyone with any basic working knowledge of SQL can probably see. If I've lost any of you, don't feel bad. It took us a long time to get it working and we still get confused trying to edit it or create new ones like it. Thankfully it doesn't need changing often, otherwise it would quickly become "not worth the trouble."
Yet it did work.
My question is then: is there a better way?
I'm okay with solutions other than Stored Procedure ones, as I realize it may just not be the way to go. Preferably, I'd like to know if anyone can do it better within the Stored Procedure, but if not, how do you all handle letting the user dynamically sort tables of data (bi-directionally, too) with ASP.NET?
And thank you for reading (or at least skimming) such a long question!
PS: Be glad I didn't show my example of a stored procedure that supports dynamic sorting, dynamic filtering/text-searching of columns, pagination via ROWNUMBER() OVER, AND try...catch with transaction rollbacking on errors... "behemoth-sized" doesn't even begin to describe them.
Update:
I would like to avoid dynamic SQL. Parsing a string together and running an EXEC on it defeats a lot of the purpose of having a stored procedure in the first place. Sometimes I wonder though if the cons of doing such a thing wouldn't be worth it, at least in these special dynamic sorting cases. Still, I always feel dirty whenever I do dynamic SQL strings like that — like I'm still living in the Classic ASP world.
A lot of the reason we want stored procedures in the first place is for security. I don't get to make the call on security concerns, only suggest solutions. With SQL Server 2005 we can set permissions (on a per-user basis if need be) at the schema level on individual stored procedures and then deny any queries against the tables directly. Critiquing the pros and cons of this approach is perhaps for another question, but again it's not my decision. I'm just the lead code monkey. :)
Yeah, it's a pain, and the way you're doing it looks similar to what I do:
order by
case when #SortExpr = 'CustomerName' and #SortDir = 'ASC'
then CustomerName end asc,
case when #SortExpr = 'CustomerName' and #SortDir = 'DESC'
then CustomerName end desc,
...
This, to me, is still much better than building dynamic SQL from code, which turns into a scalability and maintenance nightmare for DBAs.
What I do from code is refactor the paging and sorting so I at least don't have a lot of repetition there with populating values for #SortExpr and #SortDir.
As far as the SQL is concerned, keep the design and formatting the same between different stored procedures, so it's at least neat and recognizable when you go in to make changes.
This approach keeps the sortable columns from being duplicated twice in the order by, and is a little more readable IMO:
SELECT
s.*
FROM
(SELECT
CASE #SortCol1
WHEN 'Foo' THEN t.Foo
WHEN 'Bar' THEN t.Bar
ELSE null
END as SortCol1,
CASE #SortCol2
WHEN 'Foo' THEN t.Foo
WHEN 'Bar' THEN t.Bar
ELSE null
END as SortCol2,
t.*
FROM
MyTable t) as s
ORDER BY
CASE WHEN #dir1 = 'ASC' THEN SortCol1 END ASC,
CASE WHEN #dir1 = 'DESC' THEN SortCol1 END DESC,
CASE WHEN #dir2 = 'ASC' THEN SortCol2 END ASC,
CASE WHEN #dir2 = 'DESC' THEN SortCol2 END DESC
Dynamic SQL is still an option. You just have to decide whether that option is more palatable than what you currently have.
Here is an article that shows that: https://web.archive.org/web/20211029044050/https://www.4guysfromrolla.com/webtech/010704-1.shtml.
My applications do this a lot but they are all dynamically building the SQL. However, when I deal with stored procedures I do this:
Make the stored procedure a function that returns a table of your values - no sort.
Then in your application code do a select * from dbo.fn_myData() where ... order by ... so you can dynamically specify the sort order there.
Then at least the dynamic part is in your application, but the database is still doing the heavy lifting.
A stored procedure technique (hack?) I've used to avoid dynamic SQL for certain jobs is to have a unique sort column. I.e.,
SELECT
name_last,
name_first,
CASE #sortCol WHEN 'name_last' THEN [name_last] ELSE 0 END as mySort
FROM
table
ORDER BY
mySort
This one is easy to beat into submission -- you can concat fields in your mySort column, reverse the order with math or date functions, etc.
Preferably though, I use my asp.net gridviews or other objects with build-in sorting to do the sorting for me AFTER retrieving the data fro Sql-Server. Or even if it's not built-in -- e.g., datatables, etc. in asp.net.
There's a couple of different ways you can hack this in.
Prerequisites:
Only one SELECT statement in the
sp
Leave out any sorting (or have
a default)
Then insert into a temp table:
create table #temp ( your columns )
insert #temp
exec foobar
select * from #temp order by whatever
Method #2: set up a linked server back to itself, then select from this using openquery:
http://www.sommarskog.se/share_data.html#OPENQUERY
There may be a third option, since your server has lots of spare cycles - use a helper procedure to do the sorting via a temporary table. Something like
create procedure uspCallAndSort
(
#sql varchar(2048), --exec dbo.uspSomeProcedure arg1,'arg2',etc.
#sortClause varchar(512) --comma-delimited field list
)
AS
insert into #tmp EXEC(#sql)
declare #msql varchar(3000)
set #msql = 'select * from #tmp order by ' + #sortClause
EXEC(#msql)
drop table #tmp
GO
Caveat: I haven't tested this, but it "should" work in SQL Server 2005 (which will create a temporary table from a result set without specifying the columns in advance.)
At some point, doesn't it become worth it to move away from stored procedures and just use parameterized queries to avoid this sort of hackery?
I agree, use client side. But it appears that is not the answer you want to hear.
So, it is perfect the way it is. I don't know why you would want to change it, or even ask "Is there a better way." Really, it should be called "The Way". Besides, it seems to work and suit the needs of the project just fine and will probably be extensible enough for years to come. Since your databases aren't taxed and sorting is really really easy it should stay that way for years to come.
I wouldn't sweat it.
When you are paging sorted results, dynamic SQL is a good option. If you're paranoid about SQL injection you can use the column numbers instead of the column name. I've done this before using negative values for descending. Something like this...
declare #o int;
set #o = -1;
declare #sql nvarchar(2000);
set #sql = N'select * from table order by ' +
cast(abs(#o) as varchar) + case when #o < 0 then ' desc' else ' asc' end + ';'
exec sp_executesql #sql
Then you just need to make sure the number is inside 1 to # of columns. You could even expand this to a list of column numbers and parse that into a table of ints using a function like this. Then you would build the order by clause like so...
declare #cols varchar(100);
set #cols = '1 -2 3 6';
declare #order_by varchar(200)
select #order_by = isnull(#order_by + ', ', '') +
cast(abs(number) as varchar) +
case when number < 0 then ' desc' else '' end
from dbo.iter_intlist_to_tbl(#cols) order by listpos
print #order_by
One drawback is you have to remember the order of each column on the client side. Especially, when you don't display all the columns or you display them in a different order. When the client wants to sort, you map the column names to the column order and generate the list of ints.
An argument against doing the sorting on the client side is large volume data and pagination. Once your row count gets beyond what you can easily display you're often sorting as part of a skip/take, which you probably want to run in SQL.
For Entity Framework, you could use a stored procedure to handle your text search. If you encounter the same sort issue, the solution I've seen is to use a stored proc for the search, returning only an id key set for the match. Next, re-query (with the sort) against the db using the ids in a list (contains). EF handles this pretty well, even when the ID set is pretty large. Yes, this is two round trips, but it allows you to always keep your sorting in the DB, which can be important in some situations, and prevents you from writing a boatload of logic in the stored procedure.
How about handling sorting on the stuff displaying the results -- grids, reports, etc. rather than on SQL?
EDIT:
To clarify things since this answer got down-voted earlier, I'll elaborate a bit...
You stated you knew about client-side sorting but wanted to steer clear of it. That's your call, of course.
What I want to point out, though, is that by doing it on the client-side, you're able to pull data ONCE and then work with it however you want -- versus doing multiple trips back and forth to the server each time the sort gets changed.
Your SQL Server isn't getting taxed right now and that's awesome. It shouldn't be. But just because it isn't overloaded yet doesn't mean that it'll stay like that forever.
If you're using any of the newer ASP.NET stuff for displaying on the web, a lot of that stuff is already baked right in.
Is it worth adding so much code to each stored procedure just to handle sorting? Again, your call.
I'm not the one who will ultimately be in charge of supporting it. But give some thought to what will be involved as columns are added/removed within the various datasets used by the stored procedures (requiring modifications to the CASE statements) or when suddenly instead of sorting by two columns, the user decides they need three -- requiring you to now update every one of your stored procedures that uses this method.
For me, it's worth it to get a working client-side solution and apply it to the handful of user-facing displays of data and be done with it. If a new column is added, it's already handled. If the user wants to sort by multiple columns, they can sort by two or twenty of them.
Sorry I'm late to the party, but here's another option for those who really want to avoid dynamic SQL, but want the flexibility it offers:
Instead of dynamically generating the SQL on the fly, write code to generate a unique proc for every possible variation. Then you can write a method in the code to look at the search options and have it choose the appropriate proc to call.
If you only have a few variations then you can just create the procs by hand. But if you have a lot of variations then instead of having to maintain them all, you would just maintain your proc generator instead to have it recreate them.
As an added benefit, you'll get better SQL plans for better performance doing it this way too.
This solution might only work in .NET, I don't know.
I fetch the data into the C# with the initial sort order in the SQL order by clause, put that data in a DataView, cache it in a Session variable, and use it to build a page.
When the user clicks on a column heading to sort (or page, or filter), I don't go back to the database. Instead, I go back to my cached DataView and set its "Sort" property to an expression I build dynamically, just like I would dynamic SQL. ( I do the filtering the same way, using the "RowFilter" property).
You can see/feel it working in a demo of my app, BugTracker.NET, at http://ifdefined.com/btnet/bugs.aspx
You should avoid the SQL Server sorting, unless if necessary. Why not sort on app server or client side? Also .NET Generics does exceptional sortin
Related
I seem to approach thinking about sql the wrong way. I am always writing things that do not work.
For example I need a variable. So i think:
DECLARE #CNT AS INT
SET #CNT = COUNT(DISTINCT database.schema.table.column)
Why doesn't this work...? I am using a fully qualified reference here, so the value I want should be clear.
DECLARE #CNT AS INT
SET #CNT = (SELECT COUNT(DISTINCT database.schema.table.column) FROM column)
This works... but why do I have to use select?
Does everything have to be prefaced with one of the DDL or DML statements?
Secondly:
I can't debug line by line because a sql statement is treated all as one step. The only way I can debug is if I select the innermost sub-query and run that, then include next outer sub query and run that, and so on and so forth.
Is there a locals window?
I've heard about set-based thinking rather than iterative thinking, I guess I am still iterative even for functional languages... the iteration is just from innermost parentheses to outermost parentheses, and applied to the whole set. but even here I run into trouble because I don't know which value in the set causes the error.
Sorry if this seems scatterbrained... I guess that just kinda reflects how I feel about it. I don't know how to architect a big stored procedure from lots of little components......Like in vba I can just call another sub-routine and make sure the variables I need are global.
tldr: Need the conceptual grounding / knowing what actually happens when I type something and hit F5
On Question #1, You need select because that's how SQL works. You've given it a name, but haven't told it what to do with that name (select it, update it, delete it?) Just saying the column name is not grammatically correct.
On #2, Yes, SQL is declarative, you're not telling it what to do, you're telling it what to return. It will retrieve the data in the order that is most efficient at that particular moment in time, Normally your sub-query will be the last thing to run, not the first.
Yes, you have to use SELECT in-order to fetch that data first and then assign it to variable. You can also do it like
DECLARE #CNT AS INT
SELECT #CNT = COUNT(DISTINCT `column`) FROM database.schema.table
My code actually works, I don't need help with that. What I would like to know if what I have done is considered acceptable.
In one particular part of a T-SQL script I am writing I have to run almost similar insert statements about 20 times. Only a portion of the WHERE clause is different in each case. Wanting to loop, rather than have 20 almost identical inserts, I use a WHILE loop to run some dynamic SQL and I store the portion of the WHERE clause that differs in the database. Works like a charm. It's worth noting that the INSERT statements in this case may vary in number or in content and I felt this solution allowed a way to deal with that rather simply.
When showing one of my peers at work this solution to the problem, his one eyebrow went up and he looked at me as though I was growing a new head. He suggested that there was a better way. That may be and with me being the junior I'll humbly accept it. But, I did want to ask the community if this seems like a weird, unprofessional or against general standards / best practices.
I can post the code if needed but for the purposes hopefully I have given you enough to comment one way or the other.
TIA
Edit--
OK, as requested here is the code. I won't try to explain it as it's a can of worms but here it is.
DECLARE #varOfferId INT = 1
DECLARE #MaxOfferId INT = (SELECT COUNT(DISTINCT offer_id) FROM obp.CellCodes_Offers
DECLARE #SQLWhereClause VARCHAR(1000)
DECLARE #SQLStatement VARCHAR(1000)
WHILE #varOfferId <= #MaxOfferId
BEGIN
SET #SQLWhereClause = (SELECT where_clause FROM obp.Offers WHERE offer_id = #varOfferId)
SET #SQLStatement =
'INSERT INTO obp.Offers_Contacts ' +
'SELECT DISTINCT o.contact_id, ' + CONVERT(VARCHAR(2), #varOfferId) +
' FROM obp.Onboarding AS o
WHERE ' + #SQLWhereClause +
' AND o2.contact_id = o.contact_id)
AND ' + CONVERT(VARCHAR(2), #varOfferId) + ' IN(
SELECT cc.offer_id
FROM obp.CellCodes_Offers AS cc
WHERE cc.cellcode = o.cellcode)'
EXECUTE (#SQLStatement)
SET #varOfferId = #varOfferId + 1
END
So, it seems that the consensus thus far is thinking this is not a good idea. OK, I'm good with that. But I'm not sure I agree that it is easier from a maintenance standpoint. Right now my code looks at the 'Offers' table, gets the row count and loops that many times. If they add more offers going forward (or reduce the offers) all I have to do is an INSERT (or DELETE) and include the offer with the appropriate WHERE clause and we are on our way. Alternatively, if I write all the individual INSERTS if they add or remove I've got to touch the code which means testing/qa. Thoughts?
However, I do agree with several other points so I guess I'll be going back to the drawing board tomorrow!
Pros:
You've kept your code shorter, saved some time
Cons:
You are now susceptible to SQL Injection
Your DB code is now half in the DB and half in the table - this will make maintenance harder for whoever maintains your code.
Debugging is going to be difficult.
If you have to write 20 different statements, it may be possible to autogenerate them using a very similar WHILE LOOP to the one you've already made.
e.g.
SELECT 'insert into mytable (x,y,z) from a join b on a.x = b.x ' + wherecolumn
from wheretable
This would give you the code you need to paste into your stored procedure. You could even keep that statement above in the stored procedure, commented out, so others may re-use it in future if column structures change.
For the best post I've ever seen on dynamic SQL check out Erland Somerskog's page here.
I think recording the difference in a database is relatively less straightforward and less convenient to modify afterwards. I would just write a script to do this, and write the conditions in the script directly.
For example, in Python you may write something like this.
import MySQLdb
import MySQLdb.cursors
field_value_pairs = {'f1':'v1', 'f2':'v2', 'f3':'v3'} # this is where you could modify to meet your different cases
db = MySQLdb.connect(host=host_name, user=user_name, passwd=password, \
unix_socket=socket_info)
cursor = db.cursor()
db.select_db(db_name)
for field in field_value_pairs.keys():
cursor.execute("""INSERT INTO tbl_name (%s) VALUES (%s)""", field, field_value_pairs[field])
db.commit()
cursor.close()
db.close()
I'm currently writing a stored procedure that provides a calling application with the retrieval of valid city and state values.
When provided with a zip_code, the stored procedure will return a list of all valid city/state combinations for the specified input parameter.
However, if a zip_code does not exist, then the stored procedure must return an error string 'ZipCode Wrong!' back to the calling application instead of an empty dataset.
I've considered two approaches:
First Approach
SELECT City, State FROM ZipCodeTable WHERE Zip = #ZipCode
IF (##ROWCOUNT = 0)
return 'ZipCode Wrong!'
Second Approach
SELECT COALESCE(
(SELECT City, State FROM ZipCodeTable WHERE Zip=#ZipCode FOR XML PATH ('')),
(SELECT 'ZipCode Wrong!') FOR XML PATH (''))
As this transaction will be run MANY, MANY times per second, I want to make it as efficient as possible. From a performance standpoint which one is more efficient? Also, if there's another, better approach, feel free to let me know. Thanks!
Maybe you don't use a COUNT and don't use a SELECT because one counts your results and the other returns a set.
You really mean to ask whether or not it exists... so why not use the clause intended for that purpose?
EXISTS
http://msdn.microsoft.com/en-us/library/ms188336.aspx
If you, instead, want to return the result if it exists, and the error message if it doesn't, then ##ROWCOUNT is likely faster... you should benchmark it using the profiler.
Consider your use case, though.
Do you expect significantly more errors and valid returns? If so, then perhaps the EXISTS syntax is better as a filter... if you expect much more valid returns then the SELECT with ##ROWCOUNT may be preferred.
For this one, the answer is pretty squarely: It depends
What database are you using? Perhaps there is an implementation-specific approach for this.
I think you should handle this in the application layer. Look at the number of rows returned and detect the error that way. Much faster, easier and cleaner than a SQL solution.
IF Then branches generally take more time, COALESCE is easier for your processor to optimize.
That said, they will probably run at equal speed, I would speed test both solutions.
Don't create an entire result when you don't need it, just make a count:
declare #cnt int
select #cnt = count(*) from ZipCodeTable where Zip = #ZipCode
if (#cnt) begin
return 'ZipCode Wrong!'
end
We are attempting to concatenate possibly thousands of rows of text in SQL with a single query. The query that we currently have looks like this:
DECLARE #concatText NVARCHAR(MAX)
SET #concatText = ''
UPDATE TOP (SELECT MAX(PageNumber) + 1 FROM #OrderedPages) [#OrderedPages]
SET #concatText = #concatText + [ColumnText] + '
'
WHERE (RTRIM(LTRIM([ColumnText])) != '')
This is working perfectly fine from a functional standpoint. The only issue we're having is that sometimes the ColumnText can be a few kilobytes in length. As a result, we're filling up tempDB when we have thousands of these rows.
The best reason that we have come up with is that as we're doing these updates to #concatText, SQL is using implicit transactions so the strings are effectively immutable.
We are trying to figure out a good way of solving this problem and so far we have two possible solutions:
1) Do the concatenation in .NET. This is an OK option, but that's a lot of data that may go back across the wire.
2) Use .WRITE which operates in a similar fashion to .NET's String.Join method. I can't figure out the syntax for this as BoL doesn't cover this level of SQL shenanigans.
This leads me to the question: Will .WRITE work? If so, what's the syntax? If not, are there any other ways to do this without sending data to .NET? We can't use FOR XML because our text may contain illegal XML characters.
Thanks in advance.
I'd look at using CLR integration, as suggested in #Martin's comment. A CLR aggregate function might be just the ticket.
What exactly is filling up tempdb? It cannot be #concatText = #concatText + [ColumnText], there is no immutability involved and the #concatText variable will be at worst case 2GB size (I expect your tempdb is much larger than that, if not increase it). It seems more like your query plan creates a spool for haloween protection and that spool is the culprit.
As a generic answer, using the UPDATE ... SET #var = #var + ... for concatenation is known to have correctness issues and is not supported. Alternative approaches that work more reliably are discussed in Concatenating Row Values in Transact-SQL.
First, from your post, it isn't clear whether or why you need temp tables. Concatenation can be done inline in a query. If you show us more about the query that is filling up tempdb, we might be able to help you rewrite it. Second, an option that hasn't been mentioned is to do the string manipulation outside of T-SQL entirely. I.e., in your middle-tier query for the raw data, do the manipulation and push it back to the database. Lastly, you can use Xml such that the results handle escapes and entities properly. Again, we'd need to know more about what and how you are trying to accomplish.
Agreed..A CLR User Defined Function would be the best approach for what you guys are doing. You could actually read the text values into an object and then join them all together (inside the CLR) and have the function spit out a NVARCHAR(MAX) result. If you need details on how to do this let me know.
Within our business rules, we need to track when a row is designated as being changed. The table contains multiple columns designated as non-relevant per our business purposes (such as a date entered field, timestamp, reviewed bit field, or received bit field). The table has many columns and I'm trying to find an elegant way to determine if any of the relevant fields have changed and then record an entry in an auditing table (entering the PK value of the row - the PK cannot be edited). I don't even need to know which column actually changed (although it would be nice down the road).
I am able to accomplish it through a stored procedure, but it is an ugly SP using the following syntax for an update (OR statements shortened considerably for post):
INSERT INTO [TblSourceDataChange] (pkValue)
SELECT d.pkValue
FROM deleted d INNER JOIN inserted i ON d.pkValue=i.pkValue
WHERE ( i.[F440] <> d.[F440]
OR i.[F445] <> d.[F445]
OR i.[F450] <> d.[F450])
I'm trying to find a generic way where I could designated the ignore fields and the stored proc would still work even if I added additional relevant fields into the table. The non-relevant fields do not change very often whereas the relevant fields tend to be a little more dynamic.
Have a look at Change Data Capture. This is a new feature in SQL Server 2008.
First You enable CDC on the database:
EXEC sys.sp_cdc_enable_db
Then you can enable it on specific tables, and specify which columns to track:
EXEC sys.sp_cdc_enable_table
#source_schema = 'dbo',
#source_name = 'xxx',
#supports_net_changes = 1,
#role_name = NULL,
#captured_column_list = N'xxx1,xxx2,xxx3'
This creates a change table named cdc.dbo_xxx. Any changes made to records in the table are recorded in that table.
I object! The one word I cannot use to describe the option available is elegant. I have yet to find a satisfying way to accomplish what you want. There are options, but all of them feel a bit unsatisfactory. When/why you chose these options depends on some factors you didn't mention.
How often do you need to "ask" what fields changed? meaning, do users infrequently click on the "audit history" link? Or is this all the time to sort out how your app should behave?
How much does disk space cost you ? I'm not being flippant, but i've worked places where the storage strategy for our auditing was million dollar issue based on what we were being charged for san space -- meaning expensive for SQL server to reconstitute wasn't a consideration, storage size was. You maybe be the same or inverse.
Change Data Capture
As #TGnat mentioned you can use CDC. This method is great because you simply enable change tracking, then call the sproc to start tracking. CDC is nice because it's pretty efficient storage and horsepower wise. You also kind of set it and forget it---that is, until developers come along and want to change the shape of your tables. For developer sanity you'll want to generate a script that disables/enables tracking for your entities.
I noticed you want to exclude certain columns, rather than include them. You could accomplish this with a FOR XML PATH trick. You could write a query something like the following, then use the #capturedColList variable when calling sys.sp_cdc_enable_table ..
SET #capturedColList = SELECT Substring( (
SELECT ',' + COLUMN_Name
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = '<YOUR_TABLE>' AND
COLUMN_NAME NOT IN ('excludedA', 'excludedB')
FOR XML PATH( '' )
) , 2, 8000)
Triggers w/Cases
The second option I see is to have some sort of code generation. It could be an external harness or a SPROC that writes your triggers. Whatever your poison, it will need to be automated and generic. But you'll basically code that writes DDL for triggers that compare current to INSERTED or DELETED using tons of unweildy CASE statements for each column.
There is a discussion of the style here.
Log Everything, Sort it out later
The last option is to use a trigger to log every row change. Then, you write code (SPROCS/UDFs) that can look through your log data and recognize when a change has occured. Why would you choose this option? Disk space isn't a concern, and while you need to be able to understand what changed, you only rarely ask the system this question.
HTH,
-eric
Use a trigger and make sure it can handle multiple row inserts.
I found the answer in the post SQL Server Update, Get only modified fields and adapted the SQL to fit my needs (this sql is in a trigger). The SQL is posted below:
DECLARE #idTable INT
SELECT #idTable = T.id
FROM sysobjects P JOIN sysobjects T ON P.parent_obj = T.id
WHERE P.id = ##procid
IF EXISTS
(SELECT * FROM syscolumns WHERE id = #idTable
AND CONVERT(VARBINARY,REVERSE(COLUMNS_UPDATED())) & POWER(CONVERT(BIGINT, 2), colorder - 1) > 0 AND name NOT IN ('timestamp','Reviewed')
)
BEGIN
--Do appropriate stuff here
END