SQL lazy update operations

SQL lazy update operations - sql

We insert SQL entities into a table, one by one. It's easy and fast. After the entity insert, we are executing an SP to updates several tables according to the new entity, update some calculated fields, some lookup tables to help to find this new entity. This takes a lot of time and sometimes ends up in a deadlock state.
Inserting the main entity must be fast and reliable, updating the additional tables is not important to happen immediately. I was wondering (I am not a DB expert) if there is an SQL methodology similar to the thread handling in C#, to maintain an update thread, which can be awakened when a new entity arrives to update the additional tables after the insertion. This thread can update these tables in "one thread" to avoid deadlock.
I can imagine an sql job which executes every minute, searches for new entities and executes the updates, but it seems too rough to me.
What is the best practice to implement this on MS SQL side?

There are a number of ways you could achieve this. You mention that the two can be done separately - immediate updating is not important. In that case, you could set up a SQL Agent to run a stored procedure that checks for missing records and performs the update.
Another approach would be to put the entire original update inside a stored procedure responsible for performing the update and all the housekeeping work, then all you would do is call the stored procedure with the right parameters and it would do all the work behind the curtain.
Another way would be to add triggers on the inserted table to do the update for you. Sounds like the first is what you probably want.

Related

DB2: Working with concurrent DDL operations

We are working on a data warehouse using IBM DB2 and we wanted to load data by partition exchange. That means we prepare a temporary table with the data we want to load into the target table and then use that entire table as a data partition in the target table. If there was previous data we just discard the old partition.
Basically you just do "ALTER TABLE target_table ATTACH PARTITION pname [starting and ending clauses] FROM temp_table".
It works wonderfully, but only for one operation at a time. If we do multiple loads in parallel or try to attach multiple partitions to the same table it's raining deadlock errors from the database.
From what I understand, the problem isn't necessarily with parallel access to the target table itself (locking it changes nothing), but accesses to system catalog tables in the background.
I have combed through the DB2 documentation but the only reference to the topic of concurrent DDL statements I found at all was to avoid doing them. The answer to this question, can't be to simply not attempt it?
Does anyone know a way to deal with this problem?
I tried to have a global, single synchronization table to lock if you want to attach any partitions, but it didn't help either. Either I'm missing something (implicit commits somewhere?) or some of the data catalog updates even happen asynchronously, which makes the whole problem much worse. If that is the case, is there are any chance at all to query if the attach is safe to perform at any given moment?

T-SQL: Trigger that runs right before the end of a modifying transaction

Problem statement
I have a view for recursively collecting and aggregating infos from 3 different large to very large tables. This view itself needs quite a time to execute but is needed in many select statements and is executed quite often.
The resulting view, however, is very small (a few dozend results in 2 columns).
All updating actions typically start a transaction, execute many thousand INSERTs and then commit the transaction. This does not occur very frequently, but if something is written to the database it is usually a large amount of data.
What I tried
As the view is small, does not change frequently and is read often, I thought of creating an indexed view. However, sadly you can not create an indexed view with CTEs or even recursive CTEs.
To 'emulate' a indexed or materialized view, I thought about writing a trigger that executes the view and stores the results into a table every time one of the base tables get modified. However, I guess this would take forever if a large amout of entries are UPDATEed or INSERTed and the trigger runs for each INSERT/UPDATE statement on those tables, even if they are inside a single transaction.
Actual question
Is it possible to write a trigger that runs once before commiting and after the last insert/update statement of a transaction has finished and only if any of the statements has changed any of the three tables?

No, there's no direct way to make a trigger that runs right before the end of a transaction. DML Triggers run once per triggering DML statement (INSERT, UPDATE, DELETE), and there's no other kind of trigger related to data modification.
Indirectly, you could have all of your INSERT's insert into a temporary table and then INSERT them all together from the #temp table into the real table, resulting in one trigger firing for that table. But if you are writing to multiple tables, you would still have the same problem.
The SOP (Standard Operating Practice) way to address this is to have a stored procedure handle everything up front instead of a Trigger trying to catch everything on the back-side.
If data consistency is important, then I'd recommend that you follow the SOP approach based in a stored procedure that I mentioned above. Here's a hi-level outline of this approach:
Use a stored procedure that dumps all of the changes into #temp tables first,
then start a transaction,
then make the changes, moving data/changes from your #temp table(s) into your actual tables,
then do the follow-up work you wanted in a trigger. If these are consistency checks, then if they fail, you should rollback the transaction.
Otherwise, it then commits the transaction.
This is almost always how something like this is done correctly.

If your view is small and queried frequently and your underline tables are rarely changed, you don't need a "view". Instead you need a summary table with the same result of the view and updated by your triggers on each underline table.
A trigger is triggered every time you have data modification (insert, delete and update) but one modification will only trigger once, whether it updates one record or one million rows. You don't need worry about the size of update. Instead the frequency of updating is your concern.
If your have a procedure periodically insert large number of rows, or updates large number of rows one by one, you can change the procedure and disable the triggers before the update so the summary table will be updated only before the end of procedure, where you can call the same "sum" procedure and enable those triggers.
If you HAVE TO keep the "summary" up-to-date all the time, even during large number of transactions (i doubt it's very helpful or practical, if your view is slow to execute), you can disable those triggers, do some calculation by yourself after each transaction, update the summary table after each transaction, in your procedure.

Is it better to do database operations from an sql script or from application code?

Consider the following abstract situation (just as an example):
I have two tables TableA and TableB. They have unique IDs and possibly other columns (which are irrelevant) The relatioship between them is many to many so I have a third table AssociationTable that is used to store the relationships between them. Basically, AssociationTable will have two columns (ID_A and ID_B - foreign keys).
If I delete a row in AssociationTable and the ID_A that was deleted was the last one, I would also like to delete the entry from TableA that corresponds to that ID.
I could do this:
a) From the application that uses the database
b) by using an SQL trigger
My question, basically, is the following:
Is there any good practice that says "if you can do something from both the application and from SQL, always prefer sql." ?
Or does it depend on the case? If so, what should I take into account?

Performance: The query plan for stored procedures is compiled onn DB Server and subsequent requests can run faster.
A stored procedure can execute multiple steps and the intermediate results need not go back to application layer, reducing traffic between an application and the DB server.
Security: Stored procedures are well defined database objects that can be locked down with security measures. Use of typed parameters can help prevent SQL injection attacks.
Code re-use: SQL queries can be written once and re-used across multiple clients without writing the same SQL commands over and over again.
Abstraction: By putting all the SQL code into a stored procedure, the application is completely abstracted from the field names, tables names, etc. So when a SQL query needs to be changed, there is almost zero or NO impact in the application code.

There are more benefits of doing it in the database.
Other client application code need not worry about data integrity.
The data logic should remain as close to data as possible
It could be faster if managed by DB (trigger invocation).

What is the best way to maintain a LastUpdatedDate column in SQL?

Suppose I have a database table that has a timedate column of the last time it was updated or inserted. Which would be preferable:
Have a trigger update the field.
Have the program that's doing the insertion/update set the field.
The first option seems to be the easiest since I don't even have to recompile to do it, but that's not really a huge deal. Other than that, I'm having trouble thinking of any reasons to do one over the other. Any suggestions?

The first option can be more robust because the database will be maintaining the field. This comes with the possible overhead of using triggers.
If you could have other apps writing to this table in the future, via their own interfaces, I'd go with a trigger so you're not repeating that logic anywhere else.
If your app is pretty much it, or any other apps would access the database through the same datalayer, then I'd avoid that nightmare that triggers can induce and put the logic directly in your datalayer (SQL, ORM, stored procs, etc.).
Of course you'd have to make sure your time-source (your app, your users' pcs, your SQL server) is accurate in either case.
Regarding why I don't like triggers:
Perhaps I was rash by calling them a nightmare. Like everything else, they are appropriate in moderation. If you use them for very simple things like this, I could get on board.
It's when the trigger code gets complex (and expensive) that triggers start to cause lots of problems. They are a hidden tax on every insert/update/delete query you execute (depending on the type of trigger). If that tax is acceptable then they can be the right tool for the job.

You didn't mention 3. Use a stored procedure to update the table. The procedure can set timestamps as desired.
Perhaps that's not feasible for you, but I didn't see it mentioned.

As long as I'm using a DBMS in whose triggers I trust, I'd always go with the trigger option. It allows the DBMS to take care of as many things as possible, which is usually a good thing.
It work make sure under any circumstances that the timestamp column has the correct value. The overhead would be negligible.
The only thing that would be against triggers is portability. If that's not an issue, I don't think there is a question which direction to go.

I would say trigger just in case that someone uses something besides your app to update the table, you probably also want to have a LastUpdatedBy and use SUSER_SNAME() for that, this way you can see who did the update

I'm a proponent of stored procedures for everything. Your update proc could contain a GETDATE() for the column.
And I don't like triggers for this kind of update. Lack of visibility of triggers tends to cause confusion.

This sounds like business logic to me ... I would be more disposed to putting this in the code. Let the database manage the storage of data ... No more and no less.

Triggers are a blessing and a curse.
Blessing: You can use them to enable all kinds of custom constraint checking and data management without backend systems knowledge or changes.
Curse: You don't know whats happening behind your back. Concurrency issues/deadlocks by additional objects brought into transactions that were not origionally expected. Phantom behavior including session environment changes, unreliable rowcounts. Excessive triggering of conditions..additional hotspot/performance penalties.
The answer to this question (Update dates implicitly(trigger) or explicitly (code)) ususally weights heavily on context. For example if you are using last change date as an informational field you might want to only change it when a 'user' actually makes salient changes to a row vs an automated process that simply updates some sort of internal marker users don't care about.
If you are using the trigger for change synchronization or you have no control over code that is executing a trigger makes a lot more sense.
My advise on trigger use it to be careful. Most systems allow you to filter execution based on the operation and fields changed. Proper use of 'before' vs 'after' triggers can have a significant performance impacts.
Finally a few systems are capable of executing a single trigger on multiple changes (multiple rows effected within a transaction) your code should be prepared to apply itself as a bulk update to multiple rows.

Normally I'd say do it database side, but it depends on your application. If you're using LINQ-to-SQL you can just set the field as Timestamp and have your DAL use the Timestamp field for concurrency. It handles it for you automatically, so having to repeat code is a non event.
If you're writing your DAL yourself though, then I'd be more likely to handle this on the database side as it makes writing user interfaces far more flexible - although, I'd likely do this in a stored procedure that has "public" access and the tables locked down - you don't want just any clown coming along and bypassing your stored procedure by writing to the tables directly... unless you plan on making your DAL a standalone component that any future application must use to access the database, in which case, you could code it directly into the DAL - of course, you should only do this if you can guarantee that everyone accessing the database is doing so through your DAL component.
If you're going to allow "public" access to the database to insert into tables, then you'll have to go with the trigger because otherwise anyone can insert/update a single field in the table and the updated field could never get updated.

I would have the date maintained at the database, i.e., a trigger, stored procedure, etc. In most of your database-driven applications the user app is not going to be the only means by which the business users get at data. There are reporting tools, extracts, user SQL, etc. There's also updates and corrections that are done by the DBA that the application won't be providing the date for as well.
But honestly the #1 reason I wouldn't do it from the application is you have no control over the date/time on the client machine. They might be rolling it back to get more days out of a trial license on something or may just want to do bad things to your program.

You can do this without the trigger if your database supports default values on the fields. For example, in SQL Server 2005 I have a table with a field created like this:
create table dbo.Repository
(
...
last_updated datetime default getdate(),
...
)
then the insert code just leaves that field out of the insert field list.
I forgot that only worked for the first insert - I do have an update trigger as well, to update the date fields and put a copy of the updated record in my history table - which I would post ... but the editor keeps erroring out on my code ...
Finally:
create trigger dbo.Repository_Upd on dbo.Repository instead of update
as
--**************************************************************************
-- Trigger: Repository_Upd
-- Author: Ron Savage
-- Date: 09/28/2008
--
-- Description:
-- This trigger sets the last_updated and updated_by fields before the update
-- and puts a copy of the updated row into the Repository_History table.
--
-- Modification History:
-- Date Init Comment
-- 10/22/2008 RS Blocked .prm files from updating the history as they
-- get updated every time the cfg file is run.
-- 10/21/2008 RS Updated the insert into the history table to use the
-- d.last_updated field from the Repository table rather
-- than getdate() to avoid micro second differences.
-- 09/28/2008 RS Created.
--**************************************************************************
begin
--***********************************************************************
-- Update the record but fill in the updated_by, updated_system and
-- last_updated date with current information.
--***********************************************************************
update cr set
cr.filename = i.filename,
cr.created_by = i.created_by,
cr.created_system = i.created_system,
cr.create_date = i.create_date,
cr.updated_by = user,
cr.updated_system = host_name(),
cr.last_updated = getdate(),
cr.content = i.content
from
Repository cr
JOIN Inserted i
on (i.config_id = cr.config_id);
--***********************************************************************
-- Put a copy in the history table
--***********************************************************************
declare #extention varchar(3);
select #extention = lower(right(filename,3)) from Inserted;
if (#extention <> 'prm')
begin
Insert into Repository_History
select
i.config_id,
i.filename,
i.created_by,
i.created_system,
i.create_date,
user as updated_by,
host_name() as updated_system,
d.last_updated,
d.content
from
Inserted i
JOIN Repository d
on (d.config_id = i.config_id);
end
end
Ron

CREATE TRIGGER is taking more than 30 minutes on SQL Server 2005

On our live/production database I'm trying to add a trigger to a table, but have been unsuccessful. I have tried a few times, but it has taken more than 30 minutes for the create trigger statement to complete and I've cancelled it.
The table is one that gets read/written to often by a couple different processes. I have disabled the scheduled jobs that update the table and attempted at times when there is less activity on the table, but I'm not able to stop everything that accesses the table.
I do not believe there is a problem with the create trigger statement itself. The create trigger statement was successful and quick in a test environment, and the trigger works correctly when rows are inserted/updated to the table. Although when I created the trigger on the test database there was no load on the table and it had considerably less rows, which is different than on the live/production database (100 vs. 13,000,000+).
Here is the create trigger statement that I'm trying to run
CREATE TRIGGER [OnItem_Updated]
ON [Item]
AFTER UPDATE
AS
BEGIN
SET NOCOUNT ON;
IF update(State)
BEGIN
/* do some stuff including for each row updated call a stored
procedure that increments a value in table based on the
UserId of the updated row */
END
END
Can there be issues with creating a trigger on a table while rows are being updated or if it has many rows?
In SQLServer triggers are created enabled by default. Is it possible to create the trigger disabled by default?
Any other ideas?

The problem may not be in the table itself, but in the system tables that have to be updated in order to create the trigger. If you're doing any other kind of DDL as part of your normal processes they could be holding it up.
Use sp_who to find out where the block is coming from then investigate from there.

I believe the CREATE Trigger will attempt to put a lock on the entire table.
If you have a lots of activity on that table it might have to wait a long time and you could be creating a deadlock.
For any schema changes you should really get everyone of the database.
That said it is tempting to put in "small" changes with active connections. You should take a look at the locks / connections to see where the lock contention is.

That's odd. An AFTER UPDATE trigger shouldn't need to check existing rows in the table. I suppose it's possible that you aren't able to obtain a lock on the table to add the trigger.
You might try creating a trigger that basically does nothing. If you can't create that, then it's a locking issue. If you can, then you could disable that trigger, add your intended code to the body, and enable it. (I do not believe you can disable a trigger during creation.)

Part of the problem may also be the trigger itself. Could your trigger accidentally be updating all rows of the table? There is a big differnce between 100 rows in a test database and 13,000,000. It is a very bad idea to develop code against such a small set when you have such a large dataset as you can have no way to predict performance. SQL that works fine for 100 records can completely lock up a system with millions for hours. You really want to know that in dev, not when you promote to prod.
Calling a stored proc in a trigger is usually a very bad choice. It also means that you have to loop through records which is an even worse choice in a trigger. Triggers must alawys account for multiple record inserts/updates or deletes. If someone inserts 100,000 rows (not unlikely if you have 13,000,000 records), then looping through a record based stored proc could take hours, lock the entire table and cause all users to want to hunt down the developer and kill (or at least maim) him because they cannot get their work done.
I would not even consider putting this trigger on prod until you test against a record set simliar in size to prod.
My friend Dennis wrote this article that illustrates why testing a small volumn of information when you have a large volumn of information can create difficulties on prd that you didn't notice on dev:
http://blogs.lessthandot.com/index.php/DataMgmt/?blog=3&title=your-testbed-has-to-have-the-same-volume&disp=single&more=1&c=1&tb=1&pb=1#c1210

Run DISABLE TRIGGER triggername ON tablename before altering the trigger, then reenable it with ENABLE TRIGGER triggername ON tablename

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas