I am working in SQL Server and here is the scenario.
Below is the workflow:
We will get data from upstream for every 10 - 15 secs. Upstream will directly insert data into our tables - this is our Staging table
We wrote an after insert trigger on this staging table which will move data from Staging to Master table. And now master table also has an After insert trigger which will take 23 secs to complete
We are not sure when is the final batch of data will be submitted by upstream
Problem area where solution needed
As upstream tries to push data into our staging table for every 10 secs and if the trigger on Staging start it will wait until trigger on master table is completed. Now during time of execution 24 sec if we get data from upstream as STG is locked data received now is dropped due to locks
If there is any potential work around please let us know
Like Dan Guzman suggested you could just get rid of staging table and process everything using stored procedure.
Maybe a better, however a trickier, way would be use a message queue instead of a staging table and activation procedure that will deal with rows based on your business logic.
This could be a more suited and less blocking way for streaming data.
Related
Hi I am running ETL via Python .
I have simple sql file that I run from Python like
truncate table foo_stg;
insert into foo_stg
(
select blah,blah .... from tables
);
truncate table foo;
insert into foo
(
select * from foo_stg
);
This query sometimes takes lock on table which it does not release .
Due to which other processes get queued .
Now I check which table has the lock and kill the process that had caused the lock .
I want to know what changes I can make in my code to mitigate such issues ?
Thanks in Advance!!!
The TRUNCATE is probably breaking your transaction logic. Recommend doing all truncates upfront. I'd also recommend adding some processing logic to ensure that each instance of the ETL process either: A) has exclusive access to the staging tables or B) uses a separate set of staging tables.
TRUNCATE in Redshift (and many other DBs) does an implicit COMMIT.
…be aware that TRUNCATE commits the transaction in which it is run.
Redshift tries to makes this clear by returning the following INFO message to confirm success: TRUNCATE TABLE and COMMIT TRANSACTION. However, this INFO message may not be displayed by the SQL client tool. Run the SQL in psql to see it.
in my case, I created a table the first time and tried to load it from the stage table using insert into a table from select c1,c2,c3 from stage;I am running this using python script.
The table is locking and not loading the data. Another interesting scenario is when I run the same insert SQL from the editor, it is loading, and after that my python script loads the same table without any locks. But the first time only the table lock is happening. Not sure what is the issue.
We use a DB2 database. Some datawarehouse tables are TRUNCATEd and reloaded every day. We run into deadlock issues when another process is running an INSERT statement against that same table.
Scenario
TRUNCATE is executed on a table.
At the same time another process INSERTS some data in the same table.(The process is based on a trigger and can start at any time )
is there a work around?
What we have thought so far is to prioritize the truncate and then go thruogh with the insert. Is there any way to iplement this. Any help would be appreciated.
You should request a table lock before you execute the truncate.
If you do this you can't get a deadlock -- the table lock won't be granted before the insert finishes and once you have the lock another insert can't occur.
Update from comment:
You can use the LOCK TABLE command. The details depend on your situation but you should be able too get away with SHARED mode. This will allow reads but not inserts (this is the issue you are having I believe.)
It is possible this won't fix your problem. That probably means your insert statement is to complicated -- maybe it is reading from a bunch of other tables or from a federated table. If this is the case, re-architect your solution to include a staging table (first insert into the staging table .. slowly.. then insert into the target table from the staging table).
I have a PC that has a MSSQL database with 800 variables being populated every second. I need that database to merge/backup to a second database on another server PC at least every 10 minutes. Additionally, the first database needs to be wiped clean once per week, in order to save local drive space, so that only 1 week's worth of data is stored on that first database at any given time; meanwhile, the second database keeps everything intact and never gets cleared, only being added upon by the merges that occur every 10 minutes.
To my knowledge, this means I cannot rely on database mirroring, since the first one will be wiped every week. So from what I have gathered, this means I have to have scheduled merges going on every 10 minutes.
I will readily admit I know next to nothing about SQL. So my two questions are:
How do I set up scheduled merges to occur from one database to another in 10 minute frequencies?
How do I set a database to be scheduled/scripted so that it gets cleared every week?
(Note: both databases are running on MS SQL Server 2012 Standard.)
Assuming you can create a linked server on server A that connects to server B (Here's a guide)
Then create a trigger on your table, for example table1:
CREATE TRIGGER trigger1
ON table1
AFTER INSERT
AS
INSERT INTO ServerB.databaseB.dbo.table1
select *
from inserted
More on triggers here.
For part 2, you can schedule a job to truncate the table on whatever schedule you would like. How to create a scheduled job.
The trigger only fires on Inserts so deleting rows does nothing to the table on server B.
How is the purging/deleting of the data happening, via a stored proc? If so, you could also try transactional replication, and replicate the execution of that particular stored proc, but dummy the proc on the subscriber, so when the proc gets replicated and executed on the subscriber, nothing will get deleted/purged.
I am working on a transaction Table, where I am storing tasks as their status. Now I am working on a Dashboard Where I need to have the info of how many tasks are in which status.
I thought of having staging tables corresponding to each status so that it won't affect any transactional activity. Now to push data into the staging table I have two options.
Have a trigger on the transactional table on each status update i shall update the staging table.
Have SQL Job which will run every 5 minute to update the data in staging table
Please suggest which way to go..
Thanks
On our live/production database I'm trying to add a trigger to a table, but have been unsuccessful. I have tried a few times, but it has taken more than 30 minutes for the create trigger statement to complete and I've cancelled it.
The table is one that gets read/written to often by a couple different processes. I have disabled the scheduled jobs that update the table and attempted at times when there is less activity on the table, but I'm not able to stop everything that accesses the table.
I do not believe there is a problem with the create trigger statement itself. The create trigger statement was successful and quick in a test environment, and the trigger works correctly when rows are inserted/updated to the table. Although when I created the trigger on the test database there was no load on the table and it had considerably less rows, which is different than on the live/production database (100 vs. 13,000,000+).
Here is the create trigger statement that I'm trying to run
CREATE TRIGGER [OnItem_Updated]
ON [Item]
AFTER UPDATE
AS
BEGIN
SET NOCOUNT ON;
IF update(State)
BEGIN
/* do some stuff including for each row updated call a stored
procedure that increments a value in table based on the
UserId of the updated row */
END
END
Can there be issues with creating a trigger on a table while rows are being updated or if it has many rows?
In SQLServer triggers are created enabled by default. Is it possible to create the trigger disabled by default?
Any other ideas?
The problem may not be in the table itself, but in the system tables that have to be updated in order to create the trigger. If you're doing any other kind of DDL as part of your normal processes they could be holding it up.
Use sp_who to find out where the block is coming from then investigate from there.
I believe the CREATE Trigger will attempt to put a lock on the entire table.
If you have a lots of activity on that table it might have to wait a long time and you could be creating a deadlock.
For any schema changes you should really get everyone of the database.
That said it is tempting to put in "small" changes with active connections. You should take a look at the locks / connections to see where the lock contention is.
That's odd. An AFTER UPDATE trigger shouldn't need to check existing rows in the table. I suppose it's possible that you aren't able to obtain a lock on the table to add the trigger.
You might try creating a trigger that basically does nothing. If you can't create that, then it's a locking issue. If you can, then you could disable that trigger, add your intended code to the body, and enable it. (I do not believe you can disable a trigger during creation.)
Part of the problem may also be the trigger itself. Could your trigger accidentally be updating all rows of the table? There is a big differnce between 100 rows in a test database and 13,000,000. It is a very bad idea to develop code against such a small set when you have such a large dataset as you can have no way to predict performance. SQL that works fine for 100 records can completely lock up a system with millions for hours. You really want to know that in dev, not when you promote to prod.
Calling a stored proc in a trigger is usually a very bad choice. It also means that you have to loop through records which is an even worse choice in a trigger. Triggers must alawys account for multiple record inserts/updates or deletes. If someone inserts 100,000 rows (not unlikely if you have 13,000,000 records), then looping through a record based stored proc could take hours, lock the entire table and cause all users to want to hunt down the developer and kill (or at least maim) him because they cannot get their work done.
I would not even consider putting this trigger on prod until you test against a record set simliar in size to prod.
My friend Dennis wrote this article that illustrates why testing a small volumn of information when you have a large volumn of information can create difficulties on prd that you didn't notice on dev:
http://blogs.lessthandot.com/index.php/DataMgmt/?blog=3&title=your-testbed-has-to-have-the-same-volume&disp=single&more=1&c=1&tb=1&pb=1#c1210
Run DISABLE TRIGGER triggername ON tablename before altering the trigger, then reenable it with ENABLE TRIGGER triggername ON tablename