Thanks is advance for any help. Here is the scenario that I am trying to recreate in Mulesoft.
1,500,000 records in a table. Here is the current process that we use.
Start a transaction.
delete all records from the table.
reload the table from a flat file.
commit the transaction.
in the end we need the file in a good state, thus the use of the transaction. If there is any failure, the data in the table will be rolled back to the initial valid state.
I was able to get the speed that we needed by using the Batch element < 10 minutes, but it appears that transactions are not supported around the whole batch flow.
Any ideas how I could get this to work in Mulesoft?
Thanks again.
A little different workflow but how about:
Load temp table from flat file
If successful drop original table
Rename temp table to original table name
You can keep your Mule batch processing workflow to load the temp table and forget about rolling back.
For this you might try the following:
Use XA transactions (since more than one connector will be used,
regardless of the use of the same transport or not)
Enlist in the transaction the resource used in the custom Java code.
This also can be applied within the same transport (e.g. JDBC on the Mule configuration and also on the Java component), so it's not restricted to the case demonstrated in the PoC, which is only given as a reference.
Please refer to this article https://dzone.com/articles/passing-java-arrays-in-oracle-stored-procedure-fro
From temp table poll records.You can contruct array with any number of records. With 100K size will only involve 15 round trips in total.
To determine error records you can insert records in an error table but that has to be implemented in database procedure.
Related
We are working on a data warehouse using IBM DB2 and we wanted to load data by partition exchange. That means we prepare a temporary table with the data we want to load into the target table and then use that entire table as a data partition in the target table. If there was previous data we just discard the old partition.
Basically you just do "ALTER TABLE target_table ATTACH PARTITION pname [starting and ending clauses] FROM temp_table".
It works wonderfully, but only for one operation at a time. If we do multiple loads in parallel or try to attach multiple partitions to the same table it's raining deadlock errors from the database.
From what I understand, the problem isn't necessarily with parallel access to the target table itself (locking it changes nothing), but accesses to system catalog tables in the background.
I have combed through the DB2 documentation but the only reference to the topic of concurrent DDL statements I found at all was to avoid doing them. The answer to this question, can't be to simply not attempt it?
Does anyone know a way to deal with this problem?
I tried to have a global, single synchronization table to lock if you want to attach any partitions, but it didn't help either. Either I'm missing something (implicit commits somewhere?) or some of the data catalog updates even happen asynchronously, which makes the whole problem much worse. If that is the case, is there are any chance at all to query if the attach is safe to perform at any given moment?
I need to perform some calculations using few columns from a table. This database table that gets updated every couple of hours generates duplicates on couple of columns every other day. There is no way tell which one is inserted first which affects my calculations.
Is there a way to copy these rows into a new table automatically as data gets added every couple of hours and perform calculations on the fly? This way whatever comes first will be captured into a new table for a dashboard and for other business use cases.
I thought of creating a stored procedure and using a job scheduler to perform this. But I do not have admin access and can not schedule jobs. Is there another way of doing this efficiently? Much appreciated!
Edit: My request for admin access is being approved.
Another way as to stated in the answers, what you can do is:
Make a temp table.
Make a prod table.
Use stored procedure to copy everything from the temp table into prod table after any load have been done.
Use the same stored procedure to clean the temp table after the load is done.
Don't know if this will work, but this is in general how we are dealing with huge amount of load on a daily basis.
I have a table set up in my sql server that keeps track of inventory items (in another database) that have changed. This table is fed by several different triggers. Every 15 minutes a scheduled task runs a batch file that executes a number of different queries that send updates on the items flagged in this table to update several ecommerce websites. The last query in the batch file resets the flags.
As you can imagine there is potential to lose changes if an item is flagged while this batch file is running. I have worked around this by replaying the last 25 hours of updates every 24 hours, just in case this scenario happened. It works, but IMO is kind of clumsy.
What I would like to do is delay any writes to this table until my script finishes, and resets the flags on all the rows that were flagged when the script started. Then allow all of these delayed writes to happen.
I've looked into doing this with table hints (TABLOCK) but this seems to be limited to one query--unless I'm misunderstanding what I have read, which is certainly possible. I have several that run in succession. TIA.
Alex
Could you modify your script into a stored procedure that extracts all the data into a temporary table using a select statement that applies a lock to the production table. You could then drop your lock on the main table and do all your processing in the temporary table (or permanent table built for the purpose) away from the live system. It will be a lot slower and put more load on your SQL box but speed shouldn't be an issue if you have a point in time snapshot of it.
If that option is not applicable then maybe you could play with wrapping the whole thing in a transaction and putting a table lock on your production table with the first select statement.
Good luck mate
I have an issue with my data flow task locking, this task compares a couple of tables, from the same server and the result is inserted into one of the tables being compared. The table being inserted into is being compared by a NOT EXISTS clause.
When performing fast load the task freezes with out errors when doing a regular insert the task gives a dead lock error.
I have 2 other tasks that perform the same action to the same table and they work fine but the amount of information being inserted is alot smaller. I am not running these tasks in parallel.
I am considering using no locks hint to get around this because this is the only task that writes to a cerain table partition, however I am only coming to this conclusion because I can not figure out anything else, aside from using a temp table, or a hashed anti join.
Probably you have so called deadlock situation. You have in your DataFlow Task (DFT) two separate connection instances to the same table. The first conn instance runs SELECT and places Shared lock on the table, the second runs INSERT and places a page or table lock.
A few words on possible cause. SSIS DFT reads table rows and processes it in batches. When number of rows is small, read is completed within a single batch, and Shared lock is eliminated when Insert takes place. When number of rows is substantial, SSIS splits rows into several batches, and processes it consequentially. This allows to perform steps following DFT Data Source before the Data Source completes reading.
The design - reading and writing the same table in the same Data Flow is not good because of possible locking issue. Ways to work it out:
Move all DFT logic inside single INSERT statement and get rid of DFT. Might not be possible.
Split DFT, move data into intermediate table, and then - move to the target table with following DFT or SQL Command. Additional table needed.
Set a Read Committed Snapshot Isolation (RCSI) on the DB and use Read Committed on SELECT. Applicable to MS SQL DB only.
The most universal way is the second with an additional table. The third is for MS SQL only.
We need to call an external web service to get some data and store in a table locally. This process needs to be repeated every 10 minutes as the data that the external web service publishes changes rapidly. As part of this, we need to clear the entire table and re-insert the current data that is published by the web service.
The tricky situation we have is: What at the time the table is truncated a user queries the table and gets no results? This results to invalid result displayed to the user.
Can anyone please give me an advice on this?
Use transaction around both the operations. Something like this.
Begin transaction;
truncate table
populate the new table
end transaction
Snapshot isolation guarantees that data you will see will be consistent.
If you can create a view over your table, then you can load the data into a new table, take however long you need to to populate it, and then just alter the view to reference the new table.