How to trigger mule application when the value of the row in a database gets updated.
Thanks in advance.
It depends on how you define whether a row has been updated. However a good starting point is poll and watermarks.
Poll allows you to poll a resource such as a database connector with a particualr SQL SELECT query and watermarks allows you store tracking info such as the the last 'id' processed or 'lastupdated' column of a databse for example.
Some links with examples:
http://www.mulesoft.org/documentation/display/current/Poll+Reference#PollReference-PollingforUpdatesusingWatermarks
http://blogs.mulesoft.org/data-synchronizing-made-easy-with-mule-watermarks/
Related
I am using ADF to keep an Azure SQL DB in sync with an on-prem DB. The on-prem DB is read only and the direction is one-way, from the Azure SQL DB to the on-prem DB.
My source table in the Azure SQL Cloud DB is quite large (10's of millions of rows) so I have the pipeline set to use an UPSERT (merge, trying to create a differential merge). I am using a filter on the Source table and the and the Filter Query has a WHERE condition that looks like this:
[HistoryDate] >= '#{formatDateTime(pipeline().parameters.windowStart, 'yyyy-MM-dd HH:mm' )}'
AND [HistoryDate] < '#{formatDateTime(pipeline().parameters.windowEnd, 'yyyy-MM-dd HH:mm' )}'
The HistoryDate column is auto-maintained in the source table with a getUTCDate() type approach. New records will always get a higher value and be included in the WHERE condition.
This works well, but here is my question: I am testing on my local machine before deploying to the client. When I am not working, my laptop hibernates and the pipeline rightfully fails because my local SQL Instance is "offline" during that run. When I move this to production this should not be an issue (computer hibernating), but what happens if the clients connection is temporarily lost (i.e, the client loses internet for a time)? Because my pipeline has a WHERE condition on the source to reduce the table size upsert to a practical number, any failure would result in a loss of any data created during that 5 minute window.
A failed pipeline can be rerun, but the run time would be different at that moment in time and I would effectively miss the block of records that would have been picked up if the pipeline had been run on time. pipeline().parameters.windowStart and pipeline().parameters.windowEnd will now be different.
As an FYI, I have this running every 5 minutes to keep the local copy in sync as close to real-time as possible.
Am I approaching this correctly? I'm sure others have this scenario and it's likely I am missing something obvious. :-)
Thanks...
Sorry to answer my own question, but to potentially help others in the future, it seems there was a better way to deal with this.
ADF offers a "Metadata-driven Copy Task" utility/wizard on the home screen that creates a pipeline. When I used it, it offers a "Delta Load" option for tables which takes a "Watermark". The watermark is a column for an incrementing IDENTITY column, increasing date or timestamp, etc. At the end of the wizard, it allows you to download a script that builds a table and corresponding stored procedure that maintains the values of each parameters after each run. For example, if I wanted my delta load to be based on an IDENTITY column, it stores the value of the max value of a particular pipeline run. The next time a run happens (trigger), it uses this as the MIN value (minus 1) and the current MAX value of the IDENTITY column to get the added records since the last run.
I was going to approach things this way, but it seems like ADF already does this heavy lifting for us. :-)
I have taken over a project with minimal knowledge on how to use Azure Data Factory so need some help. The data factory is copying data from one postgres sql server over to my azure sql server. It is running 3 times a day and inserts new rows perfectly. But when data has changed in postgres it does not update the row as needed in the sink database. Can anyone point me in the right direction?
Since the source are on-premise, you can't use data flow. It means that the tutorial #Mark kromer provided for you doesn't works.
Per my experience in Copy active, we only can copy(insert) the data to sink table, won't update it. I'm afraid to say we can't update rows with copy active.
Is there a way to find last time updated date from a table without using sys.dm_db_index_usage_stats?? I have been searching for this for an hour now but all answers I found were using this property which seems to be reset on SQL database restart.
Thanks.
You can use this property (which is greatly advised).
Or you can code your own ON UPDATE TRIGGER that will populate this table
(or another homemade) on its own.
Also if you just wish to collect some data about current usage,
you can setup a SQL Profiler that will do the job
(then parse the results somehow, Excel or whatever)
Last option, restore successively the backups you have taken (on a copy).
Hoping you have enough backup retention to find the data you're searching for.
I'm trying to create a data sync using Mule Soft so that Db1 is checked for any updates based on LastModified Date and if so the updates are applied to Db2.
I've got the script to work to a point where when the script is first started, the data is copied from Db1 to Db2. After which the script constantly updating the records in Db2. (Below is my flow Diagram)
I've tried to setup recordVars in the message enricher (in Batch_Step) to see if records exists and route them accordingly in Choice (in Batch_Step1).
I've also enabled water mark in Poll for timestamp but nothing is working to avoid constant updating of inserted records.
Below are screenshot of my configs:
Watermark Setup:
Db1 query:
BatchStep Accept Expression:
Message Enricher:
Choice Setup:
Add LastModifiedDate in the Select statement from Db1 so watermark will able to access the field payload.LastModifiedDate.
Also, what is your query in Db2 batch_step? check it, cause it might always getting results that possibly caused to always have payload.size > 0.
I have a table that is a replicate of a table from a different server.
Unfortunately I don't have access to the transaction information, and all I have is the table that shows "as is" information & I have a SSIS to replicate the table on my server every day (the table gets truncated, and new information is pulled every night).
Everything has been fine and good, but I want to start tracking what has changed. i.e. I want to know if a new row has been inserted or a value of a column has changed.
Is this something that could be done easily?
I would appreciate any help..
The SQL version is SQL Server 2012 SP1 | Enterprise
If you want to do this for a perticular table then you can go for a scd(slowly changing dimension) transform in SSIS control flow which will keep the hystory records in different table
or
you can create CDC(changing data capture) method on that table.CDC will help you on monitering of every DML operation in that table.It will inserted in the modified row in the system table.