Hi there I'm new to Mule and I needed pointer on how to process records. I'm trying to perform an operation where I insert a new record into one table and if the record is inserted successfully, obtain the primary key and insert it into another table where the primary key is part of the foreign key.
I don't know which connector or component to use to check if an insert was successful so that I can insert the primary key into another table.
My primary key is a uuid generated as a variable. I tried returning the GUID from sql server using using the following documentation but it didn't work. Any help or pointers on either question will help.
https://doctorjw.wordpress.com/2015/10/01/mule-and-getting-the-generated-id-of-a-newly-inserted-row/
If you want a DB-generated Id, you can use two DB blocks, saving a variable between them:
1st DB block: generate an unique Id throw a sequence, in example:
select GENERAID_ESB.nextval from dual
Save variable (Session or Flow, depending on your required scope for it):
#[payload.get(0).nextval]
2nd DB block: insert your record in DB with unique id saved, in example:
INSERT INTO ESB_TABLE values(#[(sessionVars.'idTable')],
#[message.outboundProperties.'yourInformation'])**
I hope this helps.
Related
I have a table on premise that is about 21 million rows with a primary key constraint and when I search that table, there are no duplicates. This table is in an OLTP application database that is constantly moving.
I have the exact same table in Azure which has the same primary key constraint. This table is not an application table, it's just a copy of the one that is on-premise (the goal is to use this one for ad hoc queries, as a source for other systems, etc.).
When I use Azure Data Factory to select all_columns from table on premise to the table in Azure, it returns a violation of the primary key constraint. No matter how many times I run this data factory pipeline, it comes back with a primary key violation for duplicate keys (the keys are always changing though).
So I dropped the primary key constraint in Azure and ran the pipeline again, and sure enough, duplication exists.
Upon investigation, it appears that the on-premise database is doing an insert new record then update the old record to inactivate it. So for a fraction of a second, there are two active rows that ADF is grabbing to then try to insert into the table in Azure which of course fails because of duplicate primary keys.
Now to the best of my knowledge, this shouldn't be possible. You can't insert a new row that violates the primary key constraint. But ADF seems to be grabbing all the data and some of those rows are mid-flight where the insert has happened and the update to inactivate the old row hasn't happened yet.
For those that are curious, the insert happens and the update of the old row happens within less than a second... it's typically 10-20 microseconds. I don't know how this is possible and I don't know how to fix it (because I can't modify the application code). The database for the on-premise database is a SQL Server 2000 database and Azure SQL is an Azure SQL database.
Try with readpast hint. It should not select any rows in locking state.
SELECT * FROM yourtable WITH (readpast)
Since you have create_date and updated_date column then you can select rows older than 5 seconds to avoid duplication.
select * from yourtable where created_date<=dateadd(second,-5,getdate()) and updated_date<=dateadd(second,-5,getdate());
Need to enable the Fault tolerance in a Pipeline Azure Data Factory
Copy data from a Source SQL to a Sink SQL database. A primary key is defined in the sink SQL database, but no such primary key is defined in the source SQL server. The duplicated rows that exist in the source cannot be copied to the sink. Copy activity copies only the first row of the source data into the sink. The subsequent source rows that contain the duplicated primary key value are detected as incompatible and are skipped.
To configure Json Definition skip the incompatible rows in copy activity "enableSkipIncompatibleRow": true
Please Refer: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-fault-tolerance
If possible to modify your application, need to check the Primary key constraint before insert or update using EXISTS() function.
Example:
IF EXISTS(SELECT * FROM Table_Name WHERE primary key condition)
BEGIN
UPDATE Table_Name
SET Col_Name= value
WHERE condition
END
ELSE
BEGIN
INSERT INTO Table_Name ( col_Name1,col_Name2,,.. )
VALUES ( ‘’,’’,’’,….)
END
Scenario: I'm copying data from Azure Table Storage to an Azure SQL DB using an upsert stored procedure like this:
CREATE PROCEDURE [dbo].[upsertCustomer] #customerTransaction dbo.CustomerTransaction READONLY
AS
BEGIN
MERGE customerTransactionstable WITH (HOLDLOCK) AS target_sqldb
USING #customerTransaction AS source_tblstg
ON (target_sqldb.customerReferenceId = source_tblstg.customerReferenceId AND
target_sqldb.Timestamp = source_tblstg.Timestamp)
WHEN MATCHED THEN
UPDATE SET
AccountId = source_tblstg.AccountId,
TransactionId = source_tblstg.TransactionId,
CustomerName = source_tblstg.CustomerName
WHEN NOT MATCHED THEN
INSERT (
AccountId,
TransactionId,
CustomerName,
CustomerReferenceId,
Timestamp
)
VALUES (
source_tblstg.AccountId,
source_tblstg.TransactionId,
source_tblstg.CustomerName,
source_tblstg.CustomerReferenceId,
source_tblstg.Timestamp
);
END
GO
where customerReferenceId & Timestamp constitute the composite key for the CustomerTransactionstable
However, when I update the rows in my source(Azure table) and rerun the Azure data factory, I see this error:
"ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A
database operation failed with the following error: 'Violation of
PRIMARY KEY constraint 'PK_CustomerTransactionstable'.
Cannot insert duplicate key in object
'dbo.CustomerTransactionstable'. The duplicate key value is
(Dec 31 1990 12:49AM, ABCDEFGHIGK).\r\nThe statement has been
terminated.',Source=.Net SqlClient Data
Provider,SqlErrorNumber=2627,Class=14,ErrorCode=-2146232060,State=1,Errors=[{Class=14,Number=2627,State=1,Message=Violation
of PRIMARY KEY constraint 'PK_CustomerTransactionstable'"
Now, I have verified that there's only one row in both the source and sink with a matching primary key, the only difference is that some columns in my source row have been updated.
This link in the Azure documentation speaks about the repeatable copying, however I don't want to delete rows for a time range from my destination before inserting any data nor do I have the ability to add a new sliceIdentifierColumn to my existing table or any schema change.
Questions:
Is there something wrong with my upsert logic? If yes, is there a better way to do upsert to Azure SQL DBs?
If I choose to use a SQL cleanup script, is there a way to delete only those rows from my Sink that match my primary key?
Edit:
This has now been resolved.
Solution:
The primary key violation will only occur if it's trying to insert a
record which already has a matching primary key. In my case there
although there was just one record in the sink, the condition on which
the merge was being done wasn't getting satisfied due to mismatch
between datetime and datetimeoffset fields.
Have you tried it using ADF Data Flows with Mapping Data Flows instead of coding it through a stored procedure? It may be much easier for you for upserts with SQL.
With an Alter Row transformation, you can perform Upsert, Update, Delete, Insert via UI settings and picking a PK: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-alter-row
You would still just need a Copy Activity prior to your Data Flow activity to copy the data from Table Storage. Put it in a Blob folder and then Data Flow can read the Source from there.
i know like this to insert a new record
INSERT INTO dbo.Customer_data (Customer_id, Customer_Name, Credit_card_number)
VALUES (25665, 'mssqltips4', EncryptByKey( Key_GUID('SymmetricKey1'), CONVERT(varchar,'4545-58478-1245') ) );
but i want to insert a new record with a normal insert statement which should get encrypted.
ex:
INSERT INTO dbo.Customer_data (Customer_id, Customer_Name, Credit_card_number)
VALUES (25665, 'mssqltips4','4545-58478-1245') ) );
Few months ago I had similar situation. A table containing personal data need to have some of the columns encrypted, but the table is used in legacy application and have many references.
So, I you can create a separate table to hold the encrypted data:
CREATE TABLE [dbo].[Customer_data_encrypted]
(
[customer_id] PRIMARY KEY -- you can create foreign key to the original one, too
,[name] VARBANRY(..)
,[cretit_card_numbe] VARBINARY(..)
);
Then create a INSTEAD OF INSERT UPDATE DELETE trigger on the original table.The logic in the trigger is simple:
on delete, delete from both tables
on update/insert - encrypt the data and insert in the new table; use some kind of mask to the original table (for example *** or 43-****-****-****)
Then, perform a initial migration to move the data from the original table to the new one and then mask it.
Performing the steps above are nice because:
every insert/update to the original table continue to works
you can create the trigger with EXECUTE AS OWNER in order to have access to the symmetric keys and perform changes directly in the T-SQL statement without opening the certificates or by users who have not access to them
in all reads references you are going to get mask data, so you are not worried for breaking the application critically
having trigger gives you ability to easy create and changes information
It depends on your environment and business needs because for one of the tables I have stored the encrypted value as new column, not separate table. So, choose what is more appropriate for you.
Scenario:
I have a set of test data that needs to be deployed to our build server daily (our build server database is first overwritten with the current live database, and has all data over a month old removed).
This test data has foreign key references within it which need to stay.
I can't simply switch on IDENTITY_INSERT as the primary keys may clash with data that is already in the database (because we aren't starting from a blank database).
The test data needs to be able to be regenerated fairly regularly, so the though of going through the deploy script and fudging the id columns to be something outlandish (or a negative number for instance) and then changing the related foreign key columns to be the same id every time we regenerate the data doesn't thrill me.
Ideally I would like to know if there is a tool which can scan a database, pick up the foreign key constraints and generate the insert scripts accordingly, something like:
INSERT INTO MyTable VALUES('TEST','TEST');
DECLARE #Id INT;
SET #Id = (SELECT ##IDENTITY)
INSERT INTO MyRelatedTable VALUES(#Id,'TEST')
It sounds like you want to look into an ETL process that copes with the change in id. As you're using SQL Server, you can look at the OUTPUT clause - use this to build up some temporary tables that can map the "old" id to the "new" id for each primary key to map the foreign keys when migrating the "child" tables.
I set the Identity Increment with the primary key in one table, now I want to insert the data under GUI environment in SQL Server.
However, the primary key column is now read only and I am not able to edit it.
Any idea to solve this problem? Thanks.
Just leave the primary key field untouched and SQL Server will generate the data for you.