I have created a ssis package that contains one executesal task and a dft. In executesql task i am just truncating destination table data and inside dft i am insert records from oledbsource that is one table in same structure table on
destination sql server.
while executing this package there is a primary key violation error.Both the tables source and destination having same primary key and i am truncating whole table before insertion.
What i can check in this case.Executesql is truncating data when i executed that without dft and check data in table the table was empty and there is no error won executesql when running witj dft.
please suggest what to check.
Related
I have a table on premise that is about 21 million rows with a primary key constraint and when I search that table, there are no duplicates. This table is in an OLTP application database that is constantly moving.
I have the exact same table in Azure which has the same primary key constraint. This table is not an application table, it's just a copy of the one that is on-premise (the goal is to use this one for ad hoc queries, as a source for other systems, etc.).
When I use Azure Data Factory to select all_columns from table on premise to the table in Azure, it returns a violation of the primary key constraint. No matter how many times I run this data factory pipeline, it comes back with a primary key violation for duplicate keys (the keys are always changing though).
So I dropped the primary key constraint in Azure and ran the pipeline again, and sure enough, duplication exists.
Upon investigation, it appears that the on-premise database is doing an insert new record then update the old record to inactivate it. So for a fraction of a second, there are two active rows that ADF is grabbing to then try to insert into the table in Azure which of course fails because of duplicate primary keys.
Now to the best of my knowledge, this shouldn't be possible. You can't insert a new row that violates the primary key constraint. But ADF seems to be grabbing all the data and some of those rows are mid-flight where the insert has happened and the update to inactivate the old row hasn't happened yet.
For those that are curious, the insert happens and the update of the old row happens within less than a second... it's typically 10-20 microseconds. I don't know how this is possible and I don't know how to fix it (because I can't modify the application code). The database for the on-premise database is a SQL Server 2000 database and Azure SQL is an Azure SQL database.
Try with readpast hint. It should not select any rows in locking state.
SELECT * FROM yourtable WITH (readpast)
Since you have create_date and updated_date column then you can select rows older than 5 seconds to avoid duplication.
select * from yourtable where created_date<=dateadd(second,-5,getdate()) and updated_date<=dateadd(second,-5,getdate());
Need to enable the Fault tolerance in a Pipeline Azure Data Factory
Copy data from a Source SQL to a Sink SQL database. A primary key is defined in the sink SQL database, but no such primary key is defined in the source SQL server. The duplicated rows that exist in the source cannot be copied to the sink. Copy activity copies only the first row of the source data into the sink. The subsequent source rows that contain the duplicated primary key value are detected as incompatible and are skipped.
To configure Json Definition skip the incompatible rows in copy activity "enableSkipIncompatibleRow": true
Please Refer: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-fault-tolerance
If possible to modify your application, need to check the Primary key constraint before insert or update using EXISTS() function.
Example:
IF EXISTS(SELECT * FROM Table_Name WHERE primary key condition)
BEGIN
UPDATE Table_Name
SET Col_Name= value
WHERE condition
END
ELSE
BEGIN
INSERT INTO Table_Name ( col_Name1,col_Name2,,.. )
VALUES ( ‘’,’’,’’,….)
END
I have two files which I am importing via Node JS to SQL Server. The table has unique key for equity instrument identifier (ISIN)
data1.csv and data2.csv
I first import data1.csv each row is inserted to the database. After this I import data2.csv (the values are again inserted to database) which may contain the same ISIN, but it's related values are higher priority than the first file (there are not many of these ISINs 5 out of 1000 or so).
What can I do with SQL server to overwrite the values if the unique constraint is violated? I understand that there is an option to upload data2.csv first, however there are some external constrains that do not allow me to do that.
Please tell me if additional information is required
I would recommend staging process to do this:
1. create a staging table with similar schema as your target table
2. Before loading delete all rows from staging table (you can use truncate)
3. Upload the file to the staging table
4. Load your data into final table - here you can use some logic to only insert new rows and update existing rows. Merge command will be useful in scenario like this.
Repeat steps 2 to 4 for each source table.
Scenario: I'm copying data from Azure Table Storage to an Azure SQL DB using an upsert stored procedure like this:
CREATE PROCEDURE [dbo].[upsertCustomer] #customerTransaction dbo.CustomerTransaction READONLY
AS
BEGIN
MERGE customerTransactionstable WITH (HOLDLOCK) AS target_sqldb
USING #customerTransaction AS source_tblstg
ON (target_sqldb.customerReferenceId = source_tblstg.customerReferenceId AND
target_sqldb.Timestamp = source_tblstg.Timestamp)
WHEN MATCHED THEN
UPDATE SET
AccountId = source_tblstg.AccountId,
TransactionId = source_tblstg.TransactionId,
CustomerName = source_tblstg.CustomerName
WHEN NOT MATCHED THEN
INSERT (
AccountId,
TransactionId,
CustomerName,
CustomerReferenceId,
Timestamp
)
VALUES (
source_tblstg.AccountId,
source_tblstg.TransactionId,
source_tblstg.CustomerName,
source_tblstg.CustomerReferenceId,
source_tblstg.Timestamp
);
END
GO
where customerReferenceId & Timestamp constitute the composite key for the CustomerTransactionstable
However, when I update the rows in my source(Azure table) and rerun the Azure data factory, I see this error:
"ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A
database operation failed with the following error: 'Violation of
PRIMARY KEY constraint 'PK_CustomerTransactionstable'.
Cannot insert duplicate key in object
'dbo.CustomerTransactionstable'. The duplicate key value is
(Dec 31 1990 12:49AM, ABCDEFGHIGK).\r\nThe statement has been
terminated.',Source=.Net SqlClient Data
Provider,SqlErrorNumber=2627,Class=14,ErrorCode=-2146232060,State=1,Errors=[{Class=14,Number=2627,State=1,Message=Violation
of PRIMARY KEY constraint 'PK_CustomerTransactionstable'"
Now, I have verified that there's only one row in both the source and sink with a matching primary key, the only difference is that some columns in my source row have been updated.
This link in the Azure documentation speaks about the repeatable copying, however I don't want to delete rows for a time range from my destination before inserting any data nor do I have the ability to add a new sliceIdentifierColumn to my existing table or any schema change.
Questions:
Is there something wrong with my upsert logic? If yes, is there a better way to do upsert to Azure SQL DBs?
If I choose to use a SQL cleanup script, is there a way to delete only those rows from my Sink that match my primary key?
Edit:
This has now been resolved.
Solution:
The primary key violation will only occur if it's trying to insert a
record which already has a matching primary key. In my case there
although there was just one record in the sink, the condition on which
the merge was being done wasn't getting satisfied due to mismatch
between datetime and datetimeoffset fields.
Have you tried it using ADF Data Flows with Mapping Data Flows instead of coding it through a stored procedure? It may be much easier for you for upserts with SQL.
With an Alter Row transformation, you can perform Upsert, Update, Delete, Insert via UI settings and picking a PK: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-alter-row
You would still just need a Copy Activity prior to your Data Flow activity to copy the data from Table Storage. Put it in a Blob folder and then Data Flow can read the Source from there.
I created a SQL Server database first (2 tables) and then tried to load data through SSIS data flow task. At the last step an error has occurred.
When I remove a relationship between two tables in the database, the SSIS task is completed successfully and the data is loaded! But, after I load data into the tables, I can't create relationship between them.
Based on this you can conclude that a relationship can be created when there is no data in a table. Just to mention, data types are the same in both tables.
How could I work out a solution?
Thank you!
It seems the error in SSIS is due to a foreign key violation. The purpose of the foreign key relationship is to prevent you for loading bad data. When you loaded without the FK, you inserted bad data and cannot create a (trusted) foreign key constraint afterward.
The solution is to either fix the source data or modify your package to avoid inserted data that doesn't exist in the referenced table. The latter can be done with a lookup task, sending found rows down the happy path to the target table. You could either ignore not found rows or write those to an error table or file.
I am new to SSIS packages and just require assistance on how to transfer data from one data source onto my own database.
Below is my data flow:
Now I have a ODBC Source (Http_Requests Source) where I take data from a PostgreSQL database table (see screenshot below for table columns and data):
Below is the OLE DB destination where it has the table I want to transfer the data to (this table is currently blank):
Now I tried to start debugging to extract the data but I get a few errors (displayed below):
I am a complete novice so I would like some guidance on what I need to include in order to get this SSIS package to transfer data across. Would I need to include a merge statement and how do I apply it. I heard you can write a merge as a proc and call on the proc as a sql command. Does that mean I will need to write a proc in SSMS and then call on it within the OLE DB Destination?
If somebody can provide an example and screenshot then that would be very helpful as I am really new to SSIS.
Thank you,
Check constraint on destination table or disable them before running it.
Below are query you can use.
-- Disable all table constraints
ALTER TABLE YourTableName NOCHECK CONSTRAINT ALL
-- Enable all table constraints
ALTER TABLE YourTableName CHECK CONSTRAINT ALL
Tick keep identity
box or drop primary key on the table. After you apply the changes do not forget to refresh metadata by opening the mappings in sis.
the error means that PerformanceId is an IDENTITY column on your destination table. IDENTITY columns are read only unless you tell it otherwise. So if we were in tSQL to be able to insert IDENTITY we would turn on IDENTITY_INSERT. Because you are in SSIS you can accomplish the same thing by checking the "keep identity" box.
HOWEVER when ever you get an error like this it is usually a sign that you should NOT be mapping ID to Performance ID. The question you have to ask is the Identity from your source supposed to be the identity of the destination table? Usually not, most of the time it would be another column as a surrogate key. Then you have to understand if it is even possible. because if there is a unique constraint or primary key then the identity cannot repeat which means you have to know that your source's id column will not cause a duplicate primary key violation.
More than likely the actual fix if for you to uncheck ID from the source and ignore the value.
The column PerformanceID (in the target) is almost certainly an identity column and that is why it is not working. You may not want to transfer it (and have SQL Server generate values for PerformanceID or you can check 'Keep Identity.'