Partitioned Table Integrity in SQL Server 2012 - sql-server-2012

One of the issues that I worked with in previous versions of SQL Server was working with maintaining data integrity on a partitioned table where the partition key was not part of the primary key of the records being stored on the table. Specifically, creating a unique constraint on the value that uniquely identified each record would generate a non-aligned index which would prevent using some of the more fun partition features such as swapping data.
Is there any way in SQL Server 2012 to be able to enforce unique values on a partitioned table to enforce data integrity and allow for foreign key relationships without disabling features?

Related

SQL Server constraint enforcement being violated for split seconds

I have a table on premise that is about 21 million rows with a primary key constraint and when I search that table, there are no duplicates. This table is in an OLTP application database that is constantly moving.
I have the exact same table in Azure which has the same primary key constraint. This table is not an application table, it's just a copy of the one that is on-premise (the goal is to use this one for ad hoc queries, as a source for other systems, etc.).
When I use Azure Data Factory to select all_columns from table on premise to the table in Azure, it returns a violation of the primary key constraint. No matter how many times I run this data factory pipeline, it comes back with a primary key violation for duplicate keys (the keys are always changing though).
So I dropped the primary key constraint in Azure and ran the pipeline again, and sure enough, duplication exists.
Upon investigation, it appears that the on-premise database is doing an insert new record then update the old record to inactivate it. So for a fraction of a second, there are two active rows that ADF is grabbing to then try to insert into the table in Azure which of course fails because of duplicate primary keys.
Now to the best of my knowledge, this shouldn't be possible. You can't insert a new row that violates the primary key constraint. But ADF seems to be grabbing all the data and some of those rows are mid-flight where the insert has happened and the update to inactivate the old row hasn't happened yet.
For those that are curious, the insert happens and the update of the old row happens within less than a second... it's typically 10-20 microseconds. I don't know how this is possible and I don't know how to fix it (because I can't modify the application code). The database for the on-premise database is a SQL Server 2000 database and Azure SQL is an Azure SQL database.
Try with readpast hint. It should not select any rows in locking state.
SELECT * FROM yourtable WITH (readpast)
Since you have create_date and updated_date column then you can select rows older than 5 seconds to avoid duplication.
select * from yourtable where created_date<=dateadd(second,-5,getdate()) and updated_date<=dateadd(second,-5,getdate());
Need to enable the Fault tolerance in a Pipeline Azure Data Factory
Copy data from a Source SQL to a Sink SQL database. A primary key is defined in the sink SQL database, but no such primary key is defined in the source SQL server. The duplicated rows that exist in the source cannot be copied to the sink. Copy activity copies only the first row of the source data into the sink. The subsequent source rows that contain the duplicated primary key value are detected as incompatible and are skipped.
To configure Json Definition skip the incompatible rows in copy activity "enableSkipIncompatibleRow": true
Please Refer: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-fault-tolerance
If possible to modify your application, need to check the Primary key constraint before insert or update using EXISTS() function.
Example:
IF EXISTS(SELECT * FROM Table_Name WHERE primary key condition)
BEGIN
UPDATE Table_Name
SET Col_Name= value
WHERE condition
END
ELSE
BEGIN
INSERT INTO Table_Name ( col_Name1,col_Name2,,.. )
VALUES ( ‘’,’’,’’,….)
END

How to create an Index in Amazon Redshift

I'm trying to create indexes in Amazon Redshift but I received an error
create index on session_log(UserId);
UserId is an integer field.
If you try and create an index (with a name) on a Redshift table:
create index IX1 on "SomeTable"("UserId");
You'll receive the error
An error occurred when executing the SQL command:
create index IX1 on "SomeTable"("UserId")
ERROR: SQL command "create index IX1 on "SomeTable"("UserId")" not supported on Redshift tables.
This is because, like other data warehouses, Redshift uses columnar storage, and as a result, many of the indexing techniques (like adding non-clustered indexes) used in other RDBMS aren't applicable.
You do however have the option of providing a single sort key per table, and you can also influence performance with a distribution key for sharding your data, and selecting appropriate compression encodings for each column to minimize storage and I/O overheads.
For example, in your case, you may elect to use UserId as a sort key:
create table if not exists "SomeTable"
(
"UserId" int,
"Name" text
)
sortkey("UserId");
You might want to read a few primers like these
You can Define Constraints but will be informational only, as Amazon says: they are not enforced by Amazon Redshift. Nonetheless, primary keys and foreign keys are used as planning hints and they should be declared if your ETL process or some other process in your application enforces their integrity.
Some services like pipelines with insert mode (REPLACE_EXISTING) will need a primary key defined in your table.
For other performance purposes the Stuart's response is correct.
Redshift allow to create primary key
create table user (
id int ,
phone_number int,
primary key(id))
but since Redshift does not enforce this constraints, primary key accepts duplicate values.
attached article on that issue
http://www.sqlhaven.com/amazon-redshift-what-you-need-to-think-before-defining-primary-key/

PostgreSQL FOREIGN KEY with second database

I'm running the following queries on PostgreSQL 9.3:
CREATE TABLE "app_item"
(
"id" SERIAL NOT NULL PRIMARY KEY,
"location_id" UUID NOT NULL
);
CREATE INDEX app_item_e274a5da
ON "app_item" ("location_id");
ALTER TABLE "app_item"
ADD CONSTRAINT app_item_location_id_5cecc1c0b46e12e2_fk_fias_addrobj_aoguid
FOREIGN KEY ("location_id") REFERENCES "fias_addrobj" ("aoguid") deferrable
initially deferred;
Third query returns:
ERROR: relation "fias_addrobj" does not exist
app_item - table in first database
fias_addrobj - table in second database
How to do correct query with this databases?
A local table must be referenced
However, as stated within the below link, you could maybe use a trigger which uses a cross server join (facilitated by dblink) to simulate the built-in methods for constraining?
For instance, you could have a trigger set up that on INSERT, checks to see if a given FK exists to aid with enforcing referential integrity, or on DELETE to cascade
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=101322
P.S. Would avoid this at all costs.
I've not had occasion to use this myself, but you might want to look into Foreign Data Wrappers, which are essentially the successor to dblink. In particular, postgres-fdw.
Once the general setup of the fdw is in place (steps 1-3 in the link above), you could create a foreign table via CREATE FOREIGN TABLE, defined like the table in your remote DB, and then use that table as part of the foreign key CONSTRAINT, and see if it works.
If that doesn't work, another option would be to have a process which ETL's the data (say, via a Python script) from the remote server over to the local server (say, on an hourly or daily basis, depending on the size), and then you would have a true local table to use in the foreign key CONSTRAINT. It wouldn't be real-time, but depending on your needs, may suffice.

Add Foreign Key Contraint between Tables in different Databases [duplicate]

I have two tables in two different databases. In table1 (in database1) there is a column called column1 and it is a primary key. Now in table2 (in database2) there is a column called column2 and I want to add it as a foreign key.
I tried to add it and it gave me the following error:
Msg 1763, Level 16, State 0, Line 1
Cross-database foreign key references are not supported. Foreign key Database2.table2.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.
How do I do that since the tables are in different databases.
You would need to manage the referential constraint across databases using a Trigger.
Basically you create an insert, update trigger to verify the existence of the Key in the Primary key table. If the key does not exist then revert the insert or update and then handle the exception.
Example:
Create Trigger dbo.MyTableTrigger ON dbo.MyTable, After Insert, Update
As
Begin
If NOT Exists(select PK from OtherDB.dbo.TableName where PK in (Select FK from inserted) BEGIN
-- Handle the Referential Error Here
END
END
Edited: Just to clarify. This is not the best approach with enforcing referential integrity. Ideally you would want both tables in the same db but if that is not possible. Then the above is a potential work around for you.
If you need rock solid integrity, have both tables in one database, and use an FK constraint. If your parent table is in another database, nothing prevents anyone from restoring that parent database from an old backup, and then you have orphans.
This is why FK between databases is not supported.
You could use check constraint with a user defined function to make the check. It is more reliable than a trigger. It can be disabled and reenabled when necessary same as foreign keys and rechecked after a database2 restore.
CREATE FUNCTION dbo.fn_db2_schema2_tb_A
(#column1 INT)
RETURNS BIT
AS
BEGIN
DECLARE #exists bit = 0
IF EXISTS (
SELECT TOP 1 1 FROM DB2.SCHEMA2.tb_A
WHERE COLUMN_KEY_1 = #COLUMN1
) BEGIN
SET #exists = 1
END;
RETURN #exists
END
GO
ALTER TABLE db1.schema1.tb_S
ADD CONSTRAINT CHK_S_key_col1_in_db2_schema2_tb_A
CHECK(dbo.fn_db2_schema2_tb_A(key_col1) = 1)
In my experience, the best way to handle this when the primary authoritative source of information for two tables which are related has to be in two separate databases is to sync a copy of the table from the primary location to the secondary location (using T-SQL or SSIS with appropriate error checking - you cannot truncate and repopulate a table while it has a foreign key reference, so there are a few ways to skin the cat on the table updating).
Then add a traditional FK relationship in the second location to the table which is effectively a read-only copy.
You can use a trigger or scheduled job in the primary location to keep the copy updated.
The short answer is that SQL Server (as of SQL 2008) does not support cross database foreign keys--as the error message states.
While you cannot have declarative referential integrity (the FK), you can reach the same goal using triggers. It's a bit less reliable, because the logic you write may have bugs, but it will get you there just the same.
See the SQL docs # http://msdn.microsoft.com/en-us/library/aa258254%28v=sql.80%29.aspx Which state:
Triggers are often used for enforcing
business rules and data integrity. SQL
Server provides declarative
referential integrity (DRI) through
the table creation statements (ALTER
TABLE and CREATE TABLE); however, DRI
does not provide cross-database
referential integrity. To enforce
referential integrity (rules about the
relationships between the primary and
foreign keys of tables), use primary
and foreign key constraints (the
PRIMARY KEY and FOREIGN KEY keywords
of ALTER TABLE and CREATE TABLE). If
constraints exist on the trigger
table, they are checked after the
INSTEAD OF trigger execution and prior
to the AFTER trigger execution. If the
constraints are violated, the INSTEAD
OF trigger actions are rolled back and
the AFTER trigger is not executed
(fired).
There is also an OK discussion over at SQLTeam - http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=31135
Achieving referential integrity accross databases is not an easy task.
Here is a list of frequently employed mechanisms:
Clone & Sync: The referenced data is regularly cloned/merged into the referencing database. This may be suitable if the referenced data is rarely changing. You end up with two physical copies of the same data, and need a reliable process to keep them in sync (e.g. with an ETL pipeline).
Triggers: Changes to the referencing data and the referenced data are caught by SQL triggers, which ensure referential integrity. However, triggers can be slow, and may not fire at a database restore. It cannot hurt to run scheduled consistency checks as part of the operations monitoring. Write access to the referenced database is required for installing and maintaining the trigger.
Check constraints: SQL-Server offers user-defined contraints, which ensure that every row satisfies a given condition. One can exploit this functionality by writing a user defined function that checks the existence of a row in the referenced data, and then use this function as a CHECK's predicate in the referencing table. This does not catch changes in the referenced data. It is an RDBMS-specific solution, but works accross server boundaries (e.g. using linked servers). It is a good choice for referencing globally unique IDs, such as article codes in a company's ERP system, which never get deleted or re-assigned.
Re-think database architecture: When all the above mechanisms are unsatisfactory, multiple databases may be merged in a single database. The originating database names can become schema names, allowing effective grouping of database objects.
As the error message says, this is not supported on sql server.
The only way to ensure refrerential integrity is to work with triggers.

Add Foreign Key relationship between two Databases

I have two tables in two different databases. In table1 (in database1) there is a column called column1 and it is a primary key. Now in table2 (in database2) there is a column called column2 and I want to add it as a foreign key.
I tried to add it and it gave me the following error:
Msg 1763, Level 16, State 0, Line 1
Cross-database foreign key references are not supported. Foreign key Database2.table2.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.
How do I do that since the tables are in different databases.
You would need to manage the referential constraint across databases using a Trigger.
Basically you create an insert, update trigger to verify the existence of the Key in the Primary key table. If the key does not exist then revert the insert or update and then handle the exception.
Example:
Create Trigger dbo.MyTableTrigger ON dbo.MyTable, After Insert, Update
As
Begin
If NOT Exists(select PK from OtherDB.dbo.TableName where PK in (Select FK from inserted) BEGIN
-- Handle the Referential Error Here
END
END
Edited: Just to clarify. This is not the best approach with enforcing referential integrity. Ideally you would want both tables in the same db but if that is not possible. Then the above is a potential work around for you.
If you need rock solid integrity, have both tables in one database, and use an FK constraint. If your parent table is in another database, nothing prevents anyone from restoring that parent database from an old backup, and then you have orphans.
This is why FK between databases is not supported.
You could use check constraint with a user defined function to make the check. It is more reliable than a trigger. It can be disabled and reenabled when necessary same as foreign keys and rechecked after a database2 restore.
CREATE FUNCTION dbo.fn_db2_schema2_tb_A
(#column1 INT)
RETURNS BIT
AS
BEGIN
DECLARE #exists bit = 0
IF EXISTS (
SELECT TOP 1 1 FROM DB2.SCHEMA2.tb_A
WHERE COLUMN_KEY_1 = #COLUMN1
) BEGIN
SET #exists = 1
END;
RETURN #exists
END
GO
ALTER TABLE db1.schema1.tb_S
ADD CONSTRAINT CHK_S_key_col1_in_db2_schema2_tb_A
CHECK(dbo.fn_db2_schema2_tb_A(key_col1) = 1)
In my experience, the best way to handle this when the primary authoritative source of information for two tables which are related has to be in two separate databases is to sync a copy of the table from the primary location to the secondary location (using T-SQL or SSIS with appropriate error checking - you cannot truncate and repopulate a table while it has a foreign key reference, so there are a few ways to skin the cat on the table updating).
Then add a traditional FK relationship in the second location to the table which is effectively a read-only copy.
You can use a trigger or scheduled job in the primary location to keep the copy updated.
The short answer is that SQL Server (as of SQL 2008) does not support cross database foreign keys--as the error message states.
While you cannot have declarative referential integrity (the FK), you can reach the same goal using triggers. It's a bit less reliable, because the logic you write may have bugs, but it will get you there just the same.
See the SQL docs # http://msdn.microsoft.com/en-us/library/aa258254%28v=sql.80%29.aspx Which state:
Triggers are often used for enforcing
business rules and data integrity. SQL
Server provides declarative
referential integrity (DRI) through
the table creation statements (ALTER
TABLE and CREATE TABLE); however, DRI
does not provide cross-database
referential integrity. To enforce
referential integrity (rules about the
relationships between the primary and
foreign keys of tables), use primary
and foreign key constraints (the
PRIMARY KEY and FOREIGN KEY keywords
of ALTER TABLE and CREATE TABLE). If
constraints exist on the trigger
table, they are checked after the
INSTEAD OF trigger execution and prior
to the AFTER trigger execution. If the
constraints are violated, the INSTEAD
OF trigger actions are rolled back and
the AFTER trigger is not executed
(fired).
There is also an OK discussion over at SQLTeam - http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=31135
Achieving referential integrity accross databases is not an easy task.
Here is a list of frequently employed mechanisms:
Clone & Sync: The referenced data is regularly cloned/merged into the referencing database. This may be suitable if the referenced data is rarely changing. You end up with two physical copies of the same data, and need a reliable process to keep them in sync (e.g. with an ETL pipeline).
Triggers: Changes to the referencing data and the referenced data are caught by SQL triggers, which ensure referential integrity. However, triggers can be slow, and may not fire at a database restore. It cannot hurt to run scheduled consistency checks as part of the operations monitoring. Write access to the referenced database is required for installing and maintaining the trigger.
Check constraints: SQL-Server offers user-defined contraints, which ensure that every row satisfies a given condition. One can exploit this functionality by writing a user defined function that checks the existence of a row in the referenced data, and then use this function as a CHECK's predicate in the referencing table. This does not catch changes in the referenced data. It is an RDBMS-specific solution, but works accross server boundaries (e.g. using linked servers). It is a good choice for referencing globally unique IDs, such as article codes in a company's ERP system, which never get deleted or re-assigned.
Re-think database architecture: When all the above mechanisms are unsatisfactory, multiple databases may be merged in a single database. The originating database names can become schema names, allowing effective grouping of database objects.
As the error message says, this is not supported on sql server.
The only way to ensure refrerential integrity is to work with triggers.