Guarantee primary key is present in update trigger - sql

I am writing an update trigger and accessing the 'inserted' table to see which rows have been modified.
I have two related questions :
Does the inserted table always contain all the columns of the real table?
If the inserted table contains only the columns that have changed, will there always at least be the primary key columns in the inserted table?

Yes, it includes all columns from the original table, except:
SQL Server 2012 does not allow for text, ntext, or image column references in the inserted and deleted tables for AFTER triggers.
(Similar language, with different version numbers, exists for older versions of SQL Server)
Ask yourself how useful they would be if only a single (non-key) column was updated. You could tell that an update had occurred but you'd be unable to do any further useful processing.

Related

Postgresql turns update to insert and creates duplicate record

I'm not quite sure how to ask this question.
The table stores its main data in a JSONB column. The other columns are an integer primary key, a unique text secondary key, an application generated integer transaction id, and the type of operation last performed (insert, update, delete).
There are 5 triggers.
On before insert and update, set the new.operation column to TG_OP (more on this later)
On before insert, generate a unique 6 digit alphameric code for use in URLs
On before insert, generate a unique, random 6 digit numeric code avoiding the German Tank Problem.
On before insert, add the numeric and alphameric codes to the JSONB object.
On after update and delete, insert the old record appended with the new tranid and operation column to an unindexed archive table.
All of the triggers seem to work and the records get created with the new ids and the ids in the JSONB column.
However, on an update the new operation gets set to update from TG_OP variable, but the record gets inserted into the table creating duplicate keys. Subsequent ops on that record fail because of the duplicate records.
I've stepped through it in the pgAdmin debugger. It seems to go through each trigger correctly. It completes with a record from the insert (e.g. tranid=254, operation=insert) and another from the update (e.g. tranid=256, operation=update). The archive table has one record added which shows the original info was 254/insert and it was replaced by 256/update.
But there are two records in the main table!!!
This is a violation of two uniqueness constraints which should have caused it to fail:
CONSTRAINT npprimarykey_id PRIMARY KEY (id),
CONSTRAINT npid_txt_unique UNIQUE (id_txt)
Beyond that, the command being executed was an UPDATE.
I'd not clear where to look or on what forum to ask the question. Which is the forum the firms building Postgresql frequent?
Thanks,
David

Define One to Many Relationships with SQL

I'm looking for a way to set up a one to many relationship between 2 tables. The table structures is explained below but I've tried to leave everything off that has nothing to do with the problem.
Table objects has 1 column called uuid.
Table contents has 3 columns called content, object_uuid and timestamp.
The basic idea is to insert a row into objects and get a new uuid from the database. This uuid is then used stored for every row in contents to associate contents with objects.
Now I'm trying to use the database to enforce that:
Each row in contents references a row in objects (a foreign key should do)
No row in objects exists without at least a row in contents
These constraints should be enforced on commit of transactions.
Ordinary triggers can't help probably because when a row in the objects table is written, there can't be a row in contents yet. Postgres does have so called constraint triggers that can be deferred until the end of the transaction. It would be possible to use those but they seem to be some sort of internal construct not intended for everyday use.
Ideas or solutions should be standard SQL (preferred) or work with Postgres (version does not matter). Thanks for any input.
Your main problem is that other than foreign key constraints; no constraint can reference another table.
Your best bet is to denormalize this a little and have a column on object containing the count of contents that reference it. You can create a trigger to keep this up to date.
contents_count INTEGER NOT NULL DEFAULT 0
This won't be as unbreakable unless you put some user security over who can update this column. But if you keep it up to date with a trigger and all you're looking to avoid is accidental corruption, this should be sufficient.
EDIT: As per the comment, CHECK constraints are not deferrable. This solution would raise an error if all the contents are removed even if the intention is to add more in the same transaction.
Maybe what you want to do is normalize a little bit more. You need a third table, that references elements of the other tables. Table objects should have its own uuid and table contents sholud have also its own uuid and no reference to the table objects. The third table should have only the references to the other two tables, but the primary key is the combination of both references.
so for example you have an uuid of the table objects and you want all the contents of that uuid, assuming that the third table has as columns object_uuid and content_uuid, and the table contents has its own serial column named uuid, your query should be like this:
SELECT * FROM thirdtable,contents
WHERE thirdtable.content_uuid = contents.uuid AND thirdtable.object_uuid=34;
Then you can use an on insert trigger on every table
CREATE TRIGGER my_insert_trigger AFTER INSERT OR UPDATE ON contents
FOR EACH ROW EXECUTE PROCEDURE my_check_function();
and then in function my_check_function() delete every row in objects that is not present in the third table. Somebody else answered first while I was answering, if you guys like my solution I could help you to make the my_check_function() function.

If exist update else insert records in SQL Server 2008 table

I have one staging table and want to insert data to Main table, so i want to check while inserting data from staging to Main table, if exists then update the records else insert as new records. Here the issue is both the staging as well as Main table does not have any key column based on which i can compare values.
Is it possible to do without having key columns i.e. primary key on both the tables? if yes, please, suggest me how.
Thanks in advance.
If there is no unique key or set of data within a row to define uniqueness, then no.
The set of data can be a combination of the data in each column, creating a sum of parts which will provide uniqueness; however without exposure to your data you would need to make that decision.
You write the WHERE-clause to include all the fields that make your record unique (ie. the fields that decide whether the record is new or should be updated.)
Take a look at this article (http://blogs.msdn.com/b/miah/archive/2008/02/17/sql-if-exists-update-else-insert.aspx) for hints on how to construct it.
If you are using SQL Server 2008r2, you could also use the MERGE statement - I haven't tried it on tables without keys, so I don't know whether it would work for you.

How to create a Primary Key on quasi-unique data keys?

I have a nightly SSIS process that exports a TON of data from an AS400 database system. Due to bugs in the AS400 DB software, ocassional duplicate keys are inserted into data tables. Every time a new duplicate is added to an AS400 table, it kills my nightly export process. This issue has moved from being a nuisance to a problem.
What I need is to have an option to insert only unique data. If there are duplicates, select the first encountered row of the duplicate rows. Is there SQL Syntax available that could help me do this? I know of the DISTINCT ROW clause but that doesn't work in my case because for most of the offending records, the entirety of the data is non-unique except for the fields which comprise the PK.
In my case, it is more important for my primary keys to remain unique in my SQL Server DB cache, rather than having a full snapshot of data. Is there something I can do to force this constraint on the export in SSIS/SQL Server with out crashing the process?
EDIT
Let me further clarify my request. What I need is to assure that the data in my exported SQL Server tables maintains the same keys that are maintained the AS400 data tables. In other words, creating a unique Row Count identifier wouldn't work, nor would inserting all of the data without a primary key.
If a bug in the AS400 software allows for mistaken, duplicate PKs, I want to either ignore those rows or, preferably, just select one of the rows with the duplicate key but not both of them.
This SELECT statement should probably happen from the SELECT statement in my SSIS project which connects to the mainframe through an ODBC connection.
I suspect that there may not be a "simple" solution to my problem. I'm hoping, however, that I'm wrong.
Since you are using SSIS, you must be using OLE DB Source to fetch the data from AS400 and you will be using OLE DB Destination to insert data into SQL Server.
Let's assume that you don't have any transformations
Add a Sort transformation after the OLE DB Source. In the Sort Transformation, there is a check box option at the bottom to remove duplicate rows based on a give set of column values. Check all the fields but don't select the Primary Key that comes from AS400. This will eliminate the duplicate rows but will insert the data that you still need.
I hope that is what you are looking for.
In SQL Server 2005 and above:
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY almost_unique_field ORDER BY id) rn
FROM import_table
) q
WHERE rn = 1
There are several options.
If you use IGNORE_DUP_KEY (http://www.sqlservernation.com/home/creating-indexes-with-ignore_dup_key.html) option on your primary key, SQL will issue a warning and only the duplicate records will fail.
You can also group/roll-up your data but this can get very expensive. What I mean by that is:
SELECT Id, MAX(value1), MAX(value2), MAX(value3) etc
Another option is to add an identity column (and cluster on this for an efficient join later) to your staging table and then create a mapping in a temp table. The mapping table would be:
CREATE TABLE #mapping
(
RowID INT PRIMARY KEY CLUSTERED,
PKIN INT
)
INSERT INTO #mapping
SELECT PKID, MIN(rowid) FROM staging_table
GROUP BY PKID
INSERT INTO presentation_table
SELECT S.*
FROM Staging_table S
INNER JOIN #mapping M
ON S.RowID = M.RowID
If I understand you correctly, you have duplicated PKs that have different data in the other fields.
First, put the data from the other database into a staging table. I find it easier to research issues with imports (especially large ones) if I do this. Actually I use two staging tables (and for this case I strongly recommend it), one with the raw data and one with only the data I intend to import into my system.
Now you can use and Execute SQL task to grab the one of the records for each key (see #Quassnoi for an idea of how to do that you may need to adjust his query for your situation). Personally I put an identity into my staging table, so I can identify which is the first or last occurance of duplicated data. Then put the record you chose for each key into your second staging table. If you are using an exception table, copy the records you are not moving to it and don't forget a reason code for the exception ("Duplicated key" for instance).
Now that you have only one record per key in a staging table, your next task is to decide what to do about the other data that is not unique. If there are two different business addresses for the same customer, which do you chose? This is a matter of business rules definition not strictly speaking SSIS or SQL code. You must define the business rules for how you chose the data when the data needs to be merged between two records (what you are doing is the equivalent of a de-dupping process). If you are lucky there is a date field or other way to determine which is the newest or oldest data and that is the data they want you to use. In that case once you have selected just one record, you are done the intial transform.
More than likely though you may need different rules for each other field to choose the correct one. In this case you write SSIS transforms in a data flow or Exec SQl tasks to pick the correct data and update the staging table.
Once you have the exact record you want to import, then do the data flow to move to the correct production tables.

SQL Trigger: On update of primary key, how to determine which "deleted" record cooresponds to which "inserted" record?

Assume that I know that updating a primary key is bad.
There are other questions which imply that the inserted and updated table records match by position (the first of one matches the first of the other.) Is this a fact or coincidence?
Is there anything that could join the two tables together when the primary key changes on an update?
There is no match of inserted+deleted virtual table row positions.
And no, you can't match rows
Some options:
there is another unique unchanging (for that update) key to link rows
limit to single row actions.
use a stored procedure with the OUTPUT clause to capture before and after keys
INSTEAD OF trigger with OUTPUT clause (TBH not sure if you can do this)
disallow primary key updates (added after comment)
Each table is allowed to have one identity column. Identity columns are not updateable; they are assigned a value when the records are inserted (or when the column is added), and they can never change. If the primary key is updateable, it must not be an identity column. So, either the table has another column which is an identity column, or you can add one to it. There is no rule that says the identity column has to be the primary key. Then in the trigger, rows in inserted and updated that have the same identity value are the same row, and you can support updating the primary key on multiple rows at a time.
Yes -- create an "old_primary_key" field in the table you're updating, and populate it first.
Nothing you can do to match-up the inserted and deleted psuedo table record keys -- even if you store their data in a log table somewhere.
I guess alternatively, you could create a separate log table that tracked changes to primary keys (old and new). This might be more useful than adding a field to the table you're updating as I suggested right at first, as it would allow you to track more than one change for a given record. Just depends on your situation, I guess.
But that said -- before you do anything, please go find a chalk board and write this 100 times:
I know that updating a primary key is bad.
I know that updating a primary key is bad.
I know that updating a primary key is bad.
I know that updating a primary key is bad.
I know that updating a primary key is bad.
...
:-) (just kidding)