PostgreSQL import, if constraint not met set null - sql

I'm trying to import a db from file-maker pro into PostgreSQL. As a result of not being maintained well, the data has had some degradation in links between tables over time.
I've attempted to import the data with no constraints and then add the constraint with a USING bock to set it to null if the referenced value doesn't exist.
I have two tables, a people table and a show table. I want to set all the people id's that don't exist to null in the show_leader_id column of the show table. Here's what I have:
BEGIN;
ALTER TABLE show ADD FOREIGN KEY (show_leader_id) REFERENCES people
USING (CASE WHEN (SELECT COUNT(*) FROM people WHERE person_id=show_leader_id)=1 THEN show_leader_id ELSE NULL END);
COMMIT;

Check existence with an EXISTS semi-join:
UPDATE show s
SET show_leader_id = NULL
WHERE NOT EXISTS (SELECT 1 FROM people WHERE person_id = s.show_leader_id);
Then add your fk constraint.
If you have concurrent write operations, run both in the same transaction like #Eelke advises. (But that's probably not the case in your situation.)
NOT IN can be treacherous if there are NULL values in people.person_id. Since you are dealing with a mess, this is not unlikely. Details:
Find records where join doesn't exist
Select rows which are not present in other table

Related

Delete multiple rows of data from multiple tables in SQL

I need help of making a SQL query, when I delete the main category it should delete all the data of my subcategory and all the products related to that subcategory. Basically deleting data from 3 tables.
Can it be done in one query?
Cant give you code sample without a Database reference. But if all the tables have the same Primary key you can try using INNER JOIN.
DELETE table1, table2, table3
FROM table1
INNER JOIN table2 ON table2.key = table1.key
INNER JOIN table3 ON table3.key = table1.key
WHERE table1.key = value;
key value should be common accross all tables. Something like a "ID".
Instead of solving it via Sql, you could use foreign keys with the ON DELETE CASCADE setting enabled.
If you database type/version supports it.
F.e. ANSI SQL:2003, MySSQL, Oracle, PostgreSQL, MS SQL Server, SQLite
That way, when you delete from the main table. Then the records in the other table that reference to those deleted records, will automatically be deleted also.
For example in MS Sql Server:
ALTER TABLE SubCategory
ADD CONSTRAINT FK_SubCategory_MainCategoryID_Cascade
FOREIGN KEY (MainCategoryID)
REFERENCES MainCategory(ID) ON DELETE CASCADE;
ALTER TABLE Products
ADD CONSTRAINT FK_Products_SubCategoryID_Cascade
FOREIGN KEY (SubCategoryID)
REFERENCES SubCategory(ID) ON DELETE CASCADE;
It's a way to automatically maintain the referential integrity between them.
After that a delete from the MainCategory table will delete the related records from the SubCategory table. And also the Products that are related to the deleted SubCategory records.
And here is a db<>fiddle example for SQLite to demonstrate.
P.S. Personally I would prefere an ON DELETE SET NULL or an ON DELETE SET DEFAULT n, or even using triggers. Sure, that would require an extra cleanup of the unreferenced records afterwards, f.e. via a scheduled script. But it just feels less detrimental. Because then it's easier to fix when someone accidently deleted a MainCategory that shouldn't have been deleted.

Insert Records with Violations in SQL Server

I want to populate 5000 records in the below format to a particular table.
Insert into #Table
(c1,c2,c3,c4,c5)
Values
(1,2,3,4,5),
(2,2,3,4,5),
(3,2,3,4,5),
(4,2,3,4,5),
(5,2,3,4,5)
....
....
Up to 1000 rows
When I try to execute it. I got a foreign Key violation. I know the reason since one of the value did not exist in its corresponding parent table.
There are few records causing this violation. It's very hard to find those violated rows among the 1000 rows so I want to insert at least the valid records to my target table leaving the violated rows as it is for now.
I am not sure how to perform this. Please suggest me any ideas to do this.
If this is a one time thing, then you can do the following:
Drop the FK constraint
ALTER TABLE MyTAble
DROP CONSTRAINT FK_Contstraint
GO
Execute INSERT
Find the records with no matching parent id.
SELECT * FROM MyTable MT WHERE NOT EXISTS (SELECT 1 FROM ParentTable PT WHERE MT.ParentId = PT.ID)
DELETE those records or do something else with them.
Recreate the FK constraint.
Disable the foreign key or fix your data.
Finding the bad data is simple - you can always temporarily insert it into a buffer table and run queries to find which data is missing in the related table.

Moving large amounts of data instead of updating it

I have a large table (about 40M Rows) where I had a number of columns that are 0 which need to be null instead so we can better key the data.
I've written scripts to look chop the update into chunks of 10000 records, find the occurance of the columns with zero and update them to null.
Example:
update FooTable
set order_id = case when order_id = 0 then null else order_id end,
person_id = case when person_id = 0 then null else person_id end
WHERE person_id = 0
OR order_id = 0
This works great, but it takes for ever.
I thinking the better way to do this would be to create a second table and insert the data into it and then rename it to replace the old table with the columns having zero.
Question is - can I do a insert into table2 select from table1 and in the process cleanse the data from table1 before it goes in?
You can usually create a new, sanitised, table, depending on the actual DB server you are using.
The hard thing is that if there are other tables in the database, you may have issues with foreign keys, indexes, etc which will refer to the original table.
Whether making a new sanitised table will be quicker than updating your existing table is something you can only tell by trying it.
Dump the pk/clustered key of all the records you want to update into a temp table. Then perform the update joining to the temp table. That will ensure the lowest locking level and quickest access. You can also add an identity column to the temp table, than you can loop through and do the updates in batches.

Writing data constraints into tables

I want to add something to a table (trigger?) so that, for exactly, exactly 1 row per ID has a specific value for a specific column. So that if a statement was run that makes this not the case, an exception would be thrown and the insert would be rolled back.
Let's take this schema.
ID Current Value
1 Y 0
1 N 0
1 N 2
2 Y 2
And the constraint I want is that for each ID, exactly one row has a current of 'Y'.
Therefore, these statements would not be executed and return an appropriate error:
insert into table values (1,'Y',1);
insert into table values (3,'N',2);
update table set current = 'N' where ID = 1;
I have two questions:
Is it a good idea to code this kind of constraint logic into your table, or is that best saved for the applications that manipulate the data? Why?
How can it be done? What kind of tool does oracle provide to create a constraint like this?
It's best if you can specify it in a declarative fashion (rather than procedurally, e.g. using triggers). Especially because triggers, without some kind of locking algorithm, will NOT work anyway due to concurrent sessions trying to insert/update the table at the same time.
In this instance, the simplest solution is a unique, function-based index, e.g.:
CREATE UNIQUE INDEX only_one_current ON thetable
(CASE WHEN Current = 'Y' THEN ID END);
The expression is NULL if Current = 'N', and all-NULL rows in an index are not stored, which means that the uniqueness constraint will only apply to rows where Current = 'Y'.
I think what you are looking for is just a unique constraint.
You can add it using below statement so that only unique combination of ID , Current can exist in table.
ALTER TABLE table_name add CONSTRAINT constraint_name UNIQUE (ID,Current);

How to configure reference to be deleted on parent table update?

I have two tables:
info: ID, fee_id
and
fee: ID, amount
and a reference between them (SQL Server 2008):
ALTER TABLE info WITH CHECK ADD CONSTRAINT FK_info_fee FOREIGN KEY(fee_id)
REFERENCES fee (ID)
ALTER TABLE info CHECK CONSTRAINT FK_info_fee
GO
How to configure this reference that way so a record in fee will be deleted if info.fee_id becomes NULL
EDIT: or maybe set info.fee_id to NULL on deleting the corresponding record in fee.
Anyway I can do it this way:
UPDATE info SET fee = NULL WHERE = ..
DELETE FROM fee WHERE ..
but I'm sure that this can be done by the database itself.
You probably don't want to do this. What would you expect to happen if multiple info rows referenced the same fee row?
If you really want to do something like this, adding logic to an AFTER UPDATE, DELETE trigger on the info table would probably be the way to go. Check if any other info rows reference that same fee row, and if not, delete the fee row.
Some thoughts:
If you have a one:one reference then can the 2 tables be combined?
Drilling up from child to parent is odd: if it's 1:1 then can you reverse the FK direction and simply CASCADE NULL?
Otherwise, you'll have to use a trigger but assuming 1:1 makes me uneasy...
... unless you have a unique constraint/index on info_fee.fee_id
Like so:
ALTER TABLE info WITH CHECK ADD
CONSTRAINT FK_fee_info_fee FOREIGN KEY (id) REFERENCES info_fee (fee_ID) ON DELETE SET NULL
If you really intend to remove rows when fee_id is set to null, one way is an update trigger. In an update trigger, the deleted table contains the old version of the updated rows, and the inserted table contains the new version. By joining them, you can take action when a fee_id changes to null:
CREATE TRIGGER deleteFee
ON info
FOR UPDATE
AS
DELETE FROM Fee
WHERE Fee.id IN (
SELECT old.fee_id
FROM deleted old
JOIN inserted new ON old.id = new.id
WHERE old.fee_id = fee.id
AND new.fee_id is null
)
This is tricky when multiple info rows refer to the same fee. The fee will be removed if any info row is set to null. A full synch trigger would avoid that:
CREATE TRIGGER deleteFee
ON info
FOR UPDATE
AS
DELETE FROM Fee
WHERE NOT EXISTS (
SELECT *
FROM Info
WHERE Fee.id = Info.fee_id
)
But this can have other unintended consequences, like deleting half the Fee table in response to an update. In this case, as in most cases, triggers add more complexity than they solve. Triggers are evil and should be avoided at almost any cost.