I have a use case where multiple rows in table A are aggregated down to a single row in table B. We represent the origin of rows in table B with a foreign key column in table A, saying "as a row, I contributed to X row in table B".
We want to find the best solution so that once every row from table A which contributed to table B has been deleted, deleted the row in table B as an orphan.
I'm not sure if there's some way to use ON DELETE CASCADE to handle this. But I'm guessing not and that maybe triggers are the best option.
I can't just purge all orphans on a schedule because the changes need to be persisted very soon after occurring.
Using the given schema, what our best option? Alternatively, is there some other schema that better sets us up for the scenario I gave?
Related
Im new to sql and i have two tables, one for fridges and one for food. Only 5 items of food can be stored into one fridge, so i was wondering if there is a way to limit the food table to only have 5 of the same fridge_id entries?
There is no straightforward way to enforce such a constraint.
The best I can think of is:
have a (redundant) column food_count on the fridges table
define an AFTER INSERT OR UPDATE OR DELETE trigger on the food table that updates food_count whenever something changes
add a check constraint on the fridges table that limits food_count to 5
I'm using postgresql. I have 3 tables.
Table A has an ID column that's a Primary Key
Table B and Table C have ID columns that are foreign key references to A's ID.
In a single process, I would like to lock any rows that have a particular ID and then possibly delete rows and insert rows with that ID in B and C
My current approach is
SELECT FOR UPDATE on A on the ID.
Then I try to delete and insert rows in B and C.
commit/end
Unfortunately, my code deadlocks trying to do the insert.
What am I doing wrong? What is the proper way to prevent other processes from adding, removing, or updating rows with a given ID in B and C (until I am done with my transaction)?
Thanks in advance!
It looks I was doing things correctly from the start. My issue was that I was accidentally creating two different database connections in my code. So, from postgresql's perspective, there were two different transactions - hence the deadlocking.
There are two tables A and B. As they have a many-to-many relation, there's also table C.
A
------
id PK
B
------
id PK
C
------
id_A PK
id_B PK
Now, a row B only exists only exists when at least one row of A has a relation to it, and one B may contain a relation to two or more different rows of A.
My question is, how do i automatically delete a row from B if there isn't any foreign to it in C? My initial though was to set a trigger, but i'm not to sure about this and i'd want a second opinion in how to proceed. Thank you.
First, one assumes that the data is initially set up correctly. That is, the only b records are the ones that meet your condition.
Then, the solution involves triggers on table c. When a row is deleted, it would check:
Does id_b have any other rows in the table?
If not, then delete the row.
This can actually be a bit tricky. In general, you don't want to query the table being triggered. So, I might suggest an alternative approach:
Add a counter on b.
Add insert/update/delete triggers on c that increments or decrements the count in b.
If the counter is 0 (or 1 before decrementing), then delete the row.
Gosh, you might find that the counter itself is sufficient, and there is no need to actually delete the row. You can get that effect if you use a view:
create view v_b as
select b.*
from b
where ab_counter > 0;
You could also create a view on b and not have to deal with triggers at all:
create view v_b as
select b.*
from b
where exists (select 1 from c where c.b_id = b.id);
#Gordon's solution above is great, However a slight modification might help.
First, one assumes that the data is initially set up correctly. That is, the only b records are the ones that meet your condition.
Then, the solution involves triggers on table c. When a row is deleted, it would check:
Does id_b have any other rows in the table?
If not, then delete the
row.
This is a bit tricky because you have to check if other rows exist. This check can be automated by using,
FOREIGN KEY(id) REFERENCES B(id) ON DELETE RESTRICT
on table C. Now you only need to delete row from B in the trigger without any checks, since the restrict constraint will automatically check if row exists in table C and restrict the delete of a referenced row in table B else delete is successful.
I am planning for an incremental load into warehouse (especially for updates of source tables in RDBMS).
Capturing the updated rows in staging tables from RDBMS based the updates datetime. But how do I determine which column of a particular row needs to be updated in the target warehouse tables?
Or do I just delete a particular row in the warehouse table (based on the primary key of the row in staging table) and insert the new updated row?
Which is the best way to implement the incremental load between the RDBMS and Warehouse using PL/SQL and SQL coding?
In my opinion, the easiest way to accomplish this is as follows:
Create a stage table identical to your host table. When you do your incremental/net-change load, load all changed records into this table (based on whatever your "last updated" field is)
Delete the records from your actual table based on the primary key. For example, if your primary key is customer, part, the query might look like this:
delete from main_table m
where exists (
select null
from stage_table s
where
m.customer = s.customer and
m.part = s.part
);
Insert the records from the stage to the main table.
You could also do an update existing records / insert new records, but either way that's two steps. The advantage of the method I listed is that it will work even if your tables have partitions and the newly updated data violates one of the original partition rules, whereas an update would not accomplish that. Also, the syntax is much simpler as your update would have to list every single field, whereas the delete from / insert into allows you list only the primary key fields.
Oracle also has a merge clause that will update if it exists or insert if it does not. I honestly don't know how that would be impacted if you had partitions.
One major caveat. If your updates include deletes -- records that need to be deleted from the main table, none of these will resolve that and you will need some other way to handle that. It may not be necessary, depending on your circumstances, but it's something to consider.
I have a table people with less than 100,000 records and I have taken a backup of this table using the following:
create table people_backup as select * from people
I add some new records to my people table over time, but eventually I want to merge the records from my backup table into people. Unfortunately I cannot simply DROP my table as my new records will be lost!
So I want to update the records in my people table using the records from people_backup, based on their primary key id and I have found 2 ways to do this:
MERGE the tables together
use some sort of fancy correlated update
Great! However, both of these methods use SET and make me specify what columns I want to update. Unfortunately I am lazy and the structure of people may change over time and while my CTAS statement doesn't need to be updated, my update/merge script will need changes, which feels like unnecessary work for me.
Is there a way merge entire rows without having to specify columns? I see here that not specifying columns during an INSERT will direct SQL to insert values by order, can the same methodology be applied here, is this safe?
NB: The structure of the table will not change between backups
Given that your table is small, you could simply
DELETE FROM table t
WHERE EXISTS( SELECT 1
FROM backup b
WHERE t.key = b.key );
INSERT INTO table
SELECT *
FROM backup;
That is slow and not particularly elegant (particularly if most of the data from the backup hasn't changed) but assuming the columns in the two tables match, it does allow you to not list out the columns. Personally, I'd much prefer writing out the column names (presumably those don't change all that often) so that I could do an update.