Performance Issue while using On delete cascading in sql server

Performance Issue while using On delete cascading in sql server - sql

I have a Parent Table which is having just the 10 rows,
But their respective child table will have 100K records, for 1 ID in parent table we have 10K record in child table.
When I fire a delete command on parent table it also delete the record from child table, But it takes around 5 minutes to delete all 10K records.
So my question is what is the best practice to delete records from child table when we have cascading effects on the table.
10K records is just an example for some ID we have millions of records to delete.

Assuming SQL Server could make use of an index while deleting, placing an index on the foreign key column in the child table might speed up the deletion. For example:
parent (id, col1, col2, ...)
child (id, parent_id, ...)
CREATE INDEX ON child (parent_id);
Such an index might let SQL Server quickly lookup each child record, given a parent record.

This is too long for a comment.
It takes time to delete thousands of rows in a table. The deletion has to:
Find the rows to delete.
Change the data on the data pages.
Log the changes.
Modify indexes.
Execute triggers (if any).
Delete related rows in other tables (if any).
This can be quite expensive.
Given the low volume of parent ids, I think you can speed this by partitioning the child table by the parent id. A clustered index on the parent id might also help -- but that might affect insert performance.

Related

Postgres transaction table with Billion rows and multiple JSON columns

So we have a new project where we need to use postgres 14 to scale up a transaction table that gets heavily updated. The Master table has about a 60 million rows over a six month period and a child table has about 600 million rows. Data retention period is six months after which we have to drop the oldest month partition.
I want opinions from Postgres Experts on whether this design is right and whether anything is overlooked:
Parent/Master table
ID
JSON 1---> A couple of hundred characters
JSON 2 ---> 50 characters
The table has about 20 columns. Updates are always based on the primary key.
Child Table
Parent_IDFK (Parent key from Parent or Master Table)
Occurance_id (Every parent has 10 rows in the Child table, 1,2,3,4,5....)These are occurances
Occurance JSON . Each Child linked to a parent has a specific JSON. Lets call it Occurance JSON. So Child 1 has Occurance 1 JSON. Child 2 has Occurance 2 JSON.
Over the period of a day,a row first gets inserted into the master. Then about 10 rows get inserted into the child. After the child record is inserted, we have to update the parent
with aggregate occurance. The parent JSON aggregate in the parent table will look something like this
UPDATE PARENT SET AGGREGATE_JSON= (sum up the SUM of occurrences in the Child table for that parent key) WHERE ID=<>.
There will also be updates to the Child table based on primary key and occurance id.
Other than that, there will be heavy reads. Here is my design
1)Primary Key on the Master ID. There may be no need to partition a sixty million row table. Because searches are based on dates, I will have another index on the startdate.
2)Child table. Primary key is Master ID, Occurance ID, StartDate. Table is partitioned based on start date
3)Will try to compute aggregates as much as possible on a daily basis and read from aggregates so full table scans are avoided.
4)When we update the child table, we always specify the partition. Something like this
UPDATE CHILD SET <> WHERE PARENT_IDFK=<> AND OCCURANCE_ID=<> AND START_DATE(partition key)=<>.
That way full table scans are avoided.
5)All INSERTS/UPDATES will be via stored procedures keeping the Python/Flask middleware as SQL code-free as possible.
Any other points you want to add to this or is this good enough?

Incremental load for Updates into Warehouse

I am planning for an incremental load into warehouse (especially for updates of source tables in RDBMS).
Capturing the updated rows in staging tables from RDBMS based the updates datetime. But how do I determine which column of a particular row needs to be updated in the target warehouse tables?
Or do I just delete a particular row in the warehouse table (based on the primary key of the row in staging table) and insert the new updated row?
Which is the best way to implement the incremental load between the RDBMS and Warehouse using PL/SQL and SQL coding?

In my opinion, the easiest way to accomplish this is as follows:
Create a stage table identical to your host table. When you do your incremental/net-change load, load all changed records into this table (based on whatever your "last updated" field is)
Delete the records from your actual table based on the primary key. For example, if your primary key is customer, part, the query might look like this:
delete from main_table m
where exists (
select null
from stage_table s
where
m.customer = s.customer and
m.part = s.part
);
Insert the records from the stage to the main table.
You could also do an update existing records / insert new records, but either way that's two steps. The advantage of the method I listed is that it will work even if your tables have partitions and the newly updated data violates one of the original partition rules, whereas an update would not accomplish that. Also, the syntax is much simpler as your update would have to list every single field, whereas the delete from / insert into allows you list only the primary key fields.
Oracle also has a merge clause that will update if it exists or insert if it does not. I honestly don't know how that would be impacted if you had partitions.
One major caveat. If your updates include deletes -- records that need to be deleted from the main table, none of these will resolve that and you will need some other way to handle that. It may not be necessary, depending on your circumstances, but it's something to consider.

Inserting Data into Tables and Integrity

This is my first post, so please excuse me for any obvious or simple questions as I am very new to programming and all my projects are a first to me.
I am currently working on my first database project. A relational database using Oracle sql. I'm new on my course, so I am not sure on all the concepts yet, but working at it.
I have used some modelling software to help me construct a 13 table database. I have setup all my columns and assigned primary and foreign keys to all 13 tables. What I am looking to do now is insert 10 rows of test data into each table. I have done the parent tables but am confused about the child tables. When I assign ID numbers to all the parent tables primary keys, will the child tables foreign keys be populated at the same time?
I have not used sequences yet as I'm not 100% how to make them work, but instead inputted my own values like 100, 101, 102 etc. I know those values need to be in the foreign key, but wouldn't manually inserting them into many tables get confusing?
Is there an easier approach to this or am I over complicating the process?
I will need to use some queries later but I just want to be happy that the data is sound.
Thanks for your help
Rob

No, the child table data won't be populated automatically-- if there is a child table, that implies that there is a 0 or 1 to m relationship between the two. One row in the parent table may have 0 rows in the child table or it may have dozens so nothing could possibly be populated automatically.
If you are manually assigning primary key values, you'd need to hard code those same values as the foreign key values when you insert data into the child tables. In the real world, you wouldn't manually insert data into many tables at once, you'd have an application that did so and that knew what keys to use based on parameters passed in or by getting the currval of the sequence used to populate the primary key after inserting into the parent table.

Its necessary that data for foreign key should be present in parent table, but not the other way around.
If you want to create test data, i suggest you use something like below query.
insert into child_table(fk_column,column1,column2....)
select pk_column,'#dummy_value1#','#dummy_value2#',..
from parent_table
if you have 10 rows in parent, this will add 10 rows in child.
If you want more rows, e.g. 100 for each parent value you need to duplicate the parent data. for that use below query.
insert into child_table(fk_column,column1,column2....)
select pk_column,'#dummy_value1#','#dummy_value2#',..
from parent_table
join (select level from dual connect by level<10)
this will add 100 child values for 10 parent values..

Fire a SQL Server trigger once for parent or child updates

Our database has a parent table and multiple child tables for 1 to many relationships. The parent table has an UPDATE INSERT trigger so that when data is modified a 'count' of fields with data added in the current row is updated.
I would also like data INSERT DELETE or UPDATE on related records in the child tables to trigger an update of this count. So I put a trigger on each child table and it updates the count on the parent table. But when in a single transaction rows are inserted/deleted into multiple child tables then the count procedure will be fired multiple times. Is there a method to avoid this? I am looking for a single trigger only fired once for updates across multiple tables, but that has access to the inserted/deleted primary key ids for the union of all modified tables.

Massive delete (2 joins required) with no partitionning

I'm experiencing some infinite loop for my delete. I tried many way to deal with this problem but still take too much time. I will try to be clear as possible.
I have 4 implicated table in this problem.
The deletion is done depending of the pool_id given
Table 1 contain the pool_id
Table 2 the ticket_id foreign join ticket_pool_id with the pool_id
Table 3 ticket_child_id foreign join ticket_id with the ticket_id
Table 4 ticket_grand_child_id foreign ticket_child_id join with the ticket_child_id
Concerned count for each
table 1---->1
table 2---->1 200 000
table 3---->6 300 000
table 4---->6 300 000
So in fact it`s 6.3M+6.3M+1.2M+1 row to be deleted
Here`s the constraint :
No partitioning
Oracle version 9
Online all the time so no downtime neither CTAS
We cannot use cascade constraint
The normalization is very important
Here`s what I tried:
Bulk delete
Delete with statement (In and Exists clause)
temp table for each level and 1 level join
procedure and commit each 20k
None of those worked in a decent time frame like less then one hour. The fact that we cannot base a delete on one of the column value is not helping. Is there a way?

If you are trying to delete joining the tables the complexity may become cubical or even worse. With tables with many records this will become a performance killer. You can try to output a list of values for deletion from the first table in a temporary table, then use another one to select the IDs for deletion from the second table and so on. I suppose having proper indexes will keep the complexity quadratic and will complete the task in normal porion of time. Good luck

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas