Deleting from one table and updating another - sql

I have two tables with following columns:
SUMMARY(sum_id, sum_number) and DETAILS(det_id, det_number, sum_id)
I want to delete rows from table DETAILS with det_id in list of IDs, which can be done by:
DELETE FROM details WHERE det_id in (1,2,3...)
BUT
At the same time I need to update table SUMMARY if summary.sum_id=details.sum_id
UPDATE summary SET sum_number-=somefunction(details.det_number)
WHERE summary.sum_id=details.sum_id
More over, afterwards it would be totally great to delete rows from SUMMARY table if sum_number<=0
How to do all this in an intelligent way?
What if i know, from the very beginning, both IDs: details.det_id (to delete) AND summary.sum_id which correspond to details.det_id

You did not specify a DBMS so I'm assuming PostgreSQL.
You can do this with a single statement using the new writeable CTE feature:
with deleted as (
delete from details
where det_id in (1,2,3...)
returning details.*
),
new_summary as (
update summary
set sum_number = some_function(deleted.det_number)
from deleted
where delete.sum_id = summary.sum_id
returning summary.sum_id
)
delete from summary
where sum_number <= 0
and sum_id in (select sum_id from new_summary);
The in condition in the outer delete is not strictly necessary, but you may not have CTE definitions that you don't use, so the condition ensures that the new_summary CTE is actually used in the statement. Additionally it might improve performance a bit, because only the changed summary rows are checked (not all).

It is not possible to perform all of these operations in a single statement. You would have to do something like this:
UPDATE summary SET sum_number = somefunction(details.det_number)
FROM summary INNER JOIN details ON summary.sum_id = details.sum_id
DELETE FROM details WHERE det_id IN (1,2,3,...)
DELETE FROM summary WHERE sum_number <= 0

I would use a trigger... then the database is responsible for the deletes.
Using an update trigger, once/if the Update is successfull if will fire the trigger which can do as much or as little as you need... i.e. it can do you're 2 deletes.
For an example have a read of this tutorial:
http://www.mysqltutorial.org/create-the-first-trigger-in-mysql.aspx this answer (http://stackoverflow.com/questions/6296313/mysql-trigger-after-update-only-if-row-has-changed) from stackoverflow also provides a good example.

Related

Postgres: select for update using CTE

I always had a query in the form of:
UPDATE
users
SET
col = 1
WHERE
user_id IN (
SELECT
user_id
FROM
users
WHERE
...
LIMIT 1
FOR UPDATE
);
And I was pretty sure that it generates a lock on the affected row until the update is done.
Now I wrote the same query using CTE and doing
WITH query AS (
select
user_id
FROM
users
WHERE
...
LIMIT 1
FOR UPDATE
)
UPDATE
users
SET
col = 1
WHERE
user_id IN (
SELECT
user_id
FROM
query
);
I’m actually having some doubts that it is applying a row lock because of the results I get, but I couldn’t find anything documented about this.
Can someone make it clear? Thanks
Edit:
I managed to find this:
If specific tables are named in FOR UPDATE or FOR SHARE, then only rows coming from those tables are locked; any other tables used in the SELECT are simply read as usual. A FOR UPDATE or FOR SHARE clause without a table list affects all tables used in the statement. If FOR UPDATE or FOR SHARE is applied to a view or sub-query, it affects all tables used in the view or sub-query. However, FOR UPDATE/FOR SHARE do not apply to WITH queries referenced by the primary query. If you want row locking to occur within a WITH query, specify FOR UPDATE or FOR SHARE within the WITH query.
https://www.postgresql.org/docs/9.0/sql-select.html#SQL-FOR-UPDATE-SHARE
So I guess it should work only if the for update is in the with and not in the query that is using the with?

Alternatives to UPDATE statement Oracle 11g

I'm currently using Oracle 11g and let's say I have a table with the following columns (more or less)
Table1
ID varchar(64)
Status int(1)
Transaction_date date
tons of other columns
And this table has about 1 Billion rows. I would want to update the status column with a specific where clause, let's say
where transaction_date = somedatehere
What other alternatives can I use rather than just the normal UPDATE statement?
Currently what I'm trying to do is using CTAS or Insert into select to get the rows that I want to update and put on another table while using AS COLUMN_NAME so the values are already updated on the new/temporary table, which looks something like this:
INSERT INTO TABLE1_TEMPORARY (
ID,
STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS)
SELECT
ID
3 AS STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS
FROM TABLE1
WHERE
TRANSACTION_DATE = SOMEDATE
So far everything seems to work faster than the normal update statement. The problem now is I would want to get the remaining data from the original table which I do not need to update but I do need to be included on my updated table/list.
What I tried to do at first was use DELETE on the same original table using the same where clause so that in theory, everything that should be left on that table should be all the data that i do not need to update, leaving me now with the two tables:
TABLE1 --which now contains the rows that i did not need to update
TABLE1_TEMPORARY --which contains the data I updated
But the delete statement in itself is also too slow or as slow as the orginal UPDATE statement so without the delete statement brings me to this point.
TABLE1 --which contains BOTH the data that I want to update and do not want to update
TABLE1_TEMPORARY --which contains the data I updated
What other alternatives can I use in order to get the data that's the opposite of my WHERE clause (take note that the where clause in this example has been simplified so I'm not looking for an answer of NOT EXISTS/NOT IN/NOT EQUALS plus those clauses are slower too compared to positive clauses)
I have ruled out deletion by partition since the data I need to update and not update can exist in different partitions, as well as TRUNCATE since I'm not updating all of the data, just part of it.
Is there some kind of JOIN statement I use with my TABLE1 and TABLE1_TEMPORARY in order to filter out the data that does not need to be updated?
I would also like to achieve this using as less REDO/UNDO/LOGGING as possible.
Thanks in advance.
I'm assuming this is not a one-time operation, but you are trying to design for a repeatable procedure.
Partition/subpartition the table in a way so the rows touched are not totally spread over all partitions but confined to a few partitions.
Ensure your transactions wouldn't use these partitions for now.
Per each partition/subpartition you would normally UPDATE, perform CTAS of all the rows (I mean even the rows which stay the same go to TABLE1_TEMPORARY). Then EXCHANGE PARTITION and rebuild index partitions.
At the end rebuild global indexes.
If you don't have Oracle Enterprise Edition, you would need to either CTAS entire billion of rows (followed by ALTER TABLE RENAME instead of ALTER TABLE EXCHANGE PARTITION) or to prepare some kind of "poor man's partitioning" using a view (SELECT UNION ALL SELECT UNION ALL SELECT etc) and a bunch of tables.
There is some chance that this mess would actually be faster than UPDATE.
I'm not saying that this is elegant or optimal, I'm saying that this is the canonical way of speeding up large UPDATE operations in Oracle.
How about keeping in the UPDATE in the same table, but breaking it into multiple small chunks?
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 0000000 and 0999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 1000000 and 1999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 2000000 and 2999999
COMMIT
This could help if the total workload is potentially manageable, but doing it all in one chunk is the problem. This approach breaks it into modest-sized pieces.
Doing it this way could, for example, enable other apps to keep running & give other workloads a look in; and would avoid needing a single humungous transaction in the logfile.

Updating Table Records in a Batch and Auditing it

Consider this Table:
Table: ORDER
Columns: id, order_num, order_date, order_status
This table has 1 million records. I want to update the order_status to value of '5', for a bunch (about 10,000) of order_num's that i will be reading from a input text file.
My SQL could be:
(A) update ORDER set order_status=5 where order_num in ('34343', '34454', '454545',...)
OR
(B) update ORDER set order_status=5 where order_num='34343'
I can loop over this update several times until I have covered my 10,000 order updates.
(Also note that i have few Child Tables of ORDER like ORDER_ITEMS, where similar status must be updated and information audited)
My problem is here is:
How can i Audit this update in a separate ORDER_AUDIT Table:
Order_Num: 34343 - Updated Successfully
Order_Num: 34454 - Order Not Found
Order_Num: 454545 - Updated Successfully
Order_Num: 45457 - Order Not Found
If i go for batch update as in (A), I cannot Audit at Order Level.
If i go for Single Order at at time update as in (B), I will have to loop 10,000 times - that may be quite slow - but I can Audit at Order level in this case.
Is there any other way?
First of all, build an external table over your "input text file". That way you can run a simple single UPDATE statement:
update ORDER
set order_status=5
where order_num in ( select col1 from ext_table order by col1)
Neat and efficient. (Sorting the sub-query is optional: it may improve the performance of the update but the key point is, we can treat external tables like regular tables and use the full panoply of the SELECT syntax on them.) Find out more.
Secondly use the RETURNING clause to capture the hits.
update ORDER
set order_status=5
where order_num in ( select col1 from ext_table order by col1)
returning order_num bulk collect into l_nums;
l_nums in this context is a PL/SQL collection of type number. The RETURNING clause will give you all the ORDER_NUM values for updated rows only. Find out more.
If you declare the type for l_nums as a SQL nested table object you can use it in further SQL statements for your auditing:
insert into order_audit
select 'Order_Num: '||to_char(t.column_value)||' - Updated Succesfully'
from table ( l_nums ) t
/
insert into order_audit
select 'Order_Num: '||to_char(col1)||' - Order Not Found'
from ext_table
minus
select * from table ( l_nums )
/
Notes on performance:
You don't say how many of the rows you have in the input text file will match. Perhaps you don't know (actually on re-reading it's not clear whether 10,000 is the number of rows in the file or the number of matching rows). Pl/SQL collections use private session memory, so very large collections can blow the PGA. However, you should be able to cope with ten thousand NUMBER instances without blinching.
My solution does require you to read the external table twice. This shouldn't be a problem. And it will certainly be way faster than dynamically assembling one hundred IN clauses of a thousand numbers and looping over each.
Note that update is often the slowest bulk operation known to man. There are ways of speeding them up, but those methods can get quite involved. However, if this is something you'll want to do often and performance becomes a sticking point you should read this OraFAQ article.
Use MERGE. Firstly load data into a temporary table called ORDER_UPD_TMP with only one column id. You can do it using SQLDeveloper import feature. Then use MERGE in order to udpate your base table:
MERGE INTO ORDER b
USING (
SELECT order_id
FROM ORDER_UPD_TMP
) e
ON (b.id = e.id)
WHEN MATCHED THEN
UPDATE SET b.status = 5
You can also update with a different status when records don't match. Check the documentation for more details:
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm
I think the best way will be:
to import your file to the database first
then do few SQL UPDATE/INSERT queries in one transaction to update status for all orders and create audit records.

Slow join on Inserted/Deleted trigger tables

We have a trigger that creates audit records for a table and joins the inserted and deleted tables to see if any columns have changed. The join has been working well for small sets, but now I'm updating about 1 million rows and it doesn't finish in days. I tried updating a select number of rows with different orders of magnitude and it's obvious this is exponential, which would make sense if the inserted/deleted tables are being scanned to do the join.
I tried creating an index but get the error:
Cannot find the object "inserted" because it does not exist or you do not have permissions.
Is there any way to make this any faster?
Inserting into temporary tables indexed on the joining columns could well improve things as inserted and deleted are not indexed.
You can check ##ROWCOUNT inside the trigger so you only perform this logic above some threshold number of rows though on SQL Server 2008 this might overstate the number somewhat if the trigger was fired as the result of a MERGE statement (It will return the total number of rows affected by all MERGE actions not just the one relevant to that specific trigger).
In that case you can just do something like SELECT #NumRows = COUNT(*) FROM (SELECT TOP 10 * FROM INSERTED) T to see if the threshold is met.
Addition
One other possibility you could experiment with is simply bypassing the trigger for these large updates. You could use SET CONTEXT_INFO to set a flag and check the value of this inside the trigger. You could then use OUTPUT inserted.*, deleted.* to get the "before" and "after" values for a row without needing to JOIN at all.
DECLARE #TriggerFlag varbinary(128)
SET #TriggerFlag = CAST('Disabled' AS varbinary(128))
SET CONTEXT_INFO #TriggerFlag
UPDATE YourTable
SET Bar = 'X'
OUTPUT inserted.*, deleted.* INTO #T
/*Reset the flag*/
SET CONTEXT_INFO 0x

Creating a default query for a column in a table (SQL)?

I have a column in one of my tables which is suppose to be the total sum for from the rows of a number of tables. Is there a way i can have a default query which runs on the total sum column so that every time a row is added to the other table an update is made in the total sum column.
Thanks
You might want to look at using a view instead of a table for this, something like the following might help.
Select table.*, sum(otherTable.column)
from table
inner join otherTable on table.something = otherTable.something
Sounds like you want to add a trigger.
http://dev.mysql.com/doc/refman/5.0/en/triggers.html
You want to update the total sum column every time one of the columns in the other tables is changed? Then a trigger may serve your purposes.
Create Trigger For Insert, Update, Delete
On OtherTable
As
Update SumTable Set
SumColumn =
(Select Sum(Column)
From OtherTable
Where something = s.Something)
From SumTable s
Where Something In
(Select Distinct something From inserted
Union
Select Distinct Something From deleted)
or, you can separate the code for a delete from the code for an insert or update by writing separate triggers, or by:
Create Trigger For Insert, Update, Delete
On OtherTable
As
If Exists(Select * From inserted) And Update(Column)
Update SumTable Set
SumColumn =
(Select Sum(Column)
From OtherTable
Where something = s.Something)
From SumTable
Where Something In
(Select Distinct Something
From Inserted)
Else If Exists(Select * From deleted)
Update SumTable Set
SumColumn =
(Select Sum(Column)
From OtherTable
Where something = s.Something)
From SumTable
Where Something In
(Select Distinct Something
From deleted)
As Charles said, a trigger works well in this situation. If the sum of rows from other tables changes frequently however, I'm not sure if a trigger would cause performance issues. There are two other approaches:
Views - A view is essentially a saved query, and you query on it just like a table. If the sum data is only needed for reporting-type stuff, you may be better off removing the sum column from your main table and using the view for reporting
Stored Procedure - If you prefer to keep the column in the main table, you could run a stored procedure on a regular basis that keeps the sum information up-to-date for all rows.
I would compare performance between the view idea and the trigger idea before deciding which to use. Do this against the full data set you expect the view to have, not just a small test set of data. Make sure to index the view if it is possible.