PLSQL: Get number of records updated vs inserted when a merge statement is used - sql

Merge will always give you the number of records merged regardless of how my records inserted or updated using SQL%ROWCOUNT.
But how to find out number of records that were actually inserted vs number of records that were actually updated.
I tried options from this post but this doesn't seem to work -
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:122741200346595110
Any help?

You cannot, in general, differentiate how a row affected by a MERGE statement was affected in order to get separate counts for inserted, updated, and deleted rows.
If you really need separate figures, you could issue separate INSERT and UPDATE statements though that is likely to be less efficient. There are non-general solutions that depend on particular query plans but those are going to be rather brittle and generally wouldn't be recommended.

Related

Update necessary fields only in VoltDB

I have a table with some 50 column count. Every time there is change in row I do not have a knowledge that which columns will change. I don't want to deal with each and every permutation and combination when updating the table.
So when I have to do so I am updating all 50 columns and which, I know, takes much more time than my expectation when dealing with huge number of updates.
To address this I have one solution. Create different set of frequently and together updated fields and design my application that way. Which I know will require change whenever new field is added to my table.
UPDATE TBLX SET NAME = ? WHERE ID = ?;
Result of Explain Update...
UPDATE
INDEX SCAN of "TBLX" using "TBLX_ID"
scan matches for (U_ID = ?5), filter by (column#0 = ?6)
Another approach is that I write a query with when and then(as shown below). This way my code will need update but not as much as it might require in first approach.
UPDATE TBLX SET NAME = CASE WHEN (? != '####') THEN ? ELSE NAME END WHERE ID = ?;
Result of Explain Update...
UPDATE
INDEX SCAN of "TBLX" using "TBLX_ID"
scan matches for (U_ID = ?3), filter by (column#0 = ?4)
So my question is about the internal of the query execution.
How both type of the query will be treated and which one will work faster.
Thing I want to understand is whether executor will ignore the part of the query where I am not changing value in column. i.e. assign same value to the column.
The plans show that both queries are using a match on the TBLX_ID index, which is the fastest way to find the particular row or rows to be updated. If it is a single row, this should be quite fast.
The difference between these two queries is essentially what it is doing for the update work once it has found the row. While the plan doesn't show the steps it will take when updating one row, it should be fast either way. At that point, it's native C++ code updating a row in memory that it has exclusive access to. If I had to guess, the one using the CASE clause may take slightly longer, but it could be a negligible difference. You'd have to run a benchmark to measure the difference in execution times to be certain, but I would expect it to be fast in both cases.
What would be more significant than the difference between these two updates is how you handle updating multiple columns. For example, the cost of finding the affected row may be higher than the logic of the actual updates to the columns. Or, at least if you desiged it so that in order to update n columns you have to queue n SQL statements, then the engine has to execute n statements, and use the same index to find the same row n times. All of that overhead would be much more signficant. If instead you had a complex UPDATE statement with many parameters where you could pass in different values to update various columns or set them to their current value, but in all of that the engine only has to execute one statement and find the row once, then even though that seems complex, it would probably be faster. Faster still may be to simply update all of the columns to the new values, whether it is the same as the current value or not.
If you can test this and run a few hundred examples, then the "exec #Statistices PROCEDUREDETAIL 0;" output will show the average execution time for each of the SQL statements as well as the entire procedure. That should provide the metrics necessary to find the optimum way to do this.

Db2 for I :How to Select rows while deleting?

I've found this article that explains how to get the deleted record with the OLD TABLE keywords.
https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/apsg/src/tpc/db2z_selectvaluesdelete.html
However, the instruction doesn't seem to work in Db2 for I-series (version 7.2)
Do you know any alternatives to get the same result?
Thanks
As you have discovered, this syntax is not valid for DB2 for i. But, I can think of a couple ways to do what you want.
You can use two statements, one to retrieve records to be deleted into a temporary table, then one to perform the delete (just use the same where clause for both). Unfortunately, this has the opportunity, however small, that you will delete more than you read. If additional records that match your where clause are inserted between the time you select, and the time you delete, then your log will not be accurate.
You can use a delete trigger to insert records into the log as they are deleted from the table. This might be the best way, as it will always log deletes, no matter how the records are deleted. But it will always log deletes, no matter how the records are deleted, and if you only want those records logged within certain processes, then you will need to build dependencies between your trigger and those processes making both more complex.
You can use a stored procedure with a cursor and a positioned delete as mentioned by Mark Bairinstein in the comments above. This will allow you to delete records with logging, and also prevent the issue with the first option. But this leaves users the opportunity to delete records in a way that is not logged. May be good or bad depending on your requirements.

Find which table is causing duplicate rows in a view

I have a view in sql server which should be returning one row per project. A few of the projects have multiple rows. The view has a lot of table joins so I would not like to have to manually run a script on each table to find out which one is causing duplicates. Is there a quick automated way to find out which table is the problem table (aka the one with duplicate rows)?
The quickest way I've found is:
find an example dupe
copy out the query
comment out all joins
add the joins back one at a time until you get another row
Whatever the join is where you started getting dupes, is where you have multiple records.
My technique is to make a copy of the view and modify it to return every column from every table in the order of the FROM clause, with extra columns between with the table names as the column name (see example below). Then select a few rows and slowly scan to the right until you can find the table that does NOT have duplicate row data, and this is the one causing dupes.
SELECT
TableA = '----------', TableA.*,
TableB = '----------', TableB.*
FROM ...
This is usually a very fast way to find out. The problem with commenting out joins is that then you have to comment out the matching columns in the select clause each time, too.
I used a variation of SpectralGhost's technique to get this working even though neither method really solves the problem of avoiding the manual checking of each table for duplicate rows.
My variation was to use a divide and conquer method of commenting out the joins instead of
commenting out each one individually. Due to the sheer number of joins this was much faster.

Logging the results of a MERGE statement

I have 2 tables: A temporary table with raw data. Rows in it may be repeating (more then 1 time). The second is the target table with actual data (every row is unique).
I'm transfering rows using a cursor. Inside the cursor I use a MERGE statement. How can I print to the console using DBMS_OUTPUT.PUT_LINE which rows are updated and which are deleted?
According to the official documentation there is no such feature for this statement.
Are there any workaround?
I don't understand why you would want to do this. The output of dbms_output requires someone to be there to look at it. Not only that it requires someone to look through all of the output otherwise it's pointless. If there are more than say 20 rows then no one will be bothered to do so. If no one looks through all the output to verify but you need to actually log it then you are actively harming yourself by doing it this way.
If you really need to log which rows are updated or deleted there are a couple of options; both involve performance hits though.
You could switch to a BULK COLLECT, which enables you to create a cursor with the ROWID of the temporary table. You BULK COLLECT a JOIN of your two tables into this. Update / delete from the target table based on rowid and according to your business logic then you update the temporary table with a flag of some kind to indicate the operation performed.
You create a trigger on your target table which logs what's happening to another table.
In reality unless it is important that the number of updates / deletes is known then you should not do anything. Create your MERGE statement in a manner that ensures that it errors if anything goes wrong and use the error logging clause to log any errors that you receive. Those are more likely to be the things you should be paying attention to.
Previous posters already said that this approach is suspicious, both because of the cursor/loop and the output log for review.
On SQL Server, there is an OUTPUT clause in the MERGE statement that allows you to insert a row in another table with the $action taken (insert,update,delete) and any columns from the inserted or deleted/overwritten data you want. This lets you summarize exactly as you asked.
The equivalent Oracle RETURNING clause may not work for MERGE but does for UPDATE and DELETE.

What is the fastest way to compare 2 rows in SQL?

I have 2 different databases. Upon changing something in the big one (i don't have access to), i get some rows imported in my databases in a similar HUGE table. I have a job checking for records in this table, and if any, execute a stored procedure, process and delete from table.
Performance. (Huge amount of data) I would like to know what is the fastest way to know if something has changed using let's say 2 imported rows with 100 columns each. Don't have FK-s, don't need. Chances are, that even though I have records in my table, nothing has actually changed.
Also. Let's say there is actually changed something. Is it possible for example to check only for changes inside datetime columns?
Thanks
You can always use update triggers - these will give you access to two logical tables, inserted and updated. You can compare the values of these and base your action on the results.