SQL Oracle - better way to to write update and delete query - sql

I have around 270 million rows indexed on month||id_nr and below update/delete query takes around 4 hours to complete.
I was wondering if there is any other way to do update/delete which will be faster.
Update Query:-
update table_A
set STATUS='Y'
where
month||id_nr in (select distinct month||id_nr from table_A where STATUS='Y');
Delete Query:-
Delete from table_B
where
month||id_nr in (select distinct month||id_nr from table_A where STATUS='Y');

Why the string concatenation? And never try to force the DBMS to make rows distinct in an IN clause. Let the DBMS decide what it considers the best approach to look up the data.
So, just:
where (month, id_nr) in (select month, id_nr from table_A where status = 'Y');
I suppose that id_nr is not a unique ID in table_b, for otherwise you wouln't have to look at it combined with the month. An appropriate index would hence be:
create index idx_b on table_b (id_nr, month);
Or maybe, if you work a lot with the month, it may be a good idea to even partition the table by month. This could speed up queries, updates, and deletes immensely.
For table_a I suggest
create index idx_a on table_a (status, id_nr, month);
which is a covering index. The first column will help find the desired rows quickly; the other two columns will be available without having to read the table row.

Related

Never ending query

My query worked fine up to this day.
I suppose there is some problem with index.
Note: On column type (in table_1) there is index, and type is not unique in table_1
Here is what query looks alike:
--works fine(finish in 5 seconds)
select *
from table_1 a,table_2 b
where a.id=b.id
and a.date < date'2014-6-31'
and a.type=2
When I increase date range (include 1 month more, thats about 1000 records more) it doesn't finish, so I have to stop it.
--never ending
select *
from table_1 a,table_2 b
where a.id=b.id
and a.date < date'2014-7-31'
and a.type=2
But when I omit column that has index, it's ok:
--works fine
select *
from table_1 a,table_2 b
where a.id=b.id
and a.date < date'2014-7-31'
I would be grateful on any hint.
Try to disable index this way:
ALTER INDEX idxname DISABLE;
You can also rebuild index
ALTER INDEX idxname REBUILD;
or gather statistics on the table
EXEC DBMS_STATS.GATHER_TABLE_STATS ('yourschema', 'table');
But be careful it could take long time!
You can use query plan to see what's happened.
Before you runstats or disable index, recode the query plan first, compare the query plan after runstats or disable index. Then you can figure out what's happened in your database by your self.
You should know, not all index helps.
Some times without index, dbms will choose full table scan and hash join to improve the performance.
But with some index, dbms may use next loop join, this type of join may cause less cost, but may execute serial.

How to check if all fields are unique in Oracle?

How to check if all fields are unique in Oracle?
SELECT myColumn, COUNT(*)
FROM myTable
GROUP BY myColumn
HAVING COUNT(*) > 1
This will return to you all myColumn values along with the number of their occurence if their number of occurences is higher than one (i.e. they are not unique).
If the result of this query is empty, then you have unique values in this column.
An easy way to do this is to analyze the table using DBMS_STATS. After you do, you can look at dba_tables... Look at the num_rows column. The look at dab_tab_columns. Compare the num_distinct for each column to the number of rows. This is a round about way of doing what you want without performing a full table scan if you are worried about affecting a production system on a huge table. If you want direct results, do what the the others suggest by running the query against the table with a group by.
one way is to create a unique index.
if the index creation fails, you have existing duplicate info, if an insert fails, it would have produced a duplicate...

SQL Query to delete oldest rows over a certain row count?

I have a table that contains log entries for a program I'm writing. I'm looking for ideas on an SQL query (I'm using SQL Server Express 2005) that will keep the newest X number of records, and delete the rest. I have a datetime column that is a timestamp for the log entry.
I figure something like the following would work, but I'm not sure of the performance with the IN clause for larger numbers of records. Performance isn't critical, but I might as well do the best I can the first time.
DELETE FROM MyTable WHERE PrimaryKey NOT IN
(SELECT TOP 10,000 PrimaryKey FROM MyTable ORDER BY TimeStamp DESC)
I should mention that this query will run 3-4 times a day (as part of another process), so the number of records that will be deleted with each query will be small in comparison to the number of records that will be kept.
Try this:
DECLARE #X int
SELECT #X=COUNT(*) FROM MyTable
SET #X=#X-10000
DELETE MyTable
WHERE PrimaryKey IN (SELECT TOP(#x) PrimaryKey
FROM MyTable
ORDER BY TimeStamp ASC
)
kind of depends on if you are deleting fewer than 10,000 rows, if so this might run faster, as it identifies the rows to delete, not the rows to keep.
Try this, uses a CTE to get the row ordinal number, and then only deletes X number of rows at a time. You can alter this variable to suit your server.
Adding ReadPast table hint should prevent locking.
:
DECLARE #numberToDelete INT;
DECLARE #ROWSTOKEEP INT;
SET #ROWSTOKEEP = 50000;
SET #numberToDelete =1000;
WHILE 1=1
BEGIN
WITH ROWSTODELETE AS
(
SELECT ROW_NUMBER() OVER(ORDER BY dtsTimeStamp DESC) rn,
*
FROM MyTable
)
DELETE TOP (#numberToDelete) FROM ROWSTODELETE WITH(READPAST)
WHERE rn>#ROWSTOKEEP;
IF ##ROWCOUNT=0
BREAK;
END;
The query you have is about as efficient as it gets, and is readable.
NOT IN and NOT EXISTS are more efficient than LEFT JOIN/IS NULL, but only because both columns can never be null. You can read this link for a more in-depth comparison.
This depends on your scenario (whether it's feasible for you) and how many rows you have, but there is a potentially far more optimal approach.
Create a new copy of the log table with a new name
Insert into the new table, the most recent 10,000 records from the original table
Drop the original table (or rename)
Rename the new table, to the proper name
This obviously requires more thought than just deleting rows (e.g. if the table has an IDENTITY column this needs to be set on the new table etc). But if you have a large table it would be more efficient to copy 10,000 rows to a new table then drop the original table, than trying to delete millions of rows to leave just 10,000.
DELETE FROM MyTable
WHERE TimeStamp < (SELECT min(TimeStamp)
FROM (SELECT TOP 10,000 TimeStamp
FROM MyTable
ORDER BY TimeStamp DESC))
or
DELETE FROM MyTable
WHERE TimeStamp < (SELECT min(TimeStamp)
FROM MyTable
WHERE PrimaryKey IN (SELECT TOP 10,000 TimeStamp
FROM MyTable
ORDER BY TimeStamp DESC))
Not sure if these are improvement as far as efficiency though.

Deleting duplicate rows in a database without using rowid or creating a temp table

Many years ago, I was asked during a phone interview to delete duplicate rows in a database. After giving several solutions that do work, I was eventually told the restrictions are:
Assume table has one VARCHAR column
Cannot use rowid
Cannot use temporary tables
The interviewer refused to give me the answer. I've been stumped ever since.
After asking several colleagues over the years, I'm convinced there is no solution. Am I wrong?!
And if you did have an answer, would a new restriction suddenly present itself? Since you mention ROWID, I assume you were using Oracle. The solutions are for SQL Server.
Inspired by SQLServerCentral.com http://www.sqlservercentral.com/scripts/T-SQL/62866/
while(1=1) begin
delete top (1)
from MyTable
where VarcharColumn in
(select VarcharColumn
from MyTable
group by VarcharColumn
having count(*) > 1)
if ##rowcount = 0
exit
end
Deletes one row at a time. When the second to last row of a set of duplicates disappears then the remaining row won't be in the subselect on the next pass through the loop. (BIG Yuck!)
Also, see http://www.sqlservercentral.com/articles/T-SQL/63578/ for inspiration. There RBarry Young suggests a way that might be modified to store the deduplicated data in the same table, delete all the original rows, then convert the stored deduplicated data back into the right format. He had three columns, so not exactly analogous to what you are doing.
And then it might be do-able with a cursor. Not sure and don't have time to look it up. But create a cursor to select everything out of the table, in order, and then a variable to track what the last row looked like. If the current row is the same, delete, else set the variable to the current row.
This is a completely Jacked up way to do it, but given the assanine requirements, here is a workable solution assuming SQL 2005 or later:
DELETE from MyTable
WHERE ROW_NUMBER() over(PARTITION BY [MyField] order by MyField)>1
I would put a unique number of fixed size in the VARCHAR column for the duplicated rows, then parse out the number and delete all but the minimum row. Maybe that's what his VARCHAR constraint is for. But that stinks because it assumes that your unique number will fit. Lame question. You didn't want to work there anyway. ;-)
Assume you are implementing the DELETE statement for a SQL engine. how will you delete two rows from a table that are exactly identical? You need something to distinguish one from the other!
You actually cannot delete entirely duplicate rows (ALL columns being equal) under the following constraints(as provided to you)
No use of ROWID or ROWNUM
No Temporary Table
No procedural code
It can, however be done even if one of the conditions is relaxed. Here are solutions using at least one of the three conditions
Assume table is defined as below
Create Table t1 (
col1 vacrchar2(100),
col2 number(5),
col3 number(2)
);
Duplicate rows identification:
Select col1, col2, col3
from t1
group by col1, col2, col3
having count(*) >1
Duplicate rows can also be identified using this:
select c1,c2,c3, row_number() over (partition by (c1,c2,c3) order by c1,c2,c3) rn from t1
NOTE: The row_number() analytic function cannot be used in a DELETE statement as suggested by JohnFx at least in Oracle 10g.
Solution using ROWID
Delete from t1 where row_id > ( select min(t1_inner.row_id) from t1 t1_innner where t1_inner.c1=t1.c1 and t1_inner.c2=t1.c2 and t1_inner.c3=t1.c3))
Solution using temp table
create table t1_dups as (
//write query here to find the duplicate rows as liste above//
)
delete from t1
where t1.c1,t1.c2,t1.c3 in (select * from t1.dups)
insert into t1(
select c1,c2,c3 from t1_dups)
Solution using procedural code
This will use an approach similar to the case where we use a temp table.
create table temp as
select c1,c2
from table
group by c1,c2
having(count(*)>1 or count(*)=1);
Now drop the base table .
Rename the temp table to base table.
Mine was resolved using this query:
delete from where in (select from group by having count(*) >1)
in PLSQL

Performance of updating one big table using values from one small table

First, I know that the sql statement to update table_a using values from table_b is in the form of:
Oracle:
UPDATE table_a
SET (col1, col2) = (SELECT cola, colb
FROM table_b
WHERE table_a.key = table_b.key)
WHERE EXISTS (SELECT *
FROM table_b
WHERE table_a.key = table_b.key)
MySQL:
UPDATE table_a
INNER JOIN table_b ON table_a.key = table_b.key
SET table_a.col1 = table_b.cola,
table_a.col2 = table_b.colb
What I understand is the database engine will go through records in table_a and update them with values from matching records in table_b.
So, if I have 10 millions records in table_a and only 10 records in table_b:
Does that mean that the engine will do 10 millions iterations through table_a just to update 10 records? Are Oracle/MySQL/etc smart enough to do only 10 iterations through table_b?
Is there a way to force the engine to actually iterate through records in table_b instead of table_a to do the update? Is there an alternative syntax for the sql statement?
Assume that table_a.key and table_b.key are indexed.
Either engine should be smart enough to optimize the query based on the fact that there are only ten rows in table b. How the engine determines what to do is based factors like indexes and statistics.
If the "key" column is the primary key and/or is indexed, the engine will have to do very little work to run this query. It will basically already sort of "know" where the matching rows are, and look them up very quickly. It won't have to "iterate" at all.
If there is no index on the key column, the engine will have to to a "table scan" (roughly the equivalent of "iterate") to find the right values and match them up. This means it will have to scan through 10 million rows.
Do a little reading on what's called an Execution Plan. This is basically an explanation of what work the engine had to do in order to run your query (some databases show it in text only, some have the option of seeing it graphically). Learning how to interpret an Execution Plan will give you great insight into adding indexes to your tables and optimizing your queries.
Look these up if they don't work (it's been a while), but it's something like:
In MySQL, put the work "EXPLAIN" in front of your SELECT statement
In Oracle, run "SET AUTOTRACE ON" before you run your SELECT statement
I think the first (Oracle) query would be better written with a JOIN instead of a WHERE EXISTS. The engine may be smart enough to optimize it properly either way. Once you get the hang of interpreting an execution plan, you can run it both ways and see for yourself. :)
Okay I know answering own question is usually frowned upon but I already accepted another answer and won't unaccept it so meh here it is ..
I've discovered a much better alternative that I'd like to share it with anyone who encounters the same scenario: MERGE statement.
Apparently, newer Oracle versions introduced this MERGE statement which simply blows! Not only that the performance is so much better in most cases, the syntax is so simple and so make sense that I feel stupid for using the UPDATE statement! Here comes ..
MERGE INTO table_a
USING table_b
ON (table_a.key = table_b.key)
WHEN MATCHED THEN UPDATE SET
table_a.col1 = table_b.cola,
table_a.col2 = table_b.colb;
And what more is that I can also extend the statement to include INSERT action when table_a does not have matching records for some records in table_b:
MERGE INTO table_a
USING table_b
ON (table_a.key = table_b.key)
WHEN MATCHED THEN UPDATE SET
table_a.col1 = table_b.cola,
table_a.col2 = table_b.colb
WHEN NOT MATCHED THEN INSERT
(key, col1, col2)
VALUES (table_b.key, table_b.cola, table_b.colb);
This new statement type made my day the day I discovered it :)