How can i fast delete row in postgres? - sql

I have table with 17,000,000 rows. I need to delete 500,000 with certain conditions. At this moment i have a script with 500,000 rows looks like
delete from table where name = 'John' and date = '2010-08-04';
delete from table where name = 'John' and date = '2010-08-05';
delete from table where name = 'Adam' and date = '2010-08-06';
One row executed about 2.5 seconds. It's too long. How can i improve speed?

If there is no index on name and date field then try to create below index and try your code.
CREATE INDEX idx_table_name_date ON table (name, date)
If possible you can also minimize the number of delete statement by merging them.
Instead of
delete from table where name = 'John' and date = '2010-08-04';
delete from table where name = 'John' and date = '2010-08-05';
It can be:
delete from table where name = 'John' and date in('2010-08-04','2010-08-05');

I would suggest that you load the rows to delete into a table and use:
delete from table t
from todelete td
where t.name = td.name and t.date = td.date;
Even without indexes, this should be faster than zillions of separate delete statements. But you want an index on table(name, date) for performance.
If the data already comes from a table or query, then you can just use that directly.
You can also incorporate this into one query by listing the values explicitly in a from clause:
delete from table t
from (values ('John', '2010-08-04'),
('John', '2010-08-05')
('Adam', '2010-08-06')
) todelete
where t.name = td.name and t.date = td.date;

Related

Removing duplicates and keeping one copy

I have been going through the threads about removing duplicates from a table and keeping one copy .I have seen an illustration in the case one have a table with composite key.anyone with the idea ?
table contr with composite key checkno,salary_month,sal_year
delete (select * from CONTR t1
INNER JOIN
(select CHECKNO, SALARY_YEAR,SALARY_MONTH FROM CONTR
group by CHECKNO, SALARY_YEAR,SALARY_MONTH HAVING COUNT(*) > 1) dupes
ON
t1.CHECKNO = dupes.CHECKNO AND
t1.SALARY_YEAR= dupes.SALARY_YEAR AND
t1.SALARY_MONTH=dupes.SALARY_MONTH);
I expected one duplicate to be removed and one maintained.
You can use this query below to remove duplicates by using rowid as having a unique valued column :
delete contr t1
where rowid <
(
select max(rowid)
from contr t2
where t2.checkno = t1.checkno
and t2.salary_year = t1.salary_year
and t2.salary_month = t1.salary_month
);
Demo
Another way to achieve this assuming you have dupes with 3 columns you have mentioned is
Create a temp table with distinct values
Drop your table
Rename the temp table
Especially if you are dealing huge volume of data this way would be a lot faster than delete.
If the dup data you are working on is subset of your main table the steps would be
Create a temp table with distinct values
Delete all dup columns from main table
Insert data from temp table to main table
The SQL for the first step would be
create table tmp_CONTR AS
select distinct CHECKNO, SALARY_YEAR,SALARY_MONTH -- this part can be modified to match your needs
from CONTR t1;

SQLite Update Query Optimization

So I have tables with the following structure:
TimeStamp,
var_1,
var_2,
var_3,
var_4,
var_5,...
This contains about 600 columns named var_##, the user parses some data stored by a machine and I have to update all null values inside that table to the last valid value. At the moment I use the following query:
update tableName
set var_## =
(select b.var_## from tableName as
where b.timeStamp <= tableName.timeStamp and b.var_## is not null
order by timeStamp desc limit 1)
where tableName.var_## is null;
Problem right now is the tame it takes to run this query for all columns, is there any way to optimize this query?
UPDATE: this is the output query plan when executin te query for one column:
update wme_test2
set var_6 =
(select b.var_6 from wme_test2 as b
where b.timeStamp <= wme_test2.timeStamp and b.var_6 is not null
order by timeStamp desc limit 1)
where wme_test2.var_6 is null;
Having 600 indexes on the data columns would be silly. (But not necessarily more silly than having 600 columns.)
All queries can be sped up with an index on the timeStamp column.

Need to improve following UPDATE SQL statement or rewrite so can execute faster

How can I adjust the following UPDATE statement?
There are 3000000 rows in the database table and this when I execute the UPDATE statement it takes forever to run. I been running this query from last 17 hours and haven't seen the result. But when I execute select statement it takes only 2 minutes and 36 seconds.
q is fact table while a is dimension table.
UPDATE q
SET q.[DID] = a.[DID]
FROM [dbo].[CallDetail] q
JOIN [DimSchart] a ON a.[Schart] = q.[Schart]
WHERE q.[DID] IS NULL;
GO
create a temporary table with the pk fields of table CallDetail and the DID field.
INSERT into this table a SELECT query which gets CallDetails pk fields and the DID from DimSchart.
UPDATE CallDetail from the temp table.
EDIT (added code):
CREATE TABLE #tmpCallDetailUpdate(CallDetailID int, DID int);
INSERT INTO #tmpCallDetailUpdate(CallDetailID, DID)
select q.CallDetailID, a.DID
FROM CallDetail q
JOIN DimSchart a ON a.Schart = q.Schart
WHERE q.DID IS NULL;
UPDATE CallDetail q
SET q.DID = u.DID
FROM #tmpCallDetailUpdate u
WHERE u.CallDetailID = q.CallDetailID;
(assuming there is a column CallDetailID in your CallDetail table; if not, substitute whatever the PK is on the table.)

update table by giving old and new entry from query

I select a list of old and new values for a table with a query:
select new, old from SOME_TABLE;
new old
----------- -----------
1174154 1064267743
1174164 1072037230
1174167 1065180221
1174180 1071828953
1174181 1067402664
1174204 1073143287
1174215 1057480190
1174222 1061816319
1174331 1072011864
1174366 1061275972
now i need to update a table that contains these old values and replace them by the new
ones.
update OTHER_TABLE set some_column = <newvalue> where some_column = <oldvalue>
Is it possible to do this with one query or do i need to loop over the result tuples and update for each row?
I cannot change the database layout or write a trigger that does this automatically...
Try the below:
UPDATE OTHER_TABLE t1
SET some_column = (SELECT t2.new FROM SOME_TABLE t2
WHERE t2.old = t1.old_value_column)
Just replace old_value_column with the column name that hold the old value in OTHER_TABLE, along with the other table and column names.
i would not use a subquery as afaik you will select some_table for each row in the table that is to be updated. should the number of rows in both tables be large, you may run into problems. therfore i suggest the update-from method outlined below.
update t
set yourcolumn = n.new
from yourtable t
join some_table s
on t.id = s.old

SQL With... Update

Is there any way to do some kind of "WITH...UPDATE" action on SQL?
For example:
WITH changes AS
(...)
UPDATE table
SET id = changes.target
FROM table INNER JOIN changes ON table.id = changes.base
WHERE table.id = changes.base;
Some context information: What I'm trying to do is to generate a base/target list from a table and then use it to change values in another table (changing values equal to base into target)
Thanks!
You can use merge, with the equivalent of your with clause as the using clause, but because you're updating the field you're joining on you need to do a bit more work; this:
merge into t42
using (
select 1 as base, 10 as target
from dual
) changes
on (t42.id = changes.base)
when matched then
update set t42.id = changes.target;
.. gives error:
ORA-38104: Columns referenced in the ON Clause cannot be updated: "T42"."ID"
Of course, it depends a bit what you're doing in the CTE, but as long as you can join to your table withint that to get the rowid you can use that for the on clause instead:
merge into t42
using (
select t42.id as base, t42.id * 10 as target, t42.rowid as r_id
from t42
where id in (1, 2)
) changes
on (t42.rowid = changes.r_id)
when matched then
update set t42.id = changes.target;
If I create my t42 table with an id column and have rows with values 1, 2 and 3, this will update the first two to 10 and 20, and leave the third one alone.
SQL Fiddle demo.
It doesn't have to be rowid, it can be a real column if it uniquely identifies the row; normally that would be an id, which would normally never change (as a primary key), you just can't use it and update it at the same time.