Update the same data on many specific rows - sql

I want to update multiple rows. I have a lot of ids that specify which row to update (around 12k ids).
What would be the best way to achieve this?
I know I could do
UPDATE table SET col="value" WHERE id = 1 OR id = 24 OR id = 27 OR id = ....repeatx10000
But I figure that would give bad performance, right? So is there a better way to specify which ids to update?
Postgresql version is 9.1

In terms of strict update performance not much will change. All rows with given IDs must be found and updated.
One thing that may simplify your call is to use the in keyword. It goes like this:
UPDATE table SET col="value" WHERE id in ( 1,24,27, ... );
I would also suggest making sure that the ID's are in the same order like the index on the id suggests, probably ascending.

Put your IDs in a table. Then do something like this:
UPDATE table SET col="value" WHERE id in (select id from table_of_ids_to_update)
Or if the source of your ids is some other query, use that query to get the ids you want to update.
UPDATE table SET col="value" WHERE id in (
select distinct id from some_other_table
where some_condition_for_updating is true
... etc. ...
)
For more complex cases of updating based on another table, this question gives a good example.

UPDATE table SET col="value" WHERE id in ( select id from table);
Also make indexing on your id field so, you will get better performance.

It's worth noting that if you do reference a table as #dan1111 suggests, don't use in (select ..), and certainly avoid distinct! Instead, use exists -
update table
set col = value
where exists (
select from other_table
where other_table.id = table.id
)
This ensures that the reference table is only scanned as much as it is needed.

Related

Postgres: select for update using CTE

I always had a query in the form of:
UPDATE
users
SET
col = 1
WHERE
user_id IN (
SELECT
user_id
FROM
users
WHERE
...
LIMIT 1
FOR UPDATE
);
And I was pretty sure that it generates a lock on the affected row until the update is done.
Now I wrote the same query using CTE and doing
WITH query AS (
select
user_id
FROM
users
WHERE
...
LIMIT 1
FOR UPDATE
)
UPDATE
users
SET
col = 1
WHERE
user_id IN (
SELECT
user_id
FROM
query
);
I’m actually having some doubts that it is applying a row lock because of the results I get, but I couldn’t find anything documented about this.
Can someone make it clear? Thanks
Edit:
I managed to find this:
If specific tables are named in FOR UPDATE or FOR SHARE, then only rows coming from those tables are locked; any other tables used in the SELECT are simply read as usual. A FOR UPDATE or FOR SHARE clause without a table list affects all tables used in the statement. If FOR UPDATE or FOR SHARE is applied to a view or sub-query, it affects all tables used in the view or sub-query. However, FOR UPDATE/FOR SHARE do not apply to WITH queries referenced by the primary query. If you want row locking to occur within a WITH query, specify FOR UPDATE or FOR SHARE within the WITH query.
https://www.postgresql.org/docs/9.0/sql-select.html#SQL-FOR-UPDATE-SHARE
So I guess it should work only if the for update is in the with and not in the query that is using the with?

SQL update query according to row sequence number

I have a table and I want to update this table using SQL query statement.
How can i write a statement to update the 3rd row for example?
What I mean is that I want to make update to a certain row according to its appearance sequence in the table.
3rd row
When ordered by what? SQL databases make no guarantee to the order of records in any given query unless explicitly given an ORDER BY clause.
Identify the row you want to update using a WHERE clause. For example:
UPDATE SomeTable SET SomeColumn = 'Some Value' WHERE AnotherColumn = 'Another Value'
You can chain together lots of boolean logic into that WHERE clause to create more complex ways of identifying the record(s) you want to update. But the point is that you have to identify the record. "The 3rd row" doesn't mean anything to SQL.
Once you do have that ORDER BY clause, you can perhaps do something a little more complex. For example, sort the records in a sub-query and use an identifier from that query. So you might get "the 3rd row" like this:
SELECT ID FROM SomeTable ORDER BY ID OFFSET 2 LIMIT 1
Then use that in the UPDATE:
UPDATE SomeTable SET SomeColumn = 'Some Value'
WHERE ID IN (SELECT ID FROM SomeTable ORDER BY ID OFFSET 2 LIMIT 1)
(assuming MySQL, since you didn't specify, but other RDBMS engines have similar capabilities)
I don't know which DBMS you're using. If you're using Oracle DB there are pseudocolumns ROWID and ROWNUM you could use for this purpose.

Python/SQL query to use ROWID to realign out of sync data in SQLite3 db

I have an SQLite3 table with 4 columns, "date", "price", "price2" and "vol".There are about 200k lines of data, but the last 2 columns are out of sync by 798 rows. That is the values of the second two columns in row 1, actually correspond to the values of the first two columns at row 798.
I am using Python 2.7.
I was thinking there must be a way of using the ROWID column as a unique identifier where i can extract the first two columns, then extract the second two columns and rejoin based upon "ROWID+798" or something like that.
Is this possible and if so would anyone know how to do this?
I'm curious how your database could get corrupt like that, and sceptical about your assessment that you seem to know exactly what is wrong. If something like this could happen it seems likely that there are many other problems.
In any case, the query you describe should be like this, if i understood correctly.
In most DBMS's you could do this with one subquery, but the syntax (col1,col2) = is not allowed in SQLite, so you have to do it like this.
UPDATE table1 t SET
col1 =
(SELECT col1 FROM table1
WHERE t.rowid = rowid - 798),
col2 =
(SELECT col2 FROM table1
WHERE t.rowid = rowid - 798)

how to select the newly added rows in a table efficiently?

I need to periodically update a local cache with new additions to some DB table. The table rows contain an auto-increment sequential number (SN) field. The cache keeps this number too, so basically I just need to fetch all rows with SN larger than the highest I already have.
SELECT * FROM table where SN > <max_cached_SN>
However, the majority of the attempts will bring no data (I just need to make sure that I have an absolutely up-to-date local copy). So I wander if this will be more efficient:
count = SELECT count(*) from table;
if (count > <cache_size>)
// fetch new rows as above
I suppose that selecting by an indexed numeric field is quite efficient, so I wander whether using count has benefit. On the other hand, this test/update will be done quite frequently and by many clients, so there is a motivation to optimize it.
this test/update will be done quite frequently and by many clients
this could lead to unexpected race competition for cache generation
I would suggest
upon new addition to your table add the newest id into a queue table
using like crontab to trigger the cache generation by checking queue table
upon new cache generated, delete the id from queue table
as you stress majority of the attempts will bring no data, the above will only trigger where there is new addition
and the queue table concept, even can expand for update and delete
I believe that
SELECT * FROM table where SN > <max_cached_SN>
will be faster, because select count(*) may call table scan. Just for clarification, do you never delete rows from this table?
SELECT COUNT(*) may involve a scan (even a full scan), while SELECT ... WHERE SN > constant can effectively use an index by SN, and looking at very few index nodes may suffice. Don't count items if you don't need the exact total, it's expensive.
You don't need to use SELECT COUNT(*)
There is two solution.
You can use a temp table that has one field that contain last count of your table, and create new Trigger after insert on your table and inc temp table field in Trigger.
You can use a temp table that has one field that contain last SN of your table is cached and create new Trigger after insert on your table and update temp table field in Trigger.
not much to this really
drop table if exists foo;
create table foo
(
foo_id int unsigned not null auto_increment primary key
)
engine=innodb;
insert into foo values (null),(null),(null),(null),(null),(null),(null),(null),(null);
select * from foo order by foo_id desc limit 10;
insert into foo values (null),(null),(null),(null),(null),(null),(null),(null),(null);
select * from foo order by foo_id desc limit 10;

Creating a default query for a column in a table (SQL)?

I have a column in one of my tables which is suppose to be the total sum for from the rows of a number of tables. Is there a way i can have a default query which runs on the total sum column so that every time a row is added to the other table an update is made in the total sum column.
Thanks
You might want to look at using a view instead of a table for this, something like the following might help.
Select table.*, sum(otherTable.column)
from table
inner join otherTable on table.something = otherTable.something
Sounds like you want to add a trigger.
http://dev.mysql.com/doc/refman/5.0/en/triggers.html
You want to update the total sum column every time one of the columns in the other tables is changed? Then a trigger may serve your purposes.
Create Trigger For Insert, Update, Delete
On OtherTable
As
Update SumTable Set
SumColumn =
(Select Sum(Column)
From OtherTable
Where something = s.Something)
From SumTable s
Where Something In
(Select Distinct something From inserted
Union
Select Distinct Something From deleted)
or, you can separate the code for a delete from the code for an insert or update by writing separate triggers, or by:
Create Trigger For Insert, Update, Delete
On OtherTable
As
If Exists(Select * From inserted) And Update(Column)
Update SumTable Set
SumColumn =
(Select Sum(Column)
From OtherTable
Where something = s.Something)
From SumTable
Where Something In
(Select Distinct Something
From Inserted)
Else If Exists(Select * From deleted)
Update SumTable Set
SumColumn =
(Select Sum(Column)
From OtherTable
Where something = s.Something)
From SumTable
Where Something In
(Select Distinct Something
From deleted)
As Charles said, a trigger works well in this situation. If the sum of rows from other tables changes frequently however, I'm not sure if a trigger would cause performance issues. There are two other approaches:
Views - A view is essentially a saved query, and you query on it just like a table. If the sum data is only needed for reporting-type stuff, you may be better off removing the sum column from your main table and using the view for reporting
Stored Procedure - If you prefer to keep the column in the main table, you could run a stored procedure on a regular basis that keeps the sum information up-to-date for all rows.
I would compare performance between the view idea and the trigger idea before deciding which to use. Do this against the full data set you expect the view to have, not just a small test set of data. Make sure to index the view if it is possible.