I have a set of records indexed by id numbers, I need to convert these record's indexes to a new id number. I have a two column table mapping the old numbers to the new numbers.
For example given these two tables, what would the update statement look like?
Given:
OLD_TO_NEW
oldid | newid
-----------------
1234 0987
7698 5645
... ...
and
id | data
----------------
1234 'yo'
7698 'hey'
... ...
Need:
id | data
----------------
0987 'yo'
5645 'hey'
... ...
This oracle so I have access to PL/SQL, I'm just trying to avoid it.
I'd have a unique index on OLD_TO_NEW.oldid and update on an inline view:
update (select id,
newid
from old_to_new,
my_table
where my_table.id = old_to_new.oldid)
set id = newid
UPDATE base_table SET id = (SELECT newid FROM old_to_new WHERE oldid = id)
That's how I would do it in MySQL and I think that's pretty standard.
Both of the update statements from Ramon and David Aldridge should work fine, but based on the number of records to be updated it may be faster to work with a temp table like this:
create table temp as (
select newid, data
from old_to_new join my_table on my_table.id = old_to_new.oldid);
Then truncate the old table and copy the temp table into the old table or drop the old table and rename the temp table. (Note: Add some extra statements to handle for records where you don't have new values.)
First make a database backup. I'd also personally make a table backup in a work database in case something goes wrong and you need to get back to the old way in a hurry.
The next concern is do you have related tables that will also need these ids? If no then you can update using an update statement. Write your update statment so that you can do a select and make sure that it will update properly. If you are doing a lot of records, you may want to do this in batches, say 1000 records at a time. ONe situation you may need to watch out for is if the values of the ids overlap a straight update might not work (you'll run into the unique index). In this case you need to add a column, populate it with the new values, then delte the old column and rename the new one. YOu will also need to script out all indexes, fks etc. becasue you will also need to rerun them.
Related tables becomes much more complicated, but the new column is the best way to go inthis case as well.
This is how I would do it in Microsoft SQL Server 2005. Haven't had access to an Oracle database in years so this may not work for Oracle.
UPDATE target_table
SET id = newid
FROM OLD_TO_NEW
WHERE target_table.id = OLD_TO_NEW.oldid;
You may want to index the OLD_TO_NEW.oldid so the update join can run efficiently.
Related
I have a table people with less than 100,000 records and I have taken a backup of this table using the following:
create table people_backup as select * from people
I add some new records to my people table over time, but eventually I want to merge the records from my backup table into people. Unfortunately I cannot simply DROP my table as my new records will be lost!
So I want to update the records in my people table using the records from people_backup, based on their primary key id and I have found 2 ways to do this:
MERGE the tables together
use some sort of fancy correlated update
Great! However, both of these methods use SET and make me specify what columns I want to update. Unfortunately I am lazy and the structure of people may change over time and while my CTAS statement doesn't need to be updated, my update/merge script will need changes, which feels like unnecessary work for me.
Is there a way merge entire rows without having to specify columns? I see here that not specifying columns during an INSERT will direct SQL to insert values by order, can the same methodology be applied here, is this safe?
NB: The structure of the table will not change between backups
Given that your table is small, you could simply
DELETE FROM table t
WHERE EXISTS( SELECT 1
FROM backup b
WHERE t.key = b.key );
INSERT INTO table
SELECT *
FROM backup;
That is slow and not particularly elegant (particularly if most of the data from the backup hasn't changed) but assuming the columns in the two tables match, it does allow you to not list out the columns. Personally, I'd much prefer writing out the column names (presumably those don't change all that often) so that I could do an update.
I have a table with 32 Million rows and 31 columns in PostgreSQL 9.2.10. I am altering the table by adding columns with updated values.
For example, if the initial table is:
id initial_color
-- -------------
1 blue
2 red
3 yellow
I am modifying the table so that the result is:
id initial_color modified_color
-- ------------- --------------
1 blue blue_green
2 red red_orange
3 yellow yellow_brown
I have code that will read the initial_color column and update the value.
Given that my table has 32 million rows and that I have to apply this procedure on five of the 31 columns, what is the most efficient way to do this? My present choices are:
Copy the column and update the rows in the new column
Create an empty column and insert new values
I could do either option with one column at a time or with all five at once. The columns types are either character varying or character.
The columns types are either character varying or character.
Don't use character, that's a misunderstanding. varchar is ok, but I would suggest just text for arbitrary character data.
Any downsides of using data type "text" for storing strings?
Given that my table has 32 million rows and that I have to apply this
procedure on five of the 31 columns, what is the most efficient way to do this?
If you don't have objects (views, foreign keys, functions) depending on the existing table, the most efficient way is create a new table. Something like this ( details depend on the details of your installation):
BEGIN;
LOCK TABLE tbl_org IN SHARE MODE; -- to prevent concurrent writes
CREATE TABLE tbl_new (LIKE tbl_org INCLUDING STORAGE INCLUDING COMMENTS);
ALTER tbl_new ADD COLUMN modified_color text
, ADD COLUMN modified_something text;
-- , etc
INSERT INTO tbl_new (<all columns in order here>)
SELECT <all columns in order here>
, myfunction(initial_color) AS modified_color -- etc
FROM tbl_org;
-- ORDER BY tbl_id; -- optionally order rows while being at it.
-- Add constraints and indexes like in the original table here
DROP tbl_org;
ALTER tbl_new RENAME TO tbl_org;
COMMIT;
If you have depending objects, you need to do more.
Either was, be sure to add all five at once. If you update each in a separate query you write another row version each time due to the MVCC model of Postgres.
Related cases with more details, links and explanation:
Updating database rows without locking the table in PostgreSQL 9.2
Best way to populate a new column in a large table?
Optimizing bulk update performance in PostgreSQL
While creating a new table you might also order columns in an optimized fashion:
Calculating and saving space in PostgreSQL
Maybe I'm misreading the question, but as far as I know, you have 2 possibilities for creating a table with the extra columns:
CREATE TABLE
This would create a new table and filling could be done using
CREATE TABLE .. AS SELECT.. for filling with creation or
using a separate INSERT...SELECT... later on
Both variants are not what you seem to want to do, as you stated solution without listing all the fields.
Also this would require all data (plus the new fields) to be copied.
ALTER TABLE...ADD ...
This creates the new columns. As I'm not aware of any possibility to reference existing column values, you will need an additional UPDATE ..SET... for filling in values.
So, I' not seeing any way to realize a procedure that follows your choice 1.
Nevertheless, copying the (column) data just to overwrite them in a second step would be suboptimal in any case. Altering a table adding new columns is doing minimal I/O. From this, even if there would be a possibility to execute your choice 1, following choice 2 promises better performance by factors.
Thus, do 2 statements one ALTER TABLE adding all your new columns in on go and then an UPDATE providing the new values for these columns will achieve what you want.
create new column (modified colour), it will have a value of NULL or blank on all records,
run an update statement, assuming your table name is 'Table'.
update table
set modified_color = 'blue_green'
where initial_color = 'blue'
if I am correct this can also work like this
update table set modified_color = 'blue_green' where initial_color = 'blue';
update table set modified_color = 'red_orange' where initial_color = 'red';
update table set modified_color = 'yellow_brown' where initial_color = 'yellow';
once you have done this you can do another update (assuming you have another column that I will call modified_color1)
update table set 'modified_color1'= 'modified_color'
I have a large table (about 40M Rows) where I had a number of columns that are 0 which need to be null instead so we can better key the data.
I've written scripts to look chop the update into chunks of 10000 records, find the occurance of the columns with zero and update them to null.
Example:
update FooTable
set order_id = case when order_id = 0 then null else order_id end,
person_id = case when person_id = 0 then null else person_id end
WHERE person_id = 0
OR order_id = 0
This works great, but it takes for ever.
I thinking the better way to do this would be to create a second table and insert the data into it and then rename it to replace the old table with the columns having zero.
Question is - can I do a insert into table2 select from table1 and in the process cleanse the data from table1 before it goes in?
You can usually create a new, sanitised, table, depending on the actual DB server you are using.
The hard thing is that if there are other tables in the database, you may have issues with foreign keys, indexes, etc which will refer to the original table.
Whether making a new sanitised table will be quicker than updating your existing table is something you can only tell by trying it.
Dump the pk/clustered key of all the records you want to update into a temp table. Then perform the update joining to the temp table. That will ensure the lowest locking level and quickest access. You can also add an identity column to the temp table, than you can loop through and do the updates in batches.
We hare having around 20,80,000 records in the table.
We needed to add new column to it and we added that.
Since this new column needs to be primary key and we want to update all rows with Sequence
Here's the query
BEGIN
FOR loop_counter IN 1 .. 211 LOOP
update user_char set id = USER_CHAR__ID_SEQ.nextval where user_char.id is null and rownum<100000;
commit;
END LOOP;
end;
But it'w now almost 1 day completed. still the query is running.
Note: I am not db developer/programmer.
Is there anything wrong with this query or any other query solution (quick) to do the same job?
First, there does not appear to be any reason to use PL/SQL here. It would be more efficient to simply issue a single SQL statement to update every row
UPDATE user_char
SET id = USER_CHAR__ID_SEQ.nextval
WHERE id IS NULL;
Depending on the situation, it may also be more efficient to create a new table and move the data from the old table to the new table in order to avoid row migration, i.e.
ALTER TABLE user_char
RENAME TO user_char_old;
CREATE TABLE user_char
AS
SELECT USER_CHAR__ID_SEQ.nextval, <<list of other columns>>
FROM user_char;
<<Build indexes on user_char>>
<<Drop and recreate any foreign key constraints involving user_char>>
If this was a large table, you could use parallelism in the CREATE TABLE statement. It's not obvious that you'd get a lot of benefit from parallelism with a small 2 million row table but that might shave a few seconds off the operation.
Second, if it is taking a day to update a mere 2 million rows, there must be something else going on. A 2 million row table is pretty small these days-- I can populate and update a 2 million row table on my laptop in somewhere between a few seconds and a few minutes. Are there triggers on this table? Are there foreign keys? Are there other sessions updating the rows? What is the query waiting on?
I have a table with 4 columns. The first column is unique for each row, but it's a string (URL format).
I want to update my table, but instead of using "WHERE", I want to update the rows in order.
The first query will update the first row, the second query updates the second row and so on.
What's the SQL code for that? I'm using Sqlite.
Edit: My table schema
CREATE table (
url varchar(150),
views int(5),
clicks int(5)
)
Edit2: What I'm doing right now is a loop of SQL queries
update table set views = 5, click = 10 where url = "http://someurl.com";
There is around 4 million records in the database. It's taking around 16 seconds in my server to make the update. Since the loop update the row in order, so the first query update the first row; I'm thinking if updating the rows in order could be faster than using the WHERE clause which needs to browse 4 million rows.
You can't do what you want without using WHERE as this is the only way to select rows from a table for reading, updating or deleting. So you will want to use:
UPDATE table SET url = ... WHERE url = '<whatever>'
HOWEVER... SqlLite has an extra feature - the autogenerated column, ROWID. You can use this column in queries. You don't see this data by default, so if you want the data within it you need to explicitly request it, e.g:
SELECT ROWID, * FROM table
What this means is that you may be able to do what you want referencing this column directly:
UPDATE table SET url = ... WHERE ROWID = 1
you still need to use the WHERE clause, but this allows you to access the rows in insert order without doing anything else.
CAVEAT
ROWID effectively stores the INSERT order of the rows. If you delete rows from the table, the ROWIDs for remaining rows will NOT change - hence it is possible to have gaps in the ROWID sequence. This is by design and there is no workaround short of re-creating the table and re-populating the data.
PORTABILITY
Note that this only applies to SQLite - you may not be able to do the same thing with other SQL engines should you ever need to port this. It would be MUCH better to add an EXPLICIT auto-number column (aka an IDENTITY field) that you can use and manage.