Massive update with where clause with calculated value for each row? - sql

I have a table with 400.000 rows and I added a new column on it. That table has an index id and I want to update each row with a different value that I have already calculated.
For example, I have the update statements like these:
UPDATE TABLE SET SECOND_NAME = 'Alfred' WHERE id = 510675;
UPDATE TABLE SET SECOND_NAME = 'Pedro' WHERE id = 123123;
UPDATE TABLE SET SECOND_NAME = 'Robert' WHERE id = 123123;
SECOND_NAME is the new column that I want to populate, lets say I have around 400.000 update statements, is there a way to massive update them in a faster way? If not, is there a way to know beforehand how long it could take to update them in that way?

I think your best bet, is probably to use an EXTERNAL TABLE. This AskTom answer basically gives you everything you need: AskTom. However, to customize it for your question. You could take the information you have, save it to a CSV and load it in as follows.
CREATE TABLE new_names (id NUMBER(10) PRIMARY KEY, second_name VARCHAR2(50))
ORGANIZATION EXTERNAL
(type oracle_loader
default directory data_dir
access parameters ( fields terminated by ',' )
location ('names.csv')
)
/
UPDATE (SELECT t.second_name empty_name, n.second_name loaded_name
FROM table_name t
INNER JOIN new_names n ON t.id = n.id)
SET empty_name = loaded_name;
/
All you need to do to make that work is create the directory and place your file in it.
After you are done, you can drop the the external table.

Related

UPDATE two columns with new value under large size table

We have table like :
mytable (pid, string_value, int_value)
This table has more than 20M rows in total. Now we have a feature try to mark all the rows from this tables as invalid. So we need update the table columns: string_Value = NULL and int_value = 0 which indicate this is invalid row ( we still want to keep the pid as it is important to us)
So what is the best way?
I use the following SQL:
UPDATE Mytable
SET string_value = NULL,
int_value = 0;
but this query takes more than 4 minutes in my test env. Is there any better way we can improve it?
Updating all the rows can be quite expensive. Often, it is faster to empty the table and reload it.
In generic SQL this looks like:
create table mytable_temp as
select pid
from mytable;
truncate table mytable; -- back it up first!
insert into mytable (pid, string_value, int_value)
select pid, null, 0
from mytable_temp;
The creation of the temporary table may use different syntax, depending on our database.
Updates can take time to complete. Another way of achieving this is to follow the following steps:
Add new columns with the values you need set as the default value
Drop the original columns
Rename the new columns with the names of the original columns.
You can then drop the default values on the new columns.
This needs to be tested as different DBMSs allow different levels of table alters (i.e. not all DMBSs allow a drop default or a drop column).

Insert various value

I have a table like this.
create table help(
id number primary key,
number_s integer NOT NULL);
I had to insert value 0 from id 1 and id 915 I solved this one in a simple way doing
update help set number_s=0 where id<=915;
This one was easy.
Now I have to set a numbers ( that change every row) from id 915 to last row.
I was doing
update help set number_s=51 where id=916;
update help set number_s=3 where id=917;
There are more than 1.000 row to be updated how can I do it very fast?
When I had this problem I used to use sequence to auto increment value like id (example
insert into help(id,number_s) values (id_sequence.nextval,16);
insert into help(id,number_s) values (id_sequence.nextval,48);
And so on but on this case it cannot be used because in this case id start from 915 and not 1...) How can I do it very fast? I hope it is clear the problem.
Since you have your ids and numbers in a file with a simple structure, it's a fairly small number, and assuming this is something you're going to do once, honestly what I would do would be to pull the file into Excel, use the text functions to build 1000 insert statements and cut and paste them wherever.
If those assumptions are incorrect, you could (1) use sqlldr to load this file into a temporary table and (2) run an update on your help table based on the rows in that temporary table.
As mentioned in previous answers and according to your comment that there is a file stored in your system, You can use the external table / SQL loader to achieve the result.
I am trying to show you the demo as follows:
-- Create an external table pointing to your file
CREATE TABLE "EXT_SEQUENCES" (
"ID" number ,
"number_s" number
)
ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "<directory name>" ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
BADFILE 'bad_file.txt'
LOGFILE 'log_file.txt'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' MISSING FIELD VALUES ARE NULL
) LOCATION ( '<file name>' )
) REJECT LIMIT UNLIMITED;
-- Now update your help table
MERGE INTO help H
USING EXT_SEQUENCES
ON ( H.ID = E.ID)
WHEN MATCHED THEN
UPDATE SET H.number_s = E.number_s;
Note: You need to change the access parameters of the external table according to your actual data in the file.
Hope you will get proper direction now.
Cheers!!

How Do I Deep Copy a Set of Data, and Change FK References to Point to All the Copies?

Suppose I have Table A and Table B. Table B references Table A. I want to deep copy a set of rows in Table A and Table B. I want all of the new Table B rows to reference the new Table A rows.
Note that I'm not copying the rows into any other tables. The rows in table A will be copied into table A, and the rows in table B will be copied into table B.
How can I ensure that the foreign key references get readjusted as part of the copy?
To clarify, I'm trying to find a generic way to do this. The example I'm giving involves two tables, but in practice the dependency graph may be much more complicated. Even a generic way to dynamically generate SQL to do the work would be fine.
UPDATE:
People are asking why this is necessary, so I'll give some background. It may be way too much, but here goes:
I'm working with an old desktop application that's been moved to a client-server model. But, the application still uses a rudimentary in-house binary file format for storing data for its tables. A data file is just a header followed by a series of rows, each of which is just the binary serialized field values, the order of which is determined by a schema text file. The only thing good about it is that it's very fast. It's terrible in every other respect. I'm moving the application to SQL Server and trying not to degrade the performance too badly.
This is a kind of scheduling application; the data's not critical to anybody, and there's no audit tracking, etc. necessary. It's not a supermassive amount of data, and we don't necessarily need to keep very old data around if the database grows too large.
One feature that they are accustomed to is the ability to duplicate entire schedules in order to create "what-if" scenarios that they can muck with. Any user can do this as many times as they want, as often as they want. In the old database, the data files for each schedule are stored in their own data folder, identified by name. So, copying a schedule was as simple as copying the data folder and renaming it.
I must be able to do effectively the same thing with SQL Server or the migration will not work. Maybe you're thinking that I can just only copy the data that actually gets changed in order to avoid redundancy; but that honestly sounds too complicated to be feasible.
To throw another wrench into the mix, there can be a hierarchy of schedule data folders. So, a data folder may contain a data folder, which may contain a data folder. And the copying can occur at any level.
In SQL Server, I'm implementing a nested set hierarchy to mimic this. I have a DATA_SET table like this:
CREATE TABLE dbo.DATA_SET
(
DATA_SET_ID UNIQUEIDENTIFIER PRIMARY KEY,
NAME NVARCHAR(128) NOT NULL,
LFT INT NOT NULL,
RGT INT NOT NULL
)
So, there's a tree structure of data sets. Each data set represents a schedule, and may contain child data sets. Every row in every table has a DATA_SET_ID FK reference, indicating which data set it belongs to. Whenever I copy a data set, I copy all the rows in the table for that data set, and every other data set, into the same table, but referencing new data sets.
So, here's a simple concrete example:
CREATE TABLE FOO
(
FOO_ID BIGINT PRIMARY KEY,
DATA_SET_ID BIGINT FOREIGN KEY REFERENCES DATA_SET(DATA_SET_ID) NOT NULL
)
CREATE TABLE BAR
(
BAR_ID BIGINT PRIMARY KEY,
DATA_SET_ID BIGINT FOREIGN KEY REFERENCES DATA_SET(DATA_SET_ID) NOT NULL,
FOO_ID UNIQUEIDENTIFIER PRIMARY KEY
)
INSERT INTO FOO
SELECT 1, 1 UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 1 UNION ALL
INSERT INTO BAR
SELECT 1, 1, 1
SELECT 2, 1, 2
SELECT 3, 1, 3
So, let's say I copy data set 1 into a new data set of ID 2. After I copy, the tables will look like this:
FOO
FOO_ID, DATA_SET_ID
1 1
2 1
3 1
4 2
5 2
6 2
BAR
BAR_ID, DATA_SET_ID, FOO_ID
1 1 1
2 1 2
3 1 3
4 2 4
5 2 5
6 2 6
As you can see, the new BAR rows are referencing the new FOO rows. It's not the rewiring of the DATA_SET_ID's that I'm asking about. I'm asking about rewiring the foreign keys in general.
So, that was surely too much information, but there you go.
I'm sure there are a lot of concerns about performance with the idea of bulk copying the data like this. The tables are not going to be huge. I'm not expecting more than 1000 records in any table, and most of the tables will be much much smaller than that. Old data sets can be deleted outright with no repercussions.
Thanks,
Tedderz
Here is an example with three tables that can probably get you started.
DB schema
CREATE TABLE users
(user_id int auto_increment PRIMARY KEY,
user_name varchar(32));
CREATE TABLE agenda
(agenda_id int auto_increment PRIMARY KEY,
`user_id` int, `agenda_name` varchar(7));
CREATE TABLE events
(event_id int auto_increment PRIMARY KEY,
`agenda_id` int,
`event_name` varchar(8));
An SP to clone a user with his agenda and events records
DELIMITER $$
CREATE PROCEDURE clone_user(IN uid INT)
BEGIN
DECLARE last_user_id INT DEFAULT 0;
INSERT INTO users (user_name)
SELECT user_name
FROM users
WHERE user_id = uid;
SET last_user_id = LAST_INSERT_ID();
INSERT INTO agenda (user_id, agenda_name)
SELECT last_user_id, agenda_name
FROM agenda
WHERE user_id = uid;
INSERT INTO events (agenda_id, event_name)
SELECT a3.agenda_id_new, e.event_name
FROM events e JOIN
(SELECT a1.agenda_id agenda_id_old,
a2.agenda_id agenda_id_new
FROM
(SELECT agenda_id, #n := #n + 1 n
FROM agenda, (SELECT #n := 0) n
WHERE user_id = uid
ORDER BY agenda_id) a1 JOIN
(SELECT agenda_id, #m := #m + 1 m
FROM agenda, (SELECT #m := 0) m
WHERE user_id = last_user_id
ORDER BY agenda_id) a2 ON a1.n = a2.m) a3
ON e.agenda_id = a3.agenda_id_old;
END$$
DELIMITER ;
To clone a user
CALL clone_user(3);
Here is SQLFiddle demo.
I recently found myself needing to solve a similar problem; that is, I needed to copy a set of rows in a table (Table A) as well as all of the rows in related tables which have foreign keys pointing to Table A's primary key. I was using Postgres so the exact queries may differ but the overall approach is the same. The biggest benefit of this approach is that it can be used recursively to go infinitely deep
TLDR: the approach looks like this
1) find all the related table/columns of Table A
2) copy the necessary data into temporary tables
3) create a trigger and function to propagate primary key column
updates to related foreign keys columns in the temporary tables
4) update the primary key column in the temporary tables to the next
value in the auto increment sequence
5) Re-insert the data back into the source tables, and drop the
temporary tables/triggers/function
1) The first step is to query the information schema to find all of the tables and columns which are referencing Table A. In Postgres this might look like the following:
SELECT tc.table_name, kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
ON ccu.constraint_name = tc.constraint_name
WHERE constraint_type = 'FOREIGN KEY'
AND ccu.table_name='<Table A>'
AND ccu.column_name='<Primary Key>'
2) Next we need to copy the data from Table A, and any other tables which reference Table A - lets say there is one called Table B. To start this process, lets create a temporary table for each of these tables and we will populate it with the data that we need to copy. This might look like the following:
CREATE TEMP TABLE temp_table_a AS (
SELECT * FROM <Table A> WHERE ...
)
CREATE TEMP TABLE temp_table_b AS (
SELECT * FROM <Table B> WHERE <Foreign Key> IN (
SELECT <Primary Key> FROM temp_table_a
)
)
3) We can now define a function that will cascade primary key column updates out to related foreign key columns, and trigger which will execute whenever the primary key column changes. For example:
CREATE OR REPLACE FUNCTION cascade_temp_table_a_pk()
RETURNS trigger AS
$$
BEGIN
UPDATE <Temp Table B> SET <Foreign Key> = NEW.<Primary Key>
WHERE <Foreign Key> = OLD.<Primary Key>;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trigger_temp_table_a
AFTER UPDATE
ON <Temp Table A>
FOR EACH ROW
WHEN (OLD.<Primary Key> != NEW.<Primary Key>)
EXECUTE PROCEDURE cascade_temp_table_a_pk();
4) Now we just update the primary key column in to the next value of the sequence of the source table (). This will activate the trigger, and the updates will be cascaded out to the foreign key columns in . In Postgres you can do the following:
UPDATE <Temp Table A>
SET <Primary Key> = nextval(pg_get_serial_sequence('<Table A>', '<Primary Key>'))
5) Insert the data back from the temporary tables back into the source tables. And then drop the temporary tables, triggers, and functions after that.
INSERT INTO <Table A> (SELECT * FROM <Temp Table A>)
INSERT INTO <Table B> (SELECT * FROM <Temp Table B>)
DROP TRIGGER trigger_temp_table_a
DROP cascade_temp_table_a_pk()
It is possible to take this general approach and turn it into a script which can be called recursively in order to go infinitely deep. I ended up doing just that using python (our application was using django so I was able to use the django ORM to make some of this easier)

change ID number to smooth out duplicates in a table

I have run into this problem that I'm trying to solve: Every day I import new records into a table that have an ID number.
Most of them are new (have never been seen in the system before) but some are coming in again. What I need to do is to append an alpha to the end of the ID number if the number is found in the archive, but only if the data in the row is different from the data in the archive, and this needs to be done sequentially, IE, if 12345 is seen a 2nd time with different data, I change it to 12345A, and if 12345 is seen again, and is again different, I need to change it to 12345B, etc.
Originally I tried using a where loop where it would put all the 'seen again' records in a temp table, and then assign A first time, then delete those, assign B to what's left, delete those, etc., till the temp table was empty, but that hasn't worked out.
Alternately, I've been thinking of trying subqueries as in:
update table
set IDNO= (select max idno from archive) plus 1
Any suggestions?
How about this as an idea? Mind you, this is basically pseudocode so adjust as you see fit.
With "src" as the table that all the data will ultimately be inserted into, and "TMP" as your temporary table.. and this is presuming that the ID column in TMP is a double.
do
update tmp set id = id + 0.01 where id in (select id from src);
until no_rows_changed;
alter table TMP change id into id varchar(255);
update TMP set id = concat(int(id), chr((id - int(id)) * 100 + 64);
insert into SRC select * from tmp;
What happens when you get to 12345Z?
Anyway, change the table structure slightly, here's the recipe:
Drop any indices on ID.
Split ID (apparently varchar) into ID_Num (long int) and ID_Alpha (varchar, not null). Make the default value for ID_Alpha an empty string ('').
So, 12345B (varchar) becomes 12345 (long int) and 'B' (varchar), etc.
Create a unique, ideally clustered, index on columns ID_Num and ID_Alpha.
Make this the primary key. Or, if you must, use an auto-incrementing integer as a pseudo primary key.
Now, when adding new data, finding duplicate ID number's is trivial and the last ID_Alpha can be obtained with a simple max() operation.
Resolving duplicate ID's should now be an easier task, using either a while loop or a cursor (if you must).
But, it should also be possible to avoid the "Row by agonizing row" (RBAR), and use a set-based approach. A few days of reading Jeff Moden articles, should give you ideas in that regard.
Here is my final solution:
update a
set IDnum=b.IDnum
from tempimiportable A inner join
(select * from archivetable
where IDnum in
(select max(IDnum) from archivetable
where IDnum in
(select IDnum from tempimporttable)
group by left(IDnum,7)
)
) b
on b.IDnum like a.IDnum + '%'
WHERE
*row from tempimport table = row from archive table*
to set incoming rows to the same IDnum as old rows, and then
update a
set patient_account_number = case
when len((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)))= 7 then a.IDnum + 'A'
else left(a.IDnum,7) + char(ascii(right((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)),1))+1)
end
from tempimporttable a
where not exists ( *select rows from archive table* )
I don't know if anyone wants to delve too far into this, but I appreciate contructive criticism...

increase Ids in table

I would like to increase all ids in my table by 1000 cause I need to insert there data from other table with excactly the same ids. What is the best way to do that?
update dbo.table set id = id + 1000
go
The best way to go is to not do that. You have to change all related records as well and if you are using identities it gets even more complicated. If you do anything wrong you will seriousl mess up your data integrity. I would suggest that the data you want to insert is the data that needs to have the values changed and if you need to relate back to the data in another tbale, store the original ID in a new field in the table called something like table2id or database2id. If you can't change the existing table, then you can use a lookup table that has both the old id value and the new one.
Under no circumstances should you attempt something of this nature without taking a backup first.
First as HLGEM it seems to be a bad id (think about your foreign keys on id's you must add 1000 to them to).
Second dbo.table has become sys.tables in Server 2008.
Finally you'll need to find the foreign keys columns with this request :
SELECT name,OBJECT_NAME(object_id)
FROM sys.columns
WHERE name like '%id' or name like 'id%'
--depends on where is 'id' in your columns names
name : the column name, OBJECT_NAME : the table name
And update the whole thing (with a tricky request that should looks like this one, but i didn't test with the "update" command) :
CREATE TABLE #TablesWithIds (
columnName varchar(100),
tableName varchar(100)
)
Insert into #TablesWithIds
SELECT name as columnName,OBJECT_NAME(object_id) as tableName
FROM sys.columns
WHERE name like '%id%'
update #TablesWithIds.tableName set #TablesWithIds.columnName = #TablesWithIds.columnName +1000
drop table #TablesWithIds