Update/Create table from select with changes in the column values in Oracle 11g (speed up) - sql

At the job we have an update script for some Oracle 11g database that takes around 20 hours, and some of the most demanding queries are updates where we change some values, something like:
UPDATE table1 SET
column1 = DECODE(table1.column1,null,null,'no info','no info','default value'),
column2 = DECODE(table1.column2,null,null,'no info','no info','another default value'),
column3 = 'default value';
And like this, we have many updates. The problem is that the tables have around 10 millions of rows. We also have some updates where some columns are going to have a default value but they are nullable (I know if they have the not null and a default constrains then the add of such columns is almost immediate because the values are in a catalog), and then the update or add of such columns is costing a lot of time.
My approach is to recreate the table (as TOM said in https://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:6407993912330 ). But I have no idea on how to retrive some columns from the original table, that are going to remain the same and also other that are going to change to a default value (and before the update such column had a sensible info), this because we need to keep some info private.
So, my approach is something like this:
CREATE TABLE table1_tmp PARALLEL NOLOGGING
AS (select col1,col2,col3,col4 from table1);
ALTER TABLE table1_tmp ADD ( col5 VARCHAR(10) default('some info') NOT NULL;
ALTER TABLE table1_tmp ADD ( col6 VARCHAR(10) default('some info') NOT NULL;
ALTER TABLE table1_tmp ADD ( col7 VARCHAR(10);
ALTER TABLE table1_tmp ADD ( col8 VARCHAR(10);
MERGE INTO table1_tmp tt
USING table1 t
ON (t.col1 = tt.col1)
WHEN MATCHED THEN
UPDATE SET
tt.col7 = 'some defaul value that may be null',
tt.col7 = 'some value that may be null';
I also have tried to create the nullable values as not null to do it fast, and worked, the problem is when I return the columns to null, then that operation takes too much time. The last code ended up consuming also a great amount of time (more tha one hour in the merge).
Hope have an idea on how to improve performance in stuff like this.
Thanks in advance!

Maybe you can try using NVL while joining in merge:
MERGE INTO table1_tmp tt
USING table1 t
ON (nlv(t.col1,'-3') = nvl(tt.col1,'-3'))
WHEN MATCHED THEN ....
If you don't want update null values you can also do like this:
MERGE INTO table1_tmp tt
USING table1 t
ON (nlv(t.col1,'-3') = nvl(tt.col1,'-2'))
WHEN MATCHED THEN .....

At the end, I finished creating a temp table with data from the original table, and while doing the create, inserting the default values and decodes and any other stuff, like if I wanted to set something to NULL, I did the cast. Something like:
CREATE TABLE table1_tmp AS (
column1 default "default message",
column2, --This column with no change at all
column3, --This will take the value from the decode below
) AS SELECT
"default message" column1,
column2 --This column with no change at all,
decode(column3, "Something", NULL, "A", "B") column3,
FROM table1;
That is how I solved the problem. The time for coping a 23 million row's table was about 3 to 5 minutes, while updating used to take hours. Now just need to set privileges, constraints, indexes, comment, and that's it, but that stuff just takes seconds.
Thanks for the answer #thehazal could not check your approach, but sounds interesting.

Related

INSERT statement is changing integers to null values

I'm creating a table with 3 million rows of data and 9 columns.
I am using the following syntax to insert my data.
INSERT INTO myTable
( column1
column2
column3
...
column11
problemColumn
)
Select
<exampleQuery>
1 column (which I will refer to as problemColumn) inserts 1.2 million null values into this table
When I run exampleQuery on its own (not inserting it into the table), problemColumn returns 0 null values.
problemColumn is correctly defined as an integer when the table is created
problemColumn has 300,000 distinct values. Each value appears in the table at least once, which means that it can't be an issue of a poorly-formatted value
There is no obvious pattern of values being systematically deleted
Edit: Some additional clarifications:
There are no calculations or joins done on problemColumn. I am simply selecting that variable from another table
problemColumn is an integer in the source table, so it is not an issue of a mismatched variable type
Could this be an issue with the size of the table in the database? I cannot comprehend why a query would fundamentally change from an insert statement.
Most likely cause (I've done it myself) is fat fingers - the column you're inserting is in the wrong position. Hard to verify without seeing the actual code, but it might be as simple as:
insert into table
(column1,
column2)
select
(column2,
column1)
from somewhere
Second possibility - there's a trigger on the destination table, which is changing the data. One of the many reasons I hate triggers.
I don't know Teradata, but the point of an RDBMS is to be able to handle exactly this scenario, so it's very unlikely it's anything to do with the size. To verify this, please try to limit the query to 1 result, and see what happens.
If that doesn't work, please convert the results of that query into an insert statement using "values"
INSERT INTO myTable
( column1
column2
column3
...
column11
problemColumn
)
values
(....)

ORA-00947 - not enough values: Occurs in one server but not another

I am work on a project which has to add one column to the exist table.
It is like this:
The OLD TBL Layout
OldTbl(
column1 number(1) not null,
column2 number(1) not null
);
SQL TO Create the New TBL
create table NewTbl(
column1 number(1) not null,
column2 number(1) not null,
**column3 number(1)**
);
When I try to insert the data by the SQL below,
on one oracle server,it was successful executed,
but on another oracle server, I got "ORA-00947 error: not enough values"
insert into NewTbl select
column1,
column2
from OldTbl;
Is there any oracle option may cause this kind of difference in oracle?
ORA-00947: not enough values
this is the error you received, which means, your table actually has more number of columns than you specified in the INSERT.
Perhaps, you didn't add the column in either of the servers.
There is also a different syntax for INSERT, which is more readable. Here, you mention the column names as well. So, when such a SQL is issued, unless a NOT NULL column is missed out, the INSERT still work, having null updated in missed columns.
INSERT INTO TABLE1
(COLUMN1,
COLUMN2)
SELECT
COLUMN1,
COLUMN2
FROM
TABLE2
insert into NewTbl select
column1,
column2
from OldTbl;
The above query is wrong, because your new table has three columns, however, your select has only two columns listed. Had the number and the order of the columns been same, then you could have achieved it.
If the number of the columns, and the order of the columns are different, then you must list down the column names in the correct order explicitly.
I would prefer CTAS(create table as select) here, it would be faster than the insert.
CREATE TABLE new_tbl AS
SELECT column1, column2, 1 FROM old_tbl;
You could use NOLOGGING and PARALLEL to increase the performance.
CREATE TABLE new_tbl NOLOGGING PARALLEL 4 AS
SELECT column1, column2, 1 FROM old_tbl;
This will create the new table will 3 columns, the first two columns will have data from the old table, and the third column will have value as 1 for all rows. You could keep any value for the third column as per your choice. I kept it as 1 because you wanted the third column as data type NUMBER(1).

Update if different/changed

Is it possible to perform an update statement in sql, but only update if the updates are different?
for example
if in the database, col1 = "hello"
update table1 set col1 = 'hello'
should not perform any kind of update
however, if
update table1 set col1 = "bye"
this should perform an update.
During query compilation and execution, SQL Server does not take the time to figure out whether an UPDATE statement will actually change any values or not. It just performs the writes as expected, even if unnecessary.
In the scenario like
update table1 set col1 = 'hello'
you might think SQL won’t do anything, but it will – it will perform all of the writes necessary as if you’d actually changed the value. This occurs for both the physical table (or clustered index) as well as any non-clustered indexes defined on that column. This causes writes to the physical tables/indexes, recalculating of indexes and transaction log writes. When working with large data sets, there is huge performance benefits to only updating rows that will receive a change.
If we want to avoid the overhead of these writes when not necessary we have to devise a way to check for the need to be updated. One way to check for the need to update would be to add something like “where col <> 'hello'.
update table1 set col1 = 'hello' where col1 <> 'hello'
But this would not perform well in some cases, for example if you were updating multiple columns in a table with many rows and only a small subset of those rows would actually have their values changed. This is because of the need to then filter on all of those columns, and non-equality predicates are generally not able to use index seeks, and the overhead of table & index writes and transaction log entries as mentioned above.
But there is a much better alternative using a combination of an EXISTS clause with an EXCEPT clause. The idea is to compare the values in the destination row to the values in the matching source row to determine if an update is actually needed. Look at the modified query below and examine the additional query filter starting with EXISTS. Note how inside the EXISTS clause the SELECT statements have no FROM clause. That part is particularly important because this only adds on an additional constant scan and a filter operation in the query plan (the cost of both is trivial). So what you end up with is a very lightweight method for determining if an UPDATE is even needed in the first place, avoiding unnecessary write overhead.
update table1 set col1 = 'hello'
/* AVOID NET ZERO CHANGES */
where exists
(
/* DESTINATION */
select table1.col1
except
/* SOURCE */
select col1 = 'hello'
)
This looks overly complicated vs checking for updates in a simple WHERE clause for the simple scenerio in the original question when you are updating one value for all rows in a table with a literal value. However, this technique works very well if you are updating multiple columns in a table, and the source of your update is another query and you want to minimize writes and transaction logs entries. It also performs better than testing every field with <>.
A more complete example might be
update table1
set col1 = 'hello',
col2 = 'hello',
col3 = 'hello'
/* Only update rows from CustomerId 100, 101, 102 & 103 */
where table1.CustomerId IN (100, 101, 102, 103)
/* AVOID NET ZERO CHANGES */
and exists
(
/* DESTINATION */
select table1.col1
table1.col2
table1.col3
except
/* SOURCE */
select z.col1,
z.col2,
z.col3
from #anytemptableorsubquery z
where z.CustomerId = table1.CustomerId
)
The idea is to not perform any update if a new value is the same as in DB right now
WHERE col1 != #newValue
(obviously there is also should be some Id field to identify a row)
WHERE Id = #Id AND col1 != #newValue
PS: Originally you want to do update only if value is 'bye' so just add AND col1 = 'bye', but I feel that this is redundant, I just suppose
PS 2: (From a comment) Also note, this won't update the value if col1 is NULL, so if NULL is a possibility, make it WHERE Id = #Id AND (col1 != #newValue OR col1 IS NULL).
If you want to change the field to 'hello' only if it is 'bye', use this:
UPDATE table1
SET col1 = 'hello'
WHERE col1 = 'bye'
If you want to update only if it is different that 'hello', use:
UPDATE table1
SET col1 = 'hello'
WHERE col1 <> 'hello'
Is there a reason for this strange approach? As Daniel commented, there is no special gain - except perhaps if you have thousands of rows with col1='hello'. Is that the case?
This is possible with a before-update trigger.
In this trigger you can compare the old with the new values and cancel the update if they don't differ. But this will then lead to an error on the caller's site.
I don't know, why you want to do this, but here are several possibilities:
Performance: There is no performance gain here, because the update would not only need to find the correct row but additionally compare the data.
Trigger: If you want the trigger only to be fired if there was a real change, you need to implement your trigger like so, that it compares all old values to the new values before doing anything.
CREATE OR REPLACE PROCEDURE stackoverflow([your_value] IN TYPE) AS
BEGIN
UPDATE [your_table] t
SET t.[your_collumn] = [your_value]
WHERE t.[your_collumn] != [your_value];
COMMIT;
EXCEPTION
[YOUR_EXCEPTION];
END stackoverflow;
You need an unique key id in your table, (let's suppose it's value is 1) to do something like:
UPDATE table1 SET col1="hello" WHERE id=1 AND col1!="hello"
Old question but none of the answers correctly address null values.
Using <> or != will get you into trouble when comparing values for differences if there are is potential null in the new or old value to safely update only when changed use the is distinct from operator in Postgres. Read more about it here
I think this should do the trick for ya...
create trigger [trigger_name] on [table_name]
for insert
AS declare #new_val datatype,#id int;
select #new_val = i.column_name from inserted i;
select #id = i.Id from inserted i;
update table_name set column_name = #new_val
where table_name.Id = #id and column_name != #new_val;

How to use multiple identity numbers in one table?

I have an web application that creates printable forms, these forms have a unique number on them, the problem is I have 2 forms that separate numbers need to be created for them.
ie)
Form1- Numbered 2000000-2999999
Form2- Numbered 3000000-3999999
dbo.test2 - is my form information table
Tsel - is my autoinc table for the 3000000 series numbers
Tadv - is my autoinc table for the 2000000 series numbers
What I have done is create 2 tables with just autoinc row (one for 2000000 series numbers and one for 3000000 series numbers), I then created a trigger to add a record to the coresponding table, read back the autoinc number and add it to my table that stores the form information including the just created autoinc number for the right series of forms.
Although it does work, I'm concerned that the numbers will get messed up under load.
I'm not sure the ##IDENTITY will always return the right value when many people are using the system. (I cannot have duplicates and I need to use the numbering form show above.
See code below.
**** TRIGGER ****
CREATE TRIGGER MAKEANID2 ON dbo.test2
AFTER INSERT
AS
SET NOCOUNT ON
declare #someid int
declare #someid2 int
declare #startfrom int
declare #test1 varchar(10)
select #someid=##IDENTITY
select #test1 = (Select name1 from test2 where sysid = #someid )
if #test1 = 'select'
begin
insert into Tsel Default values
select #someid2 = ##IDENTITY
end
if #test1 = 'adv'
begin
insert into Tadv Default values
select #someid2 = ##IDENTITY
end
update test2
set name2=(#someid2) where sysid = #someid
SET NOCOUNT OFF
The best way to keep the two IDs in sync is to create a persisted Computed Column based on the actual identity column. Where Col1 is the identity column and Col2 is the persisted computed column that is the result of some formula based on Col1. You can then even Create Indexes on Computed Columns.
test this out:
CREATE TABLE YourTable
(Col1 int not null identity(2000000,1)
,Col2 AS (Col1-2000000+3000000) PERSISTED
,Col3 varchar(5)
)
GO
insert into YourTable (col3) values ('a')
insert into YourTable (col3) SELECT 'b' UNION SELECT 'c'
SELECT * FROM YourTable
OUTPUT:
Col1 Col2 Col3
----------- ----------- -----
2000000 3000000 a
2000001 3000001 b
2000002 3000002 c
(3 row(s) affected)
EDIT After OPs comments, I'm still not 100% sure what you are after.
I never used SQL Server 2000 (we skipped that version), and I don't really want to look up how to do everything in that version, it is so limited without the OUTPUT clause and ROW_NUMBER(), CTEs, etc.
I can think of three methods to do:
1) You could just create a sequence table, where you have 2 rows one for A and one for B, each time you need to insert one, look up, increment, and save the value of the type of seq you need and then insert with that value. for example if you are inserting a type "A" row, do this:
INSERT INTO test2
(col1, col2, col3,...)
SELECT
ISNULL(MAX(NextSeq),0)+1, col2, col3,...
FROM YourSequenceTable WITH (UPDLOCK, HOLDLOCK)
WHERE SequenceType='A'
UPDATE YourSequenceTable
SET NextSeq=ISNULL(NextSeq,0)+1
WHERE SequenceType='A'
2) change your table structure to just save the data in Tsel or Tadv and have a trigger insert into a third common table table where you can have your additional "common" identity. common table would be like
CommonTable
ID int not null indentity(1,1) primary key
TselID int null FK to Tsel.PK
TadvID int null FK to Tadv.PK
3) if you need a single table, try this, which is a real hack. Change your Tsel and Tadv tables to contain all the necessary columns and from the application INSERT INTO Tsel when the value is select and have a trigger grab that identity value and then INSERT that into test2, then remove the data from tsel. Then, from the application when the value is adv just INSERT INTO Tadv an have a trigger on that table insert the data into test2, and remove the data from Tadv. You need to have all data columns in Tsel and Tadv so the trigger can copy the values to test2, but the trigger will remove the rows from there (the identity will be sequential even if the original rows are removed).
your Tsel trigger would look like:
CREATE Trigger MAKEANID2_Tsel ON dbo.Tsel
AFTER INSERT
AS
--copy data from Tsel into test2., test2 can still have its own identity value
INSERT INTO test2
(PK, col1, col2, col3,...)
SELECT
col0, col1, col2, col3,....
FROM INSERTED
--remove rows from Tsel, which were just copied and not needed anymore.
DELETE Tsel
WHERE PK IN (SELECT PK FROM INSERTED)
GO
YOu are right to worry about ##identity, it is not a recommended peice of code, if somone else adds a differnet trigger that inserets an identity and that one triggers first, that is the value you will get.
But you have much bigger problems. Your trigger is deisgned to work on only one record ata time. This is a very very very bad thing to do with a trigger. Triggers operate on sets of data and must ALWAYS even if you think therer will never be more than one record inserted ata time) be set up to handle sets of data not one record. Further, you don;t need to ask for the identity, you have the identities of all records inserted inteh batch in a psuedotable availlble in triggers called inserted.
Now reading one of your comments, you say you can't have any missing values at all. Inthat case you cannot under any circustance use an identity column as it will have gaps if any transaction is rolled back. You will have to write your own process to create the numbers based onteh last number and look out for race conditions.

Add column2 before column1

i have table structure with 3 colums (column1, column2, column3) and i want to put another column with sql statement like:
alter table tbl_name add column4
but i need to put it between column1 and column2
may i do something like in MYSQL:
alter table tbl_name add column4 after column1
I don't think SQL Server allows you to do anything like that. If you want to put a column in the middle, you'll need to create a new table with the desired layout, migrate the data, delete the old table, and rename the new table.
#bdukes is correct. This is essentially what SSMS does when you add a column in any place other than the last position of the table. You could, however, achieve something similar using views. That is, simply add your column to the end of the table, then create a view which has the columns in a different order. Use the view instead of the actual table. I only offer this as one alternative that can be useful in certain situations. I'm not necessarily recommending it for your situation. Generally, I use the designer in SSMS and haven't had any problems with it updating tables when inserting a column. Backups, of course, are your friend!
A more fundamental question is why you want to do this? Other than the default display of columns in the SQL Enterprise manager, the order of columns is irrelevant. You can order the columns output by a SQL query any way you want, no matter how the columns are 'natively' ordered inside the database.
In fact, from an academic perspective, one of the cardinal properties of a RDBMS 'Relation' (the academic name for a table), is that the 'attributes' (columns) of a relation are unordered.
This is not to say that wanting the 'default' order to be a certain way is not reason enough... I often drop and recreate tables for exactly that reason... But just understand that the order doesn't mater for anything else.
in sql server 2005 you can save a change script of table changes, so for instance i created a test table, with col1, col2 and col3. Then added col4 between col1 and col2. I saved the change scripts and this is what it generated.
Have a look
/* To prevent any potential data loss issues, you should review this script in detail before running it outside the context of the database designer.*/
BEGIN TRANSACTION
SET QUOTED_IDENTIFIER ON
SET ARITHABORT ON
SET NUMERIC_ROUNDABORT OFF
SET CONCAT_NULL_YIELDS_NULL ON
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
COMMIT
BEGIN TRANSACTION
GO
CREATE TABLE dbo.Tmp_zzzzzzzz
(
col1 nchar(10) NULL,
col4 nchar(10) NULL,
col2 nchar(10) NULL,
col3 nchar(10) NULL
) ON [PRIMARY]
GO
IF EXISTS(SELECT * FROM dbo.zzzzzzzz)
EXEC('INSERT INTO dbo.Tmp_zzzzzzzz (col1, col2, col3)
SELECT col1, col2, col3 FROM dbo.zzzzzzzz WITH (HOLDLOCK TABLOCKX)')
GO
DROP TABLE dbo.zzzzzzzz
GO
EXECUTE sp_rename N'dbo.Tmp_zzzzzzzz', N'zzzzzzzz', 'OBJECT'
GO
COMMIT
seems like there is no way to simply add the column. Also, be aware of the transaction, this will drop the original table, so it is good to use a transaction