Moving data from one column to another in PostgreSQL - sql

Sometimes, one might want to move some data from one column to another. By moving (in constrast to copying), I mean that the new column was originally null before doing the operation, and the old column should be set to null after doing the operation.
I have a table defined as such:
CREATE TABLE photos(id BIGSERIAL PRIMARY KEY, photo1 BYTEA, photo2 BYTEA);
Suppose there is an entry in the table where photo1 contains some data, and photo2 is NULL. I would like to make an UPDATE query wuch that photo1 becomes NULL and photo2 contains the data that was originally in photo1.
I issue the following SQL command (WHERE clause left out for brevity):
UPDATE photos SET photo2 = photo1, photo1 = NULL;
It seems to work.
I also tried it this way:
UPDATE photos SET photo1 = NULL, photo2 = photo1;
It also seems to work.
But is it guaranteed to work? Specifically, could photo1 be set to NULL before photo2 is set to photo1, thereby causing me to end up with NULL in both columns?
As an aside, this standard UPDATE syntax seems inefficient when my BYTEAs are large, as photo2 has to be copied byte-by-byte from photo1, when a simple swapping of pointers might have sufficed. Maybe there is a more efficient way that I don't know about?

This is definitely safe.
Column-references in the UPDATE refer to the old columns, not the new values. There is in fact no way to reference a computed new value from another column.
See, e.g.
CREATE TABLE x (a integer, b integer);
INSERT INTO x (a,b) VALUES (1,1), (2,2);
UPDATE x SET a = a + 1, b = a + b;
results in
test=> SELECT * FROM x;
a | b
---+---
2 | 2
3 | 4
... and the ordering of assignments is not significant. If you try to multiply-assign a value, you'll get
test=> UPDATE x SET a = a + 1, a = a + 1;
ERROR: multiple assignments to same column "a"
because it makes no sense to assign to the same column multiple times, given that both expressions reference the old tuple values, and order is not significant.
However, to avoid a full table rewrite in this case, I would just ALTER TABLE ... ALTER COLUMN ... RENAME ... then CREATE the new column with the old name.

Related

UPDATE two columns with new value under large size table

We have table like :
mytable (pid, string_value, int_value)
This table has more than 20M rows in total. Now we have a feature try to mark all the rows from this tables as invalid. So we need update the table columns: string_Value = NULL and int_value = 0 which indicate this is invalid row ( we still want to keep the pid as it is important to us)
So what is the best way?
I use the following SQL:
UPDATE Mytable
SET string_value = NULL,
int_value = 0;
but this query takes more than 4 minutes in my test env. Is there any better way we can improve it?
Updating all the rows can be quite expensive. Often, it is faster to empty the table and reload it.
In generic SQL this looks like:
create table mytable_temp as
select pid
from mytable;
truncate table mytable; -- back it up first!
insert into mytable (pid, string_value, int_value)
select pid, null, 0
from mytable_temp;
The creation of the temporary table may use different syntax, depending on our database.
Updates can take time to complete. Another way of achieving this is to follow the following steps:
Add new columns with the values you need set as the default value
Drop the original columns
Rename the new columns with the names of the original columns.
You can then drop the default values on the new columns.
This needs to be tested as different DBMSs allow different levels of table alters (i.e. not all DMBSs allow a drop default or a drop column).

cannot insert value NULL into column error shows wrong column name

I've added a new column(NewValue) to my table which holds an int and allows nulls. Now I want to update the column but my insert statement only attempts to update the first column in the table not the one I specified.
I basically start with a temp table that I put my initial data into and it has two columns like this:
create table #tempTable
(
OldValue int,
NewValue int
)
I then do an insert into that table and based on the information NewValue can be null.
Example data in #tempTable:
OldValue NewValue
-------- --------
34556 8765432
34557 7654321
34558 null
Once that's complete I planned to insert NewValue into the primary table like so:
insert into myPrimaryTable(NewValue)
select tt.NewValue from #tempTable tt
left join myPrimaryTable mpt on mpt.Id = tt.OldValue
where tt.NewValue is not null
I only want the NewValue to insert into rows in myPrimaryTable where the Id matches the OldValue. However when I try to execute this code I get the following error:
Cannot insert the value NULL into column 'myCode', table 'myPrimaryTable'; column does not allow nulls. INSERT fails.
But I'm not trying to insert into 'myCode', I specified 'NewValue' as the column but it doesn't seem to see it. I've checked NewValue and it is set to allow int and is set to allow null and it does exist on the right table in the right database. The column 'myCode' is actually the second column in the table. Could someone please point me in the right direction with this error?
Thanks in advance.
INSERT always creates new rows, it never modifies existing rows. If you skip specifying a value for a column in an INSERT and that column has no DEFAULT bound to it and is not identity, that column will be NULL in the new row--thus your error. I believe you might be looking for an UPDATE instead of an INSERT.
Here's a potential query that might work for you:
UPDATE mpt
SET
mpt.NewValue = tt.NewValue
FROM
myPrimaryTable mpt
INNER JOIN #tempTable tt
ON mpt.Id = tt.OldValue -- really?
WHERE
tt.NewValue IS NOT NULL;
Note that I changed it to an INNER JOIN. A LEFT JOIN is clearly incorrect since you are filtering #tempTable for only rows with values, and don't want to update mpt where there is no match to tt--so LEFT JOIN expresses the wrong logical join type.
I put "really?" as a comment on the ON clause since I was wondering if OldValue is really an Id. It probably is--you know your table best. It just raised a mild red flag in my mind to see an Id column being compared to a column that does not have Id in its name (so if it is correct, I would suggest OldId as a better column choice than OldValue).
Also, I recommend that you never name a column just Id again--column names should be the same in every table in the database. Also, when it comes join time you will be more likely to make mistakes when your columns from different tables can coincide. It is much better to follow the format of SomethingId in the Something table, instead of just Id. Correspondingly, the suggested old column name would be OldSomethingId.

change ID number to smooth out duplicates in a table

I have run into this problem that I'm trying to solve: Every day I import new records into a table that have an ID number.
Most of them are new (have never been seen in the system before) but some are coming in again. What I need to do is to append an alpha to the end of the ID number if the number is found in the archive, but only if the data in the row is different from the data in the archive, and this needs to be done sequentially, IE, if 12345 is seen a 2nd time with different data, I change it to 12345A, and if 12345 is seen again, and is again different, I need to change it to 12345B, etc.
Originally I tried using a where loop where it would put all the 'seen again' records in a temp table, and then assign A first time, then delete those, assign B to what's left, delete those, etc., till the temp table was empty, but that hasn't worked out.
Alternately, I've been thinking of trying subqueries as in:
update table
set IDNO= (select max idno from archive) plus 1
Any suggestions?
How about this as an idea? Mind you, this is basically pseudocode so adjust as you see fit.
With "src" as the table that all the data will ultimately be inserted into, and "TMP" as your temporary table.. and this is presuming that the ID column in TMP is a double.
do
update tmp set id = id + 0.01 where id in (select id from src);
until no_rows_changed;
alter table TMP change id into id varchar(255);
update TMP set id = concat(int(id), chr((id - int(id)) * 100 + 64);
insert into SRC select * from tmp;
What happens when you get to 12345Z?
Anyway, change the table structure slightly, here's the recipe:
Drop any indices on ID.
Split ID (apparently varchar) into ID_Num (long int) and ID_Alpha (varchar, not null). Make the default value for ID_Alpha an empty string ('').
So, 12345B (varchar) becomes 12345 (long int) and 'B' (varchar), etc.
Create a unique, ideally clustered, index on columns ID_Num and ID_Alpha.
Make this the primary key. Or, if you must, use an auto-incrementing integer as a pseudo primary key.
Now, when adding new data, finding duplicate ID number's is trivial and the last ID_Alpha can be obtained with a simple max() operation.
Resolving duplicate ID's should now be an easier task, using either a while loop or a cursor (if you must).
But, it should also be possible to avoid the "Row by agonizing row" (RBAR), and use a set-based approach. A few days of reading Jeff Moden articles, should give you ideas in that regard.
Here is my final solution:
update a
set IDnum=b.IDnum
from tempimiportable A inner join
(select * from archivetable
where IDnum in
(select max(IDnum) from archivetable
where IDnum in
(select IDnum from tempimporttable)
group by left(IDnum,7)
)
) b
on b.IDnum like a.IDnum + '%'
WHERE
*row from tempimport table = row from archive table*
to set incoming rows to the same IDnum as old rows, and then
update a
set patient_account_number = case
when len((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)))= 7 then a.IDnum + 'A'
else left(a.IDnum,7) + char(ascii(right((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)),1))+1)
end
from tempimporttable a
where not exists ( *select rows from archive table* )
I don't know if anyone wants to delve too far into this, but I appreciate contructive criticism...

Doing UPSERT when row is referenced by a FK

Let's say that I have a table of items, and for each item, there can be additional information stored for it, which goes into a second table. The additional information is referenced by a FK in the first table, which can be NULL (if the item doesn't have additional info).
TABLE item (
...
item_addtl_info_id INTEGER
)
CONSTRAINT fk_item_addtl_info FOREIGN KEY (item_addtl_info)
REFERENCES addtl_info (addtl_info_id)
TABLE addtl_info (
addtl_info_id INTEGER NOT NULL
GENERATED BY DEFAULT
AS IDENTITY (
INCREMENT BY 1
NO CACHE
),
addtl_info_text VARCHAR(100)
...
CONSTRAINT pk_addtl_info PRIMARY KEY (addtl_info_id)
)
What is the "best practice" to update an item's additional info (in IBM DB2 SQL, preferably)?
It should be an UPSERT operation, meaning that if additional info does not yet exist then a new record is created in the second table, but if it does, then it is only updated, and the FK in the first table does not change.
So imperatively, this is the logic:
UPSERT(item, item_info):
CASE WHEN item.item_addtl_info_id IS NULL THEN
INSERT INTO addtl_info (item_info)
UPDATE item.item_addtl_info_id (addtl_info.addtl_info_id)
^^^^^^^^^^^^^
ELSE
UPDATE addtl_info (item_info)
END
My main problem is how to get the newly inserted addtl_info row's id (underlined above). In a stored proc I can request the id from a sequence and store it in a variable, but maybe there is a more straightforward way. Isn't it something that comes up all the time when programming databases?
I mean, I'm really not interested in what the id of the addtl_info record is as long as it remains unique and is referenced properly. So using sequences seems a bit of an overkill to me in this case.
As a matter of fact, this UPSERT operation should be part of the SQL language as a standard operation (maybe it is, and I just don't know about it?)...
The syntax I was looking for is:
SELECT * FROM NEW TABLE ( INSERT INTO phone_book VALUES ( 'Peter Doe','555-2323' ) )
from Wikipedia (http://en.wikipedia.org/wiki/Insert_%28SQL%29)
This is how to refer to the record that was just inserted in the table.
My colleague called this construct an "in-place trigger", which what it really is...
Here is the first version that I put together as a compound SQL statement:
begin atomic
declare addtl_id integer;
set addtl_id = (select item_addtl_info_id from item where item.item_id = XXX);
if addtl_id is null
then
set addtl_id = (select addtl_info_id from new table
(insert into addtl_info
(addtl_info_text)
values ('My brand new additional info')
)
);
update item set item.item_addtl_info_id = addtl_id
where item.item_id = XXX;
else
update addtl_info set addtl_info_text = 'My updated additional info'
where addtl_info.addtl_info_id = addtl_id;
end if;
end
XXX being equal to the item id to be updated - this code can now be easily inserted into a sproc, and XXX can be converted to an input parameter.
I also tried using MERGE INTO, but I couldn't figure out a syntax for updating a table different from what was specified as the target.

Custom sort in SQL Server

I have a table where the results are sorted using an "ORDER" column, eg:
Doc_Id Doc_Value Doc_Order
1 aaa 1
12 xxx 5
2 bbb 12
3 ccc 24
My issue is to initially set up this order column as efficiently and reusably as possible.
My initial take was to set up a scalar function that could be used as a default value when a new entry is added to the table:
ALTER FUNCTION [dbo].[Documents_Initial_Order]
( )
RETURNS int
AS
BEGIN
RETURN (SELECT ISNULL(MAX(DOC_ORDER),0) + 1 FROM dbo.Documents)
When a user wants to permute 2 documents, I can then easily switch the 2 orders.
It works nicely, but I now have a second table I need to set up the same way, and I am quite sure there is a nicer way to do it. Any idea?
Based on your comment, I think you have a very workable solution. You could make it a little more userfriendly by specifying it as a default:
alter table documents
add constraint constraint_name
default (dbo.documents_initial_order()) for doc_order
As an alternative, you could create an update trigger that copies the identity field to the doc_order field after an insert:
create trigger Doc_Trigger
on Documents
for insert
as
update d
set d.doc_order = d.doc_id
from Documents d
inner join inserted i on i.doc_id = d.doc_id
Example defining doc_id as an identity column:
create table Documents (
doc_id int identity primary key,
doc_order int,
doc_value ntext
)
It sounds like you want an identity column that you can then override once it gets it initial value. One solution would be to have two columns, once call "InitialOrder", that is an auto-increment identity column, and then a second column called doc_order that initially is set to the same value as the InitialOrder field (perhaps even as part of the insert trigger or a stored procedure if you are doing inserts that way), but give the user the ability to edit that column.
It does require an extra few bytes per record, but solves your problem, and if its of any value at all, you would have both the inital document order and the user-reset order available.
Also, I am not sure if your doc_order needs to be unique or not, but if not, you can then sort return values by doc_order and InitialOrder to ensure a consistent return sequence.
If there is no need to have any control over what that DOC_ORDER value might be, try using an identity column.