PostgreSQL update query - sql

I need to update table in my database. For sake of simplicity lets assume that table's name is tab and it has 2 columns: id (PRIMARY KEY, NOT NULL) and col (UNIQUE VARCHAR(300)). I need to update table this way:
id col
----------------------------------------------------
1 'One two three'
2 'One twothree'
3 'One two three'
4 'Remove white spaces'
5 'Something'
6 'Remove whitespaces '
to:
id col
----------------------------------------------------
1 'Onetwothree'
2 'Removewhitespaces'
3 'Something'
Id numbers and order of the rows after update is not important and can be different. I use PostgreSQL. Some of the columns are FOREIGN KEYs. That's why dropping UNIQUE constraint from col would be troublesome.

I think just using replace in this format will do what you want.
update tab
set col = replace(col, ' ', '');
Here's a SQLFiddle for it.

You shouldn't be using the non-descriptive column name id, even if some half-wit ORMs are in the habit of doing that. I use tab_id instead for this demo.
I interpret your description this way: You have other tables with FK columns pointing to tab.col. Like table child1 in my example below.
To clean up the mess, do all of this in a single session to preserve the temporary table I use. Better yet, do it all in a single transaction.
Update all referencing tables to have all referencing rows point to the "first" (unambiguously! - how ever you define that) in a set of going-to-be duplicates in tab.
Create a translation table up to be used for all updates:
CREATE TEMP TABLE up AS
WITH t AS (
SELECT tab_id, col, replace(col, ' ', '') AS col1
,row_number() OVER (PARTITION BY replace(col, ' ', '')
ORDER BY tab_id) AS rn
FROM tab
)
SELECT b.col AS old_col, a.col AS new_col
FROM (SELECT * FROM t WHERE rn = 1) a
JOIN (SELECT * FROM t WHERE rn > 1) b USING (col1);
Then update all your referencing tables.
UPDATE child1 c
SET col = up.new_col
FROM up
WHERE c.col = up.old_col;
-- more tables?
-> SQLfiddle
Now, all references point to the "first" in a group of dupes, and you have got your license to kill the rest.
Remove duplicate rows except the first from tab.
DELETE FROM tab t
USING up
WHERE t.col = up.old_col
Be sure that all referencing FK constraints have the ON UPDATE CASCADE clause.
ALTER TABLE child1 DROP CONSTRAINT child1_col_fkey;
ALTER TABLE child1 ADD CONSTRAINT child1_col_fkey FOREIGN KEY (col)
REFERENCES tab (col)
ON UPDATE CASCADE;
-- more tables?
Sanitize your values by removing white space
UPDATE tab
SET col = replace(col, ' ', '');
This only takes care of good old space characters (ASCII value 32, Unicode U+0020). Do you have others?
All FK constraints should be pointing to tab.tab_id to begin with. Your tables would be smaller and faster and all of this would be easier.

I solved it much easier then Erwin. I don't have SQL on my computer to test it but something like that worked for me:
DELETE FROM tab WHERE id IN (
SELECT id FROM (
SELECT id, col, row_number() OVER (PARTITION BY regexp_replace(col, '[ \t\n]*', '')) AS c WHERE c > 1;
)
)
UPDATE tab SET col = regexp_replace(col, '[ \t\n]*', '');

Related

SQL: Dedupe table data and manipulate merged data

I have an SQL table with:
Id INT, Name NVARCHAR(MAX), OldName NVARCHAR(MAX)
There are multiple duplicates in the name column.
I would like to remove these duplicates keeping only one master copy of 'Name'. When the the dedupe happens I want to concatenate the old names into the OldName field.
E.G:
Dave | Steve
Dave | Will
Would become
Dave | Steve, Will
After merging.
I know how to de-dupe data using something like:
with x as (select *,rn = row_number()
over(PARTITION BY OrderNo,item order by OrderNo)
from #temp1)
select * from x
where rn > 1
But not sure how to update the new 'master' record whilst I am at it.
This is really too complicated to do in a single update, because you need to update and delete rows.
select n.name,
stuff((select ',' + t2.oldname
from sqltable t2
where t2.name = n.name
for xml path (''), type
).value('/', 'nvarchar(max)'
), 1, 1, '') as oldnames
into _temp
from (select distinct name from sqltable) n;
truncate table sqltable;
insert into sqltable(name, oldnames)
select name, oldnames
from _temp;
Of course, test, test, test before deleting the old table (copy it for safe keeping). This doesn't use a temporary table. That way, if something happens -- like a server reboot -- before the insert is finished, you still have all the data.
Your question doesn't specify what to do with the id column. You can add min(id) or max(id) to the _temp if you want to use one of those values.

SQL: pull distinct values form 1 column with all values from 2nd column

Its easier to explain what I need to do with an example;
table looks like this
Col 1, Col 2
1, a
1, b
2, a
2, b
2, c
I need a query to return something like
1,a,b
2,a,b,c
You would want a line such as:
UPDATE t
SET t.dupcustodians = dt.custadmin
FROM tbldoc t
INNER JOIN (SELECT t1._dupid,
(SELECT DISTINCT custadmin + ', '
FROM tbldoc t2
WHERE t2._dupid = t1._dupid
ORDER BY custadmin + ', '
FOR XML PATH('')) AS custadmin
FROM tbldoc t1
GROUP BY _dupid) AS dt
ON t._dupid = dt._dupid
;
I had a similar problem where everything had a name in the "CustAdmin" field and then they all had potentially duplicate _DupID values. I wanted it to list out in a new field "DupCustodians" all the names that were there when the _DupID values were alike from one record to the next. So swap those names with the field names you need (and don't forget to change the table names, of course) and you should be good.
Well, if you are using MySQL, then you can do this:
SELECT Col1, GROUP_CONCAT(Col2)
FROM MyTable
GROUP BY Col1
Other databases that don't have the MySQL specific GROUP_CONCAT function might require a more complex query.

Tricky MS Access SQL query to remove surplus duplicate records

I have an Access table of the form (I'm simplifying it a bit)
ID AutoNumber Primary Key
SchemeName Text (50)
SchemeNumber Text (15)
This contains some data eg...
ID SchemeName SchemeNumber
--------------------------------------------------------------------
714 Malcolm ABC123
80 Malcolm ABC123
96 Malcolms Scheme ABC123
101 Malcolms Scheme ABC123
98 Malcolms Scheme DEF888
654 Another Scheme BAR876
543 Whatever Scheme KJL111
etc...
Now. I want to remove duplicate names under the same SchemeNumber. But I want to leave the record which has the longest SchemeName for that scheme number. If there are duplicate records with the same longest length then I just want to leave only one, say, the lowest ID (but any one will do really). From the above example I would want to delete IDs 714, 80 and 101 (to leave only 96).
I thought this would be relatively easy to achieve but it's turning into a bit of a nightmare! Thanks for any suggestions. I know I could loop it programatically but I'd rather have a single DELETE query.
See if this query returns the rows you want to keep:
SELECT r.SchemeNumber, r.SchemeName, Min(r.ID) AS MinOfID
FROM
(SELECT
SchemeNumber,
SchemeName,
Len(SchemeName) AS name_length,
ID
FROM tblSchemes
) AS r
INNER JOIN
(SELECT
SchemeNumber,
Max(Len(SchemeName)) AS name_length
FROM tblSchemes
GROUP BY SchemeNumber
) AS w
ON
(r.SchemeNumber = w.SchemeNumber)
AND (r.name_length = w.name_length)
GROUP BY r.SchemeNumber, r.SchemeName
ORDER BY r.SchemeName;
If so, save it as qrySchemes2Keep. Then create a DELETE query to discard rows from tblSchemes whose ID value is not found in qrySchemes2Keep.
DELETE
FROM tblSchemes AS s
WHERE Not Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID);
Just beware, if you later use Access' query designer to make changes to that DELETE query, it may "helpfully" convert the SQL to something like this:
DELETE s.*, Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID)
FROM tblSchemes AS s
WHERE (((Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID))=False));
DELETE FROM Table t1
WHERE EXISTS (SELECT 1 from Table t2
WHERE t1.SchemeNumber = t2.SchemeNumber
AND Length(t2.SchemeName) > Length(t1.SchemeName)
)
Depend on your RDBMS you may use function different from Length (Oracle - length, mysql - length, sql server - LEN)
delete ShortScheme
from Scheme ShortScheme
join Scheme LongScheme
on ShortScheme.SchemeNumber = LongScheme.SchemeNumber
and (len(ShortScheme.SchemeName) < len(LongScheme.SchemeName) or (len(ShortScheme.SchemeName) = len(LongScheme.SchemeName) and ShortScheme.ID > LongScheme.ID))
(SQL Server flavored)
Now updated to include the specified tie resolution. Although, you may get better performance doing it in two queries: first deleting the schemes with shorter names as in my original query and then going back and deleting the higher ID where there was a tie in name length.
I'd do this in multiple steps. Large delete operations done in a single step make me too nervous -- what if you make a mistake? There's no sql 'undo' statement.
-- Setup the data
DROP Table foo;
DROP Table bar;
DROP Table bat;
DROP Table baz;
CREATE TABLE foo (
id int(11) NOT NULL,
SchemeName varchar(50),
SchemeNumber varchar(15),
PRIMARY KEY (id)
);
insert into foo values (714, 'Malcolm', 'ABC123' );
insert into foo values (80, 'Malcolm', 'ABC123' );
insert into foo values (96, 'Malcolms Scheme', 'ABC123' );
insert into foo values (101, 'Malcolms Scheme', 'ABC123' );
insert into foo values (98, 'Malcolms Scheme', 'DEF888' );
insert into foo values (654, 'Another Scheme ', 'BAR876' );
insert into foo values (543, 'Whatever Scheme ', 'KJL111' );
-- Find all the records that have dups, find the longest one
create table bar as
select max(length(SchemeName)) as max_length, SchemeNumber
from foo
group by SchemeNumber
having count(*) > 1;
-- Find the one we want to keep
create table bat as
select min(a.id) as id, a.SchemeNumber
from foo a join bar b on a.SchemeNumber = b.SchemeNumber
and length(a.SchemeName) = b.max_length
group by SchemeNumber;
-- Select into this table all the rows to delete
create table baz as
select a.id from foo a join bat b where a.SchemeNumber = b.SchemeNumber
and a.id != b.id;
This will give you a new table with only records for rows that you want to remove.
Now check these out and make sure that they contain only the rows you want deleted. This way you can make sure that when you do the delete, you know exactly what to expect. It should also be pretty fast.
Then when you're ready, use this command to delete the rows using this command.
delete from foo where id in (select id from baz);
This seems like more work because of the different tables, but it's safer probably just as fast as the other ways. Plus you can stop at any step and make sure the data is what you want before you do any actual deletes.
If your platform supports ranking functions and common table expressions:
with cte as (
select row_number()
over (partition by SchemeNumber order by len(SchemeName) desc) as rn
from Table)
delete from cte where rn > 1;
try this:
Select * From Table t
Where Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber )
And Id >
(Select Min (Id)
From Table
Where SchemeNumber = t.SchemeNumber
And SchemeName = t.SchemeName)
or this:,...
Select * From Table t
Where Id >
(Select Min(Id) From Table
Where SchemeNumber = t.SchemeNumber
And Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber))
if either of these selects the records that should be deleted, just change it to a delete
Delete
From Table t
Where Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber )
And Id >
(Select Min (Id)
From Table
Where SchemeNumber = t.SchemeNumber
And SchemeName = t.SchemeName)
or using the second construction:
Delete From Table t Where Id >
(Select Min(Id) From Table
Where SchemeNumber = t.SchemeNumber
And Len(SchemeName) <
(Select Max(Len(Schemename))
From Table
Where SchemeNumber = t.SchemeNumber))

Counting a cell up per Objects

i got a problem once again :D
a little info first:
im trying to copy data from one table to an other table(structure is the same).
now one cell needs to be incremented, beginns per group at 1 (just like a histroy).
i have this table:
create table My_Test/My_Test2 (
my_Id Number(8,0),
my_Num Number(6,0),
my_Data Varchar2(100));
(my_Id, my_Num is a nested PK)
if i want to insert a new row, i need to check if the value in my_id already exists.
if this is true then i need to use the next my_Num for this Id.
i have this in my Table:
My_Id My_Num My_Data
1 1 'test1'
1 2 'test2'
2 1 'test3'
if i add now a row for my_Id 1, the row would look like this:
i have this in my Table:
My_Id My_Num My_Data
1 3 'test4'
this sounds pretty easy ,now i need to make it in a SQL
and on SQL Server i had the same problem and i used this:
Insert Into My_Test (My_Id,My_Num,My_Data)
SELECT my_Id,
(
SELECT
CASE (
CASE MAX(a.my_Num)
WHEN NULL
THEN 0
Else Max(A.My_Num)
END) + b.My_Num
WHEN NULL
THEN 1
ELSE (
CASE MAX(a.My_Num)
WHEN NULL
THEN 0
Else Max(A.My_Num)
END) + b.My_Num
END
From My_Test A
where my_id = 1
)
,My_Data
From My_Test2 B
where my_id = 1;
this Select gives null back if no Rows are found in the subselect
is there a way so i could use max in the case? and if it give null back it should use 0 or 1?
Edit:
Im usung now this:
Insert INTO My_Test ( My_Id,My_Num,My_Data )
SELECT B.My_Id,
(
SELECT COALESCE(MAX(a.My_Num),0) + b.my_Num
FROM My_Test A
Where a.My_Id = b.My_Id)
,b.My_Data
FROM My_Test2 B
WHERE My_Id = 1
THX to Bharat and OMG Ponies
greets
Auro
Try this one
Insert Into My_Test (My_Id,My_Num,My_Data)
SELECT my_Id,(
SELECT MAX(NVL(My_Num,0)) + 1
From My_Test
where my_id = b.my_id
)
,My_Data
From My_Test2 B
where my_id = <your id>;
Insert Into My_Test (My_Id,My_Num,My_Data)
select My_id,coalesce(max(My_num),0),'test4' from My_Test
where My_id=1
group by My_id
All solutions have a problem in that they don't work in a multi user environment. If two sessions issue that insert statement at the same time, they would both get the same (my_id,my_num) combination, and one of them will fail with a ORA-00001 unique constraint violation. Therefore, if you need this to work in a multi user environment, the best advice is to use only one primary key column and populate it with a sequence. Keep your my_id column as well, as that is a sort-of-grouping column or foreign key column. If your end users really like to see the "my_num" column in their (web) application, you can use the row_number analytic function.
You can read more about this scenario in this blogpost of mine: http://rwijk.blogspot.com/2008/01/sequence-within-parent.html
Regards,
Rob.

Move SQL data from one table to another

I was wondering if it is possible to move all rows of data from one table to another, that match a certain query?
For example, I need to move all table rows from Table1 to Table2 where their username = 'X' and password = 'X', so that they will no longer appear in Table1.
I'm using SQL Server 2008 Management Studio.
Should be possible using two statements within one transaction, an insert and a delete:
BEGIN TRANSACTION;
INSERT INTO Table2 (<columns>)
SELECT <columns>
FROM Table1
WHERE <condition>;
DELETE FROM Table1
WHERE <condition>;
COMMIT;
This is the simplest form. If you have to worry about new matching records being inserted into table1 between the two statements, you can add an and exists <in table2>.
This is an ancient post, sorry, but I only came across it now and I wanted to give my solution to whoever might stumble upon this one day.
As some have mentioned, performing an INSERT and then a DELETE might lead to integrity issues, so perhaps a way to get around it, and to perform everything neatly in a single statement, is to take advantage of the [deleted] temporary table.
DELETE FROM [source]
OUTPUT [deleted].<column_list>
INTO [destination] (<column_list>)
All these answers run the same query for the INSERT and DELETE. As mentioned previously, this risks the DELETE picking up records inserted between statements and could be slow if the query is complex (although clever engines "should" make the second call fast).
The correct way (assuming the INSERT is into a fresh table) is to do the DELETE against table1 using the key field of table2.
The delete should be:
DELETE FROM tbl_OldTableName WHERE id in (SELECT id FROM tbl_NewTableName)
Excuse my syntax, I'm jumping between engines but you get the idea.
A cleaner representation of what some other answers have hinted at:
DELETE sourceTable
OUTPUT DELETED.*
INTO destTable (Comma, separated, list, of, columns)
WHERE <conditions (if any)>
Yes it is. First INSERT + SELECT and then DELETE orginals.
INSERT INTO Table2 (UserName,Password)
SELECT UserName,Password FROM Table1 WHERE UserName='X' AND Password='X'
then delete orginals
DELETE FROM Table1 WHERE UserName='X' AND Password='X'
you may want to preserve UserID or someother primary key, then you can use IDENTITY INSERT to preserve the key.
see more on SET IDENTITY_INSERT on MSDN
You should be able to with a subquery in the INSERT statement.
INSERT INTO table1(column1, column2) SELECT column1, column2 FROM table2 WHERE ...;
followed by deleting from table1.
Remember to run it as a single transaction so that if anything goes wrong you can roll the entire operation back.
Use this single sql statement which is safe no need of commit/rollback with multiple statements.
INSERT Table2 (
username,password
) SELECT username,password
FROM (
DELETE Table1
OUTPUT
DELETED.username,
DELETED.password
WHERE username = 'X' and password = 'X'
) AS RowsToMove ;
Works on SQL server make appropriate changes for MySql
Try this
INSERT INTO TABLE2 (Cols...) SELECT Cols... FROM TABLE1 WHERE Criteria
Then
DELETE FROM TABLE1 WHERE Criteria
You could try this:
SELECT * INTO tbl_NewTableName
FROM tbl_OldTableName
WHERE Condition1=#Condition1Value
Then run a simple delete:
DELETE FROM tbl_OldTableName
WHERE Condition1=#Condition1Value
You may use "Logical Partitioning" to switch data between tables:
By updating the Partition Column, data will be automatically moved to the other table:
here is the sample:
CREATE TABLE TBL_Part1
(id INT NOT NULL,
val VARCHAR(10) NULL,
PartitionColumn VARCHAR(10) CONSTRAINT CK_Part1 CHECK(PartitionColumn = 'TBL_Part1'),
CONSTRAINT TBL_Part1_PK PRIMARY KEY(PartitionColumn, id)
);
CREATE TABLE TBL_Part2
(id INT NOT NULL,
val VARCHAR(10) NULL,
PartitionColumn VARCHAR(10) CONSTRAINT CK_Part2 CHECK(PartitionColumn = 'TBL_Part2'),
CONSTRAINT TBL_Part2_PK PRIMARY KEY(PartitionColumn, id)
);
GO
CREATE VIEW TBL(id, val, PartitionColumn)
WITH SCHEMABINDING
AS
SELECT id, val, PartitionColumn FROM dbo.TBL_Part1
UNION ALL
SELECT id, val, PartitionColumn FROM dbo.TBL_Part2;
GO
--Insert sample to TBL ( will be inserted to Part1 )
INSERT INTO TBL
VALUES(1, 'rec1', 'TBL_Part1');
INSERT INTO TBL
VALUES(2, 'rec2', 'TBL_Part1');
GO
--Query sub table to verify
SELECT * FROM TBL_Part1
GO
--move the data to table TBL_Part2 by Logical Partition switching technique
UPDATE TBL
SET
PartitionColumn = 'TBL_Part2';
GO
--Query sub table to verify
SELECT * FROM TBL_Part2
Here is how do it with single statement
WITH deleted_rows AS (
DELETE FROM source_table WHERE id = 1
RETURNING *
)
INSERT INTO destination_table
SELECT * FROM deleted_rows;
EXAMPLE:
postgres=# select * from test1 ;
id | name
----+--------
1 | yogesh
2 | Raunak
3 | Varun
(3 rows)
postgres=# select * from test2;
id | name
----+------
(0 rows)
postgres=# WITH deleted_rows AS (
postgres(# DELETE FROM test1 WHERE id = 1
postgres(# RETURNING *
postgres(# )
postgres-# INSERT INTO test2
postgres-# SELECT * FROM deleted_rows;
INSERT 0 1
postgres=# select * from test2;
id | name
----+--------
1 | yogesh
(1 row)
postgres=# select * from test1;
id | name
----+--------
2 | Raunak
3 | Varun
If the two tables use the same ID or have a common UNIQUE key:
1) Insert the selected record in table 2
INSERT INTO table2 SELECT * FROM table1 WHERE (conditions)
2) delete the selected record from table1 if presents in table2
DELETE FROM table1 as A, table2 as B WHERE (A.conditions) AND (A.ID = B.ID)
It will create a table and copy all the data from old table to new table
SELECT * INTO event_log_temp FROM event_log
And you can clear the old table data.
DELETE FROM event_log
For some scenarios, it might be the easiest to script out Table1, rename the existing Table1 to Table2 and run the script to recreate Table1.