How to delete Duplicates in MySQL table - sql

I've given a client the following query to delete duplicate phone no. records in an MSSQL database, but now they need to also do it on MySQL, and they report that MySQL complains about the format of the query. I've included the setup of a test table with duplicates for my code sample, but the actual delete query is what counts.
I'm asking this in ignorance and urgency, as I am still busy downloading and installing MySQL, and just maybe somebody can help in the mean time.
create table bkPhone
(
phoneNo nvarchar(20),
firstName nvarchar(20),
lastName nvarchar(20)
)
GO
insert bkPhone values('0783313780','Brady','Kelly')
insert bkPhone values('0845319792','Mark','Smith')
insert bkPhone values('0834976958','Bill','Jones')
insert bkPhone values('0845319792','Mark','Smith')
insert bkPhone values('0828329792','Mickey','Mouse')
insert bkPhone values('0834976958','Bill','Jones')
alter table bkPhone add phoneId int identity
delete from bkPhone
where phoneId not in
(
select min(phoneId)
from bkPhone
group by phoneNo,firstName,lastName
having count(*) >= 1
)

Many ways lead to Rome. This is one. It is very fast. So you can use it with big databases. Don't forget the indeces.
The trick is: make phoneNo unique and use "ignore".
drop table if exists bkPhone_template;
create table bkPhone_template (
phoneNo varchar(20),
firstName varchar(20),
lastName varchar(20)
);
insert into bkPhone_template values('0783313780','Brady','Kelly');
insert into bkPhone_template values('0845319792','Mark','Smith');
insert into bkPhone_template values('0834976958','Bill','Jones');
insert into bkPhone_template values('0845319792','Mark','Smith');
insert into bkPhone_template values('0828329792','Mickey','Mouse');
insert into bkPhone_template values('0834976958','Bill','Jones');
drop table if exists bkPhone;
create table bkPhone like bkPhone_template;
alter table bkPhone add unique (phoneNo);
insert ignore into bkPhone (phoneNo,firstName,lastName) select phoneNo,firstName,lastName from bkPhone_template;
drop table bkPhone_template;
If the data table already exists, then you only have to run a create table select with a following insert ignore select. At the end you have to run some table renaming statements. That's all.
This workaround is much,much faster then a delete operation.

You can select out the unique ones by:
select distinct(phoneNo) from bkPhone
and put them into another table, delete the old table and rename the new one to the old name.

MySQL complains, because it makes no sense. You trying to aggregate using min() column by which you group.
Now, if you're trying to delete duplicate phone numbers for the same person, the SQL should be:
delete from bkPhone
where phoneId not in
(
select min(phoneId)
from bkPhone
group by firstName,lastName /* i.e. grouping by person and NOT grouping by phoneId */
having count(*) >= 1
)

Mysql also included:
http://mssql-to-postgresql.blogspot.com/2007/12/deleting-duplicates-in-postgresql-ms.html

Related

Add a new column in table with a sequence - Oracle

I have a table that has 60 million rows of data. I would like to introduce a new column say "id" for the table that is an auto incremented sequence.
For example:
CREATE TABLE Persons (
LastName varchar(255),
FirstName varchar(255)
);
INSERT INTO Persons VALUES ('abc', 'def');
INSERT INTO Persons VALUES ('abcd', 'ghi');
CREATE SEQUENCE "PERSON_SEQUENCE" START WITH 1 INCREMENT BY 1;
ALTER TABLE PERSONS ADD (PERSONID NUMBER);
UPDATE persons SET personid = PERSON_SEQUENCE.NEXTVAL;
In the above sql statements, I am able to create a sequence then alter the table and update it.
Since the amount of data I need to update is large.. I would like to perform this with as much low cost as possible.
I am trying to do so something like this:
ALTER TABLE PERSONS ADD (PERSONID NUMBER DEFAULT(PERSON_SEQUENCE.NEXTVAL));
but the above does not work. Oracle throws me the below error:
Error starting at line :
1 in command - ALTER TABLE PERSONS ADD (PERSONID NUMBER
DEFAULT(PERSON_SEQUENCE.NEXTVAL)) Error report -
ORA-00984: column not allowed here
00984. 00000 - "column not allowed here"
*Cause:
*Action:
However this works:
ALTER TABLE PERSONS ADD (PERSONID NUMBER DEFAULT(0));
Could some one help me with how I can achieve to alter a table (create a new column) and populate the column with a seq id both in a single sql. Thank you!
For a table with 60 million rows, I would not do an add column + insert, but create the table new:
RENAME persons TO persons_old;
CREATE TABLE Persons (
personid number,
LastName varchar(255),
FirstName varchar(255)
);
INSERT INTO persons (personid, lastname, firstname)
SELECT person_sequence.nextval, lastname, firstname
FROM persons_old;
DROP TABLE persons_old;
If this is still taking too long, speak to your DBA about ALTER TABLE NOLOGGING and INSERT /*+ APPEND */ and PARALLEL DML.
EDIT: Ah, yes, for 60 million you could even increase the cache size of the sequence for the initial assignment:
ALTER SEQUENCE PERSON_SEQUENCE CACHE 1000;
This worked for me:
alter table PERSONS add (PERSON_ID number default PERSON_SEQ.nextval);

SQL Server trigger can't insert

I beginning to learn how to write trigger with this basic database.
I'm also making my very 1st database.
Schema
Team:
TeamID int PK (TeamID int IDENTITY(0,1) CONSTRAINT TeamID_PK PRIMARY KEY)
TeamName nvarchar(100)
History:
HistoryID int PK (HistoryID int IDENTITY(0,1) CONSTRAINT HistoryID_PK PRIMARY KEY)
TeamID int FK REF Team(TeamID)
WinCount int
LoseCount int
My trigger: when a new team is inserted, it should insert a new history row with that team id
CREATE TRIGGER after_insert_Player
ON Team
FOR INSERT
AS
BEGIN
INSERT INTO History (TeamID, WinCount, LoseCount)
SELECT DISTINCT i.TeamID
FROM Inserted i
LEFT JOIN History h ON h.TeamID = i.TeamID
AND h.WinCount = 0 AND h.LoseCount = 0
END
Executed it returns
The select list for the INSERT statement contains fewer items than the insert list. The number of SELECT values must match the number of INSERT columns.
Please help thank. I'm using SQL Server
The error text is the best guide, it is so clear ..
You try inserting one value from i.TeamID into three columns (TeamID,WinCount,LoseCount)
consider these WinCount and LoseCount while inserting.
Note: I Think the structure of History table need to revisit, you should select WinCount and LoseCount as Expressions not as actual columns.
When you specify insert columns, you say which columns you will be filling. But in your case, right after insert you select only one column (team id).
You either have to modify the insert to contain only one column, or select, to retrieve 3 fields as in insert.
If you mention the columns where values have to be inserted(Using INSERT-SELECT).
The SELECT Statement has to contain the same number of columns that have been specified to be inserted. Also, ensure they are of the same data type.(You might face some issues otherwise)

Insert data from one table to another table while the target table has a primary key

In SQL Server I have a table as RawTable (temp) which gets fed by a CVS, let's say it has 22 columns in it. Then, I need to copy existing records (ONLY FEW COLUMNs NOT ALL) into another table as Visitors which is not temporary table.
Visitor table has an ID column as INT and that is primary key and incremental.
RawData table
id PK, int not null
VisitorDate Varchar(10)
VisitorTime Varchar(11)
Visitors table
VisitorID, PK, big int, not null
VisitorDate, Varchar(10), null
VisitorTime Varchar(11), null
So I did:
insert into [dbo].[Visitors] ( [VisitorDate], [VisitorTime])
select [VisitorDate], [VisitorTime]
from RawTable /*this is temp table */
Seems SQL Server doesn't like this method so it throws
Msg 515, Level 16, State 2, Line 1
Cannot insert the value NULL into column 'VisitorID', table 'TS.dbo.Visitors'; column does not allow nulls. INSERT fails. The statement has been terminated.
How can I keep Sql Server not to complain about the primary key? this column as you know better will be fed by sql server itself.
Any idea?
Just because your visitors table has an ID column that is the primary key doesn't mean that the server will supply your ID values for you. if you want SQL to provide the ID's then you need to alter the table definition and make the visitorsId column an IDENTITY column.
Otherwise, you can psuedo-create these id's during the insert with the ROW_NUMBER function -
DECLARE #maxId INT;
SELECT #maxId = (SELECT MAX(visitorsId) FROM dbo.visitors);
INSERT INTO [dbo].[Visitors] ( [visitorsId],[VisitorDate], [VisitorTime])
SELECT #maxId + ROW_NUMBER() OVER (ORDER BY visitorDate), [VisitorDate], [VisitorTime]
from RawTable /*this is temp table */

Insert and alter in one statement

I'd like to store a set of data into a database but if it's a pre-existing record, I'd like to alter it. Otherwise, create a new one. Is there a combine statement for that? (Haven't got any when googling.)
Right now, the best I have is to check if already exists and then perform one of the operations. Seems cumbersome to me.
create table Stuff (
Id int identity(1001, 1) primary key clustered,
Beep int unique,
Boop nvarchar(50))
IN MYSQL :
You may use INSERT ... ON DUPLICATE KEY UPDATE .
eg:
INSERT INTO table (a,b,c) VALUES (4,5,6)
ON DUPLICATE KEY UPDATE c=9;
For more information: http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html
MySQL uses INSERT... ON DUPLICATE KEY and MSSQL uses MERGE
MERGE is supported by Azure, and I can highly recommend this blog article on it, as a good intro to the statement
Here is a merge statement based on the schema provided...
create table #Stuff (
Id int identity(1001, 1) primary key clustered,
Beep int unique,
Boop nvarchar(50),
Baap nvarchar(50)
);
INSERT INTO #Stuff VALUES (1,'boop', 'poop');
INSERT INTO #Stuff VALUES (2,'beep', 'peep');
SELECT * FROM #STUFF;
MERGE #Stuff
USING (VALUES(1,'BeepBeep','PeepPeep')) AS TheNewThing(A,B,C)
ON #Stuff.Beep = TheNewThing.A
WHEN MATCHED THEN UPDATE SET #Stuff.Boop = TheNewThing.B, #Stuff.Baap = 'fixed'
WHEN NOT MATCHED THEN INSERT (Beep,Boop,Baap) VALUES (
TheNewThing.A, TheNewThing.B, TheNewThing.C);
SELECT * FROM #STUFF
I also found a really good SO Q which might make good further reading
yes you can easily do it using pl/sql here is sample code which will help you
http://docs.oracle.com/cd/B10501_01/appdev.920/a96624/01_oview.htm#7106

How to add data to two tables linked via a foreign key?

If I were to have 2 tables, call them TableA and TableB. TableB contains a foreign key which refers to TableA. I now need to add data to both TableA and TableB for a given scenario. To do this I first have to insert data in TableA then find and retrieve TableA's last inserted primary key and use it as the foreign key value in TableB. I then insert values in TableB. This seems lika a bit to much of work just to insert 1 set of data. How else can I achieve this? If possible please provide me with SQL statements for SQL Server 2005.
That sounds about right. Note that you can use SCOPE_IDENTITY() on a per-row basis, or you can do set-based operations if you use the INSERT/OUTPUT syntax, and then join the the set of output from the first insert - for example, here we only have 1 INSERT (each) into the "real" tables:
/*DROP TABLE STAGE_A
DROP TABLE STAGE_B
DROP TABLE B
DROP TABLE A*/
SET NOCOUNT ON
CREATE TABLE STAGE_A (
CustomerKey varchar(10),
Name varchar(100))
CREATE TABLE STAGE_B (
CustomerKey varchar(10),
OrderNumber varchar(100))
CREATE TABLE A (
Id int NOT NULL IDENTITY(51,1) PRIMARY KEY,
CustomerKey varchar(10),
Name varchar(100))
CREATE TABLE B (
Id int NOT NULL IDENTITY(1123,1) PRIMARY KEY,
CustomerId int,
OrderNumber varchar(100))
ALTER TABLE B ADD FOREIGN KEY (CustomerId) REFERENCES A(Id);
INSERT STAGE_A VALUES ('foo', 'Foo Corp')
INSERT STAGE_A VALUES ('bar', 'Bar Industries')
INSERT STAGE_B VALUES ('foo', '12345')
INSERT STAGE_B VALUES ('foo', '23456')
INSERT STAGE_B VALUES ('bar', '34567')
DECLARE #CustMap TABLE (CustomerKey varchar(10), Id int NOT NULL)
INSERT A (CustomerKey, Name)
OUTPUT INSERTED.CustomerKey,INSERTED.Id INTO #CustMap
SELECT CustomerKey, Name
FROM STAGE_A
INSERT B (CustomerId, OrderNumber)
SELECT map.Id, b.OrderNumber
FROM STAGE_B b
INNER JOIN #CustMap map ON map.CustomerKey = b.CustomerKey
SELECT * FROM A
SELECT * FROM B
If you work directly with SQL you have the right solution.
In case you're performing the insert from code, you may have higher level structures that help you achieve this (LINQ, Django Models, etc).
If you are going to do this in direct SQL, I suggest creating a stored procedure that takes all of the data as parameters, then performs the insert/select identity/insert steps inside a transaction. Even though the process is still the same as your manual inserts, using the stored procedure will allow you to more easily use it from your code. As #Rax mentions, you may also be able to use an ORM to get similar functionality.