Compare unique values from two mysql tables - sql

I have two mysql tables: TableA has 10,000 records TableB has 2,000 records.
I want to copy the 8,000 unique records from TableA into TableB ignoring the 2,000 in TableB which have already been copied.

If uniqueness is determined by PRIMARY KEY constraint or UNIQUE constraint, then you can use INSERT IGNORE:
INSERT IGNORE INTO TableB SELECT * FROM TableA;
The rows that are duplicates and that conflict with rows already in TableB will be silently skipped and the other 8,000 rows should be inserted.
See the docs on INSERT for more details.
If you need to do this in PHP, read about the array_diff_key() function. Store your arrays with the primary key values as the key of the array elements. No guarantees for the performance of this PHP function on such large arrays, though!

Use the INSERT INTO syntax:
INSERT INTO TABLE_B
SELECT *
FROM TABLE_A a
WHERE NOT EXISTS(SELECT NULL
FROM TABLE_B b
WHERE b.column = a.column)
You'll need to update the WHERE b.column = a.column) to satisfy however you determine that a record already exists in TABLE_B.

What about something like this :
insert into TableB
select *
from Table A
where not exists (
select 1
from TableB
where TableB.id = TableA.id
)
Or, if the entries in table B are "not unique" because of their primary key, an insert ignore might do the trick, I suppose.

Related

Writing the SQL query based on 2 tables

I am really confused writing this SQL query, it might be easy but I still cannot come to the right solution.
Idea: Delete rows (foreign keys) from TableA based on TableB, if in TableB exist Primary Keys which match some other value within TableB.
For table B it should look like this:
SELECT Column1
FROM TableB
WHERE Column2 = 'Value';
And then
Delete rows in TableA which match to values inside of Column1 (TableB).
IN operator is good when you have hard coded values in IN operator like where SomeCoumn IN ('value1', 'Value2')
Or you are checking against a Primary key column like WHERE SomeColumn IN (select PK_Column from SomeTable)
Because in either of the above cases you will not have a NULL value inside your IN operator.
Null values inside IN operator brings back unexpected results.
A better option would be to use Exists operator... something like....
DELETE FROM TableA
WHERE EXISTS ( SELECT 1
FROM TableB
WHERE TableA.ColumnX = TableB.Column1
AND TableB.Column2 = 'Value'
);
Assuming that you need to match ColumnX in TableA:
DELETE FROM TableA
WHERE ColumnX IN (SELECT Column1
FROM TableB
WHERE Column2 = 'Value');

Insert new/Changes from one table to another in Oracle SQL

I have two tables with same number of columns :-Table A and Table B
Every day I insert data from Table B to Table A. now the insert query is working
insert into table_a (select * from table_b);
But by this insert the same data which was inserted earlier that is also getting inserted. I only want those rows which are new or are changed from the old data. How can this be done ?
You can use minus:
insert into table_a
select *
from table_b
minus
select *
from table_a;
This assumes that by "duplicate" you mean that all the columns are duplicated.
If you have a timestamp field, you could use it to limit the records to those created after the last copy.
Another option is, assuming that you have an primary key (id column in my example) that you can use to know whether a record has already been copied, you can create a table c (with the same structure as a and b) and do the following:
insert into table c
select a.* from table a
left join table b on (a.id=b.id)
where b.id is null;
insert into table b select * from table c;
truncate table c;
You need to adjust this query in order to use the actual primary key.
Hope this helps!
If the tables have a primary or unique key, then you could leverage that in an anti-join:
insert into table_a
select *
from table_b b
where not exists (
select null
from table_a a
where
a.pk_field_1 = b.pk_field_1 and
a.pk_field_2 = b.pk_field_2
)
You don't say what your key is. Assuming you have a key ID, that is you only want ID's that are not already in Table A. You can also use Merge-Statement for this:
MERGE INTO A USING B ON (A.ID = B.ID)
WHEN NOT MATCHED THEN INSERT (... columns of A) VALUES (... columns of B)

How to loop through rows in two tables and create a new set based on the merged results in SQL

Here is my obstacle.
I have two tables. Table A contains more rows than Table B. I have to merge the results and if Table A does not contain a row from Table B then I insert it into the new set. If however, a row from Table A contains a row with the same primary key as Table B, the new set will take the row from Table B.
Would this best be done in a cursor or is there an easier way to do this? I ask because there are 20 million rows and while I am new to sql, i've heard cursors are expensive.
Your phrasing is a little vague. It seems that you want everything from TableB and then rows from TableA that have no matching primary key in B. The following query solves this problem:
select *
from tableB union all
select *
from tableA
where tableA.pk not in (select pk from tableB)
Yep, cursors are expensive.
There's a MERGE command in later versions of SQL that will do this in one shot, but it's sooo cumbersome. Better to do it in two pieces - first:
UPDATE A SET
field1 = B.field1
,field2 = B.field2
, etc
FROM A JOIN B on B.id = A.id
Then:
INSERT A SELECT * FROM B --enumerate fields if different
WHERE B.id not in (select id FROM A)
An OUTER JOIN should do what you need and be more efficient than a cursor.
Try this query
--first get the rows that match between TableA and TableB
INSERT INTO [new set]
SELECT TableB.* --or columns of your choice
FROM TableA LEFT JOIN TableB ON [matching key criteria]
WHERE TableB.[joining column/PK] IS NOT NULL
--then get the rows from TableA that don't have a match
INSERT INTO [new set]
SELECT TableA.* --you didn't say what was inserted if there was no matching row
FROM TableA LEFT JOIN TableB ON [matching key criteria]
WHERE TableB.[joining column/PK] IS NULL

SQL Select Into Field

I want to accomplish something of the following:
Select DISTINCT(tableA.column) INTO tableB.column FROM tableA
The goal would be to select a distinct data set and then insert that data into a specific column of a new table.
SELECT column INTO tableB FROM tableA
SELECT INTO will create a table as it inserts new records into it. If that is not what you want (if tableB already exists), then you will need to do something like this:
INSERT INTO tableB (
column
)
SELECT DISTINCT
column
FROM tableA
Remember that if tableb has more columns that just the one, you will need to list the columns you will be inserted into (like I have done in my example).
You're pretty much there.
SELECT DISTINCT column INTO tableB FROM tableA
It's going to insert into whatever column(s) are specified in the select list, so you would need to alias your select values if you need to insert into columns of tableB that aren't in tableA.
SELECT INTO
Try the following...
INSERT INTO tableB (column)
Select DISTINCT(tableA.column)
FROM tableA
The goal would be to select a distinct data set and then insert that data into a specific column of a new table.
I don't know what the schema of tableB is... if table B already exists and there is no unique constraint on the column you can do as any of the others suggest here....
INSERT INTO tableB (column)Select DISTINCT(tableA.column)FROM tableA
but if you have a unique constraint on table B and it already exists you'll have to exclude those values already in table B...
INSERT INTO tableB (column)
Select DISTINCT(tableA.column)
FROM tableA
WHERE tableA.column NOT IN (SELECT /* NOTE */ tableB.column FROM tableB)
-- NOTE: Remember if there is a unique constraint you don't need the more
-- costly form of a "SELECT DISTICT" in this subquery against tableB
-- This could be done in a number of different ways - this is just
-- one version. Best version will depend on size of data in each table,
-- indexes available, etc. Always prototype different ways and measure perf.

deleting multiple records from two tables

I have two tables called TableA and TableB.
TableA has the following fields:
TableA_ID
FileName
TableB has the following fields:
TableB_ID
TableA_ID
CreationDate
There is a foreign key link between the two tables on the TableA_ID field
I need to delete records from both tables. I need to look at the “CreationDate” on TableB and if it’s after a certain date, delete that record. I will also need to delete the record in TableA with the same TableA_ID as the record in TableB
There may be several records in TableB that use the TableA_ID (a one to many relationship). So I can’t delete the record in TableA if entries in TableB still use it.
I know this can’t be done in a single statement but am happy to do it in a transaction. The problem I have is I’m not sure how to do this. I’m using MS SQL server 2008. I don’t want to use triggers if possible.
Can there be records in TableA with no matching record in TableB? If not, then we know after we delete from TableB, we can delete any non-matching records in TableA:
begin transaction
delete from TableB
where CreationDate > #SomeDate
delete from TableA
where TableA_ID not in (select TableA_ID from TableB)
end transaction
Otherwise:
begin transaction
-- Save the TableA_IDs being deleted:
select distinct TableA_ID
into #TableA_Delete
from Table_B
where CreationDate > #Somedate
-- Depending on the expected size of #TableA_Delete, you may want
-- to create an index here, to speed up the delete from TableA.
delete from TableB
where CreationDate > #SomeDate
delete from TableA
where TableA_id in (select TableA_Id from #TableA_Delete)
and TableA_id not in (select TableA_id from TableB)
commit transaction
NOTE Both above solutions need error handing added.
Also, see NYSystemsAnalyst for another method of storing the IDs temporarily.
Referential integrity should delete the child rows when the parent is deleted. Make sure you have cascade deletes enabled.
Otherwise you would have to put in two delete statements one for the child records and one for the parent record.
DELETE FROM TableB INNER JOIN TableA ON TableA.TableAID = TableB.TableAID WHERE CreationDate >= SomeDate
DELETE FROM TableA WHERE TableAID=SomeID
Edit: Sorry, I didn't see the part of your question stating the need to keep the TableA entries if they have multiple records in TableB. Use this instead:
/* Table valued variable to hold Table A IDs to be deleted. */
DELCARE #IDs AS table
(
ID int
);
/* Get the TableA IDs that are subject to deletion. */
INSERT INTO #IDs (
ID
)SELECT TableA_ID
FROM TableB
WHERE CreationDate >= #MyCreationDate)
GROUP BY TableA_ID
HAVING (COUNT(TableA_ID) = 1); /* Only get IDs that appear once in TableB */
/* Delete the TableB records. */
DELETE b
FROM TableB AS b
WHERE CreationDate >= #MyCreationDate);
/* Delete the TableA records. */
DELETE a
FROM TableA AS a
INNER JOIN #IDs AS c ON (a.TableA_ID = c.ID);
You should wrap all of this in a transaction to ensure data integrity.