SQL: How can I add to a table without some duplicates? - sql

So I am currently trying to take one table and add it into another table but for some reason it is not working the way I want it to.
There are three columns in both the tables and I only want to add each row of data from table 2 to table 1 if the first 2 columns of table 2 are not already in table 1 (I dont care about the 3rd column)
This is what I have so far:
INSERT INTO table1 (col1, col2, col3)
SELECT a.col1, a.col2, a.col3
FROM table2 as a
WHERE NOT EXISTS (SELECT b.col1, b.col2
FROM table1 as b
WHERE a.col1 = b.col1 AND a.col2 = b.col2);
I checked around and this seems that it should work but it isn't but can anyone see why?

I often have trouble when there are two fields to search for. One way is to combine them together:
INSERT INTO table1 (col1, col2, col3)
SELECT a.col1, a.col2, a.col3 from table2 as a
WHERE concat(a.col1,':', a.col2)
NOT IN (SELECT concat(col1,':',col2) from table1);
Another way is a left join:
INSERT INTO table1 (col1, col2, col3)
SELECT a.col1, a.col2, a.col3
from table2 as a
LEFT OUTER JOIN table1 as b
ON a.col1 = b.col1
AND a.col2 = b.col2
WHERE b.col1 IS NULL AND b.col2 IS NULL;
For example 2, it is better to use a primary key in the where clause.

Try this:
merge into tab2 a
using
(select col1,col2,col3 from tabl1) b
on
(b.col1=a.col1 and b.col2=a.col2)
when not matched then
insert (a.col1,a.col2,a.col3)
values
(b.col1,b.col2,b.col3);

Try this:
INSERT INTO table1 (col1, col2, col3)
(SELECT a.col1, a.col2, a.col3
FROM table2 a
WHERE NOT EXISTS (SELECT b.col1, b.col2
FROM table1 b
WHERE a.col1 = b.col1 AND a.col2 = b.col2));
I guess for table name as is not needed.

Related

Show Rows That Are Different Between Two Tables - MS Access

I have been working on trying to convert the following SQL-Server code to achieve a similar result in MS Access.
WITH TableA(Col1, Col2, Col3)
AS (SELECT 'Dog',1,1 UNION ALL
SELECT 'Cat',27,86 UNION ALL
SELECT 'Cat',128,92),
TableB(Col1, Col2, Col3)
AS (SELECT 'Dog',1,1 UNION ALL
SELECT 'Cat',27,105 UNION ALL
SELECT 'Lizard',83,NULL)
SELECT CA.*
FROM TableA A
FULL OUTER JOIN TableB B
ON A.Col1 = B.Col1
AND A.Col2 = B.Col2
/*Unpivot the joined rows*/
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
SELECT 'TableB' AS what, B.*) AS CA
/*Exclude identical rows*/
WHERE EXISTS (SELECT A.*
EXCEPT
SELECT B.*)
/*Discard NULL extended row*/
AND CA.Col1 IS NOT NULL
ORDER BY CA.Col1, CA.Col2
Gives
what Col1 Col2 Col3
------ ------ ----------- -----------
TableA Cat 27 86
TableB Cat 27 105
TableA Cat 128 92
TableB Lizard 83 NULL
So far I have been able to convert get replication of the FULL OUTER JOIN using the following code, but I have been unable to replicate unpivoting the joint rows (CROSS APPLY).
(SELECT *
FROM TableA AA
INNER JOIN TableB BB ON AA.Col1 = BB.Col1
UNION ALL
SELECT *
FROM TableA AA
LEFT JOIN TableB BB ON AA.Col1 = BB.Col1
WHERE BB.[IP Number] IS NULL
UNION ALL
SELECT *
FROM TableA AA
RIGHT JOIN TableB BB ON AA.Col1 = BB.Col1
WHERE AA.Col1 IS NULL
)
I could use some help achieving the same result in a MS-Access query.
From what I can gather, you have two tables that have unique rows. You want to return rows that are present in one table but not the other.
I would suggest aggregation and HAVING for this -- in either database:
SELECT col1, col2, col3
FROM ((SELECT col1, col2, col3 FROM TableA) UNION ALL
(SELECT col1, col2, col3 FROM TableB)
) as ab
GROUP BY col1, col2, col3
HAVING COUNT(*) = 1;
Or alternatively, two NOT EXISTS clauses:
SELECT a.*
FROM TableA as a
WHERE NOT EXISTS (SELECT 1
FROM TableB as b
WHERE (a.col1 = b.col1 OR a.col1 IS NULL AND b.col1 IS NULL) AND
(a.col2 = b.col2 OR a.col2 IS NULL AND b.col2 IS NULL) AND
(a.col3 = b.col3 OR a.col3 IS NULL AND b.col3 IS NULL)
)
UNION ALL
SELECT b.*
FROM TableB as b
WHERE NOT EXISTS (SELECT 1
FROM TableA as a
WHERE (a.col1 = b.col1 OR a.col1 IS NULL AND b.col1 IS NULL) AND
(a.col2 = b.col2 OR a.col2 IS NULL AND b.col2 IS NULL) AND
(a.col3 = b.col3 OR a.col3 IS NULL AND b.col3 IS NULL)
);
Here is a db<>fiddle that uses SQL Server, but the syntax should be basically the same in MS Access.

Using a column from inside a not in query

I have a sql query as below:
SELECT
A.COL1, A.COL2
FROM
SOMESCHEMA.TABLE1 A
WHERE
A.COL3 NOT IN (SELECT A1.COL3 FROM SOMESCHEMA.TABLE2 B, SOMESCHEMA.TABLE1 A1 WHERE A.COL4 = B.COL4 AND B.DATE >= '2014-01-17')
The result of above query is two columns COL1 and COL2.
Now I want the DATE column of the second table into my result.
That is, the result should be COL1, COL2 and DATE.
How to achieve this?
Thanks for reading!
This is exactly the situation you want to use a join:
SELECT
A.COL1, A.COL2, B.DATE
FROM
SOMESCHEMA.TABLE1 A INNER JOIN SOMESCHEMA.TABLE2 B ON A.COL3 = B.COL3
WHERE B.DATE >= '2014-01-17'
You can find more info on using JOINS in DB2 here: http://www-01.ibm.com/support/knowledgecenter/?lang=en#!/SSEPEK_10.0.0/com.ibm.db2z10.doc.intro/src/tpc/db2z_innerjoin.dita
Your question as stated makes no sense - you're asking for matched data where the data does not match. Sample data from each table and a sample output would really help here.
Here is my best guess as to what you're trying to do:
Return data from Table2 where a match on COL4 exists that is greater than January 17th and has with a different COL3
SELECT
A.COL1, A.COL2, B.DATE
FROM
SOMESCHEMA.TABLE1 A
INNER JOIN
SOMESCHEMA.TABLE2 B ON
A.COL4 = B.COL4 AND
B.DATE >= '2014-01-17' AND
A.COL3 <> B.COL3

using with as and nesting sql

So I have a Q that is like this:
with t1 as (
a.col1 as 'c1',
a.col2 as 'c2',
b.col1 as 'c3',
b.col2 as 'c4'
from table1 a left join table2 b
on a.col1 = b.col1
)
select
c.c1,
c.c2,
c.c3,
c.c4
from t1 c
and I want to make this whole thing a with as T2 so I can pull from what is the outer query on the above code. this is needed to perform calculations on data then renaming the column then performing calculation on the renamed columns and then one more time. I can't seem to figure out how to make the whole statement a "table" that I can then make my select statement from.
I have tried nesting another ;with as () and either it's not possible or I'm not doing it right and my guess is the latter.
Thanks in advance!
Is this what you want?
with t1 as (
select a.col1 as c1, a.col2 as c2, b.col1 as c3, b.col2 as c4
from table1 a left join
table2 b
on a.col1 = b.col1
),
t2 as (
select c.c1, c.c2, c.c3, c.c4
from t1 c
)
select *
from t2;
You can define multiple CTEs with a with statement. They are separated by commas.

Optimize SQL statement

I have a requirement to update a column in table A if the count of records in table B grouped by 3 columns (matching between A and B) is less than 7. I have written below query, but it is running long. Please suggest any optimal query or tune this.
update /*+ parallel(A) */ A set A.col4=0
where exists
(select 1
from B
where A.col1=B.col1 and A.col2=B.col2
and A.col3=B.col3
group by col1,col2,col3
having count(*) < 7)
Try this,
MERGE INTO A
USING (
SELECT col1, col2, col3
FROM B
GROUP BY col1, col2, col3
HAVING COUNT(*) > 7
) b ON (A.col1 = b.col1 AND A.col2=b.col2 AND A.col3= b.col3)
WHEN MATCHED THEN UPDATE
SET A.col4 = 0;
View it on SQL Fiddle: http://www.sqlfiddle.com/#!4/dcdf1/17
Let me know if it worked or not!
My first suggestion is to create an index on B: B(col1, col2, col3).
The next attempt would be to switch this to a join:
update A
set col4 = 0
from (select col1, col2, col3
from B
group by col1, col2, col3
having count(*) < 7
) B
where A.col1 = B.col1 and A.col2 = B.col2
and A.col3 = B.col3 ;

What's the SQL query to list all rows that have 2 column sub-rows as duplicates?

I have a table that has redundant data and I'm trying to identify all rows that have duplicate sub-rows (for lack of a better word). By sub-rows I mean considering COL1 and COL2 only.
So let's say I have something like this:
COL1 COL2 COL3
---------------------
aa 111 blah_x
aa 111 blah_j
aa 112 blah_m
ab 111 blah_s
bb 112 blah_d
bb 112 blah_d
cc 112 blah_w
cc 113 blah_p
I need a SQL query that returns this:
COL1 COL2 COL3
---------------------
aa 111 blah_x
aa 111 blah_j
bb 112 blah_d
bb 112 blah_d
Does this work for you?
select t.* from table t
left join ( select col1, col2, count(*) as count from table group by col1, col2 ) c on t.col1=c.col1 and t.col2=c.col2
where c.count > 1
With the data you have listed, your query is not possible. The data on rows 5 & 6 is not distinct within itself.
Assuming that your table is named 'quux', if you start with something like this:
SELECT a.COL1, a.COL2, a.COL3
FROM quux a, quux b
WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.COL3 <> b.COL3
ORDER BY a.COL1, a.COL2
You'll end up with this answer:
COL1 COL2 COL3
---------------------
aa 111 blah_x
aa 111 blah_j
That's because rows 5 & 6 have the same values for COL3. Any query that returns both rows 5 & 6 will also return duplicates of ALL of the rows in this dataset.
On the other hand, if you have a primary key (ID), then you can use this query instead:
SELECT a.COL1, a.COL2, a.COL3
FROM quux a, quux b
WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.ID <> b.ID
ORDER BY a.COL1, a.COL2
[Edited to simplify the WHERE clause]
And you'll get the results you want:
COL1 COL2 COL3
---------------------
aa 111 blah_x
aa 111 blah_j
bb 112 blah_d
bb 112 blah_d
I just tested this on SQL Server 2000, but you should see the same results on any modern SQL database.
blorgbeard proved me wrong -- good for him!
Join on yourself like this:
SELECT a.col3, b.col3, a.col1, a.col2
FROM tablename a, tablename b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3
If you're using postgresql, you can use the oid to make it return less duplicated results, like this:
SELECT a.col3, b.col3, a.col1, a.col2
FROM tablename a, tablename b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3
AND a.oid < b.oid
Don't have a database handy to test this, but I think it should work...
select
*
from
theTable
where
col1 in
(
select
col1
from
theTable
group by
col1||col2
having
count(col1||col2) > 1
)
My naive attempt would be
select a.*, b.* from table a, table b where a.col1 = b.col1 and a.col2 = b.col2 and a.col3 != b.col3;
but that would return all the rows twice. I'm not sure how you'd restrict it to just returning them once. Maybe if there was a primary key, you could add "and a.pkey < b.pkey".
Like I said, that's not elegant and there is probably a better way to to do this.
Something like this should work:
SELECT a.COL1, a.COL2, a.COL3
FROM YourTable a
JOIN YourTable b ON b.COL1 = a.COL1 AND b.COL2 = a.COL2 AND b.COL3 <> a.COL3
In general, the JOIN clause should include every column that you're considering to be part of a "duplicate" (COL1 and COL2 in this case), and at least one column (or as many as it takes) to eliminate a row joining to itself (COL3, in this case).
This is pretty similar to the self-join, except it will not have the duplicates.
select COL1,COL2,COL3
from theTable a
where exists (select 'x'
from theTable b
where a.col1=b.col1
and a.col2=b.col2
and a.col3<>b.col3)
order by col1,col2,col3
Here is how you find duplicates. Tested in oracle 10g with your data.
select * from tst
where (col1, col2) in
(select col1, col2 from tst group by col1, col2 having count(*) > 1)
select COL1,COL2,COL3
from table
group by COL1,COL2,COL3
having count(*)>1
Forget joins -- use an analytic function:
select col1, col2, col3
from
(
select col1, col2, col3, count(*) over (partition by col1, col2) rows_per_col1_col2
from table
)
where rows_per_col1_col2 > 1