Copy records from one table to another without duplicates - sql

IF object_id('tempdb..#A') IS NOT NULL DROP TABLE #A
IF object_id('tempdb..#B') IS NOT NULL DROP TABLE #B
CREATE TABLE #A (fname varchar(20), lname varchar(20))
CREATE TABLE #B (fname varchar(20), lname varchar(20))
INSERT INTO #A
SELECT 'Kevin', 'XP'
UNION ALL
SELECT 'Tammy', 'Win7'
UNION ALL
SELECT 'Wes', 'XP'
UNION ALL
SELECT 'Susan', 'Win7'
UNION ALL
SELECT 'Kevin', 'Win7'
SELECT * FROM #A
INSERT INTO #B
SELECT a.fname, a.lname FROM #A a
WHERE a.fname NOT IN (SELECT fname from #B)
SELECT * FROM #B
DELETE FROM #B
INSERT INTO #B
SELECT a.fname, a.lname FROM #A a
LEFT OUTER JOIN #B b ON a.fname = b.fname
WHERE a.fname NOT IN (SELECT fname from #B)
SELECT * FROM #B
Both of these examples copy all 5 records to the new table.
I only want to see one unique fname so only one Kevin should show up.
Why don't these work, or is there a better way to do it?
It seems like such a simple thing.

This would create rows with unique fname and take Win7 if both Win7 and XP existed.
INSERT INTO #B
SELECT a.fname, MIN(a.lname)
FROM #A a
GROUP BY a.fname

As per comments, given that W comes before X then you should be able to do
INSERT INTO #B
SELECT fname, lname
FROM (
SELECT fname, lname,
ROW_NUMBER() OVER(PARTITION BY fname ORDER BY lname) r
FROM #A
) t
WHERE r=1
demo

Answering your question, why don't your queries work?
INSERT INTO #B
SELECT a.fname, a.lname FROM #A a
WHERE a.fname NOT IN (SELECT fname from #B)
This operation is evaluated in two different operations. In the first, the SELECT part of the query is executed. It returns a table. At such point #B is empty, hence, every tuple in #A will be part of this result. Then, once this result is computed, this result is inserted into #B. #B will end being a copy of #A.
The DBMS does not insert one tuple, and then re-evaluate the query for the next tuple of #A, as your question seems to imply. Insertions are always done AFTER the query has been completely evaluated.
if your goal is to insert into #B the tuples in #A without duplicates, there are many ways to do that. One of them is:
INSERT INTO #B SELECT distinct * from #A;
--dmg

Just use DISTINCT over the select query :
INSERT INTO TARGET_TABLE
SELECT DISTINCT * FROM
(
-- some big query
) x

Related

Find Groups that don't contain all records

I feel like I should be able to get this and I'm just having a brain fart. I've simplified the problem to the following example:
DECLARE #A TABLE (ID int);
DECLARE #B TABLE (GroupID char(1), ID int);
INSERT #A VALUES (1);
INSERT #A VALUES (2);
INSERT #A VALUES (3);
INSERT #B VALUES ('X', 1);
INSERT #B VALUES ('X', 2);
INSERT #B VALUES ('X', 3);
INSERT #B VALUES ('Y', 1);
INSERT #B VALUES ('Y', 2);
INSERT #B VALUES ('Z', 1);
INSERT #B VALUES ('Z', 2);
INSERT #B VALUES ('Z', 3);
INSERT #B VALUES ('Z', 4);
So table A contains a set of some records. Table B contains multiple copies of the set contained in A with Group IDs. But some of those groups may be missing one or more records of the set. I want to find the groups that are missing records. So in the above example, my results should be:
GroupID
-------
Y
But for some reason I just can't wrap my head around this, today. Any help would be appreciated.
Awesome use-case for relational division! (Here's a must-read blog post about it)
SELECT DISTINCT b1.GroupID
FROM #B b1
WHERE EXISTS (
SELECT 1
FROM #A a
WHERE NOT EXISTS (
SELECT 1
FROM #B b2
WHERE b1.GroupID = b2.GroupID
AND b2.ID = a.ID
)
);
How to read this?
I want all distinct GroupIDs in #B for which there is a record in #A for which there isn't a record in #B with the same #A.ID
In fact, this is the "remainder" of the relational division.
try this
SELECT GroupID ,COUNT(GroupID )
FROM #a INNER JOIN #b
ON #a.id=#b.id
GROUP BY GroupID
HAVING COUNT(GroupID )<(SELECT count(*) FROM #a)
This will give you all the combinations that are missing.
select FullList.*
from (select distinct a.ID,
b.GroupId
from #A a
cross join #B b) FullList
left join #B b
on FullList.ID = b.ID
and FullList.GroupID = b.GroupID
where b.ID is null
The answer to your question would just be the same but with the first line:
select distinct FullList.GroupID
This will give you all the combinations that are missing.
select FullList.*
from (select distinct a.ID,
b.GroupId
from #A a
cross join (select distinct db.GroupId from #B db) b
) as FullList
left join #B b
on FullList.ID = b.ID
and FullList.GroupID = b.GroupID
where b.ID is null
The answer to your question would just be the same but with the first line:
select distinct FullList.GroupID

SQL intersect with other tables, how do I ignore it?

I am trying to run a query given three tables.
DECLARE #TABLE1 TABLE (ID CHAR(2))
DECLARE #TABLE2 TABLE (ID CHAR(2))
DECLARE #TABLE3 TABLE (ID CHAR(2))
INSERT INTO #TABLE1 VALUES('1')
INSERT INTO #TABLE1 VALUES('2')
INSERT INTO #TABLE2 VALUES('1')
--NOTHING in TABLE3
I Need to get only the values that are present and ignore the null table. This doesn't work since TABLE3 has no values.
SELECT ID
FROM #TABLE1
INTERSECT
SELECT ID
FROM #TABLE2
INTERSECT
SELECT ID
FROM #TABLE3
**Result should be 1**
How do I ignore the any table if it's null but keep the other values?
Why not do a union of select distincts from each table, and then group that by ID and select count(*), and select only rows with count(*) equal to the maximum value of count(*) in the result?
It's a bit of a mess of subqueries at this point unfortunately but you should get the logic :)
Intersect is not going to work for you as you can't add conditions to it.
From what I understand you want to select all records where the ID appears in at least 2 of the tables. I am assuming that the ID is unique to each table.
The following works in MS SQL Server:
DECLARE #TABLE1 TABLE (ID CHAR(2))
DECLARE #TABLE2 TABLE (ID CHAR(2))
DECLARE #TABLE3 TABLE (ID CHAR(2))
INSERT INTO #TABLE1 VALUES('1')
INSERT INTO #TABLE1 VALUES('2')
INSERT INTO #TABLE2 VALUES('1')
--NOTHING in TABLE3
;WITH AllValues AS
(
SELECT ID
FROM #TABLE1
UNION ALL
SELECT ID
FROM #TABLE2
UNION ALL
SELECT ID
FROM #TABLE3
)
SELECT ID
FROM AllValues
GROUP BY ID
HAVING COUNT(*) > 1
Maybe... But the design of the system is extremely foreign; a real world example would help understand what you're trying to do.
Select count(*), ID FROM (
Select ID from #table1
UNION
Select ID from #table2
UNION
Select ID from #table3) Derived
Where RowNum =1
GROUP BY ID
ORder by count(*) DESC
Updated where clause was in wrong place

How to scan for differences between two queries?

I have a table that loads new data every day and another table that contains a history of changes to that table. What's the best way to check if any of the data have changed since the last time data was loaded?
For example, I have table #a with some strategies for different countries and table #b tracks the changes made to table #a. I can use a checksum() to hash the fields that can change, and add them to the table if the existing hash is different from the new hash. However, MSDN doesn't think this is a good idea since "collisions" can occur, e.g. two different values map to the same checksum.
MSDN link for checksum
http://msdn.microsoft.com/en-us/library/aa258245(v=SQL.80).aspx
Sample code:
declare #a table
(
ownerid bigint
,Strategy varchar(50)
,country char(3)
)
insert into #a
select 1,'Long','USA'
insert into #a
select 2,'Short','CAN'
insert into #a
select 3,'Neutral','AUS'
declare #b table
(
Lastupdated datetime
,ownerid bigint
,Strategy varchar(50)
,country char(3)
)
insert into #b
(
Lastupdated
,ownerid
,strategy
,country
)
select
getdate()
,a.ownerid
,a.strategy
,a.country
from #a a left join #b b
on a.ownerid=b.ownerid
where
b.ownerid is null
select * from #b
--get a different timestamp
waitfor delay '00:00:00.1'
--change source data
update #a
set strategy='Short'
where ownerid=1
--add newly changed data into
insert into #b
select
getdate()
,a.ownerid
,a.strategy
,a.country
from
(select *,checksum(strategy,country) as hashval from #a) a
left join
(select *,checksum(strategy,country) as hashval from #b) b
on a.ownerid=b.ownerid
where
a.hashval<>b.hashval
select * from #b
How about writing a query using EXCEPT? Just write queries for both tables and then add EXCEPT between them:
(SELECT * FROM table_new) EXCEPT (SELECT * FROM table_old)
The result will be the entries in table_new that aren't in table_old (i.e. that have been updated or inserted).
Note: To get rows recently deleted from table_old, you can reverse the order of the queries.
There is no need to check for changes if you use a different approach to the problem.
On your master table create a trigger for INSERT, UPDATE and DELETE which tracks the changes for you by writing to table #b.
If you search the internet for "SQL audit table" you will find many pages describing the process, for example: Adding simple trigger-based auditing to your SQL Server database
Thanks to #newenglander I was able to use EXCEPT to find the changed row. As #Tony said, I'm not sure how multiple changes will work, but here's the same sample code reworked to use Except instead of CHECKSUM
declare #a table
(
ownerid bigint
,Strategy varchar(50)
,country char(3)
)
insert into #a
select 1,'Long','USA'
insert into #a
select 2,'Short','CAN'
insert into #a
select 3,'Neutral','AUS'
declare #b table
(
Lastupdated datetime
,ownerid bigint
,Strategy varchar(50)
,country char(3)
)
insert into #b
(
Lastupdated
,ownerid
,strategy
,country
)
select
getdate()
,a.ownerid
,a.strategy
,a.country
from #a a left join #b b
on a.ownerid=b.ownerid
where
b.ownerid is null
select * from #b
--get a different timestamp
waitfor delay '00:00:00.1'
--change source data
update #a
set strategy='Short'
where ownerid=1
--add newly changed data using EXCEPT
insert into #b
select getdate(),
ownerid,
strategy,
country
from
(
(
select
ownerid
,strategy
,country
from #a changedtable
)
EXCEPT
(
select
ownerid
,strategy
,country
from #b historicaltable
)
) x
select * from #b

sql like clause multiple values

I have a table with multiple words, from 1 to n.
declare #words table
(
word varchar(100) not null
)
insert into #words (word) values ('word1')
insert into #words (word) values ('word2')
insert into #words (word) values ('word3')
declare #tablea table
(
column1 varchar(100) not null
)
insert into #tablea (column1) values ('aword1a aword2a aword3a')
insert into #tablea (column1) values ('word2a')
insert into #tablea (column1) values ('word3a')
Im having trouble to write a query to select from a table where a column is like these words, and I need the AND operator. If the table contains word1, word2, word3, the like clause must match the three words, it means, I want to return the first row in tablea.
select *
from tablea
where
column1 like ?
Updated:
select t.column1
from #tablea t
inner join #words w on charindex(w.word, t.column1) > 0
group by t.column1
having count(distinct w.word) = (select count(*) from #words)
Since any one column needs to contain all the values in the #words table, I would use a not exists, and try to find a value in #words that isn't contained in the column1 field.
select
*
from
#tablea a
where not exists (
select 1
from #words w
where a.column1 not like '%' + w.word + '%'
)
This will do it, but I'm not sure how extensible it is:
SELECT column1
FROM #tablea t
JOIN #words w ON t.column1 LIKE '%'+w.word+'%'
GROUP BY column1
HAVING COUNT(*) = (SELECT COUNT(*) FROM #words)
In the long run, you may be better off implementing Full Text Search.

Query SQL with like operator from two tables

How can I do a SQL query with the like operator from two different tables?
I need something like:
select * from table1 where name like %table2.name
It's not a common field but a substring of a field on another table.
Edit
(original answer is further down)
Your comment (and subsequent edit) completely changes the question.
To do that, you can use LIKE as part of the ON clause in a join:
CREATE TABLE a (foo varchar(254))
GO
CREATE TABLE b (id int, bar varchar(254))
GO
INSERT INTO a (foo) VALUES ('one')
INSERT INTO a (foo) VALUES ('tone')
INSERT INTO a (foo) VALUES ('phone')
INSERT INTO a (foo) VALUES ('two')
INSERT INTO a (foo) VALUES ('three')
INSERT INTO b (id, bar) VALUES (2, 'ne')
INSERT INTO b (id, bar) VALUES (3, 't')
SELECT a.foo
FROM a
INNER JOIN b ON a.foo LIKE '%' + b.bar
WHERE b.id = 2
(That's the SQL Server version; for MySQL, add in the various semicolons, remove the GOs, and use ...LIKE concat('%', b.bar) instead.)
That uses id = 2 to find bar = "ne" in table b, then prepends the % operator and uses it to filter results from a. Results are:
one
tone
phone
You won't have to do the concat if you can store the operator in b.bar.
Separately, I was surprised to find that this works (on SQL Server) as well:
SELECT foo
FROM a
WHERE foo LIKE (
SELECT TOP 1 '%' + bar
FROM b
WHERE id = 2
)
...but the version using JOIN is probably more flexible.
That should get you going.
Original answer
(Arguably no longer relevant)
It's hard to tell what you're asking, but here's an example of using LIKE to limit the results from a JOIN:
SELECT a.foo, b.bar
FROM someTable a
INNER JOIN someOtherTable b
ON a.someField = b.someField
WHERE a.foo LIKE 'SOMETHING%'
AND b.bar LIKE '%SOMETHING ELSE'
That will give you foo from someTable and bar from someOtherTable where the rows are related by someField and foo starts with "SOMETHING" and bar ends with "SOMETHING ELSE".
Not particularly sure about the precise syntax, but here's an idea:
select ... from (
select ID, Foo as F from FooTable
union all
select ID, Bar as F from BarTable) R
where R.F like '%text%'
Based on #TJCrowder answer
Test Tables
create table #A (
id varchar(10)
)
create table #b(
number varchar(5)
)
insert into #A values('123')
insert into #A values('456')
insert into #A values('789')
insert into #A values('0123')
insert into #A values('4567')
select * from #A
insert into #B values('12')
insert into #b values('45')
insert into #b values('987')
insert into #b values('012')
insert into #b values('666')
Actual query
select * from #a, #b
where #a.id like '%' + #b.number + '%'
Modify the above query as you need such as
select #A.* from #A,#B ...
or
select * from #a, #b
where #a.id like #b.number + '%' -- one side match