merging content of two tables without duplicating content - sql

I have two identical SQL Server tables (SOURCE and DESTINATION) with lots a columns in each. I want to insert into table DESTINATION rows from table SOURCE that do not already exist in table DESTINATION. I define equality between the two rows if all columns match except for the timestamp, a count column, and the integer primary key. So I want to insert into DESTINATION all rows in SOURCE that dont already exist in DESTINATIONignoring count, timestamp, and the primary key columns.
How do I do this?
Thanks for all the contributions! I chose to use the Merge command since it is structured to allow for updates and inserts in one statement and I needed to do the update separately.
this is the code that worked:
Merge
into DESTINATION as D
using SOURCE as S
on (
D.Col1 = S.Col1
and D.Col2 = S.Col2
and D.Col3 = S.Col3
)
WHEN MATCHED
THEN UPDATE SET D.Count = S.Count
WHEN NOT MATCHED THEN
INSERT (Col1, Col2, Col3, Count, timestamp)
VALUES (S.Col1, S.Col2, S.Col3, S.Count, S.timestamp);
note: when I wrote this question first I called the tables AAA and BBB. I edited and changed the names of AAA to SOURCE AND BBB to DESTINATION for clarity

using Select statement for this purpose since Sql Server 2008 is obsolete instead of Select You can use Merge statement :
ref:
http://technet.microsoft.com/en-us/library/bb510625.aspx
http://weblogs.sqlteam.com/peterl/archive/2007/09/20/Example-of-MERGE-in-SQL-Server-2008.aspx

Something like this:
INSERT INTO BBB(id, timestamp, mycount, col1, col2, col3, etc.)
SELECT id, timestamp, mycount, col1, col2, col3, etc.
FROM AAA
WHERE
NOT EXISTS(SELECT NULL FROM BBB oldb WHERE
oldb.col1 = AAA.col1
AND oldb.col2 = AAA.col2
AND oldb.col3 = AAA.col3
)
Add columns as needed to the NOT EXISTS clause.

A solution using good ol'-fashioned LEFT JOIN -- note in the example below, only the first row of BBB is inserted into AAA, because only it has no matching row in AAA. You'd replace col1 and col2 with the actual columns of the tables.
> select * from AAA;
+---------------------+------+------+
| timestamp | col1 | col2 |
+---------------------+------+------+
| 2012-03-17 08:17:22 | 1 | 1 |
| 2012-03-17 08:17:27 | 1 | 2 |
| 2012-03-17 08:17:30 | 1 | 3 |
| 2012-03-17 08:17:32 | 1 | 4 |
| 2012-03-17 08:17:49 | 2 | 2 |
| 2012-03-17 08:17:52 | 2 | 3 |
| 2012-03-17 08:17:54 | 2 | 4 |
+---------------------+------+------+
7 rows in set (0.00 sec)
> select * from BBB;
+---------------------+------+------+
| timestamp | col1 | col2 |
+---------------------+------+------+
| 2012-03-17 08:18:16 | 2 | 1 |
| 2012-03-17 08:18:18 | 2 | 2 |
| 2012-03-17 08:18:20 | 2 | 3 |
+---------------------+------+------+
3 rows in set (0.00 sec)
> INSERT INTO AAA
SELECT BBB.* FROM BBB
LEFT JOIN AAA
USING(col1,col2)
WHERE AAA.timestamp IS NULL;
> select * from AAA;
+---------------------+------+------+
| timestamp | col1 | col2 |
+---------------------+------+------+
| 2012-03-17 08:17:22 | 1 | 1 |
| 2012-03-17 08:17:27 | 1 | 2 |
| 2012-03-17 08:17:30 | 1 | 3 |
| 2012-03-17 08:17:32 | 1 | 4 |
| 2012-03-17 08:17:49 | 2 | 2 |
| 2012-03-17 08:17:52 | 2 | 3 |
| 2012-03-17 08:17:54 | 2 | 4 |
| 2012-03-17 08:18:16 | 2 | 1 |
+---------------------+------+------+
8 rows in set (0.00 sec)

Related

Over Partition to find duplicates and remove them based on criteria SQL

I hope everyone is doing well. I have a dilemma that i can not quite figure out.
I am trying to find a unique value for a field that is not a duplicate.
For example:
Table 1
|Col1 | Col2| Col3 |
| 123 | A | 1 |
| 123 | A | 2 |
| 12 | B | 1 |
| 12 | B | 2 |
| 12 | C | 3 |
| 12 | D | 4 |
| 1 | A | 1 |
| 2 | D | 1 |
| 3 | D | 1 |
Col 1 is the field that would have the duplicate values. Col2 would be the owner of the value in Col 1. Col 3 uses the row number() Over Partition syntax to get the numbers in ascending order.
The goal i am trying to accomplish is to remove the value in col 1 if it is not truly unique when looking at col2.
Example:
Col1 has the value 123, Col2 has the value A. Although there are two instances of 123 being owned by A, i can determine that it is indeed unique.
Now look at Col1 that has the value 12 with values in Col2 of B,C,D.
Value 12 is associated with three different owners thus eliminating 12 from our result list.
So in the end i would like to see a result table such as this :
|Col1 | Col2|
| 123 | A |
| 1 | A |
| 2 | D |
| 3 | D |
To summarize, i would like to first use the partition numbers to identify if the value in col1 is repeated. From there i want to verify that the values in col 2 are the same. If so the value in col 1 and col 2 remains as one single entry. However if the values in col 2 do not match, all records for the col1 value are removed.
I will provide the syntax code for my query if needed.
Update**
I failed to mention that table 1 is the result of inner joining two tables.
So Col1 comes from table a and Col2 comes from table b.
The values in table a for col2 are hard to interpret so i had to make sense of them and assigned it proper name values.
The join query i used to combine the two are:
Select a.Col1, B.Col2 FROM Table a INNER JOIN Table b on a.Colx = b.Colx
Update**
Table a:
|Col1 | Colx| Col3 |
| 123 | SMS | 1 |
| 123 | S9W | 2 |
| 12 | NAV | 1 |
| 12 | NFR | 2 |
| 12 | ABC | 3 |
| 12 | DEF | 4 |
| 1 | SMS | 1 |
| 2 | DEF | 1 |
| 3 | DES | 1 |
Table b:
|Colx | Col2|
| SMS | A |
| S9W | A |
| DEF | D |
| DES | D |
| NAV | B |
| NFR | B |
| ABC | C |
Above are sample data for both tables that get joined in order to create the first table displayed in this body.
Thank you all so much!
NOT EXISTS operator can be used to do this task:
SELECT distinct Col1 , Col2
FROM table t
WHERE NOT EXISTS(
SELECT 1 FROM table t1
WHERE t.col1=t1.col1 AND t.col2 <> t1.col2
)
If I understand correctly, you want:
select col1, min(col2)
from t
group by col1
where min(col2) <> max(col2);
I think the third column is confusing you. It doesn't seem to play any role in the logic you want.

Get count of combinations of rows having different values in different columns

I want get count of combinations of rows having different values in different columns.
Sample Data as below:
+------+---------+---------+
| GUID | Column1 | Column2 |
+------+---------+---------+
| XXX | A | aaa |
| XXX | B | bbb |
| YYY | C | ccc |
| YYY | D | ddd |
| XXX | A | aaa |
| XXX | B | bbb |
+------+---------+---------+
I am expecting following result. So XXX should be 2 as we are having 2 records in which Column1=A, Column2=aaa and Column1=B, Column2=bbb (Combination of two different columns values)
XXX 2
YYY 1
You can group by GUID and Column2, then take the max of count(*) to get the number of combinations:
declare #tmp table ([GUID] varchar(3), Column1 varchar(1), Column2 varchar(3))
insert into #tmp values ('XXX','A','aaa'),('XXX','B','bbb'),('YYY','C','ccc'),
('YYY','D','ddd'),('XXX','A','aaa'),('XXX','B','bbb')
select T.[GUID], max(T.cnt) as count_combinations
from (
select [GUID], Column2, count(*) as cnt
from #tmp
group by [GUID], Column2
) T
group by T.[GUID]
Results:

Check if data for update is same as before in SQL Server

I have a table Table1:
ID | RefID | Answer | Points |
----+-------+---------+--------+
1 | 1 | 1 | 5 |
2 | 1 | 2 | 0 |
3 | 1 | 3 | 3 |
4 | 2 | 1 | 4 |
I have a result set in temp table Temp1 with same structure and have update Table1 only if for refID answer and points have changed, otherwise there should be deletion for this record.
I tried:
update table1
set table1.answer = temp1.answer,
table1.points = temp1.points
from table1
join temp1 on table1.refid = temp1.refid
where table1.answer != temp1.answer or table1.points != temp1.points
Here is a fiddle http://sqlfiddle.com/#!18/60424/1/1
However this is not working and don't know how to add the delete condition.
Desired result should be if tables not the same ex. (second row answer 2 points3):
ID | RefID | Answer | Points |
----+-------+---------+--------+
1 | 1 | 1 | 5 |
2 | 1 | 2 | 3 |
3 | 1 | 3 | 3 |
4 | 2 | 1 | 4 |
if they are same all records with refID are deleted.
Explanation when temp1 has this data
ID | RefID | Answer | Points |
----+-------+---------+--------+
12 | 1 | 1 | 5 |
13 | 1 | 2 | 0 |
14 | 1 | 3 | 3 |
EDIT: adding another id column questionid solved the update by adding this also in join.
Table structure is now:
ID | RefID | Qid |Answer | Points |
----+-------+------+-------+--------+
1 | 1 | 10 | 1 | 5 |
2 | 1 | 11 | 2 | 0 |
3 | 1 | 12 | 3 | 3 |
4 | 2 | 11 | 1 | 4 |
SQL for update is: (fiddle http://sqlfiddle.com/#!18/00f87/1/1) :
update table1
set table1.answer = temp1.answer,
table1.points = temp1.points
from table1
join temp1 on table1.refid = temp1.refid and table1.qid = temp1.qid
where table1.answer != temp1.answer or table1.points != temp1.points;
SELECT ID, refid, answer, points
FROM table1
How can I make the deletion case, if data is same ?
You need to add one more condition in the join to exactly match the column.Try this one.
update table1
set table1.answer=temp1.answer,
table1.points=temp1.points
from
table1 join temp1 on table1.refid=temp1.refid and **table1.ID=temp1.ID**
where table1.answer!=temp1.answer or table1.points!=temp1.points
I would first do the delete, and only then the update.
The reason for this is that once you've deleted all the records where the three columns are the same, your update statement becomes simpler - you only need the join, and no where clause:
DELETE t1
FROM table1 AS t1
JOIN temp1 ON t1.refid = temp1.refid
AND t1.qid = temp1.qid
AND t1.answer=temp1.answer
AND t1.points=temp1.points
UPDATE t1
SET answer = temp1.answer,
points = temp1.points
FROM table1 AS t1
JOIN temp1 ON t1.refid=temp1.refid
AND t1.qid = temp1.qid
I think from what i understood that you need to use id instead of refid or both if id is unique

Find list of values in list of values

I'm trying to write a sql with a where clause, that checks if any element in a list is in another list. Is there a shorter way to accomplish this rather than check each member of the first list?
SELECT * from FOO
WHERE FOO.A IN ('2','3', '5', '7','11','13','17','19') OR
FOO.B IN ('2','3', '5', '7','11','13','17','19') OR
FOO.C IN ('2','3', '5', '7','11','13','17','19') OR
FOO.D IN ('2','3', '5', '7','11','13','17','19') OR
FOO.E IN ('2','3', '5', '7','11','13','17','19') OR
FOO.F IN ('2','3', '5', '7','11','13','17','19')
That is the simplified sql.
Was trying not to muddy waters too much, but since you ask:
Ultimately what I am trying to do here is, select rows from FOO, that has columns fulfilling various criteria. These criteria are stored in a second table (call it BAR), mainly db, name, type must match and flag must be 1. Was planning to build the IN list from BAR, comparing them with column names in INFORMATION_SCHEMA.COLUMNS containing FOO
FOO:
+--------+--------+---------+---------+--------+-------+
| DB | Name | Type | Col1 | Col2 | Col3 |
+--------+--------+---------+---------+--------+-------+
| 4 | AC1 | LO | 1 | 10 | 2 |
| 4 | AC1 | HI | 2 | 20 | 4 |
| 1 | DC2 | HI-HI | 11 | 5 | 2 |
| 1 | DC2 | HI | 22 | 10 | 4 |
| 1 | DC2 | LO | 33 | 15 | 6 |
+--------+--------+---------+---------+--------+-------+
BAR:
+--------+--------+---------+---------+--------+
| DB | Name | Type | Field | Flag |
+--------+--------+---------+---------+--------+
| 4 | AC1 | LO | Col1 | 1 |
| 4 | AC1 | HI | Col1 | 1 |
| 1 | DC2 | HI-HI | Col1 | 1 |
| 1 | DC2 | HI | Col1 | 1 |
| 1 | DC2 | LO | Col1 | 1 |
| 4 | AC1 | LO | Col2 | 0 |
| 4 | AC1 | HI | Col2 | 0 |
| 1 | DC2 | LO | Col2 | 0 |
| 1 | DC2 | HI-HI | Col2 | 0 |
| 1 | DC2 | HI | Col2 | 0 |
| 4 | AC1 | LO | Col3 | 0 |
| 4 | AC1 | HI | Col3 | 0 |
| 1 | DC2 | LO | Col3 | 0 |
| 1 | DC2 | HI-HI | Col3 | 0 |
| 1 | DC2 | HI | Col3 | 0 |
+--------+--------+---------+---------+--------+
On first examination, it would seem your schema is not appropriate for the type of query you're performing. It seems like you would want a FOOVAL table with a type and a value then you're query simply becomes:
CREATE TABLE FOOVAL
{
ID int, -- References FOO.ID
TYPE char, -- A, B, C, D, E, F
VAL int
}
SELECT * FROM FOO WHERE FOO.ID IN
(SELECT DISTINCT FOOVAL.ID WHERE FOOVAL.VAL IN ('2','3', '5', '7','11','13','17','19'))
Your method probably performs the best. Here is an alternative that only requires creating the list once. It uses a CTE to create a list of the values and then an exists clause to check whether any values match:
with vals as (
select '2' as p union all
select '3' union all
select '5' union all
select '7' union all
select '11' union all
select '13' union all
select '17' union all
select '19'
)
select *
from foo
where exists (select 1 from vals where vals.p in (foo.A, foo.B, foo.C, foo.D, foo.E, foo.F))
If you are using a database that doesn't support CTEs, you can just put the code in the where clause:
select 8
from foo
where exists (select 1
from (select '2' as p union all
select '3' union all
select '5' union all
select '7' union all
select '11' union all
select '13' union all
select '17' union all
select '19'
) t
where vals.p in (foo.A, foo.B, foo.C, foo.D, foo.E, foo.F)
)
If you are using Oracle, then you need to add from dual in the statements after the string constants. Otherwise, I think one or the other should work in any SQL database.
While it is not exactly clear what you want to do with the data, since you are using SQL Server my suggestion would be to use the UNPIVOT function to turn the col1, col2 and col3 columns into rows which will make it easier to filter the data:
select db, name, type, col, value
from foo
unpivot
(
value
for col in (Col1, Col2, Col3)
) unpiv;
See SQL Fiddle with Demo. This gives the data in the following format:
| DB | NAME | TYPE | COL | VALUE |
------------------------------------
| 4 | AC1 | LO | Col1 | 1 |
| 4 | AC1 | LO | Col2 | 10 |
| 4 | AC1 | LO | Col3 | 2 |
| 4 | AC1 | HI | Col1 | 2 |
Once the is in the row format, it should be significantly easier to apply any filters or even join to your BAR table.

TSQL select the from two rows that has higher priority and is not null

I try to consolidate two rows of the same table whereas each row has a priority.
The value of interest is the value having priority 1 if it is not NULL; otherwise the value with priority 0.
An example data source could be:
| Id | GroupId | Priority | Col1 | Col2 | Col3 | ... | Coln |
-----------------------------------------------------------------
| 1 | 1 | 0 | NULL | 4711 | 3.41 | ... | f00 |
| 2 | 1 | 1 | NULL | NULL | 2.83 | ... | bar |
| 3 | 2 | 0 | NULL | 4711 | 3.41 | ... | f00 |
| 4 | 2 | 1 | 23 | NULL | 2.83 | ... | NULL |
and I want to have:
| GroupId | Col1 | Col2 | Col3 | ... | Coln |
-------------------------------------------------
| 1 | NULL | 4711 | 2.83 | ... | bar |
| 2 | 23 | 4711 | 2.83 | ... | f00 |
Is there a generic way in TSQL without the need to check each column explicitly?
SELECT
t1.GroupId,
ISNULL(t2.Col1, t1.Col1) as Col1,
ISNULL(t2.Col2, t1.Col2) as Col2,
ISNULL(t2.Col3, t1.Col3) as Col3,
...
ISNULL(t2.Coln, t1.Coln) as Coln
FROM mytable t1
JOIN mytable t2 ON t1.GroupId = t2.GroupId
WHERE
t1.Priority = 0 AND
t2.Priority = 1
Regards
I'll elaborate the ROW_NUMBER() solution that #KM suggested since IMO it's the best solution for this. (In CTE form for easier readability)
WITH cte AS (
SELECT
t1.GroupId,
t1.Col1,
t1.Col2,
ROW_NUMBER() OVER(PARTITION BY t1.GroupId ORDER BY ISNULL(GroupId ,-1) ) AS [row_id]
FROM
mytable t1
)
SELECT
*
FROM
cte
WHERE
row_id = 1
That will give you the row with the highest priority (according to your rules) for each GroupId in mytable.
ROW_NUMBER and RANK are two of my favorite TSQL tricks. http://msdn.microsoft.com/en-us/library/ms186734.aspx
edit: Another favorite of mine is PIVOT/UNPIVOT which you can use to transpose rows/columns which is another way of going about this type of problem. http://msdn.microsoft.com/en-us/library/ms177410.aspx
I think this would do what you are asking for without using isnull for every column
select
*
from
mytable t1
where
priority=(select max(priority) from mytable where groupid=t1.groupid group by groupid)