How to find missing data from duplicate table without using 'EXCEPT' - sql

I used to use 'EXCEPT' to find missing data from 2 tables that should have the same data but was told not to use it anymore. I found a solution but I'm not entirely sure how it works. Could someone explain it to me or help me with another solution?
This is a basic example of my query:
SELECT MIN(C.TABLE_NAME) as TABLE_NAME,columnid,column
FROM
(
SELECT DISTINCT 'Source' as TABLE_NAME,columnid,column
FROM table1
UNION ALL
SELECT DISTINCT 'Output' as TABLE_NAME,columnid,column
FROM table2
) AS C
GROUP BY columnid,column
HAVING COUNT(*) = 1;
The output result shouldn't display any rows if the data is matching. The above code works as intended as I tested it on a table where I know the data is matching and not matching. I'm just not sure how it works. Sorry for the simple question. I'm new to this.
Edit:
I quickly made some sample data if it helps.
WITH salesman AS
(
SELECT 5005 AS id, 'Pit Alex' AS [name]
UNION ALL
SELECT 5006 AS id, 'Mc Lyon' AS [name]
UNION ALL
SELECT 5011 AS id, 'Lauson Hen' AS [name]
UNION ALL
SELECT 5007 AS id, 'Paul Adam' AS [name]
) ,
salesmancopy AS
(
SELECT 5005 AS id, 'Pit Alex' AS [name]
UNION ALL
SELECT 5006 AS id, 'Mc Lyon' AS [name]
UNION ALL
SELECT 5010 AS id, 'Lauson Hen' AS [name]
)
SELECT MIN(C.TABLE_NAME) as TABLE_NAME,id,[name]
FROM
(
SELECT DISTINCT 'original' as TABLE_NAME,id,[name]
FROM salesman
UNION ALL
SELECT DISTINCT 'copy' as TABLE_NAME,id,[name]
FROM salesmancopy
) AS C
GROUP BY id,[name]
HAVING COUNT(*) = 1;

If you want rows from table1 that are not in table2 then your solution will work only if table2 does not contain some unique rows. In other words, rows in table2 have to exist in table1. Another solution is to use NOT EXISTS
select *
from table1 t1
where not exists (
select 1
from table2 t2
where t1.columnid = t2.columnid and
t1.column = t2.column
)
Here you can see a comparison of different approaches to this problem where NOT EXISTS solution is prefered over LEFT JOIN + IS NULL solution.

Except is the fastest method to determinate if exists on one side.
But if you want to check both tables on single go you could use FULL OUTER JOIN
IF OBJECT_ID('tempdb..#t1') IS NOT NULL
DROP TABLE #t1;
IF OBJECT_ID('tempdb..#t2') IS NOT NULL
DROP TABLE #t2;
SELECT *
INTO #t1
FROM (SELECT 1 AS num UNION SELECT 2 AS num UNION SELECT 3 AS num) d;
SELECT *
INTO #t2
FROM (SELECT 1 AS num UNION SELECT 2 AS num UNION SELECT 5 AS num) d;
SELECT *
FROM #t1
FULL OUTER JOIN #t2
ON #t2.num = #t1.num
WHERE #t1.num IS NULL
OR #t2.num IS NULL;
Output:

To get around the issue mentioned by Radim Bača, ie if you would need to figure out the differences between two tables, when rows dont exist in table2 but they exist in table1, you can choose the following option.
1.Create two column to indicate if the record is from original or copy
2.group by the columns you wish to compare.
3.use the clause having count(orig)<> count(copy)
use the same query as before with small changes,
WITH salesman AS
(
SELECT 5005 AS id, 'Pit Alex' AS [name]
UNION ALL
SELECT 5006 AS id, 'Mc Lyon' AS [name]
UNION ALL
SELECT 5011 AS id, 'Lauson Hen' AS [name]
UNION ALL
SELECT 5007 AS id, 'Paul Adam' AS [name]
) ,
salesmancopy AS
(
SELECT 5005 AS id, 'Pit Alex' AS [name]
UNION ALL
SELECT 5006 AS id, 'Mc Lyon' AS [name]
UNION ALL
SELECT 5010 AS id, 'Lauson Hen' AS [name]
)
SELECT c.id
,c.name
,count(orig) as present_in_orig
,count(copy) as present_in_copy
FROM
(
SELECT 'original' as orig
,null as copy
,id
,[name]
FROM salesman
UNION ALL
SELECT null as orig
,'copy' as copy
,id
,[name]
FROM salesmancopy
) AS C
GROUP BY id
,[name]
HAVING COUNT(copy)<> count(orig)
order by 1,2
See the following link from Stew who details this method very nicely.
https://stewashton.wordpress.com/2014/02/04/compare-and-sync-tables-tom-kyte-and-group-by/

Related

Exists - Not exists - Exclude records those are having status in 0 ignoring other status associated with that record

Below is my data.
with cte as(
select 'A' name, 0 status
union all select 'A' name, 1 status
union all select 'B' name, 1 status
union all select 'C' name, 2 status
union all select 'D' name, 1 status
)
I want to get only B, C, D as output from the query. Lets say, 0 is status-complete & I want to ignore records associated with it.
This I am able to achieve using the not in clause as below.
select * from cte c
where c.name not in (select cf.name from cte cf where cf.status=0)
But I want to achieve this using exists or not exists clause in where condition.
Could you please share the logic ?
thanks,
Can you please try with this:
SELECT * FROM cte c
WHERE NOT EXISTS (SELECT cf.name
FROM cte cf WHERE c.name = cf.name AND cf.status = 0)
For this we don't need any column in the where clause because we are addressing that conditional column as comparison in WHERE of sub query.
Please try this
with cte as(
select 'A' name, 0 status
union all select 'A' name, 1 status
union all select 'B' name, 1 status
union all select 'C' name, 2 status
union all select 'D' name, 1 status
)
Select * from cte c
where NOT EXISTS (select 1 from cte cf where cf.status=0 AND c.name = cf.name)
With NOT EXISTS
with cte as(
select 'A' name, 0 status
union all select 'A' name, 1 status
union all select 'B' name, 1 status
union all select 'C' name, 2 status
union all select 'D' name, 1 status
)
select * from cte out where NOT EXISTS
(select inn.name from cte inn WHERE out.name = inn.name and inn.status=0)
DECLARE #tbl1 AS TABLE
(
Name VARCHAR(50),
Status INT
)
INSERT INTO #tbl1 VALUES('A',0)
INSERT INTO #tbl1 VALUES('A',1)
INSERT INTO #tbl1 VALUES('B',1)
INSERT INTO #tbl1 VALUES('C',1)
INSERT INTO #tbl1 VALUES('D',1)
INSERT INTO #tbl1 VALUES('E',0)
With Not EXISTS:
SELECT
*
FROM #tbl1 T1
WHERE NOT EXISTS( SELECT T2.Name FROM #tbl1 T2 WHERE T2.Status=0 AND T1.Name=T2.Name)
With EXISTS:
SELECT
*
FROM #tbl1 T1
WHERE EXISTS( SELECT T2.Name FROM #tbl1 T2 WHERE T1.Name=T2.Name AND T1.Status=1 GROUP BY T2.Name having count(T2.Status)=1 )
Output:

SQL Having count logic

i need help on HAVING COUNT , i have a result set of data below:
CREATE TABLE #tmpTest1 (Code VARCHAR(50), Name VARCHAR(100))
INSERT INTO [#tmpTest1]
(
[Code],
[Name]
)
SELECT '160215-039','ROBIN'
UNION ALL SELECT '160215-039','ROBIN'
UNION ALL SELECT '160215-046','SENGAROB'
UNION ALL SELECT '160215-046','BABYPANGET'
UNION ALL SELECT '160215-045','JONG'
UNION ALL SELECT '160215-045','JAPZ'
UNION ALL SELECT '160215-044','AGNES'
UNION ALL SELECT '160215-044','AGNES'
UNION ALL SELECT '160215-041','BABYTOT'
UNION ALL SELECT '160215-041','BABYTOT'
UNION ALL SELECT '160215-041','BABYTOT'
i want to show only the rows that have the same code but different name , so in this case my expected result is below since those are have the same code but different name:
160215-045 JAPZ
160215-045 JONG
160215-046 BABYPANGET
160215-046 SENGAROB
but when i try to group the two columns then use the having count, below is my query:
SELECT [Code], [Name] FROM [#tmpTest1]
GROUP BY [Code], [Name] HAVING COUNT([Code]) > 1
It gives me wrong result below which have the rows that have the same code and name, it is the opposite of what i want.
160215-044 AGNES
160215-041 BABYTOT
160215-039 ROBIN
How can i get my expected output ?
Thanks in advance, any help would much appreciated.
I believe this query will give you the result you want, although your original question is a bit unclear.
SELECT t1.[Code], t1.[Name]
FROM [#tmpTest1] t1
INNER JOIN
(
SELECT [Code]
FROM [#tmpTest1]
GROUP BY [Code]
HAVING COUNT(DISTINCT [Name]) > 1
) t2
ON t1.[Code] = t2.[Code]
Follow the link below for a running demo:
SQLFiddle
If you want rows with the same code and name, then use window functions:
select t.*
from (select t.*, count(*) over (partition by code, name) as cnt
from #temptest1 t
) t
where cnt >= 2;
From your comment
if there is 1 different name for the codes , i want to show those
records for me to know that there is one differs to others..
This sounds like an exists query because you want to check if another row with the same code but different name exists.
select * from [#tmpTest1] t1
where exists (
select 1 from [#tmpTest] t2
where t2.code = t1.code
and t2.name <> t1.name
)

How to avoid Sorting in Union ALL

MY question is simple, How do you avoid the automatic sorting which the UNION ALL query does?
This is my query
SELECT * INTO #TEMP1 FROM Final
SELECT * INTO #TEMP2 FROM #TEMP1 WHERE MomentId = #MomentId
SELECT * INTO #TEMP3 FROM #TEMP1 WHERE RowNum BETWEEN #StartRow AND #EndRow
SELECT * INTO #TEMP4 FROM (SELECT *FROM #TEMP3 UNION ALL SELECT *FROM #TEMP2) as tmp
SELECT DISTINCT * FROM #TEMP4
I'm using SQL Server 2008. I need the Union ALL to perform like a simple Concatenate, which it isn't! Appreciate your help in this.
I think you're mistaken on which operation is actually causing the sort. Check the code below, UNION ALL will not cause a sort. You may be looking at the DISTINCT operation, which uses a sort (it sorts all items and the eliminates duplicates)
CREATE TABLE #Temp1
(
i int
)
CREATE TABLE #temp2
(
i int
)
INSERT INTO #Temp1
SELECT 3 UNION ALL
SELECT 1 UNION ALL
SELECT 8 UNION ALL
SELECT 2
INSERT INTO #Temp2
SELECT 7 UNION ALL
SELECT 1 UNION ALL
SELECT 5 UNION ALL
SELECT 6
SELECT * INTO #TEMP3
FROM (SELECT * FROM #Temp1 UNION ALL SELECT * FROM #temp2) X
UNION ALL adds all the records where as UNION adds only new/distinct records.
Since you are using UNION ALL and using DISTINCT soon after, I think you are looking for UNION
SELECT * INTO #TEMP4 FROM
(
SELECT * FROM #TEMP3
UNION --JUST UNION
SELECT * FROM #TEMP2
) AnotherTemp
Or you can simplify it as
SELECT * INTO #TEMP4 FROM
SELECT DISTINCT *
FROM Final
WHERE MomentId = #MomentId OR RowNum BETWEEN #StartRow AND #EndRow
I'm not familiar with SQL-Server, but you might get my idea
select *, 'A' tid, rownumber() tno from tableA
union all
select *, 'B', rownumber() from tableB
order by tid, tno;
This should get you all records of tableA in their specific order, followed by all records of tableB in their specific order.

SELECT COUNT(DISTINCT [name]) from several tables

I can perform the following SQL Server selection of distinct (or non-repeating names) from a column in one table like so:
SELECT COUNT(DISTINCT [Name]) FROM [MyTable]
But what if I have more than one table (all these tables contain the name field called [Name]) and I need to know the count of non-repeating names in two or more tables.
If I run something like this:
SELECT COUNT(DISTINCT [Name]) FROM [MyTable1], [MyTable2], [MyTable3]
I get an error, "Ambiguous column name 'Name'".
PS. All three tables [MyTable1], [MyTable2], [MyTable3] are a product of a previous selection.
After the clarification, use:
SELECT x.name, COUNT(x.[name])
FROM (SELECT [name]
FROM [MyTable]
UNION ALL
SELECT [name]
FROM [MyTable2]
UNION ALL
SELECT [name]
FROM [MyTable3]) x
GROUP BY x.name
If I understand correctly, use:
SELECT x.name, COUNT(DISTINCT x.[name])
FROM (SELECT [name]
FROM [MyTable]
UNION ALL
SELECT [name]
FROM [MyTable2]
UNION ALL
SELECT [name]
FROM [MyTable3]) x
GROUP BY x.name
UNION will remove duplicates; UNION ALL will not, and is faster for it.
EDIT: Had to change after seeing recent comment.
Does this give you what you want? This gives a count for each person after combining the rows from all tables.
SELECT [NAME], COUNT(*) as TheCount
FROM
(
SELECT [Name] FROM [MyTable1]
UNION ALL
SELECT [Name] FROM [MyTable2]
UNION ALL
SELECT [Name] FROM [MyTable3]
) AS [TheNames]
GROUP BY [NAME]
Here's another way:
SELECT x.name, SUM(x.cnt)
FROM ( SELECT [name], COUNT(*) AS cnt
FROM [MyTable]
GROUP BY [name]
UNION ALL
SELECT [name], COUNT(*) AS cnt
FROM [MyTable2]
GROUP BY [name]
UNION ALL
SELECT [name], COUNT(*) AS cnt
FROM [MyTable3]
GROUP BY [name]
) AS x
GROUP BY x.name
In case you have different amounts of columns per table, like:
table1 has 3 columns,
table2 has 2 columns,
table3 has 1 column
And you want to count the amount of distinct values of different column names, what it was useful to me in AthenaSQL was to use CROSS JOIN since your output would be only one row, it would be just 1 combination:
SELECT * FROM (
SELECT COUNT(DISTINCT name1) as amt_name1,
COUNT(DISTINCT name2) as amt_name2,
COUNT(DISTINCT name3) as amt_name3,
FROM table1 ) t1
CROSS JOIN
(SELECT COUNT(DISTINCT name4) as amt_name4,
COUNT(DISTINCT name5) as amt_name5,
MAX(t3.amt_name6) as amt_name6
FROM table2
CROSS JOIN
(SELECT COUNT(DISTINCT name6) as amt_name6
FROM table3) t3) t2
Would return a table with one row and their counts:
amt_name1 | amt_name2 | amt_name3 | amt_name4 | amt_name5 | amt_name6
4123 | 675 | 564 | 2346 | 18667 | 74567

SQL Select Condition Question

I have a quick question about a select statement condition.
I have the following table with the following items. What I need to get is the object id that matches both type id's.
TypeId ObjectId
1 10
2 10
1 11
So I need to get both object 10 because it matches type id 1 and 2.
SELECT ObjectId
FROM Table
WHERE TypeId = 1
AND TypeId = 2
Obviously this doesn't work because it won't match both conditions for the same row. How do I perform this query?
Also note that I may pass in 2 or more type id's to narrow down the results.
Self-join:
SELECT t1.ObjectId
FROM Table AS t1
INNER JOIN Table AS t2
ON t1.ObjectId = t2.ObjectId
AND t1.TypeId = 1
AND t2.TypeId = 2
Note sure how you want the behavior to work when passing in values, but that's a start.
I upvoted the answer from #Cade Roux, and that's how I would do it.
But FWIW, here's an alternative solution:
SELECT ObjectId
FROM Table
WHERE TypeId IN (1, 2)
GROUP BY ObjectId
HAVING COUNT(*) = 2;
Assuming uniqueness over TypeId, ObjectId.
Re the comment from #Josh that he may need to search for three or more TypeId values:
The solution using JOIN requires a join per value you're searching for. The solution above using GROUP BY may be easier if you find yourself searching for an increasing number of values.
This code is written with Oracle in mind. It should be general enough for other flavors of SQL
select t1.ObjectId from Table t1
join Table t2 on t2.TypeId = 2 and t1.ObjectId = t2.ObjectId
where t1.TypeId = 1;
To add additional TypeIds, you just have to add another join:
select t1.ObjectId from Table t1
join Table t2 on t2.TypeId = 2 and t1.ObjectId = t2.ObjectId
join Table t3 on t3.TypeId = 3 and t1.ObjectId = t3.ObjectId
join Table t4 on t4.TypeId = 4 and t1.ObjectId = t4.ObjectId
where t1.TypeId = 1;
Important note: as you add more joins, performance will suffer a LOT.
In regards to Bill's answer you can change it to the following to get rid of the need to assume uniqueness:
SELECT ObjectId
FROM (SELECT distinct ObjectId, TypeId from Table)
WHERE TypeId IN (1, 2)
GROUP BY ObjectId
HAVING COUNT(*) = 2;
His way of doing it scales better as the number of types gets larger.
Try this
Sample Input:(Case 1)
declare #t table(Typeid int,ObjectId int)
insert into #t
select 1,10 union all select 2,10 union all
select 1,11
select * from #t
Sample Input:(Case 2)
declare #t table(Typeid int,ObjectId int)
insert into #t
select 1,10 union all select 2,10 union all
select 3,10 union all select 4,10 union all
select 5,10 union all select 6,10 union all
select 1,11 union all select 2,11 union all
select 3,11 union all select 4,11 union all
select 5,11 union all select 1,12 union all
select 2,12 union all select 3,12 union all
select 4,12 union all select 5,12 union all
select 6,12
select * from #t
Sample Input:(Case 3)[Duplicate entries are there]
declare #t table(Typeid int,ObjectId int)
insert into #t
select 1,10 union all select 2,10 union all
select 1,10 union all select 2,10 union all
select 3,10 union all select 4,10 union all
select 5,10 union all select 6,10 union all
select 1,11 union all select 2,11 union all
select 3,11 union all select 4,11 union all
select 5,11 union all select 1,12 union all
select 2,12 union all select 3,12 union all
select 4,12 union all select 5,12 union all
select 6,12 union all select 3,12
For case 1, the output should be 10
For case 2 & 3, the output should be 10 and 12
Query:
select X.ObjectId from
(
select
T.ObjectId
,count(ObjectId) cnt
from(select distinct ObjectId,Typeid from #t)T
where T.Typeid in(select Typeid from #t)
group by T.ObjectId )X
join (select max(Typeid) maxcnt from #t)Y
on X.cnt = Y.maxcnt