Select rows that are different in SQL - sql

I have a table with way too many columns and a couple million rows that I need to query for differences.
On these rows there will hopefully be only one column that is different and that should be the Auto incremented id field.
What I need to do is check to see if these rows ARE actually the same and if there are any that have any differences in any of the fields.
So for example, if the "Name" column is supposed to be "Peter, Paul and Mary" and the "Order #" column is supposed to be "132" I need to find any rows where those values aren't true, but I need to find it for every column in the table AND I don't actually know what the correct values are (meaning I can't just create a "SELECT...WHERE Name='This'" for each column).
So how can I find the rows that are different? (using straight SQL, no programming)

Would you think this answer is what you are looking for and would help you? here's a Link to find the appropriate sql query.
Let's suppose you coded a email newsletter signup form, but you forgot to double check that the email address was not a duplicate, or already in the database. We can write a query to find all the emails in our table that are duplicates, or occurs in more than one row.
The following SQL query works great for finding duplicate values in a table.
SELECT email,
COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
By using group by and then having a count greater than one, we find rows with with duplicate email addresses using the above SQL.
Blockquote

If you know the limit of the wrong results (say 10 for example) then you could order them and get only the first 11 results. You see where I am going with this, right?
I have no SQL expertise whatsoever though :)

Do you need to do this programmatically, or can you just run a few queries yourself to check it?
If the latter, I'd just do "select distinct name, order#" to start. This should return a list that includes "Peter Paul and Mary, 132" and possibly some other things.
Then find the other things by doing select ... where name = "this" as you suggest.
You could get even more info out of that first query by doing "select distinct name, order#, count(*) from ... group by name, order#". This would give you both the list of values and the frequency of a given set of values.

if I understand you correctly, (your question is not 100% clear to me), you are tryin g to find the rows that are unnecessary duplicates ? If so, Try these SQL queries:
Select A.Id, B.Id
From Table A
Join Table B
On A.Id <> B.Id
And A.ColA = B.ColA
And A.ColB = B.Col
And A.ColC = B.ColC
...
Or
Select ColA, ColB, etc.
From Table
Group By ColA, ColB, etc.
Having Count(*) > 1

If you have a correlation between two "independent" columns where there is really only one "correct" value for column B whenever column A is a given value, then you have a broken database design, because these correlation should have been factored out as a separate table.

Try this:
SELECT Name, OrderNum
FROM Orders T1
FULL OUTER JOIN (
SELECT Name, OrderNum
FROM Orders
GROUP BY Name, OrderNum
HAVING COUNT(*) > 1) T2
ON T1.Name = T2.Name
AND T1.OrderNum = T2.OrderNum
The nested select is identifying the duplicates, so you will need to target your common fields, the FULL OUTER JOIN excludes the duplicates from your result set. So essentially you are joining the table on itself to identify the duplicates and exclude them from your results. If you want only the duplicates then change the FULL OUTER JOIN to just JOIN.

Related

table with duplicate id inner join table no duplicate id

in the last select result i see duplicate id . how to remove please the duplicate . see the attached picture
3 select query
While your sample data does not allow a 100% accurate answer, here are some guidelines that will hopefully help you.
To avoid duplicates in a join when no column can be used to uniquely identify the duplicate records, I would suggest to suppress them in a subquery, and then join the subquery with the other table.
As you seem to have true duplicates, meaning all columns that you retrive from the child table have identical values, then DISTINCT should to the trick :
SELECT i.*, c.TELEPHONE_NUM
FROM [dbo].[GT_Import] AS i
JOIN (
SELECT DISTINCT TELEPHONE_NUM
FROM [dbo].[ComplainSubscriber_Import]
) AS c ON c.TELEPHONE_NUM = i.TELEPHONE_NUM
WHERE ...
You can add more columns to the subquery, as long as they do have duplicated values.

How to query if just one record of a grouped result set satisfies a condition?

I've created this db fiddle example to help illustrate the issue. Basically I want the last four records that were inserted into the table to be excluded from the result set. The reason they should be excluded and the first four records should be included is because I'm looking for just the records from the same person which have the same amount and at least one of those records has the description of 'fee'. However I'm not sure how to check for the last part of that sentence. Any ideas?
This would do, feel free to ask any concern,
Select *
from tableA C
where exists(
Select * From
tableA a JOIN
tableA b on a.PersonId=b.PersonId and a.Amount=b.Amount and a.Description ='fee' and
b.Description!='fee'
where C.PersonId=a.PersonId and C.Amount=a.Amount)

Nesting SQL queries

I have the following SQL query. In this query, I am left joining a table tblTestImport with access query named "unique". I am trying to integrate the query "unique" into the code below. I am not having any luck, please help.
DELETE tblTestImport.ID
FROM tblTestImport
WHERE tblTestImport.[ID]
in (SELECT tblTestImport.ID
FROM tblTestImport
LEFT JOIN **[unique]**
ON tblTestImport.ID = **unique.**LastOfID
WHERE (((**unique.**LastOfID) Is Null)));
Code for "unique" query
SELECT Last(tblTestImport.ID) AS LastOfID
FROM tblTestImport
GROUP BY tblTestImport.Url, tblTestImport.Kms, tblTestImport.Price, tblTestImport.Time;
Further info: I am trying to delete duplicates from the Access Table, and leave unique ones only.
tblTestImport has duplicate records. "Unique" query displays unique records. Then, I join tblTestImport table with "unique" query, to determine which unique records do not exist in tblTestImport. This gives me a list duplicates, which I want to delete.
Within the big chunk of code, i have [unique], which I would like replace with small piece of code below.
Your query will return no results, assuming that id is never NULL.
Why? Well unique.lastOfId is always a valid id in tblTestImport. As such, the LEFT JOIN will always match, and lastOfId will never be NULL.
So, no rows are in the subquery.
My suggestion is that you ask another question. Explain what you want to do and provide sample data and desired results.
It's hard to tell exactly what you're after based on the current setup of your question. However, this may be what you're looking for:
DELETE tblTestImport.ID
FROM tblTestImport
WHERE tblTestImport.[ID] IN
(SELECT tblTestImport.ID
FROM tblTestImport
LEFT JOIN [unique] ON tblTestImport.ID = unique.LastOfID
WHERE unique.LastOfID Is Null
AND tblTestImport.ID IN (
SELECT Last(tblTestImport.ID) AS LastOfID
FROM tblTestImport
GROUP BY tblTestImport.Url, tblTestImport.Kms, tblTestImport.Price, tblTestImport.Time)
)
(This is a simple nested query in the previous query, but it would all depend on the relationship of the data itself. This query assumes that they are linked on the same tblTestImport.ID field)
Try the following, untested:
DELETE FROM tblTestImport
WHERE ID <> (SELECT Min(ID) AS MinOfID FROM tblTestImport AS Dupe
WHERE (Dupe.Url= tblTestImport.Url)
AND (Dupe.Kms= tblTestImport.Kms)
AND (Dupe.Price= tblTestImport.Price)
AND (Dupe.Time= tblTestImport.Time));
Solution based on the following http://allenbrowne.com/subquery-01.html

Getting way more results than expected in SQL left join query

My code is such:
SELECT COUNT(*)
FROM earned_dollars a
LEFT JOIN product_reference b ON a.product_code = b.product_code
WHERE a.activity_year = '2015'
I'm trying to match two tables based on their product codes. I would expect the same number of results back from this as total records in table a (with a year of 2015). But for some reason I'm getting close to 3 million.
Table a has about 40,000,000 records and table b has 2000. When I run this statement without the join I get 2,500,000 results, so I would expect this even with the left join, but somehow I'm getting 300,000,000. Any ideas? I even refered to the diagram in this post.
it means either your left join is using only part of foreign key, which causes row multiplication, or there are simply duplicate rows in the joined table.
use COUNT(DISTINCT a.product_code)
What is the question are are trying to answer with the tsql?
instead of select count(*) try select a.product_code, b.product_code. That will show you which records match and which don't.
Should also add a where b.product_code is not null. That should exclude the records that don't match.
b is the parent table and a is the child table? try a right join instead.
Or use the table's unique identifier, i.e.
SELECT COUNT(a.earned_dollars_id)
Not sure what your datamodel looks like and how it is structured, but i'm guessing you only care about earned_dollars?
SELECT COUNT(*)
FROM earned_dollars a
WHERE a.activity_year = '2015'
and exists (select 1 from product_reference b ON a.product_code = b.product_code)

How do I write an SQL query to identify duplicate values in a specific field?

This is the table I'm working with:
I would like to identify only the ReviewIDs that have duplicate deduction IDs for different parameters.
For example, in the image above, ReviewID 114 has two different parameter IDs, but both records have the same deduction ID.
For my purposes, this record (ReviewID 114) has an error. There should not be two or more unique parameter IDs that have the same deduction ID for a single ReviewID.
I would like write a query to identify these types of records, but my SQL skills aren't there yet. Help?
Thanks!
Update 1: I'm using TSQL (SQL Server 2008) if that helps
Update 2: The output that I'm looking for would be the same as the image above, minus any records that do not match the criteria I've described.
Cheers!
SELECT * FROM table t1 INNER JOIN (
SELECT review_id, deduction_id FROM table
GROUP BY review_id, deduction_id
HAVING COUNT(parameter_id) > 1
) t2 ON t1.review_id = t2.review_id AND t1.deduction_id = t2.deduction_id;
http://www.sqlfiddle.com/#!3/d858f/3
If it is possible to have exact duplicates and that is ok, you can modify the HAVING clause to COUNT(DISTINCT parameter_id).
Select ReviewID, deduction_ID from Table
Group By ReviewID, deduction_ID
Having count(ReviewID) > 1
http://www.sqlfiddle.com/#!3/6e113/3 has an example
If I understand the criteria: For each combination of ReviewID and deduction_id you can have only one parameter_id and you want a query that produces a result without the ReviewIDs that break those rules (rather than identifying those rows that do). This will do that:
;WITH review_errors AS (
SELECT ReviewID
FROM test
GROUP BY ReviewID,deduction_ID
HAVING COUNT(DISTINCT parameter_id) > 1
)
SELECT t.*
FROM test t
LEFT JOIN review_errors r
ON t.ReviewID = r.ReviewID
WHERE r.ReviewID IS NULL
To explain: review_errors is a common table expression (think of it as a named sub-query that doesn't clutter up the main query). It selects the ReviewIDs that break the criteria. When you left join on it, it selects all rows from the left table regardless of whether they match the right table and only the rows from the right table that match the left table. Rows that do not match will have nulls in the columns for the right-hand table. By specifying WHERE r.ReviewID IS NULL you eliminate the rows from the left hand table that match the right hand table.
SQL Fiddle