table with duplicate id inner join table no duplicate id - sql

in the last select result i see duplicate id . how to remove please the duplicate . see the attached picture
3 select query

While your sample data does not allow a 100% accurate answer, here are some guidelines that will hopefully help you.
To avoid duplicates in a join when no column can be used to uniquely identify the duplicate records, I would suggest to suppress them in a subquery, and then join the subquery with the other table.
As you seem to have true duplicates, meaning all columns that you retrive from the child table have identical values, then DISTINCT should to the trick :
SELECT i.*, c.TELEPHONE_NUM
FROM [dbo].[GT_Import] AS i
JOIN (
SELECT DISTINCT TELEPHONE_NUM
FROM [dbo].[ComplainSubscriber_Import]
) AS c ON c.TELEPHONE_NUM = i.TELEPHONE_NUM
WHERE ...
You can add more columns to the subquery, as long as they do have duplicated values.

Related

Best way to combine two tables, remove duplicates, but keep all other non-duplicate values in SQL

I am looking for the best way to combine two tables in a way that will remove duplicate records based on email with a priority of replacing any duplicates with the values in "Table 2", I have considered full outer join and UNION ALL but Union all will be too large as each table has several 1000 columns. I want to create this combination table as my full reference table and save as a view so I can reference it without always adding a union or something to that effect in my already complex statements. From my understanding, a full outer join will not necessarily remove duplicates. I want to:
a. Create table with ALL columns from both tables (fields that don't apply to records in one table will just have null values)
b. Remove duplicate records from this master table based on email field but only remove the table 1 records and keep the table 2 duplicates as they have the information that I want
c. A left-join will not work as both tables have unique records that I want to retain and I would like all 1000+ columns to be retained from each table
I don't know how feasible this even is but thank you so much for any answers!
If I understand your question correctly you want to join two large tables with thousands of columns that (hopefully) are the same between the two tables using the email column as the join condition and replacing duplicate records between the two tables with the records from Table 2.
I had to do something similar a few days ago so maybe you can modify my query for your purposes:
WITH only_in_table_1 AS(
SELECT *
FROM table_1 A
WHERE NOT EXISTS
(SELECT * FROM table_2 B WHERE B.email_field = A.email_field))
SELECT * FROM table_2
UNION ALL
SELECT * FROM only_in_table_1
If the columns/fields aren't the same between tables you can use a full outer join on only_in_table_1 and table_2
try using a FULL OUTER JOIN between the two tables and then a COALESCE function on each resultset column to determine from which table/column the resultset column is populated

Why do I get a duplicate column name error only when I SELECT FROM (SELECT)

I imagine this is a really basic oversight on my part but I have an SQL query which works fine. But I when I SELECT from that result (SELECT FROM (SELECT))
I get a 'duplicate column' error. There are duplicate column names, for sure, in two tables where I compare them but they do not cause a problem in the initial result. For example:
SELECT _dia_tagsrel.tag_id,_dia_tagsrel.article_id, _dia_tags.tag_id, _dia_tags.tag
FROM _dia_tagsrel
JOIN _dia_tags
ON _dia_tagsrel.tag_id = _dia_tags.tag_id
Works fine but when I try to select from it, I get the error:
SELECT DISTINCT tag FROM
(SELECT _dia_tagsrel.tag_id,_dia_tagsrel.article_id, _dia_tags.tag_id, _dia_tags.tag
FROM _dia_tagsrel
JOIN _dia_tags
ON _dia_tagsrel.tag_id = _dia_tags.tag_id) a
Regardless of the DISTINCT. Ok, I can change the column names to be unique but the question really is - why do i get the error when I SELECT FROM (SELECT) and not in the initial query?
Thanks
Solution:
SELECT DISTINCT tag_id, tag FROM (SELECT _dia_tagsrel.tag_id, _dia_tagsrel.article_id, _dia_tags.tag
FROM _dia_tagsrel
JOIN _dia_tags
ON _dia_tagsrel.tag_id = _dia_tags.tag_id) a
I only needed to SELECT one of the duplicate columns, even though I was comparing the both of them. Provided by answer below.
In you are second query i.e., the sub query, you are selecting tag_id twice. Though it is from two different tables, it works out whey you are selecting the data. But when you select the columns with same name twice, it provides you duplicate error. Below is the way you have selected the column which is incorrect
_dia_tagsrel.tag_id,_dia_tagsrel.article_id, _dia_tags.tag_id, _dia_tags.tag
While using sub queries, merge, in or exists clause, avoid using the same column names multiple times.
Simple join works out no need of having subquery,
SELECT _dia_tagsrel.tag_id,_dia_tagsrel.article_id, _dia_tags.tag_id, _dia_tags.tag
FROM _dia_tagsrel
JOIN _dia_tags
ON _dia_tagsrel.tag_id = _dia_tags.tag_id
Your first query returns four columns:
tag_id
article_id
tag_id
tag
Duplicate column names are allowed in a result set, but are not allowed in a table -- or derived table, view, CTE, or most subqueries (an exception are EXISTS subqueries).
I hope you can see the duplicate. There is no need to select tag_id twice, because the JOIN requires that the values are the same. So just select three columns:
SELECT tr.tag_id, tr.article_id, t.tag
FROM _dia_tagsrel tr JOIN
_dia_tags t
ON tr.tag_id = t.tag_id
Your subquery has two tag_ids, so how database engine decide which one you want to use.
So, either use one (join requires tag_ids to be same) or re-name it :
If _dia_tag has unique tags then you can use EXISTS instead of INNER JOIN:
SELECT t.tag
FROM _dia_tags t
WHERE EXISTS (SELECT 1 FROM _dia_tagsrel tr WHERE tr.tag_id = t.tag_id);

Nesting SQL queries

I have the following SQL query. In this query, I am left joining a table tblTestImport with access query named "unique". I am trying to integrate the query "unique" into the code below. I am not having any luck, please help.
DELETE tblTestImport.ID
FROM tblTestImport
WHERE tblTestImport.[ID]
in (SELECT tblTestImport.ID
FROM tblTestImport
LEFT JOIN **[unique]**
ON tblTestImport.ID = **unique.**LastOfID
WHERE (((**unique.**LastOfID) Is Null)));
Code for "unique" query
SELECT Last(tblTestImport.ID) AS LastOfID
FROM tblTestImport
GROUP BY tblTestImport.Url, tblTestImport.Kms, tblTestImport.Price, tblTestImport.Time;
Further info: I am trying to delete duplicates from the Access Table, and leave unique ones only.
tblTestImport has duplicate records. "Unique" query displays unique records. Then, I join tblTestImport table with "unique" query, to determine which unique records do not exist in tblTestImport. This gives me a list duplicates, which I want to delete.
Within the big chunk of code, i have [unique], which I would like replace with small piece of code below.
Your query will return no results, assuming that id is never NULL.
Why? Well unique.lastOfId is always a valid id in tblTestImport. As such, the LEFT JOIN will always match, and lastOfId will never be NULL.
So, no rows are in the subquery.
My suggestion is that you ask another question. Explain what you want to do and provide sample data and desired results.
It's hard to tell exactly what you're after based on the current setup of your question. However, this may be what you're looking for:
DELETE tblTestImport.ID
FROM tblTestImport
WHERE tblTestImport.[ID] IN
(SELECT tblTestImport.ID
FROM tblTestImport
LEFT JOIN [unique] ON tblTestImport.ID = unique.LastOfID
WHERE unique.LastOfID Is Null
AND tblTestImport.ID IN (
SELECT Last(tblTestImport.ID) AS LastOfID
FROM tblTestImport
GROUP BY tblTestImport.Url, tblTestImport.Kms, tblTestImport.Price, tblTestImport.Time)
)
(This is a simple nested query in the previous query, but it would all depend on the relationship of the data itself. This query assumes that they are linked on the same tblTestImport.ID field)
Try the following, untested:
DELETE FROM tblTestImport
WHERE ID <> (SELECT Min(ID) AS MinOfID FROM tblTestImport AS Dupe
WHERE (Dupe.Url= tblTestImport.Url)
AND (Dupe.Kms= tblTestImport.Kms)
AND (Dupe.Price= tblTestImport.Price)
AND (Dupe.Time= tblTestImport.Time));
Solution based on the following http://allenbrowne.com/subquery-01.html

How do I write an SQL query to identify duplicate values in a specific field?

This is the table I'm working with:
I would like to identify only the ReviewIDs that have duplicate deduction IDs for different parameters.
For example, in the image above, ReviewID 114 has two different parameter IDs, but both records have the same deduction ID.
For my purposes, this record (ReviewID 114) has an error. There should not be two or more unique parameter IDs that have the same deduction ID for a single ReviewID.
I would like write a query to identify these types of records, but my SQL skills aren't there yet. Help?
Thanks!
Update 1: I'm using TSQL (SQL Server 2008) if that helps
Update 2: The output that I'm looking for would be the same as the image above, minus any records that do not match the criteria I've described.
Cheers!
SELECT * FROM table t1 INNER JOIN (
SELECT review_id, deduction_id FROM table
GROUP BY review_id, deduction_id
HAVING COUNT(parameter_id) > 1
) t2 ON t1.review_id = t2.review_id AND t1.deduction_id = t2.deduction_id;
http://www.sqlfiddle.com/#!3/d858f/3
If it is possible to have exact duplicates and that is ok, you can modify the HAVING clause to COUNT(DISTINCT parameter_id).
Select ReviewID, deduction_ID from Table
Group By ReviewID, deduction_ID
Having count(ReviewID) > 1
http://www.sqlfiddle.com/#!3/6e113/3 has an example
If I understand the criteria: For each combination of ReviewID and deduction_id you can have only one parameter_id and you want a query that produces a result without the ReviewIDs that break those rules (rather than identifying those rows that do). This will do that:
;WITH review_errors AS (
SELECT ReviewID
FROM test
GROUP BY ReviewID,deduction_ID
HAVING COUNT(DISTINCT parameter_id) > 1
)
SELECT t.*
FROM test t
LEFT JOIN review_errors r
ON t.ReviewID = r.ReviewID
WHERE r.ReviewID IS NULL
To explain: review_errors is a common table expression (think of it as a named sub-query that doesn't clutter up the main query). It selects the ReviewIDs that break the criteria. When you left join on it, it selects all rows from the left table regardless of whether they match the right table and only the rows from the right table that match the left table. Rows that do not match will have nulls in the columns for the right-hand table. By specifying WHERE r.ReviewID IS NULL you eliminate the rows from the left hand table that match the right hand table.
SQL Fiddle

Select rows that are different in SQL

I have a table with way too many columns and a couple million rows that I need to query for differences.
On these rows there will hopefully be only one column that is different and that should be the Auto incremented id field.
What I need to do is check to see if these rows ARE actually the same and if there are any that have any differences in any of the fields.
So for example, if the "Name" column is supposed to be "Peter, Paul and Mary" and the "Order #" column is supposed to be "132" I need to find any rows where those values aren't true, but I need to find it for every column in the table AND I don't actually know what the correct values are (meaning I can't just create a "SELECT...WHERE Name='This'" for each column).
So how can I find the rows that are different? (using straight SQL, no programming)
Would you think this answer is what you are looking for and would help you? here's a Link to find the appropriate sql query.
Let's suppose you coded a email newsletter signup form, but you forgot to double check that the email address was not a duplicate, or already in the database. We can write a query to find all the emails in our table that are duplicates, or occurs in more than one row.
The following SQL query works great for finding duplicate values in a table.
SELECT email,
COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
By using group by and then having a count greater than one, we find rows with with duplicate email addresses using the above SQL.
Blockquote
If you know the limit of the wrong results (say 10 for example) then you could order them and get only the first 11 results. You see where I am going with this, right?
I have no SQL expertise whatsoever though :)
Do you need to do this programmatically, or can you just run a few queries yourself to check it?
If the latter, I'd just do "select distinct name, order#" to start. This should return a list that includes "Peter Paul and Mary, 132" and possibly some other things.
Then find the other things by doing select ... where name = "this" as you suggest.
You could get even more info out of that first query by doing "select distinct name, order#, count(*) from ... group by name, order#". This would give you both the list of values and the frequency of a given set of values.
if I understand you correctly, (your question is not 100% clear to me), you are tryin g to find the rows that are unnecessary duplicates ? If so, Try these SQL queries:
Select A.Id, B.Id
From Table A
Join Table B
On A.Id <> B.Id
And A.ColA = B.ColA
And A.ColB = B.Col
And A.ColC = B.ColC
...
Or
Select ColA, ColB, etc.
From Table
Group By ColA, ColB, etc.
Having Count(*) > 1
If you have a correlation between two "independent" columns where there is really only one "correct" value for column B whenever column A is a given value, then you have a broken database design, because these correlation should have been factored out as a separate table.
Try this:
SELECT Name, OrderNum
FROM Orders T1
FULL OUTER JOIN (
SELECT Name, OrderNum
FROM Orders
GROUP BY Name, OrderNum
HAVING COUNT(*) > 1) T2
ON T1.Name = T2.Name
AND T1.OrderNum = T2.OrderNum
The nested select is identifying the duplicates, so you will need to target your common fields, the FULL OUTER JOIN excludes the duplicates from your result set. So essentially you are joining the table on itself to identify the duplicates and exclude them from your results. If you want only the duplicates then change the FULL OUTER JOIN to just JOIN.