Duplicate Rows when self joining tables in SQL - sql

I am trying to self join a table together based on the column "Warehouse Number". The goal is to list part numbers, descriptions, and item class of any pairs of parts that are in the same item class and same warehouse. Below is an example of the desired output and starting data.
STARTING DATA
EXAMPLE OF SOME DESIRED DATA
However, when that self join happens, there aren't "exact" duplicates but the pairs appear twice in the table.
EXAMPLE OF OUTPUT WITH PROBLEMS (HIGHLIGHTED)
I have tried most iterations of UNION, INNER JOIN, and other join methods. Is it possible to remove the pairs since it isn't technically an exact duplicate of another row?
Current SQL Code

You may alter your join condition to check that the first part number is strictly less than the second one:
SELECT
t1.PARTNUMB, t1.PARTDESC, t1.ITEMCLSS, t2.PARTNUMB, t2.PARTDESC, t2.ITEMCLSS
FROM PARTFIRST t1
INNER JOIN PARTSECOND t2
ON t1.WRHSNUMB = t2.WRHSNUMB AND
t1.ITEMCLSS = t2.ITEMCLSS AND
t1.PARTNUMB < t2.PARTNUMB;
The problem with using FIRST.PARTNUMB <> SECOND.PARTNUMB is that it would report two different part numbers twice, once on the left/right side and vice-versa. By using a strictly less than inequality, we exclude "duplicates," as you view them.

Related

Natural Joins Creating More Records Than Desired

I have the following tables populated with these records:
I have created a view that looks like this and have selected all records from it:
However, the results are not as expected. Each store location is matched with each craft item, even if they are not supplied to that store.
Even regions that don't have recorded stores display records:
I imagine this has something to do with the natural join being mixed with the left outer join, but I don't understand why.
And based on all the comments others have provided, and you may not be comfortable explicitly with the syntax, think of LEFT side as the first table of a query and right is the second. So a left join implies everything from the left-side table regardless of a match on the other, but if one exists, it only exists based on the matching criteria/condition. For what you have, you are probably looking for something like...
create or replace view detailedCraftRegaion as
select
cr.CraftRegionDescription,
cs.StoreAddress,
cs.StoreCity,
cs.StoreState,
cs.StoreZipCode,
csi.CraftItemName
from
CraftStore cs
JOIN CraftRegion cr
on cs.CraftRegionID = cr.CraftRegionID
JOIN CraftShipItems csi
on cs.CraftStoreID = csi.CraftStoreID

Newbie to SQL I have run the the inner join query but result comes up with columns only

I have run this query in adventureworks but the result is run successfully but i only get the columns instead of the data with columns how so?
select
a.BusinessEntityID,b.bonus,b.SalesLastYear
from
[Sales].[SalesPersonQuotaHistory] a
inner join
[Sales].[SalesPerson] b
on
a.SalesQuota = b.SalesQuota
My best guess is that instead of joining the tables on SalesQuota, you should be joining them on something else - An ID field, typically.
I don't have Adventureworks here, but judging from the names of the tables and the columns that you've provided, I would assume that there's a SalesPersonID field of some sort that actually connects a Salesperson's quota history to the Salesperson him/herself.
I would expect that you're looking for something closer to this:
SELECT
a.BusinessEntityID
,b.bonus
,b.SalesLastYear
FROM [Sales].[SalesPersonQuotaHistory] a
INNER JOIN [Sales].[SalesPerson] b
ON a.SalesPersonID = b.SalesPersonID
General Knowledge:
INNER JOIN means "Show me only entries (rows) that have a matching value on both sides of the condition." (i.e. The value in Table A matches the value in Table B).
So ON a.SalesQuota = b.SalesQuota means "Only where the value of SalesQuota in Table A matches the value of SalesQuota in Table B."
I'm not sure what the purpose of this query could be, since it is entirely possible that two salespeople have the same values in both tables, and then you would get duplicate rows (because the values of SalesQuota would match in both cases), or that the values wouldn't match at all, and then you wouldn't get any rows - I suspect that is what's happening to you.
Consider the conditions of what you're trying to join. Are you really trying to join quota amounts, or are you trying to retrieve quota information for specific salespeople? The answer should help guide your JOIN conditions.

Why does this SQL query need DISTINCT?

I've written a query to filter a table based on criteria found in a master table, and then remove rows that match a third table. I'm executing the query in Access, so I can't use MINUS. It works, but I found that it returns duplicate rows for some, but not all, of the selected records. I fixed it with DISTINCT, but I don't know why it would return duplicates in the first place. It's a pretty simple query:
select distinct sq.*
from
(select List_to_Check.*, Master_List.SELECTION_VAR
from List_to_Check
left join Master_List
on List_to_Check.SUB_ID = Master_List.SUB_ID
where Master_List.SELECTION_VAR = 'criteria'
) as sq
left join List_to_Exclude
on sq.SUB_ID = List_to_Exclude.SUB_ID
where List_to_Exclude.SUB_ID is null
;
Edit: The relationships between all three tables are 1-to-1 on the SUB_ID var. Combined with using a LEFT JOIN, I would expect one line per ID.
I recommend breaking your query apart and checking for duplicates. My guess is that it's your data/ the sub_ID isn't very unique.
Start with you sub query since you're returning all of those columns. If you get duplicates there, your query is going to return duplicates regardless of what is in your exclusion table.
Once you have those duplicates cleared up, check the exclusion table for duplicate sub_Id.
To save time in trouble-shooting, if there are known culprits that are duplicates, you may want to limit the returned values, so you can focus on the peculiarities of those data.
I'm not sure this is a problem, but look into the logic on
on List_to_Check.SUB_ID =
Master_List.SUB_ID
where Master_List.SELECTION_VAR = 'criteria'
Where clauses on data in the right side of a left outer join may not be returning the data you expect. Try this and see what happens:
on List_to_Check.SUB_ID = Master_List.SUB_ID
and Master_List.SELECTION_VAR = 'criteria'
The inner query joins List_to_Check and master but the outer query joins List_to_Exclude with Subscriber(maybe you can change the names i call these 3 tables)
To avoid duplicates you need to use one of the table in both the queries inner and outer. This will avoid duplicates.

sql separate one field into columns depending on value of initial field

I am trying to separate one field in one table, into two columns in a report, populating each column depending on the value of that initial field. Here is how the tables are structured. TABLES VIEW I am quite new to sql, and am learning slowly, but have tried cases, and sub-queries, but no luck... I do hope some kind sole will be able to help me. :P
My database is structured like this:
This is the query I am using
SELECT
Insurance_Folder.code, Rating_Section.rating_section_type_id, Rating_Section.sum_insured, Rating_Section_Type.description
FROM
dbo.Rating_Section
Left Outer Join dbo.Rating_Section_Type ON
Rating_Section.rating_section_type_id = Rating_Section_Type.rating_section_type_id
Left Outer Join dbo.Insurance_Folder
Left Outer Join dbo.Insurance_File ON
Insurance_Folder.insurance_folder_cnt = Insurance_File.insurance_folder_cnt
Left Outer Join dbo.insurance_file_risk_link ON
Insurance_File.insurance_file_cnt = insurance_file_risk_link.insurance_file_cnt and insurance_file_risk_link.risk_cnt = Rating_Section.risk_cnt
WHERE
Rating_Section.rating_section_type_id = 219 or Rating_Section.rating_section_type_id = 228
This gives me the following result ( I can not post another image)
-All the codes from Insurance folder, with each rating section type for each code, on a row. I want to separate the row with second value, and have it put in the same row of the code, in a separate column, depending on the value of rating_section_type_id
And these are the results I am looking for:

How do I invert my join critera in TSQL?

I think this is a relatively basic operation in SQL, but I'm having a hard time figuring it out.
I have two tables, a source table I'm trying to SELECT my data from, and a reference table containing serial #'s and transaction #'s (the source table also has these columns, and more). I want to do two different SELECTs, one where the serial/trans number pair exist in the source table and one where the serial/trans number do not exist. (the combination of serial and trans number are the primary keys for both these tables)
Intially I'm doing this with a join like this:
SELECT * FROM source s
INNER JOIN reference r ON s.serial = r.serial AND s.trans = r.trans
I would think this should give me everything from the source table that has a serial/trans pair matching with one in the reference table. I'm not positive this is correct, but it is returning a reasonable number of results and I think it looks good.
When I go to do the opposite, get everything from source where the serial/trans pair do not match up with one in reference, I encounter a problem. I tried the following query:
SELECT * FROM source s
INNER JOIN reference r ON s.serial <> r.serial AND s.trans <> r.trans
When I run this the query goes on forever, it starts returning way more results than it should, more than are actually in the entire source table. It eventually ends with an OOM exception, I let it run for 20 min+. For some persepctive the source table I'm dealing with has about 13 million records, and reference table has about 105,000.
So how do I get the results I'm looking for? If it is not already clear, the number of results from the first query + results from second query should equal the total number of records in my source table.
Thanks for any help!
I think you need something like NOT EXISTS:
SELECT *
FROM source s
WHERE NOT EXISTS (SELECT 1
FROM reference r
WHERE s.serial = r.serial AND s.trans = r.trans)
The above query will get everything from source where the serial/trans pair do not match up with one in reference.
As cited in comment from #Dan below NOT EXISTS generally has a performance advantage over LEFT JOIN in a situation like this (see this for example).
"LEFT JOIN" match records that match together, and return NULL for those that doesn't fit.
So, non-matching records all have NULL in column coming from reference table. This way, you can than add a where clause to return only records with null value in a non-nullable column of table "reference" (like the primarykey).
SELECT * FROM source s
LEFT JOIN reference r ON s.serial = r.serial AND s.trans = r.trans
WHERE R.serial IS NULL