getting duplicates when joining tables - sql

I have two tables that I want to join. Table1 has sales order, but it doesn’t have the name of the sales person. It only has employee ID. I have table2, that has the names of employees, and employeeID is common between the two tables. Normally I would use an inner join to get the name of the sales person from table2. The problem is that on table2, there are multiple entries for each employee. If they changed manager, or changed roles within the company, or perhaps went on FMLA, it creates a new row. Therefore, when I join the tables, it creates duplicates because of the multiple entries in table2. A sale shows 3 or 4 times in my results.
Select
a.state_name
,order_number
,a.employeeID
,b.Sales_Rep_Name
,a.order_date
from
table1 as A
Inner join table2 as B
On a.employeeid = b.employeeID
where
b.monthperiod = 'November' <-- If I remove this one it adds duplicates
Is there a way to not get these duplicates? I tried distinct but didn’t work. Probably because the rows have at least one column different. I was able to eliminate the duplicates when I added a where clause asking for last month on table 2, but I am in a situation where I need all months, not just one. I have to manually change the month in order to get the full year.
Any help would be appreciated. Thanks

Use a subquery to get list of distinct employee records and then query the sales table

Related

How to return no duplicates for records that have many-to-many values

I am trying to find the correct join construction to join together the relevant customer info from the Rentals table with the Accidents table. I often run into this issue where my joining fields aren't unique but not sure what else to join on. The accidents table only has about 1500 records but when I join it to pull in more customer data, I get like 35k records. I can do some joins but frequently have joins like this at work and I feel like a dummy because I am not sure how to troubleshoot...
SELECT a.*,
r.market,
r.date_of_birth,
r.is_blocked,
r.rate_type
FROM `accidents` a
LEFT JOIN `rentals` r -- I also tried an INNER JOIN
USING (customer_id) -- Other fields I tried to match on: Full Name
ORDER BY accident_dt DESC

Why do repeated values appear in SQL results

I'm with a doubt about joins. For example, using an example database dvdrental, this query:
SELECT customer.customer_id
, first_name
, last_name
FROM customer
INNER JOIN payment ON Customer.customer_id = Payment.customer_id
Some records appear repeated, for example, it appears 3 times "342 Harold Martino" like:
342 Harold Martino
342 Harold Martino
342 Harold Martino
Do you know why it appears repeated records like in this example that appears the same Record 3 times? This repetition means that there are 3 records in the payment table where customer_id = 342? But this query "select * from payment where customer_id = 342" returns 32 records. So I'm not understanding properly how the join works.
There are many resources around this, so to be short your expression says this in plain english:
Get all the records from the customer table
Then for each of those records, get every payment record that has the same value in the customer_Id field.
return a single row for each payment record that duplicates all the fields from the customer record for each row in the payment record.
Finally, only return 3 columns:
the customer_id column from the customer table
the first_name column that is in one of the customer or payment table
the last_name column that is in one of the customer or payment table
Note that we didn't bring back any columns from the payment table... (I assume first_name and last_name are fields in the customer table...)
Keep in mind, a CROSS JOIN (or a FULL OUTER JOIN) is a join that says take all fields from the left side and create a single row combination that is multiplied by the right side, so for every row on the left, return a combination of the left row with every row on the right. So the number of rows returned in a CROSS JOIN is the number of rows in the current table, multiplied by the number of rows in the joined table.
In your query, an INNER JOIN or LEFT INNER JOIN will produce a recordset that includes all the fields from the current table structure and will include fields from the joined table as well.
the implicit LEFT component specifies that only records that match the existing table structure should be returned, so in this case only Payment records that match a customer_id in the currently not filtered customer table will be returned.
The number of resulting rows is the number of rows in the joined table that have a match in the current table.
If instead you want to query:
Select all the customers that have payments
then you can use a join, but you should also use a DISTINCT clause, to only return the unique records:
SELECT DISTINCT customer.customer_id
, first_name
, last_name
FROM customer
INNER JOIN payment ON Customer.customer_id = Payment.customer_id
An alternative way to do this is to use a sub-query instead of a join:
SELECT customer_id
, first_name
, last_name
FROM customer
WHERE EXISTS (SELECT customer_id FROM payment WHERE payment.customer_id = customer.customer_id)
The rules on when to use one style of query over the other are pretty convoluted and very dependant on the number of rows in each table, the types of indexes that are available and even the type or version of the RDBMS you are operating within.
To optimise, run both queries, compare the results and timing and use the one that fits your database better. If later performance becomes an issue, try the other one :)
Select the Customer_id field

Select distinct record with join count records

I have two tables: Company and Contact, with a relationship of one-to-many.
I have another table Track which identifies some of the companies as parent companies to other companies.
I want to write a SQL query that selects the parent companies from Track and the amount of contacts that each parent has.
SELECT Track.ParentId, Count(Contact.companyId)
FROM Track
INNER JOIN Contact
ON Track.ParentId = Contact.companyId
GROUP BY Track.ParentId
however The result holds less records than when I run the following query:
SELECT DISTINCT Track.ParentId
FROM Track
I tried the first query with an added DISTINCT and it returned the same results (less then what it was meant to).
You're performing an INNER JOIN with the Contact table, which means that any rows from the first table (Track in this case) with no matches to the JOINed table will not show up in your results. Try using a LEFT OUTER JOIN instead.
The COUNT with Contact.companyId will only count rows where there is a match (Contact.companyId is not NULL). Since you're counting contacts that's fine as they will count as 0. If you were trying to count some other set of data and tried to do a COUNT on a specific column (rather than COUNT(*)) then any NULL values in that column would not count towards your total, which might or might not be what you want.
I used an INNER JOIN which returns only records that are identical in both tables.
To return all records from Track table, and records that match in the Contact table, I need to use LEFT JOIN.

Inner join sql statement

I have two tables, Invoices and members, connected by PK/FK relationship through the field InvoiceNum. I have created the following sql and it works fine, and pulls 44 records as expected.
SELECT
INVOICES.InvoiceNum,
INVOICES.GroupNum,
INVOICES.DivisionNum,
INVOICES.DateBillFrom,
INVOICES.DateBillTo
FROM INVOICES
INNER JOIN MEMBERS ON INVOICES.InvoiceNum = MEMBERS.InvoiceNum
WHERE MEMBERS.MemberNum = '20032526000'
Now, I want to replace INVOICES.GroupNum and INVOICES.DivisionNum in the above query with GroupName and DivisionName. These values are present in the Groups and Divisions tables which also have the corresponding Group_num and Division_num fields. I have created the following sql. The problem is that it now pulls 528 records instead of 44!
SELECT
INVOICES.InvoiceNum,
INVOICES.DateBillFrom,
INVOICES.DateBillTo,
DIVISIONS.DIVISION_NAME,
GROUPS.GROUP_NAME
FROM INVOICES
INNER JOIN MEMBERS ON INVOICES.InvoiceNum = MEMBERS.InvoiceNum
INNER JOIN GROUPS ON INVOICES.GroupNum = GROUPS.Group_Num
INNER JOIN DIVISIONS ON INVOICES.DivisionNum = DIVISIONS.Division_Num
WHERE MEMBERS.MemberNum = '20032526000'
Any help is greatly appreciated.
You have at least one relation between your tables which is missing in your query. It gives you extra records. Find all common fields. Say, are divisions related to groups?
The statement is fine, as far as the SQL syntax goes.
But the question you have to ask yourself (and answer it):
How many rows in Groups do you get for that given GroupNum?
Ditto for Divisions - how many rows exist for that DivisionNum?
It would appear that those numbers aren't unique - multiple rows exist for each number - therefore you get multiple rows returned

SQL - Multiple criteria with a LEFT OUTER JOIN

I am trying to do an OUTER JOIN, with multiple join conditions. Here is my query (I will explain issue below):
SELECT ad.*, cp.P_A, cp.P_B, cp.P_C
INTO #AggData3
FROM #AggData2 ad
LEFT OUTER JOIN #CompPriceTemp cp
ON ad.PART=cp.Part_No
and ad.[Month]=cp.[Month]
and ad.[Year]=cp.[Year]
GO
For each record in #AggData2, which is average price and volume by month for each part, I want to join the prices of the three competitors (A, B & C). Thus, I want to join based on Part, Month, and Year. Because some competitors don't offer all parts, I am using a LEFT OUTER JOIN. So, the resulting table (#AggData3), should have the exact same number of rows as the initial table (#AggData2), just with the three additional columns with competitor prices.
However, the new table (#AggData3), has ~35,000 more rows than #AggData2.
Any ideas why that is happening, and how to fix my query.
Because there are multiple rows in Table #CompPriceTemp that match to one row in #AggData2.
Is there one for each of three competitors perhaps? If that is so, then you need three joins, each to the same table, one for each of the 3 competitors?
But if there is supposed to be one row in #CompPriceTemp for each Month, Year, and product, with three separate columns one column for each competitor, then you have some bad data in there.
Wild guess:
ON ad.PART=cp.Part_No
and ad.[Month]=cp.[Month]
and ad.[Year]=cp.[Year]
This query does not uniquely identify rows in CP. Or CP has ~35000 duplicate rows.
Are you sure that you have only one matching row in CompPriceTemp for every single row in AggData2 ?