Why do repeated values appear in SQL results - sql

I'm with a doubt about joins. For example, using an example database dvdrental, this query:
SELECT customer.customer_id
, first_name
, last_name
FROM customer
INNER JOIN payment ON Customer.customer_id = Payment.customer_id
Some records appear repeated, for example, it appears 3 times "342 Harold Martino" like:
342 Harold Martino
342 Harold Martino
342 Harold Martino
Do you know why it appears repeated records like in this example that appears the same Record 3 times? This repetition means that there are 3 records in the payment table where customer_id = 342? But this query "select * from payment where customer_id = 342" returns 32 records. So I'm not understanding properly how the join works.

There are many resources around this, so to be short your expression says this in plain english:
Get all the records from the customer table
Then for each of those records, get every payment record that has the same value in the customer_Id field.
return a single row for each payment record that duplicates all the fields from the customer record for each row in the payment record.
Finally, only return 3 columns:
the customer_id column from the customer table
the first_name column that is in one of the customer or payment table
the last_name column that is in one of the customer or payment table
Note that we didn't bring back any columns from the payment table... (I assume first_name and last_name are fields in the customer table...)
Keep in mind, a CROSS JOIN (or a FULL OUTER JOIN) is a join that says take all fields from the left side and create a single row combination that is multiplied by the right side, so for every row on the left, return a combination of the left row with every row on the right. So the number of rows returned in a CROSS JOIN is the number of rows in the current table, multiplied by the number of rows in the joined table.
In your query, an INNER JOIN or LEFT INNER JOIN will produce a recordset that includes all the fields from the current table structure and will include fields from the joined table as well.
the implicit LEFT component specifies that only records that match the existing table structure should be returned, so in this case only Payment records that match a customer_id in the currently not filtered customer table will be returned.
The number of resulting rows is the number of rows in the joined table that have a match in the current table.
If instead you want to query:
Select all the customers that have payments
then you can use a join, but you should also use a DISTINCT clause, to only return the unique records:
SELECT DISTINCT customer.customer_id
, first_name
, last_name
FROM customer
INNER JOIN payment ON Customer.customer_id = Payment.customer_id
An alternative way to do this is to use a sub-query instead of a join:
SELECT customer_id
, first_name
, last_name
FROM customer
WHERE EXISTS (SELECT customer_id FROM payment WHERE payment.customer_id = customer.customer_id)
The rules on when to use one style of query over the other are pretty convoluted and very dependant on the number of rows in each table, the types of indexes that are available and even the type or version of the RDBMS you are operating within.
To optimise, run both queries, compare the results and timing and use the one that fits your database better. If later performance becomes an issue, try the other one :)
Select the Customer_id field

Related

Left Join misbehaving - VBA SQL

Table 1: Customer Info Data (has ~50K records)
Table 2: Sales by customer ID (has ~25K records)
When I am performing a left join of Sales data from Table 2 on Table 1 based on the Customer ID, the output has a handful of records (~500) with a unit increase in the sales quantity. For e.g. sales quantity for customer #1 is 200, the sale quantity that I am getting in my joined output is 201. Note again that this is only for handful of records, for the majority of the data, it is joined absolutely correctly.
The SQL query is a pretty standard one:
SELECT [Table1$].[ID], Name, Volume, Amount
FROM [Table1$] LEFT JOIN [Table2$] ON [Table1$].[ID] = [Table2$].[ID]
What is weird is that this error is only for a few records and it is changing only by a unit for all of them. What do you think could be the potential reason here? Note that I run this SQL query from VBA.
Edit:
Maybe if I add an image, it would help, I have masked my data:
Table1, blue colored column is the joined column, I have filtered for the ID where there is a problem:
Table2, source of the joined column, as you can see it is 20, but the value that has been joined with table 1 is 21. Interestingly, there is no ID with value 21 in table2
There is certainly a space or something that makes seemingly identical IDs different. To make sure do a select distinct id on the sales table.

getting duplicates when joining tables

I have two tables that I want to join. Table1 has sales order, but it doesn’t have the name of the sales person. It only has employee ID. I have table2, that has the names of employees, and employeeID is common between the two tables. Normally I would use an inner join to get the name of the sales person from table2. The problem is that on table2, there are multiple entries for each employee. If they changed manager, or changed roles within the company, or perhaps went on FMLA, it creates a new row. Therefore, when I join the tables, it creates duplicates because of the multiple entries in table2. A sale shows 3 or 4 times in my results.
Select
a.state_name
,order_number
,a.employeeID
,b.Sales_Rep_Name
,a.order_date
from
table1 as A
Inner join table2 as B
On a.employeeid = b.employeeID
where
b.monthperiod = 'November' <-- If I remove this one it adds duplicates
Is there a way to not get these duplicates? I tried distinct but didn’t work. Probably because the rows have at least one column different. I was able to eliminate the duplicates when I added a where clause asking for last month on table 2, but I am in a situation where I need all months, not just one. I have to manually change the month in order to get the full year.
Any help would be appreciated. Thanks
Use a subquery to get list of distinct employee records and then query the sales table

Select distinct record with join count records

I have two tables: Company and Contact, with a relationship of one-to-many.
I have another table Track which identifies some of the companies as parent companies to other companies.
I want to write a SQL query that selects the parent companies from Track and the amount of contacts that each parent has.
SELECT Track.ParentId, Count(Contact.companyId)
FROM Track
INNER JOIN Contact
ON Track.ParentId = Contact.companyId
GROUP BY Track.ParentId
however The result holds less records than when I run the following query:
SELECT DISTINCT Track.ParentId
FROM Track
I tried the first query with an added DISTINCT and it returned the same results (less then what it was meant to).
You're performing an INNER JOIN with the Contact table, which means that any rows from the first table (Track in this case) with no matches to the JOINed table will not show up in your results. Try using a LEFT OUTER JOIN instead.
The COUNT with Contact.companyId will only count rows where there is a match (Contact.companyId is not NULL). Since you're counting contacts that's fine as they will count as 0. If you were trying to count some other set of data and tried to do a COUNT on a specific column (rather than COUNT(*)) then any NULL values in that column would not count towards your total, which might or might not be what you want.
I used an INNER JOIN which returns only records that are identical in both tables.
To return all records from Track table, and records that match in the Contact table, I need to use LEFT JOIN.

SQL, Select from table A and use result ID to loop though table B?

I have a windows server running MS-SQL 2008.
I have a customer table that I need to select the id's of all customers that are Active, On Hold, or Suspended.
get all customers that are Y= current active customer H=on hold S=Suspended
select id from customer where active = 'Y';
the above statement worked fine for selecting the ID's of the affected customers.
I need to use these results to loop though the following command in order to find out what rates all the affected customers have.
get all rates for a customer
select rgid from custrate where custid = [loop though changing this id with results from first statement];
the id from the customer table coincides with the custid from the custrate table.
So in the end I need a list of all affected customer id's and what rgid's(rate group(s)) they have.
SQL isn't about loops in general and instead you should think in terms of joins.
select customer.id, custrate.rgid, customer.active
from customer
inner join custrate
on customer.id = custrate.custid
where active in ('Y', 'H', 'S")
order by customer.active, customer.id
would be a starting point to think about. However, that is just a wild guess as the schema was not specified nor the relations between the tables.

Joining table issue with SQL Server 2008

I am using the following query to obtain some sales figures. The problem is that it is returning the wrong data.
I am joining together three tables tbl_orders tbl_orderitems tbl_payment. The tbl_orders table holds summary information, the tbl_orderitems holds the items ordered and the tbl_payment table holds payment information regarding the order. Multiple payments can be placed against each order.
I am trying to get the sum of the items sum(mon_orditems_pprice), and also the amount of items sold count(uid_orderitems).
When I run the following query against a specific order number, which I know has 1 order item. It returns a count of 2 and the sum of two items.
Item ProdTotal ProdCount
Westvale Climbing Frame 1198 2
This order has two payment records held in the tbl_payment table, which is causing the double count. If I remove the payment table join it reports the correct figures, or if I select an order which has a single payment it works as well. Am I missing something, I am tired!!??
SELECT
txt_orditems_pname,
SUM(mon_orditems_pprice) AS prodTotal,
COUNT(uid_orderitems) AS prodCount
FROM dbo.tbl_orders
INNER JOIN dbo.tbl_orderitems ON (dbo.tbl_orders.uid_orders = dbo.tbl_orderitems.uid_orditems_orderid)
INNER JOIN dbo.tbl_payment ON (dbo.tbl_orders.uid_orders = dbo.tbl_payment.uid_pay_orderid)
WHERE
uid_orditems_orderid = 61571
GROUP BY
dbo.tbl_orderitems.txt_orditems_pname
ORDER BY
dbo.tbl_orderitems.txt_orditems_pname
Any suggestions?
Thank you.
Drill down Table columns
dbo.tbl_payment.bit_pay_paid (1/0) Has this payment been paid, yes no
dbo.tbl_orders.bit_order_archive (1/0) Is this order archived, yes no
dbo.tbl_orders.uid_order_webid (integer) Web Shop's ID
dbo.tbl_orders.bit_order_preorder (1/0) Is this a pre-order, yes no
YEAR(dbo.tbl_orders.dte_order_stamp) (2012) Sales year
dbo.tbl_orders.txt_order_status (varchar) Is the order dispatched, awaiting delivery
dbo.tbl_orderitems.uid_orditems_pcatid (integer) Product category ID
It's a normal behavior, if you remove grouping clause you'll see that there really are 2 rows after joining and they both have 599 as a mon_orditems_pprice hence the SUM is correct. When there is a multiple match in any joined table the entire output row becomes multiple and the data that is being summed (or counted or aggregated in any other way) also gets summed multiple times. Try this:
SELECT txt_orditems_pname,
SUM(mon_orditems_pprice) AS prodTotal,
COUNT(uid_orderitems) AS prodCount
FROM dbo.tbl_orders
INNER JOIN dbo.tbl_orderitems ON (dbo.tbl_orders.uid_orders = dbo.tbl_orderitems.uid_orditems_orderid)
INNER JOIN
(
SELECT x.uid_pay_orderid
FROM dbo.tbl_payment x
GROUP BY x.uid_pay_orderid
) AS payments ON (dbo.tbl_orders.uid_orders = payments.uid_pay_orderid)
WHERE
uid_orditems_orderid = 61571
GROUP BY
dbo.tbl_orderitems.txt_orditems_pname
ORDER BY
dbo.tbl_orderitems.txt_orditems_pname
I don't know what data from tbl_payment you are using, are any of the columns from the SELECT list actually from tbl_payment? Why is tbl_payment being joined?