duplicated rows in select query

duplicated rows in select query - sql

I am trying to run a query to get all the customers from my database. These are my tables in a diagram :
when running the query by joining the table Companies_Customers and the Customers table based on the customerId in both tables(doesn't show in the join table in the pic), I get duplicate rows, which is not the desired outcome.
This is normal from a database standpoint since a Customer can be related to different companies (Companies can share single customer).
My question is how do I get rid of the duplication via SQL.

There can be 2 approaches to your problem.
Either only select data from Customers table:
SELECT * FROM Customers
Or select from both tables joined together, but without CompanyName and with GROUP BY CompanyCustomerId - although I highly suggest the first approach.

Related

Counting Unique IDs In a LEFT/RIGHT JOIN Query in Access

I am working on a database to track staff productivity. Two of the ways we do that is by monitoring the number of orders they fulfil and by tracking their error rate.
Each order they finish is recorded in a table. In one day they can complete many orders.
It is also possible for a single order to have multiple errors.
I am trying to create a query that provides a summary of their results. This query should have one column with "TotalOrders" and another with "TotalErrors".
I connect the two tables with a LEFT/RIGHT join since not all orders will have errors.
The problem comes when I want to total the number of orders. If someone made multiple mistakes on an order, that order gets counted multiple times; once for each error.
I want to modify my query so that when counting the number of orders it only counts records with distinct OrderID's; yet, in the same query, also count the total errors without losing any.
Is this possible?
Here is my SQL
SELECT Count(tblTickets.TicketID) AS TotalOrders,
Count(tblErrors.ErrorID) AS TotalErrors
FROM tblTickets
LEFT JOIN tblErrors ON tblTickets.TicketID = tblErrors.TicketID;
I have played around with SELECT DISTINCT and UNION but am struggling with the correct syntax in Access. Also, a lot of the examples I have seen are trying to total a single field rather than two fields in different ways.
To be clear when totalling the OrderCount field I want to only count records with DISTINCT TicketID's. When totalling the ErrorCount field I want to count ALL errors.
Ticket = Order.
Query Result: Order Count Too High
Ticket/Order Table: Total of 14 records
Error Table: You can see two errors for the same order on 8th

do a query that counts orders by staff from Orders table and a query that counts errors by staff from Errors table then join those two queries to Staff table (queries can be nested for one long SQL statement)
correlated subqueries
SELECT Staff.*,
(SELECT Count(*) FROM Orders WHERE StaffID = Staff.ID) AS CntOrders,
(SELECT Count(*) FROM Errors WHERE StaffID = Staff.ID) AS CntErrors
FROM Staff;
use DCount() domain aggregate function
Option 1 is probably the most efficient and option 3 the least.

MS Access SQL Query merging two different fields from separate tables shows reference ID's instead of actual values

I have two separate tables in my access database, which both use a third table as the reference for one particular field on each table. The data is entered onto the different tables by separate forms. Then I have several queries that then reference those particular fields that count and show unique values. Those queries show the actual values, then I created an sql query that does the same thing, only it shows the reference ID instead of the value in the actual field.
table ODI----------table CDN----------reference table
id RHA---------id CHA----------------id HA
1 blank----------1 radio---------------1 internet
2 internet-------2 tv------------------2 radio
3 referral-------3 radio---------------3 referral
4 tv-------------4 blank---------------4 repeat customer
5 blank----------5 internet------------5 tv
6 internet-------6 referral------------6 employee
7 referral-------7 referral------------7 social media
this is the code I am trying to make work.
SELECT m.[Marketing Results], Count(*) AS [Count]
FROM (SELECT RHA as [Marketing Results] FROM ODI
UNION ALL
SELECT CHA as [Marketing Results] FROM CDN) AS m
GROUP BY m.[Marketing Results]
HAVING (((m.[Marketing Results]) Is Not Null))
ORDER BY Count(*) DESC;
and what my desired result is,
Marketing Results--Counts
referral------------------4
internet------------------3
radio---------------------2
tv------------------------2

Lookup fields with alias don't show what is actually stored in table. ID is stored, not descriptive alias. Lookup alias will carry into regular queries but Union query only pulls actual stored values. At some point need to include reference table in query by joining on key fields in order to retrieve descriptive alias. Options:
in each UNION query SELECT line, join tables
join UNION query to reference table
join aggregate query to reference table
Most experienced developers will not build lookups in table because of confusion they cause. Also, they are not portable to other database platforms. http://access.mvps.org/Access/lookupfields.htm

How do you JOIN tables to a view using a Vertica DB?

Good morning/afternoon! I was hoping someone could help me out with something that probably should be very simple.
Admittedly, I’m not the strongest SQL query designer. That said, I’ve spent a couple hours beating my head against my keyboard trying to get a seemingly simple three way join working.
NOTE: I'm querying a Vertica DB.
Here is my query:
SELECT A.CaseOriginalProductNumber, A.CaseCreatedDate, A.CaseNumber, B.BU2_Key as BusinessUnit, C.product_number_desc as ModelNumber
FROM pps_sfdc.v_Case A
INNER JOIN reference_data.DIM_PRODUCT_LINE_HIERARCHY B
ON B.PL_Key = A.CaseOriginalProductLine
INNER JOIN reference_data.DIM_PRODUCT C
ON C.product_line_code = A.CaseOriginalProductLine
WHERE B.BU2_Key = 'XWT'
LIMIT 20
I have a view (v_Case) that I’m trying to join to two other tables so I can lookup a value from each of them. The above query returns identical data on everything EXCEPT the last column (see below). It's like it's iterating through the last column to pull out the unique entries, sort of like a "GROUP BY" clause. What SHOULD be happening is that I get unique rows with specific "BusinessUnit" and "ModelNumber" for that record.
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 1
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 2
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 3
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 4
I modeled my solution after this post:
How to deal with multiple lookup tables for beginners of SQL?
What am I doing wrong?
Thank you for any help you can provide.

Data issue. General rule in trouble shooting these is the column that is distinct (in this case C.product_number_desc as ModelNumber) for each record is generally where the issue is going to be...and why I pointed you towards dim_product.
If you receive duplicates, this query below will help identify if this table is giving you the issues. Remember key in this statement can be multiple fields...whatever you are joining the table on:
Select key,count(1) from table group by key having count(1)>1
Other options for the future...don't assume it's your code, duplicates like this almost always point towards dirty data (other option is you are causing cross joins because keys are not correct). If you comment out the 'c' table and the column referred to in the select clause, you would have received one row...hence your dupes were coming from the 'c' table here.
Good luck with it

Translate SQL query to absolute beginner (me)

I am trying to understand the JOIN command from w3schools tutorial and there is an example.
Can you please "translate" it to me what exactly it does? I already know what do the dots do, but INNER JOIN, ON and so on messed me up?
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
And does it creates a new table in my SQL database or it just creates (lets call it..) "virtual" table which I can use it in the moment

The dot notation here signifies the column of a table.
Table.Column
So the SELECT is retrieving the columns specified from each table.
JOIN matches the tables based on the condition that appears after ON
this queries joins customers to orders based the condition of having the same CustomerID.
For more on joins check out http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html

Perhaps the first thing to understand is the "dot notation." When you see a value like Orders.CustomerID SQL will read that to mean the CustomerID column within the Orders table.
The INNER JOIN is looking for an exact match for the records you specify with the ON clause...
So, in this example, SQL will first find all the records in the Orders table which have a CustomerID that matches the CustomerID found in the Customers table.
Then, it will look to the SELECT clause and show you the OrderID (from the Orders table), the CustomerName (from the Customers table) and the OrderDate from the Orders table.
If you're using SQL Server Management Studio (SSMS) or some other similar tool, it should display your results - kindof like a virtual table... but it doesn't actually create a table.
Hope that helps,
Russell

There are INNER AND OUTER JOINS. I think this post sums up those differences well.
What is the difference between "INNER JOIN" and "OUTER JOIN"?
As far as the concept itself you are 'joining' the tables by finding things in common between them based on a shared value in each. If you have ever done a vlookup with Microsoft Excel then you are familiar with this concept. In the query you've posted you are selecting values from both the Orders and Customers table. You can tell which table you are selecting values from by the prefix Orders. or Customers. You can then tell which column by the value following the dot i.e. OrderID, CustomerName, OrderDate.
The way the query knows to pull a conjoined field is if the CustomerID field matches.
So step by step
You are selecting
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
From the table Orders
FROM Orders
You wish to pull corresponding values from Customers. There are numerous join methods and logic that goes with each but you have chosen INNER JOIN explained in the link above.
INNER JOIN Customers
The criteria for joining is CustomerID
ON Orders.CustomerID=Customers.CustomerID;
To use an approximate Excel Vlookup analogy
The Lookup_value is CustomerID
The Table array is ORDERS
The range lookup is the type of join you are performing

SELECT statement returns data, it doesn't create any table not even virtual table whatsoever (if you see any table after executing SELECT statement, thats just a way the returned data presented). Create and modify table done using DDL (Data Definition Language) statements when SELECT is one of DML (Data Manipulation Language) [reference].
Join statement explained very well here, from basic use to more advance with illustration.

Selecting Columns from 3 Inner Joins - still getting cartesian product. S

I have 3 tables. They are inner joined. When I choose to select from all three tables I get a cartesian product.
I've used distinct and I've tried cross apply with top 1. Top 1 brings back the right amount of records but it repeats the fields used in that select top N.
Basic question. Can you select from 3 different tables and avoid a cartesian result? I can have all three tables joined and with distinct I can get records from two of the tables without a cartesian. It is when I choose to select from a third is where the cartesian appears.
If this is possible, what other tsql commands/constructs should I be experimenting with?
http://imageshack.us/f/255/50353790.png/
SELECT CRT.[TransactionID]
,CRT.[creditrewardsID]
,CRT.[OwnerID]
, CRT.[TransactionDate]
,CRT.[ItemID]
,CRT.[VALUE]
,CRM.First
,CRM.MI
,CRM.Last
,CTI.fn
,CTI.ln
FROM [ownership].[dbo].[creditrewardsTransactions] CRT
Join [ownership].[dbo].[creditrewardsMembers] CRM
on CRT.creditrewardsid = CRM.[creditrewardsID]
Join [Exchange].[dbo].[CreditTourInfo] CTI
on CRM.CRMemberNum = CTI.PRIMECRPNum
--where CRT.creditrewardsID = 11111

To answer your "basic" question, yes you can easily join 3 tables and not receive a Cartesian product. However, you need to ensure you have proper One-to-One or One-to-Many joins.
The issue above likely has to do with the data you have in the creditrewardsMembers and CreditTourInfo tables. Joining the CRMemberNum and PRIMECRPNum fields is likely a Many-to-Many join, which will give you a Cartesian product.
You can run these SQL statements to confirm. If they both return records, then you have a Many-to-Many join issue.
SELECT CRM.CRMemberNum, COUNT(*)
FROM [ownership].[dbo].[creditrewardsMembers] CRM
GROUP BY CRM.CRMemberNum
HAVING COUNT(*) > 1
SELECT CTI.PRIMECRPNum, COUNT(*)
FROM [Exchange].[dbo].[CreditTourInfo] CTI
GROUP BY CTI.PRIMECRPNum
HAVING COUNT(*) > 1
If the fields are in fact the correct ones to join, then you may need to just pick the first record from one of the tables in your query like this.
SELECT CRT.[TransactionID]
,CRT.[creditrewardsID]
,CRT.[OwnerID]
,CRT.[TransactionDate]
,CRT.[ItemID]
,CRT.[VALUE]
,CRM.First
,CRM.MI
,CRM.Last
,CTI.fn
,CTI.ln
FROM [ownership].[dbo].[creditrewardsTransactions] CRT
JOIN [ownership].[dbo].[creditrewardsMembers] CRM
ON CRT.creditrewardsid = CRM.[creditrewardsID]
JOIN (
SELECT PRIMECRPNum, MIN(fn) as [fn], MIN(ln) as [ln]
FROM [Exchange].[dbo].[CreditTourInfo]
GROUP BY PRIMECRPNum
) CTI
ON CRM.CRMemberNum = CTI.PRIMECRPNum
Ultimately, it's going to depend on the limitations of your specific data model and the tradeoffs you may need to make. (ie. getting the first record only from [CreditTourInfo])

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas