Remove Duplicates from SQL Query - sql

I am using this query currently to trigger a customer event. But sometimes the same customer will be in the results because they are assigned a different WorkOrderId, which is my primary filter. So, I want to expand the filter to also look for a unique CustomerName. In other words, if the query returns two rows with the same CustomerName then I want it to exclude the 2nd row altogether.
SELECT CustomerName, JobId, Email, CCEmail, WorkOrderId AS id
FROM dbo.vwWorkOrderDetail
WHERE JobStatusId=3 AND Active=1 AND LocationStopTypeId=1
ORDER BY WorkOrderId DESC
I've tried using DISTINCT but I continue to get results that include the same CustomerName in different rows. How can I setup this query so it returns results that passed all of the WHERE conditions and then only shows rows with a unique CustomerName?

As long as you include WorkOrderId, DISTINCT will do nothing for you. DISTINCT can only eliminate duplicates where all of the columns specified in the SELECT contain the same information. So to use DISTINCT to eliminate duplicate customers, you would need to do this:
SELECT DISTINCT CustomerName, JobId, Email, CCEmail
FROM dbo.vwWorkOrderDetail
WHERE JobStatusId=3 AND Active=1 AND LocationStopTypeId=1
ORDER BY WorkOrderId DESC
The best way to approach this to preserve a WorkOrderId is to make a new view based on the underlying tables. You will need to decide what WorkOrderId of the available WorkOrderIds you want to present. Typically this is the highest ID. If all you need is the WorkOrderId itself and not the details, this is actually pretty simple. Note the code below is a naïve example that assumes CustomerId is tied directly to a work order. To really answer this properly you'd need to provide the code for vwWorkOrderDetail.
SELECT CustomerName, JobId, Email, CCEmail, (SELECT MAX(WorkOrderId) FROM WorkOrders WHERE CustomerID = Customers.CustomerID) AS WorkOrderID
FROM Customers
WHERE JobStatusId=3 AND Active=1 AND LocationStopTypeId=1
ORDER BY WorkOrderId DESC

SELECT CustomerName, JobId, Email, CCEmail, max(WorkOrderId) AS id
FROM dbo.vwWorkOrderDetail
WHERE JobStatusId=3 AND Active=1 AND LocationStopTypeId=1
GROUP BY CustomerName, JobId, Email, CCEmail

Related

Query GROUP BY and COUNT

I'm new to SQL and taking COURSERA's "SQL for Data Science" course.I have the following question in a summary assignment:
Show the number of orders placed by each customer and sort the result by the number of orders in descending order.
Having failed to write the correct code, the answer would be as follows (of course one of several options):
SELECT *
,COUNT (InvoiceId) AS number_of_orders
FROM Invoices
GROUP BY CustomerId
ORDER BY number_of_orders DESC
I am still having trouble understanding the query logic. I would appreciate your assistance in understanding this query.
I seriously hope that Coursera isn't giving you the query you cited above as the recommended answer. It won't run on most databases, and even in cases such as MySQL where it might run, it is not completely correct. You should be using this version:
SELECT CustomerId, COUNT (InvoiceId) AS number_of_orders
FROM Invoices
GROUP BY CustomerId
ORDER BY number_of_orders DESC;
A basic rule of GROUP BY is that the only columns available for selection are those which appear in the GROUP BY clause. In addition to these columns, aggregates of any column(s) may also appear in the select. The version I gave you above follows these rules, and is ANSI compliant, meaning it would run on any database.
When you say SELECT * it represents ALL COLUMNS. But you are grouping by only CustomerId which is wrong in SQL.
Specify the other columns in the group section that you want to show
The script should be something like
SELECT CustomerName, DateEntered
,COUNT (InvoiceId) AS number_of_orders
FROM Invoices
GROUP BY CustomerId, CustomerName, DateEntered
ORDER BY number_of_orders DESC

MS Access Count unique values of one table appearing in second table which is related to a third table

I am working with my lab database and close to complete it. But i am stuck in a query and a few similar queries which all give back the similar results.
Here is the Query in design mode
and this is what it gives out
This query is counting the number of ID values in table PatientTestIDs whereas I want to count the number of unique PatientID values grouped by each department
I have even tried Unique Values and Unique Records properties but all the times it gives the same result.
What you want requires two queries.
Query1:
SELECT DISTINCT PatientID, DepartmentID FROM PatientTestIDs;
Query2:
SELECT Count(*) AS PatientsPerDept, DepartmentID FROM Query1 GROUP BY DepartmentID;
Nested all in one:
SELECT Count(*) AS PatientsPerDept, DepartmentID FROM (SELECT DISTINCT PatientID, DepartmentID FROM PatientTestIDs) AS Query1 GROUP BY DepartmentID;
You can include the Departments table in query 2 (or the nested version) to pull in descriptive fields but will have to include those additional fields in the GROUP BY.

ORDER BY + IN statement?

The marketing department wants to focus on the customers from Noth America first. Get the ID, last name and country of all customers. Build the list by bringing up the customers living in Canada and in the USA first, and finally order them by ID. Tip: use the IN expression in the ORDER BY clause.
I've tried many times
SELECT CustomerID, LastName, Country
FROM Customer
ORDER BY Country IN ('Canada', 'USA'), CustomerID
but it doesn't seem to work, as it takes the specified fields from all customers in the order they appear in the original table (which is also ordered by ID), even if I remove CustomerID from the ORDER BY clause, whithout caring to which country they belong to.
What should I do? I'm really new to SQL, and have no idea on how to fix this.
Edit: WHERE ins't suitable at all, as I need to take in consideration all customers, only making sure the Canadian and American ones appear at the top of the list.
Also I'm unsure statements like UNION, AS, EXCEPT and things like that are meant to be used, because the tutorial didn't go that deep already.
Not every DBMS has a boolean datatype. So the result of
Country IN ('Canada', 'USA'),
which is a boolean, can not be sorted in these DBMS.
You can use a CASE expression, however, to assign a value:
SELECT CustomerID, LastName, Country
FROM Customer
ORDER BY CASE WHEN Country IN ('Canada', 'USA') THEN 1 ELSE 2 END, CustomerID;
SELECT CustomerID, LastName, Country
FROM Customer
ORDER BY Country IN ('Canada', 'USA') desc, CustomerID asc
IN expression don't return value so you can't sort
You can try:
SELECT CustomerID, LastName, Country
FROM Customer
WHERE Country='Canada'
UNION ALL
SELECT CustomerID, LastName, Country
FROM Customer
WHERE Country='USA'
ORDER BY CustomerID
Using ORDER BY with UNION, EXCEPT, and INTERSECT When a query uses the
UNION, EXCEPT, or INTERSECT operators, the ORDER BY clause must be
specified at the end of the statement and the results of the combined
queries are sorted. The following example returns all products that
are red or yellow and sorts this combined list by the column
ListPrice.
https://msdn.microsoft.com/en-us/library/ms188385.aspx#Union

how to use distinct in ms access

I have two tables. Task and Categories.
TaskID is not a primary key as there are duplicate values.When there are multiple contacts are selected for a specific task,taskid and other details will be duplicated.I wrote the query:
SELECT Priority, Subject, Status, DueDate, Completed, Category
FROM Task, Categories
WHERE Categories.CategoryID=Task.CategoryID;
Now as multiple contacts are selected for that task,for the taskid=T4, there are two records(highlighted with gray). I have tried using distinct in ms access 2003 but its not working. I want to display distinct records. (Here there's no requirement to show taskid) If I write :
select priority, distinct(subject), .......
and remaining same as mentioned in above query then its giving me an error. I have tried distinctrow also.But didnt get success. How to get distinct values in ms access?
Okay.Its working this way.
SELECT DISTINCT Task.Priority, Task.Subject, Task.Status, Task.DueDate,
Task.Completed, Categories.Category
FROM Task, Categories
WHERE (((Categories.CategoryID)=[Task].[CategoryID]));
I don't like using SELECT DISTINCT, I have found that it makes my code take longer to compile. The other way I do it is by using GROUP BY.
SELECT Priority, Subject, Status, DueDate, Completed, Category
FROM Task, Categories
WHERE Categories.CategoryID=Task.CategoryID
GROUP BY Subject;
I do not have VBA up at the moment but this should work as well.
Using SELECT DISTINCT will work for you, but a better solution here would be to change your database design.
Duplicate records may lead to inconsistent data. For example, imagine having two different status in different records with the same TaskID. Which one would be right?
A better design would include something like a Task table, a Contact table and an Assignment table, as follows (the fields in brackets are the PK):
Tasks: [TaskID], TaskPriority, Subject, Status, DueDate, Completed, StartDate, Owner, CategoryID, ContactID, ...
Contact: [ID], Name, Surname, Address, PhoneNumber, ...
Assignment: [TaskID, ContactID]
Then, you can retrieve the Tasks with a simple SELECT from the Tasks tables.
And whenever you need to know the contacts assigned to a Tasks, you would do so using the JOIN clause, like this
SELECT T.*, C.*
FROM TaskID as T
INNER JOIN Assignment as A
ON T.TaskID = A.TaskID
INNER JOIN Contac as C
ON A.ContactID = C.ID
Or similar. You can filter, sort or group the results using all of SQL's query power.

Return all Fields and Distinct Rows

Whats the best way to do this, when looking for distinct rows?
SELECT DISTINCT name, address
FROM table;
I still want to return all fields, ie address1, city etc but not include them in the DISTINCT row check.
Then you have to decide what to do when there are multiple rows with the same value for the column you want the distinct check to check against, but with different val;ues in the other columns. In this case how does the query processor know which of the multiple values in the other columns to output, if you don't care, then just write a group by on the distinct column, with Min(), or Max() on all the other ones..
EDIT: I agree with comments from others that as long as you have multiple dependant columns in the same table (e.g., Address1, Address2, City, State ) That this approach is going to give you mixed (and therefore inconsistent ) results. If each column attribute in the table is independant ( if addresses are all in an Address Table and only an AddressId is in this table) then it's not as significant an issue... cause at least all the columns from a join to the Address table will generate datea for the same address, but you are still getting a more or less random selection of one of the set of multiple addresses...
This will not mix and match your city, state, etc. and should give you the last one added even:
select b.*
from (
select max(id) id, Name, Address
from table a
group by Name, Address) as a
inner join table b
on a.id = b.id
When you have a mixed set of fields, some of which you want to be DISTINCT and others that you just want to appear, you require an aggregate query rather than DISTINCT. DISTINCT is only for returning single copies of identical fieldsets. Something like this might work:
SELECT name,
GROUP_CONCAT(DISTINCT address) AS addresses,
GROUP_CONCAT(DISTINCT city) AS cities
FROM the_table
GROUP BY name;
The above will get one row for each name. addresses contains a comma delimted string of all the addresses for that name once. cities does the sames for all the cities.
However, I don't see how the results of this query are going to be useful. It will be impossible to tell which address belongs to which city.
If, as is often the case, you are trying to create a query that will output rows in the format you require for presentation, you're much better off accepting multiple rows and then processing the query results in your application layer.
I don't think you can do this because it doesn't really make sense.
name | address | city | etc...
abc | 123 | def | ...
abc | 123 | hij | ...
if you were to include city, but not have it as part of the distinct clause, the value of city would be unpredictable unless you did something like Max(city).
You can do
SELECT DISTINCT Name, Address, Max (Address1), Max (City)
FROM table
Use #JBrooks answer below. He has a better answer.
Return all Fields and Distinct Rows
If you're using SQL Server 2005 or above you can use the RowNumber function. This will get you the row with the lowest ID for each name. If you want to 'group' by more columns, add them in the PARTITION BY section of the RowNumber.
SELECT id, Name, Address, ...
(select id, Name, Address, ...,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY id) AS RowNo
from table) sub
WHERE RowNo = 1