SQL Server query to select the highest quantity data row - sql

I have been trying to find a similar case. I found a lot, but I still can't figure it out to adopt to my query.
I have a testDB in SQL Server that has 3 tables, as shown in picture below:
I created query as below:
SELECT P.FirstName,
P.LastName,
O.ProductType,
PO.ProductName,
PO.Quantity
FROM Persons AS P
INNER JOIN Orders AS O ON P.PersonID = O.PersonID
INNER JOIN ProductOrders AS PO ON PO.OrderID = O.OrderID;
Current result, it shows all records from ProductOrders. See picture below:
I want the result that only shows, for each Person name only record with the highest quantity. My expected result as shown in picture below:
Thanks very much for your help.

SQL Server has the TOP WITH TIES/ROW_NUMBER() trick that does this very elegantly:
SELECT TOP (1) WITH TIES P.FirstName, P.LastName, O.ProductType, PO.ProductName, PO.Quantity
FROM Persons P INNER JOIN
Orders O
ON P.PersonID = O.PersonID INNER JOIN
ProductOrders PO
ON PO.OrderID = O.OrderID
ORDER BY ROW_NUMBER() OVER (PARTITION BY P.PersonId, P.ProductType ORDER BY PO.Quantity DESC);

Use Window functions:
SELECT distinct P.FirstName
, P.LastName
, O.ProductType
, first_value(O.ProductName) OVER (Partition By P.FirstName, P.LastName, O.ProductType Order by PO.Quantity desc) as [Productname]
, max(PO.Quantity) OVER (Partition By P.FirstName, P.LastName, O.ProductType) as [Quantity]
FROM Persons AS P
INNER JOIN Orders AS O ON P.PersonID = O.PersonID
INNER JOIN ProductOrders AS PO ON PO.OrderID = O.OrderID;

Related

Retrieving records with inner joins

My assignment is to get the the First name, Middle name and Last name for all Customers that have had an order before '2012-09-30' and after '2013-09-30'. I'm using the AdventureWorks2017 as a sample DB
Table: Sales.SalesOrderHeader
[SalesOrderID]
,[RevisionNumber]
,[OrderDate]
,[DueDate]
,[ShipDate]
,[Status]
,[OnlineOrderFlag]
,[SalesOrderNumber]
,[PurchaseOrderNumber]
,[AccountNumber]
,[CustomerID]
,[SalesPersonID]
,[TerritoryID]
,[BillToAddressID]
,[ShipToAddressID]
,[ShipMethodID]
,[CreditCardID]
,[CreditCardApprovalCode]
,[CurrencyRateID]
,[SubTotal]
,[TaxAmt]
,[Freight]
,[TotalDue]
,[Comment]
,[rowguid]
,[ModifiedDate]
Table: Person.Person
[BusinessEntityID]
,[PersonType]
,[NameStyle]
,[Title]
,[FirstName]
,[MiddleName]
,[LastName]
,[Suffix]
,[EmailPromotion]
,[AdditionalContactInfo]
,[Demographics]
,[rowguid]
,[ModifiedDate]
Table: Sales.Customers
[CustomerID]
,[PersonID]
,[StoreID]
,[TerritoryID]
,[AccountNumber]
,[rowguid]
,[ModifiedDate]
My Query
SELECT DISTINCT person_table.FirstName,
person_table.MiddleName,
person_table.LastName
FROM Sales.SalesOrderHeader as sales_order_table
inner join Sales.Customer as sales_customer_table
on (sales_customer_table.CustomerID = sales_order_table.CustomerID
and sales_order_table.OrderDate <= '2012-09-30' )
inner join Sales.Customer as sales_customer_table2
on (sales_customer_table2.CustomerID = sales_order_table.CustomerID
and sales_order_table.OrderDate >= '2013-06-30' )
inner join Sales.Customer as match_result
on (match_result.CustomerID = sales_customer_table2.CustomerID)
inner join Person.Person as person_table
on (person_table.BusinessEntityID = match_result.PersonID)
In this current state returns no rows and im unsure where the problem is
[UPDATE]
Found a relatevly good solution to the problem by editing Bilal Fakih answer
SELECT DISTINCT person_table.FirstName,
person_table.MiddleName,
person_table.LastName,
count(*) as Total_Instanses
FROM Sales.SalesOrderHeader as sales_order_table
inner join Sales.Customer as sales_customer_table
on (sales_customer_table.CustomerID = sales_order_table.CustomerID)
inner join Person.Person as person_table
on (person_table.BusinessEntityID = sales_customer_table.PersonID)
WHERE sales_order_table.OrderDate NOT BETWEEN '2012-09-30' AND '2013-06-30'
GROUP BY person_table.FirstName,
person_table.MiddleName,
person_table.LastName
HAVING count(*) >= 2
The suggestion was good but it woud return records that only had one instance. Im running into a few corner cases now. For example If a person has made 2 Orders that are bewfore 2012 or after 2013 will still be shown. The result im looking for is for a person to show up only when he has made orders before AND after the given dates
Try this, I'm not sure if it works I don't have the dataset to test, but it should
SELECT DISTINCT person_table.FirstName,
person_table.MiddleName,
person_table.LastName
FROM Sales.SalesOrderHeader as sales_order_table
inner join Sales.Customer as sales_customer_table
on (sales_customer_table.CustomerID = sales_order_table.CustomerID
inner join Person.Person as person_table
on (person_table.BusinessEntityID = match_result.PersonID)
WHERE sales_order_table.OrderDate NOT BETWEEN '2012-09-30' AND '2013-06-30'
You could simply this using below. Also your dates filter was not correct.
SELECT DISTINCT p.FirstName,
p.MiddleName,
p.LastName
FROM Sales.SalesOrderHeader as s
INNER JOIN Sales.Customer as c
ON c.CustomerID = s.CustomerID
INNER JOIN Person.Person as p
ON p.BusinessEntityID = c.PersonID)
WHERE s.OrderDate >= '2012-09-30' <----- add this
AND s.OrderDate <= '2013-06-30' ) ---- and this
My assignment is to get the the First name, Middle name and Last name for all Customers that have had an order before '2012-09-30' and after '2013-09-30'.
One method uses aggregation:
SELECT p.FirstName, p.MiddleName, p.LastName
FROM person_table p JOIN
Sales.Customer c
ON p.BusinessEntityID = c.PersonID JOIN
Sales.SalesOrderHeader so
ON c.CustomerID = so.Cus tomerID
GROUP BY p.FirstName, p.MiddleName, p.LastName
HAVING MIN(so.OrderDate) < '2020-09-30' AND
MAX(so.OrderDate) >'2013-06-30';
I will say that this condition looks suspicious:
ON p.BusinessEntityID = c.PersonID
However, that is what you use in your query. I would expect the person table to have an id called something like PersonId.

Is there a way to distinct multiple columns in sql?

Is there a way to distinct multiple columns? When I tried to do it with p.name it says that there is an error that occurred.
SELECT DISTINCT( V.NAME ),
POH.status,
poh.shipdate,
pod.orderqty,
POD.receivedqty,
POD.rejectedqty,
p.NAME
FROM purchasing.vendor v
INNER JOIN purchasing.productvendor pv
ON v.businessentityid = pv.businessentityid
INNER JOIN production.product p
ON pv.productid = P.productid
INNER JOIN purchasing.purchaseorderdetail POD
ON P.productid = POD.productid
INNER JOIN purchasing.purchaseorderheader POH
ON POD.purchaseorderid = POH.purchaseorderid
ORDER BY v.NAME,
p.NAME;
If you want one row per NAME, then you can use ROW_NUMBER():
with q as (
<your query here with columns renamed so there are no duplicates>
)
select q.*
from (select q.*,
row_number() over (partition by v_name order by v_name) as seqnum
from q
) q
where seqnum = 1;
DISTINCT is not a function, it is an operator and its scope is the entire SELECT clause
(The query formatting is just for emphasizing the point)
SELECT DISTINCT
V.NAME,
POH.status,
poh.shipdate,
pod.orderqty,
POD.receivedqty,
POD.rejectedqty,
p.NAME
FROM purchasing.vendor v
...
That answers the error you get, however, I doubt if this will give you the results you are looking for

Selecting the Id of the item with a MAX value when doing a left join

Each product in the database can have many revisions. Some products might not have any revisions at all.
ProductRevision table has the following fields: Id, Version, SubmitDate
I am trying to figure out how I can select a field called LatestRevisionId based on the MAX SubmitDate and if not revision then the field will be null
SELECT p.Id, p.Name, p.Price
FROM Product p
LEFT OUTER JOIN ProductRevision pr ON p.Id = pr.ProductId
Do I have to do a sub select in my select? I really want to try and use HAVING but can't figure out how to do it with a left join.
I was trying to do the following as a sub select:
(SELECT Id
FROM ProductRevision
WHERE ProductId=p.Id
HAVING SubmitDate=MAX(SubmitDate)
) AS LatestVersionId
Please note that I am using SQL SERVER 2008
Here's one option using row_number:
SELECT *
FROM (
SELECT p.Id, p.Name, p.Price, pr.id as LatestRevisionId,
row_number() over (partition by p.Id order by pr.SubmitDate desc) rn
FROM Product p
LEFT OUTER JOIN ProductRevision pr PN p.Id = pr.ProductId
) t
WHERE rn = 1
This will select a single Product with the latest matching row from the ProductRevision table.
If you just prefer to use max, then you need to join the table back to itself again:
SELECT p.Id, p.Name, p.Price, pr.id as LatestRevisionId
FROM Product p
LEFT OUTER JOIN ProductRevision pr PN p.Id = pr.ProductId
LEFT OUTER JOIN (SELECT ProductId, MAX(SubmitDate) MaxSubmitDate
FROM ProductRevision
GROUP BY ProductId) mpr ON pr.ProductId = mpr.ProductId AND
pr.SubmitDate = mpr.MaxSubmitDate
This could perhaps return duplicates though if multiple revisions share the same date.
if you are using SQL Server 2012 and aboe, below code will give you desired result.
SELECT DISTINCT p.Id, p.Name, p.Price, FIRST_VALUE(pr.ID) OVER (PARTITION BY p.Id ORDER BY pr.SubmitDate DESC) AS LatestVersionId
FROM Product p
LEFT OUTER JOIN ProductRevision pr ON p.Id = pr.ProductId
You can use a LEFT JOIN like this:
SELECT p.Id, p.Name, p.Price, pr.RevisionId as LatestRevisionId
FROM Product p LEFT OUTER JOIN
(SELECT pr.*,
ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY SubmitDate DESC) as seqnum
FROM ProductRevision pr
)
ON p.Id = pr.ProductId AND seqnum = 1;
If you want to aggregation other values, then just do:
SELECT p.Id, p.Name, p.Price,
MAX(CASE WHEN seqnum = 1 THEN pr.RevisionId END) as LatestRevisionId
FROM Product p LEFT OUTER JOIN
(SELECT pr.*,
ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY SubmitDate DESC) as seqnum
FROM ProductRevision pr
)
ON p.Id = pr.ProductId
GROUP BY p.Id, p.Name, p.Price;
This should be the simplest method to accomplish what you want.

How to write this query to display a COUNT with other fields

I have following two tables:
Person {PersonId, FirstName, LastName,Age .... }
Photo {PhotoId,PersonId, Size, Path}
Obviously, PersonId in the Photo table is an FK referencing the Person table.
I want to write a query to display all the fields of a Person , along with the number of photos he/she has in the Photo table.
A row of the result will looks like
24|Ryan|Smith|28|6
How to write such query in tsql?
Thanks,
You need a subquery in order to avoid having to repeat all the columns from Person in your group by clause.
SELECT
p.PersonId,
p.FirstName,
p.LastName,
p.Age,
coalesce(ph.PhotoCount, 0) as Photocount
FROM
Person p
LEFT OUTER JOIN
(SELECT PersonId,
COUNT(PhotoId) as PhotoCount
FROM Photo
GROUP BY PersonId) ph
ON p.PersonId = ph.PersonId
SELECT
p.PersonId,
p.FirstName,
p.LastName,
p.Age,
CASE WHEN
t.ThePhotoCount IS NULL THEN 0 ELSE t.ThePhotoCount END AS TheCount
--the above line could also use COALESCE
FROM
Person p
LEFT JOIN
(SELECT
PersonId,
COUNT(*) As ThePhotoCount
FROM
Photo
GROUP BY PersonId) t
ON t.PersonId = p.PersonID
SELECT P.PersonId, FirstName, LastName,Age, COUNT(PhotoId) AS Num
FROM Person P
LEFT OUTER JOIN PHOTO PH ON P.PersonId = PH.PersonId
GROUP BY P.PersonId, FirstName, LastName,Age
select Person.*, count(PhotoId) from Person left join Photo on Person.PersonId = Photo.PersonId
IMO GROUP BY should be the solution, something like this works for me even with other table joins:
SELECT meetings.id, meetings.location, meetings.date, COUNT( users.id ) AS attendees
FROM `meetings`
LEFT JOIN users ON meetings.id = users.meeting_id
WHERE meetings.moderator_id = 'XXX'
GROUP BY meetings.id

Compare subselect value with value in master select

In MS Access, I have a query where I want to use a column in the outer query as a condition in the inner query:
SELECT P.FirstName, P.LastName, Count(A.attendance_date) AS CountOfattendance_date,
First(A.attendance_date) AS FirstOfattendance_date,
(SELECT COUNT (*)
FROM(SELECT DISTINCT attendance_date
FROM tblEventAttendance AS B
WHERE B.event_id=8
AND B.attendance_date >= FirstOfattendance_date)
) AS total
FROM tblPeople AS P INNER JOIN tblEventAttendance AS A ON P.ID = A.people_id
WHERE A.event_id=8
GROUP BY P.FirstName, P.LastName
;
The key point is FirstOfattendance_date - I want the comparison deep in the subselect to use the value in each iteration of the master select. Obviously this doesn't work, it asks me for the value of FirstOfattendance_date when I try to run it.
I'd like to do this without resorting to VB code... any ideas?
How about:
SELECT
p.FirstName,
p.LastName,
Count(a.attendance_date) AS CountOfattendance_date,
First(a.attendance_date) AS FirstOfattendance_date,
c.total
FROM (
tblPeople AS p
INNER JOIN tblEventAttendance AS a ON
a.people_id = p.ID)
INNER JOIN (SELECT people_id, Count (attendance_date) As total
FROM (
SELECT DISTINCT people_id,attendance_date
FROM tblEventAttendance)
Group By people_id) AS c ON
p.ID = c.people_id
GROUP BY
p.ID, c.total;
Can you change
B.attendance_date >= FirstOfattendance_date
to
B.attendance_date >= First(A.attendance_date)