SQL Server 2012 -- loop keys - sql

I have written a query which left joins separate tables and attempts to discover at a point in time (the point of insertion) which PK keys are the newest in a db for a given record.
you will see i have pared it back to patientid 100 as this is the only way i can seem to get it working.
The current query works as shown:
SELECT TOP 1 P1.PatientID,
P1.DimPatientPK,
DA1.DimAdmissionPK,
DD1.DiagnosisPK,
DI1.Investigation1PK,
DIE1.InvestigationECGPK,
IEG1.InvestigationEchoGoldPK,
MH1.DimMedicalHistoryPK,
FH1.DimPatientFamilyHistoryPK,
PHT1.PatientHospitalisationTreatmentPK,
PMP1.PatientMedicalPersonnelPK,
RR1.PatientReferralReasonPK,
PEA1.PhysicalExamAHSPK,
PEM1.PhysicalExamMurmursPK,
SI1.SocialIssuePK,
TRT.TreatmentPK
--DT1.Treatment1PK
FROM
DimPatient P1 LEFT JOIN DimAdmission DA1 ON P1.PatientID = DA1.PatientID
LEFT JOIN DimDiagnosis DD1 ON P1.PatientID = DD1.PatientID
LEFT JOIN DimInvestigation1 DI1 ON P1.PatientID = DI1.PatientID
LEFT JOIN DimInvestigationECG DIE1 ON P1.PatientID = DIE1.PatientId
LEFT JOIN DimInvestigationECHOgold IEG1 ON P1.PatientID = DIE1.PatientId
LEFT JOIN DimMedicalHistory MH1 ON P1.PatientID = MH1.PatientId
LEFT JOIN DimPatientFamilyHistory FH1 ON P1.PatientId = FH1.PatientID
LEFT JOIN DimPatientHospitalisationTreatment PHT1 ON P1.PatientID = PHT1.PatientId
LEFT JOIN DimPatientMedicalPersonnel PMP1 ON P1.PatientID = PMP1.PatientId
LEFT JOIN DimPatientReferralReason RR1 ON P1.PatientID = RR1.PatientId
LEFT JOIN DimPhysicalExamAHS PEA1 ON P1.PatientId = PEA1.PatientId
LEFT JOIN DimPhysicalExamination PE1 ON P1.PatientID = PE1.PatientId
LEFT JOIN DimPhysicalExamMurmurs PEM1 ON P1.PatientID = PEM1.PatientId
LEFT JOIN DimSocialIssue SI1 ON P1.PatientID = SI1.PatientID
LEFT JOIN DimTreatment TRT ON P1.PatientID = TRT.PatientId
WHERE P1.patientid IN(100)
ORDER BY DA1.DimAdmissionPK DESC,
P1.DimPatientPK DESC,
DD1.DiagnosisPK DESC,
DI1.Investigation1PK DESC,
DIE1.InvestigationECGPK DESC,
IEG1.InvestigationEchoGoldPK DESC,
MH1.DimMedicalHistoryPK DESC,
FH1.DimPatientFamilyHistoryPK DESC,
PHT1.PatientHospitalisationTreatmentPK DESC,
PMP1.PatientMedicalPersonnelPK DESC,
RR1.PatientReferralReasonPK DESC,
PEA1.PhysicalExamAHSPK DESC,
PE1.PhysicalExaminationPK DESC,
PEM1.PhysicalExamMurmursPK DESC,
SI1.SocialIssuePK DESC,
TRT.TreatmentPK DESC;
This successfully recovers a full record whether it has been filled out or not for the patid 100.
I am having trouble expanding this so that it loops through and collects the same results for every patient in the db.
i.e. if i remove the where clause, i only get 1 row still ..
if i remove select top 1 .. then it returns me multiple sets of patientid 90 - i basically want 1 row for each patientID - ie 90, 91, 92 with the corresponding maximum key value from each table matched.
Anyone have any ideas on how to achieve this?

One (or more) tables you are joining is empty. Change your joins to left outer join, i.e.
LEFT OUTER JOIN DimAdmission DA1 ON P1.PatientID = DA1.PatientID
for all joined tables. The empty table will have Nulls in the results set columns.
If multiple selections for the same patient ID exist, then one or more of your tables has more than one record for each patient ID. You either need to exclude these tables from the query or look at your data structure and see if there is a field which will let you select single record.

I suggest you add the ROW_NUMBER function into your WHERE clause, e.g.
WHERE ROW_NUMBER() (PARTION BY P1.patientid ORDER BY DA1.DimAdmissionPK desc,
P1.DimPatientPK desc, ... ) = 1
By "..." I mean move your entire ORDER BY clause inside the ROW_NUMBER function.
Good luck - it seems you are missing a Fact table ...

Related

How does this SQL query return results with same id_product?

I am facing a complex SQL query in some code, which is suppose to return products without duplicates (by the use of DISTINCT keywork at the beginning), here is the query:
SELECT DISTINCT p.`id_product`, p.*, product_shop.*, pl.* , m.`name` AS manufacturer_name, x.`id_feature` , x.`id_feature_value` , s.`name` AS supplier_name
FROM `ps_product` p
INNER JOIN ps_product_shop product_shop
ON (product_shop.id_product = p.id_product AND product_shop.id_shop = 1)
LEFT JOIN `ps_product_attribute` y ON (y.`id_product` = p.`id_product`)
LEFT JOIN `ps_product_attribute_combination` ac ON (y.`id_product_attribute` = ac.`id_product_attribute`)
LEFT JOIN `ps_product_lang` pl ON (p.`id_product` = pl.`id_product` AND pl.id_shop = 1 )
LEFT JOIN `ps_manufacturer` m ON (m.`id_manufacturer` = p.`id_manufacturer`)
LEFT JOIN `ps_feature_product` x ON (x.`id_product` = p.`id_product`)
LEFT JOIN `ps_supplier` s ON (s.`id_supplier` = p.`id_supplier`)
LEFT JOIN `ps_category_product` c ON (c.`id_product` = p.`id_product`)
WHERE pl.`id_lang` = 1 AND c.`id_category` = 18 AND p.`price` between 0 and 1000
AND product_shop.`visibility` IN ("both", "catalog") AND product_shop.`active` = 1
ORDER BY p.`id_product` ASC LIMIT 1,4
But it returns 4 product with 2 products with same "id_product" (11941)
What I need is to return 4 products but of different ids each.
Anyone ?
Thanks a lot
Aymeric
[EDIT]
The result of this query shows 4 rows, with 2 having the same exact columns values EXCEPT for the id_feature_value column which 36 for one and 38 for the other.
SELECT DISTINCT gets all the distinct combinations of all selected fields in your query, not just the first field.
Now, you could solve that by using GROUP BY to select only distinct values of id_product specifically, like:
SELECT p.`id_product`, p.*, product_shop.*, pl.* , m.`name` AS manufacturer_name, x.`id_feature` , x.`id_feature_value` , s.`name` AS supplier_name
FROM `ps_product` p
INNER JOIN ps_product_shop product_shop
ON (product_shop.id_product = p.id_product AND product_shop.id_shop = 1)
LEFT JOIN `ps_product_attribute` y ON (y.`id_product` = p.`id_product`)
LEFT JOIN `ps_product_attribute_combination` ac ON (y.`id_product_attribute` = ac.`id_product_attribute`)
LEFT JOIN `ps_product_lang` pl ON (p.`id_product` = pl.`id_product` AND pl.id_shop = 1 )
LEFT JOIN `ps_manufacturer` m ON (m.`id_manufacturer` = p.`id_manufacturer`)
LEFT JOIN `ps_feature_product` x ON (x.`id_product` = p.`id_product`)
LEFT JOIN `ps_supplier` s ON (s.`id_supplier` = p.`id_supplier`)
LEFT JOIN `ps_category_product` c ON (c.`id_product` = p.`id_product`)
WHERE pl.`id_lang` = 1 AND c.`id_category` = 18 AND p.`price` between 0 and 1000
AND product_shop.`visibility` IN ("both", "catalog") AND product_shop.`active` = 1
GROUP BY p.`id_product`
ORDER BY p.`id_product` ASC LIMIT 1,4
However, the problem now is that your query has multiple different values of all the other fields you are selected to choose from, and no deterministic way to pick from them. Even though the id_product is unique in it's table, it's not unique in the result set because in at least one of your JOINs there is a one-to-many relationship, meaning there are several rows that match the JOIN conditions.
On older versions of MySQL, it will just pick the first value it finds in this case, but on SQL Server it will actually error out and tell you that the remaining fields either have to be mentioned in the GROUP BY clause, or they have to be aggregated. So, you've got a few ways you can go from here:
You are on an old version of MySQL and you don't particularly care which values are returned for the rest of the fields, so leave the query as I've posted and use that. I wouldn't recommend this, as it's undefined behaviour so in theory it could change at MySQL's whim. All the values returned will be from the same result row though.
Add aggregate functions, such as MIN() or MAX() to the rest of the remaining fields in the select clause. This will reduce the possible values for the fields down to one, but you will probably end up with a mixture of values from different rows.
Remove any one-to-many JOINs from your query so that you only ever get one row back in the result set for each individual id_product. Then, fetch the remaining data you need in a separate query.
There may be other alternative solutions, but it depends a lot on which values you want returned for the rest of the rows and what RDBMS you are using. For example, on SQL Server you could potentially make use of PARTITION BY to select the first row for each distinct id_product deterministically.

SELECT statement where rows are omitted based on another table

Table with orders has another table with positions. I want the orders table to show but then only have the most up to-date position on it. Below is a picture of the 3 rows I want showing. Omit the rest.
SELECT DispatchTable.ordernumber, DispatchTable.truck,
DispatchTable.driver, DispatchTable.actualpickup,
DispatchTable.actualdropoff, orders.pickupdateandtime,
orders.dropoffdateandtime, Truck002.lastposition,
Truck002.lastdateandtime
FROM DispatchTable
INNER JOIN orders ON DispatchTable.ordernumber = orders.id
INNER JOIN Truck002 ON DispatchTable.truck = Truck002.name
WHERE (orders.status = 'onRoute')
Assuming that you want the row having the latest lastdateandtime for the truck name, this should work:
SELECT DispatchTable.ordernumber,
DispatchTable.truck,
DispatchTable.driver,
DispatchTable.actualpickup,
DispatchTable.actualdropoff,
orders.pickupdateandtime,
orders.dropoffdateandtime,
TruckLatest.lastposition,
TruckLatest.lastdateandtime
FROM DispatchTable
INNER JOIN orders ON DispatchTable.ordernumber = orders.id
INNER JOIN (SELECT name,
lastposition,
lastdateandtime
FROM Truck002 Truck1
WHERE lastdateandtime =
(SELECT MAX(lastdateandtime)
FROM Truck002 Truck2
WHERE Truck2.name = Truck1.name)) TruckLatest
ON DispatchTable.truck = TruckLatest.name
WHERE (orders.status = 'onRoute')
If I understand correctly, you can get the most recent record for a truck using ROW_NUMBER():
SELECT dt.ordernumber, dt.truck,
dt.driver, dt.actualpickup,
dt.actualdropoff, o.pickupdateandtime,
o.dropoffdateandtime, t.lastposition,
t.lastdateandtime
FROM DispatchTable dt INNER JOIN
orders o
ON dt.ordernumber = o.id INNER JOIN
(SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.name ORDER BY t.lastdateandtime DESC) as seqnum
FROM Truck002 t
) t
ON dt.truck = t.name
WHERE o.status = 'onRoute' AND seqnum = 1;
Firstly, why are you using Truck002's name field rather than its id field as the link to DispacthTable? This is considered a less efficient way of doing it than using id (which is either a numerical field or a shorter string than name).
Secondly, you should mention in your Question that each Order can have many DispatchTable's and that each DispacthTable can have many Truck002's, otherwise many people will start by assuming that it is the other way round between DispatchTable and Truck002.
Thirdly, please try...
SELECT DispatchTable.ordernumber,
DispatchTable.truck,
DispatchTable.driver,
DispatchTable.actualpickup,
DispatchTable.actualdropoff,
orders.pickupdateandtime,
orders.dropoffdateandtime,
Truck002.lastposition,
Truck002.lastdateandtime
FROM DispatchTable
INNER JOIN orders ON DispatchTable.ordernumber = orders.id
INNER JOIN Truck002 ON DispatchTable.truck = Truck002.name
WHERE (orders.status = 'onRoute')
GROUP BY ordernumber
HAVING lastdateandtime = MAX( lastdateandtime )
If you have any questions or comments, then please feel free to post a Comment accordingly.
Further Reading
https://msdn.microsoft.com/en-us/library/bb177906(v=office.12).aspx (on HAVING)
https://www.w3schools.com/sql/sql_having.asp (on HAVING)
https://msdn.microsoft.com/en-us/library/bb177905(v=office.12).aspx (on GROUP BY)
https://www.w3schools.com/sql/sql_groupby.asp (on GROUP BY)

SQL Query to retrieve single record per filter

I have the following query:
SELECT min(salesorder.SOM_SalesOrderID) AS salesorder,
Item.IMA_ItemID,
Item.IMA_ItemName,
Customer.CUS_CorpName,
WK.WKO_WorkOrderID,
min(WK.WKO_OrigRequiredDate),
WK.WKO_WorkOrderTypeCode,
min(WK.WKO_RequiredDate),
max(WK.WKO_LastWorkDate),
min(wk.WKO_RequiredQty),
wk.WKO_MatlIssueDate,
min(SalesOrderDelivery.SOD_RequiredQty),
Item.IMA_ItemTypeCode,
Item.IMA_OnHandQty,
min(SalesOrderDelivery.SOD_PromiseDate),
min(WO.woo_operationseqID) AS seqid
FROM SalesOrder
INNER JOIN SalesOrderLine ON SalesOrder.SOM_RecordID = SalesOrderLine.SOI_SOM_RecordID
INNER JOIN SalesOrderDelivery ON SalesOrderLine.SOI_RecordID = SalesOrderDelivery.SOD_SOI_RecordID,
WO.
INNER JOIN Item ON SalesOrderLine.SOI_IMA_RecordID = Item.IMA_RecordID
INNER JOIN WKO wk ON Item.IMA_ItemID = WK.WKO_ItemID
INNER JOIN Customer ON SalesOrder.SOM_CUS_RecordID = Customer.CUS_RecordID
INNER JOIN woo WO ON WO.WOO_WorkOrderID = WK.WKO_WorkOrderID
WHERE wk.WKO_StatusCode = 'Released'
AND WO.WOO_StatusCode IS NULL
AND SalesOrderDelivery.SOD_ShipComplete = 'false'
GROUP BY WK.WKO_WorkOrderID,
Item.IMA_ItemID,
Item.IMA_ItemName,
Customer.CUS_CorpName,
WK.WKO_WorkOrderTypeCode,
wk.WKO_MatlIssueDate,
Item.IMA_ItemTypeCode,
Item.IMA_OnHandQty
I need 1 record returned for each wk.wko_workorderid. There is a field that is not included that I'm not sure how to get. I need to retrieve the woo.woo_workcenterid that corresponds to min(WO.woo_operationseqID)as seqid. I cannot include it in the general query since there are multiple workcenterids in the table and I only want the specific one that is part of the min operation sequence record.
Any help would be appreciated.

Joining top records in T-SQL

SELECT MD.*, Contact.FirstName
FROM MerchantData MD
JOIN Merchant M ON M.MerchID = MD.MerchID
JOIN (SELECT TOP 1 * FROM Location WHERE Location.BusID = MD.BusID) L ON L.BusID=MD.BusID
AND L.Deleted = 0
JOIN Contact ON Contact.ContactID = L.PrincipalID
I am using SQLSERVER 2008 and trying to write this SQL statement. There is some times multiple locations for a busid and I want to join in only the first found. I am getting an error on the part "Location.BusID = MD.BusID" as MD.BusID cannot be bound. Is it possible to use the MD table in the nested select statment in this join or is there another way of accomplishing this?
I am contiplating putting the data using nested querys in the column list to grab the contact data driectly there.
It would be simpler I think to have a subquery of the full result set:
SELECT MD.*, Contact.FirstName
FROM MerchantData MD
JOIN Merchant M ON M.MerchID = MD.MerchID
JOIN (SELECT BusID, MAX(PrincipalID)
FROM Location
WHERE Deleted = 0
GROUP BY BusID) L ON L.BusID=MD.BusID
JOIN Contact ON Contact.ContactID = L.PrincipalID
You still get one record per BusID in the JOIN but it's not correlated.
SELECT MD.*, Contact.FirstName
FROM MerchantData MD
JOIN Merchant M ON M.MerchID = MD.MerchID
CROSS APPLY (SELECT TOP 1 * FROM Location WHERE BusID = MD.BusID AND DELETED = 0) L
JOIN Contact ON Contact.ContactID = L.PrincipalID
This is a case of the "top n per group" problem. This question will guide you:
SQL Server query select 1 from each sub-group
You'll want to be doing something like this:
SELECT MD.* ,
Contact.FirstName
FROM MerchantData MD
JOIN Merchant M ON M.MerchID = MD.MerchID
JOIN ( select * ,
seq = rank() over( partition by BusID order by BusID , ... )
from Location
where L.Deleted = 0
) L on L.BusID = MD.BusID
and seq = 1
JOIN Contact ON Contact.ContactID = L.PrincipalID
The virtual table expression should return at most 1 Location per BusID (0 if the BusID has no non-deleted Locations).
To try and isolate out the error I would try. See if it can match Location.BusID = MD.BusID.
SELECT MD.*, Contact.FirstName
FROM MerchantData MD
JOIN Merchant M ON M.MerchID = MD.MerchID
JOIN Location On Location.BusID = MD.BusID
You do not use the * so use
SELECT TOP 1 Location.BusID FROM Location WHERE Location.BusID = MD.BusID
Once you get the syntax working.
You do know that once you get this working it will only match if the "first" row and then check if it is deleted. The problem is that without an order by the "first" row is arbitrary. Even with a clustered index on a table there is no guaranteed sort without an order by clause. To get a repeatable answer you need a sort. But if you are sorting and only want the top row then a MAX or MIN and a group be more straight forward.
If you want just business that have one or more deleted locations then the following should work but you need to break out the columns for the group by. If two deleted locations have differenct contact name then it will report each. So, this may not be what you are looking for.
SELECT MD.col1, MD.col2, Contact.FirstName
FROM MerchantData MD
JOIN Merchant M ON M.MerchID = MD.MerchID
JOIN Location L
ON L.BusID = MD.BusID
AND L.Deleted = 0
JOIN Contact ON Contact.ContactID = L.PrincipalID
GROUP BY MD.col1, MD.col2, Contact.FirstName

Max date in view on left outer join

Thanks to a previous question, I found out how to pull the most recent data based on a linked table. BUT, now I have a related question.
The solution that I found used row_number() and PARTITION to pull the most recent set of data. But what if there's a possibility for zero or more rows in a linked table in the view? For example, the table FollowUpDate might have 0 rows, or 1, or more. I just want the most recent FollowUpDate:
SELECT
EFD.FormId
,EFD.StatusName
,MAX(EFD.ActionDate)
,EFT.Name AS FormType
,ECOA.Account AS ChargeOffAccount
,ERC.Name AS ReasonCode
,EAC.Description AS ApprovalCode
,MAX(EFU.FollowUpDate) AS FollowUpDate
FROM (
SELECT EF.FormId, EFD.ActionDate, EFS.Name AS StatusName, EF.FormTypeId, EF.ChargeOffId, EF.ReasonCodeId, EF.ApprovalCodeId,
row_number() OVER ( PARTITION BY EF.FormId ORDER BY EFD.ActionDate DESC ) DateSortKey
FROM Extension.FormDate EFD INNER JOIN Extension.Form EF ON EFD.FormId = EF.FormId INNER JOIN Extension.FormStatus EFS ON EFD.StatusId = EFS.StatusId
) EFD
INNER JOIN Extension.FormType EFT ON EFD.FormTypeId = EFT.FormTypeId
LEFT OUTER JOIN Extension.ChargeOffAccount ECOA ON EFD.ChargeOffId = ECOA.ChargeOffId
LEFT OUTER JOIN Extension.ReasonCode ERC ON EFD.ReasonCodeId = ERC.ReasonCodeId
LEFT OUTER JOIN Extension.ApprovalCode EAC ON EFD.ApprovalCodeId = EAC.ApprovalCodeId
LEFT OUTER JOIN (Select EFU.FormId, EFU.FollowUpDate, row_number() OVER (PARTITION BY EFU.FormId ORDER BY EFU.FollowUpDate DESC) FUDateSortKey FROM Extension.FormFollowUp EFU INNER JOIN Extension.Form EF ON EFU.FormId = EF.FormId) EFU ON EFD.FormId = EFU.FormId
WHERE EFD.DateSortKey = 1
GROUP BY
EFD.FormId, EFD.ActionDate, EFD.StatusName, EFT.Name, ECOA.Account, ERC.Name, EAC.Description, EFU.FollowUpDate
ORDER BY
EFD.FormId
If I do a similar pull using row_number() and PARTITION, I get the data only if there is at least one row in FollowUpDate. Kinda defeats the purpose of a LEFT OUTER JOIN. Can anyone help me get this working?
I rewrote your query - you had unnecessary subselects, and used row_number() for the FUDateSortKey but didn't use the column:
SELECT t.formid,
t.statusname,
MAX(t.actiondate) 'actiondate',
t.formtype,
t.chargeoffaccount,
t.reasoncode,
t.approvalcode,
MAX(t.followupdate) 'followupdate'
FROM (
SELECT t.formid,
fs.name 'StatusName',
t.actiondate,
ft.name 'formtype',
coa.account 'ChargeOffAccount',
rc.name 'ReasonCode',
ac.description 'ApprovalCode',
ffu.followupdate,
row_number() OVER (PARTITION BY ef.formid ORDER BY t.actiondate DESC) 'DateSortKey'
FROM EXTENSION.FORMDATE t
JOIN EXTENSION.FORM ef ON ef.formid = t.formid
JOIN EXTENSION.FORMSTATUS fs ON fs.statusid = t.statusid
JOIN EXTENSION.FORMTYPE ft ON ft.formtypeid = ef.formtypeid
LEFT JOIN EXTENSION.CHARGEOFFACCOUNT coa ON coa.chargeoffid = ef.chargeoffid
LEFT JOIN EXTENSION.REASONCODE rc ON rc.reasoncodeid = ef.reasoncodeid
LEFT JOIN EXTENSION.APPROVALCODE ac ON ac.approvalcodeid = ef.approvalcodeid
LEFT JOIN EXTENSION.FORMFOLLOWUP ffu ON ffu.formid = t.formid) t
WHERE t.datesortkey = 1
GROUP BY t.formid, t.statusname, t.formtype, t.chargeoffaccount, t.reasoncode, t.approvalcode
ORDER BY t.formid
The change I made to allow for FollowUpDate was to use a LEFT JOIN onto the FORMFOLLOWUP table - you were doing an INNER JOIN, so you'd only get rows with FORMFOLLOWUP records associated.
It's pretty hard to guess what's going on without table definitions and sample data.
Also, this is confusing: "the table FollowUpDate might have 0 rows" and you "want the most recent FollowUpDate." (especially when there is no table named FollowUpDate) There is no "most recent FollowUpDate" if there are zero FollowUpDates.
Maybe you want
WHERE <follow up date row number> in (1,NULL)
I figured it out. And as usual, I need a nap. I just needed to change my subselect to something I would swear I'd tried with no success:
SELECT field1, field2
FROM Table1 t1
LEFT JOIN (
SELECT field3, max(dateColumn)
FROM Table2
GROUP BY
field3
) t2
ON (t1.field1 = t2.field3)