Remove multiple rows with same ID - sql

So I've done some looking around and wasn't unable to find quite what I was looking for. I have two tables.
1.) Table where general user information is stored
2.) Where a status is generated and stored.
The problem is, is that there are multiple rows for the same users and querying these results in multiple returns. I can't just merge them because they aren't all the same status. I need just the newest status from that table.
Example of the table:
SELECT DISTINCT
TOP(50) cam.UserID AS PatientID,
mppi.DisplayName AS Surgeon,
ISNULL(sci.IOPStatus, 'N/A') AS Status,
tkstat.TrackerStatusID AS Stat_2
FROM
Main AS cam
INNER JOIN
Providers AS rap
ON cam.VisitID = rap.VisitID
INNER JOIN
ProviderInfo AS mppi
ON rap.UnvUserID = mppi.UnvUserID
LEFT OUTER JOIN
Inop AS sci
ON cam.CwsID = sci.CwsID
LEFT OUTER JOIN
TrackerStatus AS tkstat
ON cam.CwsID = tkstat.CwsID
WHERE
(
cam.Location_ID IN
(
'SURG'
)
)
AND
(
rap.IsAttending = 'Y'
)
AND
(
cam.DateTime BETWEEN CONCAT(CAST(GETDATE() AS DATE), ' 00:00:00') AND CONCAT(CAST(GETDATE() AS DATE), ' 23:59:59')
)
AND
(
cam.Status_StatusID != 'Cancelled'
)
ORDER BY
cam.UserID ASC
So I need to grab only the newest Stat_2 from each ID so they aren't returning multiple rows. Each Stat_2 also has an update time meaning I can sort by the time/date that column is : StatusDateTime

One way to handle this is to create a calculated row_number for the table where you need the newest record.
Easiest way to do that is to change your TKSTAT join to a derived table with the row_number calculation and then add a constraint to your join where the RN =1
SELECT DISTINCT TOP (50)
cam.UserID AS PatientID, mppi.DisplayName AS Surgeon, ISNULL(sci.IOPStatus, 'N/A') AS Status, tkstat.TrackerStatusID AS Stat_2
FROM Main AS cam
INNER JOIN Providers AS rap ON cam.VisitID = rap.VisitID
INNER JOIN ProviderInfo AS mppi ON rap.UnvUserID = mppi.UnvUserID
LEFT OUTER JOIN Inop AS sci ON cam.CwsID = sci.CwsID
LEFT OUTER JOIN (SELECT tk.CwsID, tk.TrackerStatusId, ROW_NUMBER() OVER (PARTITION BY tk.cwsId ORDER BY tk.CreationDate DESC) AS rn FROM TrackerStatus tk)AS tkstat ON cam.CwsID = tkstat.CwsID
AND tkstat.rn = 1
WHERE (cam.Location_ID IN ( 'SURG' )) AND (rap.IsAttending = 'Y')
AND (cam.DateTime BETWEEN CONCAT(CAST(GETDATE() AS DATE), ' 00:00:00') AND CONCAT(CAST(GETDATE() AS DATE), ' 23:59:59'))
AND (cam.Status_StatusID != 'Cancelled')
ORDER BY cam.UserID ASC;
Note you need a way to derive what the "newest" status is; I assume there is a created_date or something; you'll need to enter the correct colum name
ROW_NUMBER() OVER (PARTITION BY tk.cwsId ORDER BY tk.CreationDate DESC) AS rn

SQL Server doesn't offer a FIRST function, but you can reproduce the functionality with ROW_NUMBER() like this:
With Qry1 (
Select <other columns>,
ROW_NUMBER() OVER(
PARTITION BY <group by columns>
ORDER BY <time stamp column*> DESC
) As Seq
From <the rest of your select statement>
)
Select *
From Qry1
Where Seq = 1
* for the "newest" record.

Related

multiple joins to same sub query

I need to join to the same table repeatedly but this looks ugly. Any suggestions appreciated. Below is simplified SQL, I have 8 tables in the subquery and it produces many duplicate records of the same date, so I need to find only the newest record for each client. (I don't think DB and/or version should matter, but I am using DB2 11.1 LUW)
select c.client_num, a.eff_date, t.trx_date
from client c
join address a on(a.id = c.addr_id)
join transaction t on(t.addr_id = a.id)
{many other joins}
where {many conditions};
select SQ.*
from [ABOVE_SUBQUERY] SQ
join
(select client_num, max(eff_date) AS newest_date from [ABOVE_SUBQUERY] group by client_num) AA
ON(SQ.client_num = AA.client_num and SQ.eff_date = AA.newest_date)
join
(select client_num, max(trx_date) AS newest_date from [ABOVE_SUBQUERY] group by client_num) TT
ON(SQ.client_num = TT.client_num and SQ.trx_date = TT.newest_date)
I need to fine only the newest record for each client.
Can't you just use row_number()?
select t.*
from (select t.*,
row_number() over (partition by client_num order by eff_date desc, eff_time desc) as seqnum
from <whatever> t
) t
where seqnum = 1;

Sub-Query Should return Null when comparing it to the Next rows from the different table

My T-SQL subquery returning more than 1 rows:
WITH ReportView AS
(
SELECT
PMA.AssetName, ReleaseDt, ExpiresDt, TicketNumber, ChangeDt,
ChangeReasonCd,
ROW_NUMBER() OVER (PARTITION BY PMA.AssetName, ReleaseDt, ExpiresDt, TicketNumber, ChangeReasonCd
ORDER BY ChangeDt ASC) AS ROW_NUM
FROM
pmm.pmmreleaserequest PRR WITH (nolock)
LEFT JOIN
pmm.PmmManagedAccount AS PMA WITH (nolock) ON PRR.ManagedAccountID = PMA.ManagedAccountID
LEFT JOIN
dbo.ManagedEntity AS ME WITH (nolock) ON PRR.ManagedSystemID = ME.ManagedEntityID
LEFT JOIN
dbo.Asset AS AST WITH (nolock) ON ME.AssetID = AST.AssetID
LEFT JOIN
pmm.PmmLogChange AS PLC WITH (nolock) ON PRR.ManagedAccountID = PLC.ManagedAccountID
AND PRR.ExpiresDt < PLC.ChangeDt
)
SELECT *
FROM ReportView
WHERE ROW_NUM = 1
How to compare current row of ChangeDt with the next row of ReleaseDt. Example ChangeDt(current row) < ReleaseDt(Next row)
How can I put this condition.
This is how row_num is defined:
ROW_NUMBER() OVER (PARTITION BY PMA.AssetName, ReleaseDt, ExpiresDt, TicketNumber, ChangeReasonCd
ORDER BY ChangeDt ASC) AS ROW_NUM
It is going to return one row per unique combination of the columns in the PARTITION BY. That is one row per PMA.AssetName / ReleaseDt / ExpiresDt / TicketNumber / ChangeReasonCd.
I would expect there to be more than one such combination.
If you want only one row, then use SELECT TOP (1) or OFFSET/FETCH.

Joining Onto Partition Statement HANA SQL

I am trying to join two tables to the following query:
SELECT "NUMBER",
"U_ANALYZED_DATE",
"DV_SALES_ACCOUNT",
"U_USD_TOTAL_POTENTIAL_NNACV"
FROM
(select *, row_number() over ( partition by "DV_SALES_ACCOUNT" order by "U_ANALYZED_DATE" desc ) rownum
from "SURF_RT"."SALES_REQUEST")
WHERE rownum = 1
AND "DV_SALES_CATEGORY" = 'Compliance'
AND "DV_STATE" NOT IN ('Closed Canceled')
AND (YEAR("U_ANALYZED_DATE") = '2019' AND MONTH("U_ANALYZED_DATE") IN ('10','11','12')
OR YEAR("U_ANALYZED_DATE") = '2020' AND MONTH("U_ANALYZED_DATE") IN ('1','2','3'))
AND "U_USD_TOTAL_POTENTIAL_NNACV" > 0
ORDER BY "U_ANALYZED_DATE" desc
The tables should be joined as follows:
JOIN "SURF_RT"."SALES_ACCOUNT" on "SURF_RT"."SALES_ACCOUNT"."NAME" = "SURF_RT"."SALES_REQUEST"."DV_SALES_ACCOUNT"
JOIN "SURF_RT"."SALES_CONTRACT" on "SURF_RT"."SALES_CONTRACT"."DV_ACCOUNT" = "SURF_RT"."SALES_REQUEST"."DV_SALES_ACCOUNT"
I am getting an error no matter what I try and it has to be because of the partition. Does anyone know the solution here?
I suspect that you just need to alias the derived table so you can then refer to join it. In many databases this is mandatory anyway, but apparently not in Hana (otherwise your original query would not run).
But to join on a derived table (the resultset that is generated by the subquery), alias do help:
SELECT
...
FROM
(
select
*,
row_number() over ( partition by "DV_SALES_ACCOUNT" order by "U_ANALYZED_DATE" desc ) rownum
from "SURF_RT"."SALES_REQUEST"
) sr -- table alias
JOIN "SURF_RT"."SALES_ACCOUNT"
ON "SURF_RT"."SALES_ACCOUNT"."NAME" = "SURF_RT"."SALES_REQUEST"."DV_SALES_ACCOUNT"
JOIN "SURF_RT"."SALES_CONTRACT"
ON "SURF_RT"."SALES_CONTRACT"."DV_ACCOUNT" = sr."DV_SALES_ACCOUNT" --reference to the derived table
WHERE
...
Side notes:
you should use table aliases for other tables involved in the query too, in order to make the code more readable
you should also prefix each column in the query with the identifier of the table it belongs to, so the query is unambiguous (and easier to maintain)
The problem with this query is not "that the where rownum = 1 needs to be at the end" but that the OP got confused by the bracketing of SQL expression.
More specifically, trying to reference the sub-query data by specifying join conditions against the base table that is used in the sub-query:
JOIN "SURF_RT"."SALES_ACCOUNT"
on "SURF_RT"."SALES_ACCOUNT"."NAME" = "SURF_RT"."SALES_REQUEST"."DV_SALES_ACCOUNT"
JOIN "SURF_RT"."SALES_CONTRACT"
on "SURF_RT"."SALES_CONTRACT"."DV_ACCOUNT" = "SURF_RT"."SALES_REQUEST"."DV_SALES_ACCOUNT"
Since the sub-query (derived table) is used in the query and should be used for the join, it needs to be referred in the join condition, instead.
So, yes, it needs a table alias here and the join conditions need to refer to it.
SELECT
...
FROM
(select *
, row_number() over
(partition by "DV_SALES_ACCOUNT"
order by "U_ANALYZED_DATE" desc) rownum
from
"SURF_RT"."SALES_REQUEST") sr
INNER JOIN "SURF_RT"."SALES_ACCOUNT" sa
on sa."NAME" = sr."DV_SALES_ACCOUNT"
INNER JOIN "SURF_RT"."SALES_CONTRACT" sc
on sc."DV_ACCOUNT" = sr."DV_SALES_ACCOUNT"
WHERE
sr.rownum = 1
AND "DV_SALES_CATEGORY" = 'Compliance'
AND "DV_STATE" NOT IN ('Closed Canceled')
AND (YEAR("U_ANALYZED_DATE") = '2019'
AND MONTH("U_ANALYZED_DATE") IN ('10','11','12')
OR YEAR("U_ANALYZED_DATE") = '2020'
AND MONTH("U_ANALYZED_DATE") IN ('1','2','3'))
AND "U_USD_TOTAL_POTENTIAL_NNACV" > 0
ORDER BY
"U_ANALYZED_DATE" desc;
With just this little bit of standard SQL syntax and formatting of code the query got a lot easier to understand.
Now it's even obvious that the IN conditions for MONTH and YEAR should in fact be integers, not strings as these functions return integers.
SELECT
...
FROM
(SELECT *
, row_number() over
(partition by "DV_SALES_ACCOUNT"
order by "U_ANALYZED_DATE" desc) rownum
FROM
"SURF_RT"."SALES_REQUEST") sr
INNER JOIN "SURF_RT"."SALES_ACCOUNT" sa
on sa."NAME" = sr."DV_SALES_ACCOUNT"
INNER JOIN "SURF_RT"."SALES_CONTRACT" sc
on sc."DV_ACCOUNT" = sr."DV_SALES_ACCOUNT"
WHERE
sr.rownum = 1
AND "DV_SALES_CATEGORY" = 'Compliance'
AND "DV_STATE" NOT IN ('Closed Canceled')
AND ( YEAR("U_ANALYZED_DATE") = 2019
AND MONTH("U_ANALYZED_DATE") IN (10, 11, 12)
OR YEAR("U_ANALYZED_DATE") = '2020'
AND MONTH("U_ANALYZED_DATE") IN (1 , 2 , 3 )
)
AND "U_USD_TOTAL_POTENTIAL_NNACV" > 0
ORDER BY
"U_ANALYZED_DATE" DESC

INNER JOIN on a Sub Query

I have a list of tasks in a table called dbo.Task
In the database, each Task can have 1 or more rows in the TaskLine table.
TaskLine has a TaskID to related the Tasklines to the Task.
A TaskLine has a column called TaskHeadingTypeID
I need to return all the tasks, joined to the LAST TaskLine for that Task.
In english, I need to display a task, with the latest TaskLine heading. So, I basically need to join to the TaskLine table, like this (which, is incorrect and maybe inefficient, but hopefully shows what I am trying to do)
SELECT *
FROM #Results r
INNER JOIN (
SELECT TOP 1 TaskID, TaskHeadingTypeID FROM dbo.TaskLine
ORDER BY TaskLineID DESC
) tl
ON tl.TaskID = r.TaskID
However, the issue is, the sub query only brings back the last TaskLine row, which is incorrect.
Edit:
At the moment, it's 'Working' like the code below, but it seems highly inefficient, because for each task row, it has to run two extra queries. And they're both on the same table, just slightly different columns in that table:
(An extract of the columns in the SELECT cause)
SELECT TaskStatusID,
TaskStatus,
(SELECT TOP 1 TaskHeadingTypeID FROM dbo.TaskLine
WHERE TaskID = r.TaskID
ORDER BY TaskLineID DESC) AS TaskHeadingID,
(SELECT TOP 1 LongName FROM dbo.TaskLine tl
INNER JOIN ref.TaskHeadingType tht
ON tht.TaskHeadingTypeID = tl.TaskHeadingTypeID
WHERE TaskID = r.TaskID
ORDER BY TaskLineID DESC) AS TaskHeading,
PersonInCareID,
ICMSPartyID,
CarerID.... FROM...
EDIT:
Thanks to the ideas and comments below, I have ended up with this, using CTE:
;WITH ValidTaskLines (RowNumber, TaskID, TaskHeadingTypeID, TaskHeadingType)
AS
(SELECT
ROW_NUMBER()OVER(PARTITION BY tl.TaskID, tl.TaskHeadingTypeID ORDER BY tl.TaskLineID) AS RowNumber,
tl.TaskID,
tl.TaskHeadingTypeID,
LongName AS TaskHeadingType
FROM dbo.TaskLine tl
INNER JOIN ref.TaskHeadingType tht
ON tht.TaskHeadingTypeID = tl.TaskHeadingTypeID
)
SELECT AssignedByBusinessUserID,
BusinessUserID,
LoginName,
Comments,
r.CreateDate,
r.CreateUser,
r.Deleted,
r.Version,
IcmsBusinessUserID,
r.LastUpdateDate,
r.LastUpdateUser,
OverrrideApprovalBusinessUserID,
PlacementID,
r.TaskID,
TaskPriorityTypeID,
TaskPriorityCode,
TaskPriorityType,
TaskStatusID,
TaskStatus,
vtl.TaskHeadingTypeID AS TaskHeadingID,
vtl.TaskHeadingType AS TaskHeading,
PersonInCareID,
ICMSPartyID,
CarerID,
ICMSCarerEntityID,
StartDate,
EndDate
FROM #Results r
INNER JOIN ValidTaskLines vtl
ON vtl.TaskID = r.TaskID
AND vtl.RowNumber = 1
You could use the ROW_NUMBER() function for this:
SELECT *
FROM #Results r
INNER JOIN (SELECT TaskID
, TaskHeadingTypeID
, ROW_NUMBER()OVER(PARTITION BY TaskID, TaskHeadingTypeID ORDER BY TAskLineID DESC) RN
FROM dbo.TaskLine
) tl
ON tl.TaskID = r.TaskID
AND t1.RN = 1
The ROW_NUMBER() function assigns a number to each row. PARTITION BY is optional, but used to start the numbering over for each value in that group, ie: if you PARTITION BY Some_Date then for each unique date value the numbering would start over at 1. ORDER BY of course is used to define how the counting should go, and is required in the ROW_NUMBER() function.
You may need to adjust the PARTITION BY to suit your query, run the subquery by itself to get an idea of how the ROW_NUMBER() works.

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...