SQL - Return most recent row from history table - sql

I am working with Salesforce Lead and Contact data ('people') and parsing the record ownership history to see if a certain type of user currently owns or previously owned one of these people.
There are instances where one person or multiple people may have owned a record, in which case I want to return the more recent owner info (based on assignment date)
Here is my SQL, which returns duplicate rows for each Lead/Contact record.
with history as (
select
id,
leadid as lead_or_contact_id,
createddate,
field,
oldvalue,
newvalue
from salesforce_leadhistory
union
select
id,
contactid as lead_or_contact_id,
createddate,
field,
oldvalue,
newvalue
from salesforce_contacthistory
)
select
h.createddate as assignment_date,
h.lead_or_contact_id,
coalesce(old_user.name, new_user.name) as user_name,
coalesce(old_user.id, new_user.id) as user_id,
coalesce(old_user_role.name, new_user_role.name) as user_role
from history h
left join salesforce_user new_user on new_user.id = h.newvalue
left join salesforce_userrole new_user_role on new_user_role.id = new_user.userroleid
left join salesforce_user old_user on old_user.id = h.oldvalue
left join salesforce_userrole old_user_role on old_user_role.id = old_user.userroleid
where
field = 'Owner'
and ( old_user_role.name like 'SDR%' or new_user_role.name like 'SDR%' )
This returns, for example, two rows that refer to the same person record:
assignment_date lead_or_contact_id user_name user_id user_role
2020-01-01T00:42:37.000+00:00 00QXYZ Joe Jones 123 SDR - EMEA
2020-10-14T03:25:39.000+00:00 00QXYZ Max Clark 456 SDR - USA
In this instance, I would prefer all of the data from the second row (assignment date of 2020-10-14, user_name of Max Clark, user_id 456, and user_role "SDR - USA" to be returned since that is the most recent assignment history.
How can I accomplish this?

You can use row_number()
select * from
(
select
h.createddate as assignment_date,
h.lead_or_contact_id,
coalesce(old_user.name, new_user.name) as user_name,
coalesce(old_user.id, new_user.id) as user_id,
coalesce(old_user_role.name, new_user_role.name) as user_role,row_number() over(partition by h.lead_or_contact_id order by h.createddate desc) as rn
from history h
left join salesforce_user new_user on new_user.id = h.newvalue
left join salesforce_userrole new_user_role on new_user_role.id = new_user.userroleid
left join salesforce_user old_user on old_user.id = h.oldvalue
left join salesforce_userrole old_user_role on old_user_role.id = old_user.userroleid
where
field = 'Owner'
and ( old_user_role.name like 'SDR%' or new_user_role.name like 'SDR%' )
)A where rn=1

Related

Return Duplicate emails along with User Ids that are different

I'm running into an issue with a duplicate query and I hope you guys can help.
Essentially what I want to do is find and list of the duplicate emails associated with different userids
My query is:
select UserId, acitveid, email, userstatusid
from (select u.UserId, u.acitveid, cd.email, u.userstatusid,
count(*)over (partition by cd.email) as cnt
from ContactDetails cd
join UserContactDetails ucd on ucd.ContactDetailsId = cd.ContactDetailsId
join dbo.[User] u on u.UserId = ucd.UserId ) ua
where cnt >1
The issue I have with the above query is that it is returning the same userids for some of the results so it looks like:
Userid AcitveId email UserStatusid
123 1 abc#123.com 1
123 1 abc#123.com 1
135 1 efg#123.com 1
142 1 efg#123.com 1
The results Im looking for are simply:
Userid AcitveId email UserStatusid
135 1 efg#123.com 1
142 1 efg#123.com 1
WITH base AS (
SELECT DISTINCT u.UserId
,u.acitveid
,cd.email
,u.userstatusid
,
FROM ContactDetails cd
JOIN UserContactDetails ucd ON ucd.ContactDetailsId = cd.ContactDetailsId
JOIN dbo.[User] u ON u.UserId = ucd.UserId
)
,duplicate_emails AS (
SELECT email
,count(userId) AS cnt
FROM base
GROUP BY 1
HAVING count(userId) > 1
)
SELECT b.*
FROM base b
JOIN duplicate_emails de ON b.email = de.email
A self join across Email = email and id <> id would work fine here. That said, your request and lack of sample data means that we are largely guessing based off the query and sample output you have provided. The below should get you pretty close and, if you update your OP, I am sure we can get you exactly what you're after.
SELECT ActiveUser.UserID Active_UserID,
ActiveUser.ActiveID Active_ActiveID,
ContactDetails.email AS Email,
DuplicateUser.UserID AS Dup_UserID,
DuplicateUser.ActiveID As Dup_ActiveID
FROM ContactDetails INNER JOIN
ContactDetails AS Duplicate ON ContactDetails.email = Duplicate.email AND ContactDetails.UserID <> Duplicate.UserID INNER JOIN
UserContactDetails AS ActiveUserContactDetails ON ActiveUserContactDetails.ContactDetailsID = ContactDetails.ContactDetailsID INNER JOIN
dbo.[User] AS ActiveUser ON ActiveUser.UserID = ActiveUserContactDetails.UserID INNER JOIN
UserContactDetails AS DuplicateUserContactDetails ON DuplicateUserContactDetails.ContactDetailsID = Duplicate.ContactDetailsID INNER JOIN
dbo.[User] AS DuplicateUser ON DuplicateUser.UserID = UserContactDetails.UserID

Remove multiple rows with same ID

So I've done some looking around and wasn't unable to find quite what I was looking for. I have two tables.
1.) Table where general user information is stored
2.) Where a status is generated and stored.
The problem is, is that there are multiple rows for the same users and querying these results in multiple returns. I can't just merge them because they aren't all the same status. I need just the newest status from that table.
Example of the table:
SELECT DISTINCT
TOP(50) cam.UserID AS PatientID,
mppi.DisplayName AS Surgeon,
ISNULL(sci.IOPStatus, 'N/A') AS Status,
tkstat.TrackerStatusID AS Stat_2
FROM
Main AS cam
INNER JOIN
Providers AS rap
ON cam.VisitID = rap.VisitID
INNER JOIN
ProviderInfo AS mppi
ON rap.UnvUserID = mppi.UnvUserID
LEFT OUTER JOIN
Inop AS sci
ON cam.CwsID = sci.CwsID
LEFT OUTER JOIN
TrackerStatus AS tkstat
ON cam.CwsID = tkstat.CwsID
WHERE
(
cam.Location_ID IN
(
'SURG'
)
)
AND
(
rap.IsAttending = 'Y'
)
AND
(
cam.DateTime BETWEEN CONCAT(CAST(GETDATE() AS DATE), ' 00:00:00') AND CONCAT(CAST(GETDATE() AS DATE), ' 23:59:59')
)
AND
(
cam.Status_StatusID != 'Cancelled'
)
ORDER BY
cam.UserID ASC
So I need to grab only the newest Stat_2 from each ID so they aren't returning multiple rows. Each Stat_2 also has an update time meaning I can sort by the time/date that column is : StatusDateTime
One way to handle this is to create a calculated row_number for the table where you need the newest record.
Easiest way to do that is to change your TKSTAT join to a derived table with the row_number calculation and then add a constraint to your join where the RN =1
SELECT DISTINCT TOP (50)
cam.UserID AS PatientID, mppi.DisplayName AS Surgeon, ISNULL(sci.IOPStatus, 'N/A') AS Status, tkstat.TrackerStatusID AS Stat_2
FROM Main AS cam
INNER JOIN Providers AS rap ON cam.VisitID = rap.VisitID
INNER JOIN ProviderInfo AS mppi ON rap.UnvUserID = mppi.UnvUserID
LEFT OUTER JOIN Inop AS sci ON cam.CwsID = sci.CwsID
LEFT OUTER JOIN (SELECT tk.CwsID, tk.TrackerStatusId, ROW_NUMBER() OVER (PARTITION BY tk.cwsId ORDER BY tk.CreationDate DESC) AS rn FROM TrackerStatus tk)AS tkstat ON cam.CwsID = tkstat.CwsID
AND tkstat.rn = 1
WHERE (cam.Location_ID IN ( 'SURG' )) AND (rap.IsAttending = 'Y')
AND (cam.DateTime BETWEEN CONCAT(CAST(GETDATE() AS DATE), ' 00:00:00') AND CONCAT(CAST(GETDATE() AS DATE), ' 23:59:59'))
AND (cam.Status_StatusID != 'Cancelled')
ORDER BY cam.UserID ASC;
Note you need a way to derive what the "newest" status is; I assume there is a created_date or something; you'll need to enter the correct colum name
ROW_NUMBER() OVER (PARTITION BY tk.cwsId ORDER BY tk.CreationDate DESC) AS rn
SQL Server doesn't offer a FIRST function, but you can reproduce the functionality with ROW_NUMBER() like this:
With Qry1 (
Select <other columns>,
ROW_NUMBER() OVER(
PARTITION BY <group by columns>
ORDER BY <time stamp column*> DESC
) As Seq
From <the rest of your select statement>
)
Select *
From Qry1
Where Seq = 1
* for the "newest" record.

Avoiding a redundant query in a join

The salient feature of this problem is trying to get two columns from a row without a second query returning the same row. I've included more information for context.
I have the following (simplified) tables which represent documents sent to us by customers, scanned in batches by users here.
Batches: Id, ...
Documents: Id, CustomerId, ...
Documents_Batches: Id, BatchId, DocumentId
And document history (Creation, state changes, edits, etc.):
DocumentEvents: Id, DocumentId, UserId, Occurred (datetime)
What I want is list of documents in a given batch, plus some event data:
Result: DocumentId, CustomerId, Created, CreatedBy, ...
How do I get both the Created date, and the CreatedBy value in that same row?
ALTER PROCEDURE [dbo].[sp_GetBatchDocuments]
#BatchId INT
AS
BEGIN
SELECT
Documents.Id,
Documents.CustomerId,
MIN(DocumentEvents.Occurred) AS Created,
/* UserId value of the 'Created' row, AS CreatedBy */
MAX(DocumentEvents.Occurred) AS Modified
/* UserId value of the 'Modfied' row, AS ModifiedBy */
FROM
Documents
INNER JOIN Documents_Batches
ON Documents.Id = Documents_Batches.DocumentId
INNER JOIN DocumentEvents
ON Documents.Id = DocumentEvents.DocumentId
WHERE Documents_Batches.BatchId = #BatchId;
END
Although I could likely get them beforehand, or with a function call, every case I can think of would mean multiple queries of the same row.
EDIT: Barring some surprise idea from SO, I've concluded that this isn't logically possible without a second query to the same row (for each Date/User column pair I want). In order to make this happen, SQL would need a Row-valued (vs. a table-valued) function, and internally, that would need first filter by DocumentId, and then filter that result by the lowest/highest date. No matter the approach, it's two queries. Maybe time to reassess the normalization strategy for this data.
Playing with the CTE and the ROW_NUMBER function you can do something like
WITH MinMax AS (
SELECT d.Id
, d.CustomerId
, de.Occurred
, de.UserId
, RowAsc = ROW_NUMBER() OVER (PARTITION BY d.Id, d.CustomerId
ORDER BY de.Occurred)
, RowDesc =ROW_NUMBER() OVER (PARTITION BY d.Id, d.CustomerId
ORDER BY de.Occurred Desc)
FROM Documents d
INNER JOIN Documents_Batches d_b ON d.Id = d_b.DocumentId
INNER JOIN DocumentEvents de ON d.Id = d_e.DocumentId
WHERE d_b.BatchId = #BatchId;
)
SELECT Id, CustomerId
, Created = Max(Case When RowAsc = 1 Then Occurred Else Null End)
, CreatedBy = Max(Case When RowAsc = 1 Then UserId Else Null End)
, Modified = Max(Case When RowDesc = 1 Then Occurred Else Null End)
, ModifiedBy = Max(Case When RowDesc = 1 Then UserId Else Null End)
FROM MinMax
WHERE 1 IN (RowAsc, RowDesc)
GROUP BY Id, CustomerId
In MinMax the row with RowAsc = 1 is the row with the min date and the row with RowDesc = 1 is the row with the max date for the Id, CustomerId group
I would do it like this. Two joins are not redundant, they select different information.
SELECT
Documents.Id,
Documents.CustomerId,
MinTable.Created,
MinTable.UserId AS CreatedBy,
MaxTable.Modified,
MaxTable.UserId AS ModifiedBy
FROM
Documents
INNER JOIN Documents_Batches
ON Documents.Id = Documents_Batches.DocumentId
INNER JOIN (SELECT Occurred AS Created, UserId, DocumentId FROM DocumentEvents GROUP BY DocumentId, CustomerId HAVING Occurred = MIN(Occurred)) AS MinTable
ON Documents.Id = MinTable.DocumentId
INNER JOIN (SELECT Occurred AS Modified, UserId, DocumentId FROM DocumentEvents GROUP BY DocumentId, CustomerId HAVING Occurred = MAX(Occurred)) AS MaxTable
ON Documents.Id = MaxTable.DocumentId
WHERE Documents_Batches.BatchId = #BatchId;

How to get some records without cursor, without Cross Apply in T-SQL

I have a table called Objects which contains some files, say:
User
Teacher
There is another table (States) which holds the possible states of these objects, like:
Active
Idle
Teaching
Resting
Authoring
And there is a third table (junction table) which logs each state change of each object. In this third table (ObjectStates) records are like:
1, 1, DateTime1 (User was active on DateTime1)
2, 5, DateTime2 (Teacher was authoring on DateTime2)
etc.
Now, what I want is a query to get each object, with its latest state (not state history). It's possible to get this result using cursors, or using Cross Apply command. However, I'd like to know if there is any other way to get the latest states of each object from these three tables? Because cursors are costy.
Using the row_number() windowing function...
select *
from
(
select objects.*,
state.state,
objectstates.changedate,
row_number() over (partition by object.objectid order by changedate desc) rn
from
objects
inner join
objectstates
on objects.id = objectstates.objectid
inner join
states
on objectstates.stateid = states.stateid
) v
where rn = 1
If you can't use row_number because you're on SQL 2000, for example, you can use a join on a max/group by query.
select objects.*,
state.state,
objectstates.changedate,
from
objects
inner join
objectstates
on objects.id = objectstates.objectid
inner join
states
on objectstates.stateid = states.stateid
inner join
(select objectid, max(changedate) as maxdate from objectstates group by objectid) maxstates
on objectstates.objectid=maxstates.objectid
and objectstates.changedate = maxstates.maxdate
You can join on the ObjectStates table twice. The first join of the table will get the max(activedate) for each objectid. The second time, you will join on both the objectid and the value of the max(activedate) and this will get the state associated with that value:
select o.name o_name,
s.name s_name,
os1.activedate
from objects o
left join
(
select max(activeDate) activedate, objectid
from objectstates
group by objectid
) os1
on o.id = os1.objectid
left join ObjectStates os2
on os1.objectid = os2. objectid
and os1.activedate = os2.activedate
left join states s
on os2.stateid = s.id
See SQL Fiddle with Demo
You can use partition over to find the latest row for each Object, like this
create table #ObjectState
(
Object int NOT NULL,
State int NOT NULL,
TimeStamp datetime NOT NULL
)
INSERT INTO #ObjectState (Object, State, TimeStamp) VALUES (1, 1, '2012-01-01')
INSERT INTO #ObjectState (Object, State, TimeStamp) VALUES (1, 2, '2012-01-02')
INSERT INTO #ObjectState (Object, State, TimeStamp) VALUES (1, 3, '2012-01-03')
INSERT INTO #ObjectState (Object, State, TimeStamp) VALUES (2, 4, '2012-01-01')
INSERT INTO #ObjectState (Object, State, TimeStamp) VALUES (2, 2, '2012-01-02')
select *, ROW_NUMBER() over (partition by Object order by TimeStamp desc) as RowNo from #ObjectState
select InnerSelect.Object, InnerSelect.State, InnerSelect.TimeStamp FROM
(
select *, ROW_NUMBER() over (partition by Object order by TimeStamp desc) as RowNo from #ObjectState
) InnerSelect
where InnerSelect.RowNo = 1
DROP TABLE #ObjectState
gives output
Object State TimeStamp
1 3 2012-01-03 00:00:00.000
2 2 2012-01-02 00:00:00.000
for the last select
In the good old days, we just used Scalar subqueries.
select o.*, (select top(1) s.description
from objectstates os
join states s on s.id = os.state_id
where os.object_id = o.id
order by os.recorded_time desc) last_state
from objects o;
Which CROSS APPLY replaces. To extend it to more fields, it had to be extended something like
select *
from (
select o.*, (select top(1) os.id
from objectstates os
where os.object_id = o.id
order by os.recorded_time desc) last_state
from objects o
) x
join objectstates os on os.id = x.last_state
join states s on s.id = os.state_id;

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...