Select all records from Table 1 but only the row in Table 2 with the max Version number

Select all records from Table 1 but only the row in Table 2 with the max Version number - sql

I have Table Trades.Transaction and Table Trades.BondRef. They can be joined on InstrumentDescription but produce one to many rows as there are multiple ISIN/CUSIP (BondRef) per InstrumentDescription (Transaction). I would like to join but only display the row from Trades.BondRefwhich has the max Version number. I have reviewed numerous posts and come up with the code below.
SELECT tr.TradeDate,
tr.InstrumentDescription,
B.maxVersion,
B.IsLatest,
B.Isin,
B.Cusip,
B.RbcType1,
B.RbcType2,
B.RbcType3
FROM [trade_management].[dbo].[Trades.Transaction] tr WITH (NOLOCK)
INNER JOIN (
SELECT InstrumentDescription,
MAX(version) maxVersion,
IsLatest,
Isin,
Cusip,
RbcType1,
RbcType2,
RbcType3
FROM [trade_management].[dbo].[Trades.BondRef]
WHERE ValidTo between '2018-10-30 00:00:00.0000000 +00:00' and '2018-10-30 23:59:29.0000000 +00:00'
GROUP BY InstrumentDescription,IsLatest, Isin,Cusip,RbcType1,RbcType2,RbcType3
) AS B
ON B.InstrumentDescription = tr.InstrumentDescription
WHERE
(tr.OrigSystem = 'RBCE TOMS' OR tr.OrigSystem = 'SALE')
and (BookingAccountType = 'CLIENT' OR BookingAccountType = 'MASTER')
and tr.BookingAccountFacilitatorTeamCode in ('ESF','MJC','43B','DWV','G9J','698','9DN','A2T','AX3') -- HK Sales
and tr.IsLatest = 1
and tr.Status not in ('Cancelled')
and tr.TradeDate between '2018-10-30 00:00:00.0000000 +00:00' and '2018-10-30 23:59:29.0000000 +00:00'
order by tr.tradedate
I'm getting duplicate rows being returned as my groupby includes the Isin and Cusip. Note CBAAU 4 1/2 12/09/25 with Version 249 should be the only row returned.
TradeDate InstrumentDescription maxVersion Isin Cusip RbcType1 RbcType2 RbcType3
2018-10-30 NESNVX 3 1/8 03/22/23 124 XS1796233150 NULL CORP INDUSTRIAL EURO_MTN
2018-10-30 HSBC 6 1/4 PERP 116 US404280BN80 404280BN8 CORP BANK GLOBAL
2018-10-30 CBAAU 4 1/2 12/09/25 248 US2027A0HR32 2027A0HR3 CORP BANK PRIV_PLACEMENT
2018-10-30 CBAAU 4 1/2 12/09/25 249 US2027A1HR15 2027A1HR1 CORP BANK EURO-DOLLAR
2018-10-30 EIB 8 3/4 08/18/25 434 XS1274823571 NULL SUPRA NATIONAL EURO_MTN
But if I remove them I can display the fields.
Column 'trade_management.dbo.Trades.BondRef.Isin' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
So how can I retrieve the columns in Trades.BondRef in the Select staement if they are not included in the subquery?

Instead of using a GROUP BY and a MAX you could use the window function ROW_NUMBER.
Since ROW_NUMBER can be given an order.
To determine which record will have row_number = 1.
And you can also combine an ORDER BY ROW_NUMBER with a TOP 1 WITH TIES.
...
INNER JOIN (
SELECT TOP 1 WITH TIES
InstrumentDescription,
version AS maxVersion,
IsLatest,
Isin,
Cusip,
RbcType1,
RbcType2,
RbcType3
FROM [trade_management].[dbo].[Trades.BondRef]
WHERE ValidTo between '2018-10-30 00:00:00.0000000 +00:00' and '2018-10-30 23:59:29.0000000 +00:00'
ORDER BY ROW_NUMBER() OVER (PARTITION BY InstrumentDescription ORDER BY version DESC)
) AS B
ON B.InstrumentDescription = tr.InstrumentDescription
...

You don't do anything to remove any non-maximum versions. If you can use common-table expressions then this is just another step, to find the maximum version per instrument description:
WITH B AS ( SELECT InstrumentDescription,
MAX(version) maxVersion,
IsLatest,
Isin,
Cusip,
RbcType1,
RbcType2,
RbcType3
FROM [trade_management].[dbo].[Trades.BondRef]
WHERE ValidTo between '2018-10-30 00:00:00.0000000 +00:00' and '2018-10-30 23:59:29.0000000 +00:00'
GROUP BY InstrumentDescription,IsLatest, Isin,Cusip,RbcType1,RbcType2,RbcType3),
C AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY InstrumentDescription ORDER BY maxVersion DESC) AS version_id FROM B)
SELECT tr.TradeDate,
tr.InstrumentDescription,
C.maxVersion,
C.IsLatest,
C.Isin,
C.Cusip,
C.RbcType1,
C.RbcType2,
C.RbcType3
FROM [trade_management].[dbo].[Trades.Transaction] tr WITH (NOLOCK)
INNER JOIN C
ON C.InstrumentDescription = tr.InstrumentDescription AND c.version_id = 1
WHERE
(tr.OrigSystem = 'RBCE TOMS' OR tr.OrigSystem = 'SALE')
and (BookingAccountType = 'CLIENT' OR BookingAccountType = 'MASTER')
and tr.BookingAccountFacilitatorTeamCode in ('ESF','MJC','43B','DWV','G9J','698','9DN','A2T','AX3') -- HK Sales
and tr.IsLatest = 1
and tr.Status not in ('Cancelled')
and tr.TradeDate between '2018-10-30 00:00:00.0000000 +00:00' and '2018-10-30 23:59:29.0000000 +00:00'
order by tr.tradedate

Related

oracle sql get transactions between the period

I have 3 tables in oracle sql namely investor, share and transaction.
I am trying to get new investors invested in any shares for a certain period. As they are the new investor, there should not be a transaction in the transaction table for that investor against that share prior to the search period.
For the transaction table with the following records:
Id TranDt InvCode ShareCode
1 2020-01-01 00:00:00.000 inv1 S1
2 2019-04-01 00:00:00.000 inv1 S1
3 2020-04-01 00:00:00.000 inv1 S1
4 2021-03-06 11:50:20.560 inv2 S2
5 2020-04-01 00:00:00.000 inv3 S1
For the search period between 2020-01-01 and 2020-05-01, I should get the output as
5 2020-04-01 00:00:00.000 inv3 S1
Though there are transactions for inv1 in the table for that period, there is also a transaction prior to the search period, so that shouldn't be included as it's not considered as new investor within the search period.
Below query is working but it's really taking ages to return the results calling from c# code leading to timeout issues. Is there anything we can do to refine to get the results quicker?
WITH
INVESTORS AS
(
SELECT I.INVCODE FROM INVESTOR I WHERE I.CLOSED IS NULL)
),
SHARES AS
(
SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL))
),
SHARES_IN_PERIOD AS
(
SELECT DISTINCT
T.INVCODE,
T.SHARECODE,
T.TYPE
FROM TRANSACTION T
JOIN INVESTORS I ON T.INVCODE = I.INVCODE
JOIN SHARES S ON T.SHARECODE = S.SHARECODE
WHERE T.TRANDT >= :startDate AND T.TRANDT <= :endDate
),
PREVIOUS_SHARES AS
(
SELECT DISTINCT
T.INVCODE,
T.SHARECODE,
T.TYPE
FROM TRANSACTION T
JOIN INVESTORS I ON T.INVCODE = I.INVCODE
JOIN SHARES S ON T.TRSTCODE = S.TRSTCODE
WHERE T.TRANDT < :startDate
)
SELECT
DISTINCT
SP.INVCODE AS InvestorCode,
SP.SHARECODE AS ShareCode,
SP.TYPE AS ShareType
FROM SHARES_IN_PERIOD SP
WHERE (SP.INVCODE, SP.SHARECODE, SP.TYPE) NOT IN
(
SELECT
PS.INVCODE,
PS.SHARECODE,
PS.TYPE
FROM PREVIOUS_SHARES PS
)
With the suggestion given by #Gordon Linoff, I tried following options (for all the shares I need) but they are taking long time too. Transaction table is over 32 million rows.
1.
WITH
SHARES AS
(
SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL))
)
select t.invcode, t.sharecode, t.type
from (select t.*,
row_number() over (partition by invcode, sharecode, type order by trandt)
as seqnum
from transactions t
) t
join shares s on s.sharecode = t.sharecode
where seqnum = 1 and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
WITH
INVESTORS AS
(
SELECT I.INVCODE FROM INVESTOR I WHERE I.CLOSED IS NULL)
),
SHARES AS
(
SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL))
)
select t.invcode, t.sharecode, t.type
from (select t.*,
row_number() over (partition by invcode, sharecode, type order by trandt)
as seqnum
from transactions t
) t
join investors i on i.invcode = t.invcode
join shares s on s.sharecode = t.sharecode
where seqnum = 1 and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
select t.invcode, t.sharecode, t.type
from (select t.*,
row_number() over (partition by invcode, sharecode, type order by trandt)
as seqnum
from transactions t
) t
where seqnum = 1 and
t.sharecode IN (SELECT S.SHARECODE FROM SHARE S WHERE S.DORMANT IS NULL)))
and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';

If you want to know if the first record in transactions for a share is during a period, you can use window functions:
select t.*
from (select t.*,
row_number() over (partition by invcode, sharecode order by trandt) as seqnum
from transactions t
) t
where seqnum = 1 and
t.sharecode = :sharecode and
t.trandt >= date '2020-01-01' and
t.trandt < date '2020-05-01';
For performance for this code, you want an index on transactions(invcode, sharecode, trandate).

Iterating through rows to capture the value in the next row

I have been a long time reader of this forum. It has helped me a lot, however I have a question which I cannot find a solution specific to my requirements.
I am given the task to develop a metric to determine how many days the 'Staff Performance Evaulations' are past due. The data comes in the following format:
EmployeeID LastEvalCompleteDate NextEvalDueDate
1001 2010-01-01 2010-11-01
1001 2010-11-20 2011-11-01
1001 2011-10-29 2012-11-15
1002 NULL 2013-12-01
According to the sample data above, the employee 1001 has had 3 evals since 2010-01-01. Employee 1002 has started this year and his first eval is due on 2013-12-01.
What I need to do is to convert the data to this format:
EmployeeID EvalDueDate EvalCompleteDate DaysPastDue
1001 2010-11-01 2010-11-20 19
1001 2011-11-01 2011-10-29 -2
1001 2012-11-15 NULL 342 (based on today's date)
1002 2013-12-01 NULL -39 (based on today's date)
As you noticed, I derive a new row by taking the value of NextEvalDueDate column and mapping it to the EvalDueDate column in my new table. I also take the value in the LastEvalCompleteDate column in the NEXT row and map it to the NextEvalDueDate column.
I am having trouble with iterating through the rows for a given EmployeeID. I tried using ROW_NUMBER() OVER (PARTITION BY ...) but it did not take me anywhere.
I appreciate any kind of help. Thank you.

You went into right direction using ROW_NUMBER() OVER (PARTITION BY ...). Don't know where have you stuck, but it should be something like this:
WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY NextEvalDueDate) RN
FROM dbo.Table1
)
SELECT
c1.EmployeeID
, c1.NextEvalDueDate AS EvalDueDate
, c2.LastEvalCompleteDate AS EvalCompleteDate
, DATEDIFF(DAY, c1.NextEvalDueDate, COALESCE(c2.LastEvalCompleteDate, GETDATE())) AS DaysPastDue
FROM CTE c1
LEFT JOIN CTE c2 ON c1.EmployeeID = c2.EmployeeID AND c1.RN = c2.RN - 1
ORDER BY c1.EmployeeID, c1.RN

DECLARE #Results TABLE
(
EmployeeID INT NOT NULL,
RowNum INT NOT NULL,
PRIMARY KEY (RowNum, EmployeeID),
LastEvalCompleteDate DATE,
NextEvalDueDate DATE
);
INSERT #Results (RowNum, EmployeeID, LastEvalCompleteDate, NextEvalDueDate)
SELECT ROW_NUMBER() OVER(PARTITION BY e.EmployeeID ORDER BY e.LastEvalCompleteDate),
e.EmployeeID,
e.LastEvalCompleteDate,
e.NextEvalDueDate
FROM dbo.EmployeeEvaluation e;
WITH Base
AS
(
SELECT crt.RowNum,
crt.EmployeeID,
crt.NextEvalDueDate AS EvalDueDate,
nxt.LastEvalCompleteDate AS EvalCompleteDate
FROM #Results crt
LEFT JOIN #Results nxt ON crt.EmployeeID = nxt.EmployeeID AND crt.RowNum + 1 = nxt.RowNum
)
SELECT r.*,
DATEDIFF(DAY, r.EvalDueDate, ISNULL(r.EvalCompleteDate, GETDATE())) AS DaysPastDue
FROM Base r
ORDER BY r.EmployeeID, r.RowNum

Sql Server Self JOIN (pushing column values down)

I am asked to do the following:
"CycleStartDate needs to be the BillDate from the previous BillDate record. If a previous record does not exist, you should use the most recent CycleEndDate from the DataTime table"
CycleStartDate and CycleEndDate are columns in a table called DataTime
BillDate is a column in a table called BillingData
This is the BillDate values:
2012-07-27 00:00:00.000
2012-07-27 00:00:00.000
2012-08-27 00:00:00.000
2012-08-27 00:00:00.000
2012-09-28 00:00:00.000
2012-09-28 00:00:00.000
2012-10-26 00:00:00.000
2012-10-26 00:00:00.000
2012-11-27 00:00:00.000
2012-11-27 00:00:00.000
2012-12-27 00:00:00.000
How would I set the CycleStartDate values based on the requirements?
The tables Datetime and BillingData are connected by a column called MeterID.

Try something similar to this...
SELECT B.BillDate,
ISNULL(
B2.BillDate,
(SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID)
) CycleStartDate
FROM BillingData B
OUTER APPLY (
SELECT TOP 1 B2.BillDate
FROM BillingData B2
WHERE B2.MeterID = B.MeterID AND
B2.BillingData < B.BillingData
ORDER BY B2.BillingData DESC
) B2
I still have one doubt... Do you need to take the SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID or the SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID AND DT.CycleEndDate < B.BillDate?
But it can be done without the OUTER APPLY...
SELECT B.BillDate,
ISNULL(
(SELECT MAX(B2.BillDate)
FROM BillingData B2
WHERE B2.MeterID = B.MeterID AND
B2.BillingData < B.BillingData),
(SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID)
) CycleStartDate
FROM BillingData B
I think the second version is quite readable... For each row of BillingData B, look for the biggest BillDate (MAX(B2.BillDate)) lesser than the current BillDate and of the same MeterID. If not present (the ISNULL, if the first one is not present then it's NULL, so it goes to the second part of the ISNULL), look for the biggest CycleEndDate from DataTime with the same MeterID and return it.

You can use the ROW_NUMBER() function for offsetting a JOIN:
SELECT a.BillDate, COALESCE(b.BillDate,c.CycleEndDate) 'CycleEndDate'
FROM (SELECT *,ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY BillDate DESC)'RowRank'
FROM YourTable
)a
LEFT JOIN (SELECT *,ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY BillDate DESC)'RowRank'
FROM YourTable
)b
ON a.RowRank = b.RowRank - 1
AND a.MeterID = b.MeterID
LEFT JOIN (SELECT MeterID,MAX(CycleEndDate)'CycleEndDate'
FROM DataTime
GROUP BY MeterID
) c
ON a.MeterID = c.MeterID
The PARTITION BY may not be necessary as well as the MeterID criteria in the JOIN, your wording is a little confusing as to whether the ORDER BY should be ascending or descending, as it is above the newest record will be the one that gets it's date from the DateTime table, remove DESC to make it the oldest record that gets it's value from that table.

SQL query to sort data by time and date and then select only the newest record

I am trying to trying to update create a query that will sort data by Date and time, find the newest record and then update another field in the record marking it as so.
Take my life harder the time and data are two separate fields and the time is also a string.
So here is what I have so far,
UPDATE server.ESCC_HWAY_ASSETS_GULLIES_N
set CURRENT_REC = 'Y'
From server.ESCC_HWAY_ASSETS_GULLIES_N A
inner join (
SELECT GULLY_ID, Max([DATE]) AS MaxDate, MAX([TIME]) AS MaxTime
FROM server.ESCC_HWAY_ASSETS_GULLIES_N B
GROUP BY GULLY_ID, [DATE] ) B
on A.GULLY_ID = B.GULLY_ID and A.[DATE] = B.MaxDate and A.[TIME] = B.MaxTime
This results in data that is sorted by time and date but it updates all fields it finds, apart from on dates where there are two entries. Then it only updates the newest record.
I am testing on a single record - B47605 which gives the following results on this queuer
SELECT GULLY_ID, Max([DATE]) AS MaxDate, MAX([TIME]) AS MaxTime
FROM ESMAPADMIN.ESCC_HWAY_ASSETS_GULLIES_N B
WHERE GULLY_ID = 'B47605'
GROUP BY GULLY_ID, [DATE]
Gully_ID MaxDate MaxTime
B47605 2008-03-12 00:00:00.000 09:02:29
B47605 2008-09-19 00:00:00.000 09:51:14
B47605 2009-02-16 00:00:00.000 11:18:28
B47605 2009-08-21 00:00:00.000 12:34:45
B47605 2010-03-16 00:00:00.000 09:22:26
B47605 2010-08-25 00:00:00.000 11:19:55
B47605 2011-03-07 00:00:00.000 12:19:56
B47605 2012-05-02 00:00:00.000 20:57:54
The result I would like is to only have the newest record returned so -
Gully_ID MaxDate MaxTime
B47605 2012-05-02 00:00:00.000 20:57:54
I am not sure how to go from where i am to where i need to be, so any help would be appreciated.

Assuming you are using SQL Server 2005+ (because of the [] I see)
;WITH latestResult
AS
(
SELECT Gully_ID, MaxDate, MaxTime,
ROW_NUMBER() OVER (PARTITION BY Gully_ID
ORDER BY MaxDate DESC, MaxTime DESC) RN
FROM tableName
)
SELECT Gully_ID, MaxDate, MaxTime
FROM latestResult
WHERE RN = 1
SQLFiddle Demo

Ended up using the following, thanks to all that helped me with this.
UPDATE ....
set CURRENT_REC = 'Y'
where [objectID] in
(
select [objectID] from
(
SELECT [objectID],[GULLY_ID], [date], [time],
ROW_NUMBER() over (partition by gully_id order by date desc, time desc) rown
FROM ....
) as t
where rown=1
)

Mixing date frequencies in SQL

I have the query below:
select s1.DATADATE, s1.PRCCD, c.EBIT
from sec_dprc s1
left outer join rdq_temp c
on s1.GVKEY = c.GVKEY
and s1.DATADATE = c.rdq
where s1.GVKEY = 008068
order by s1.DATADATE
I am trying to create a rolling calculation that between the two columns, the PRCCD column is daily prices and the EBIT column is a quarterly value. I want to be able to calculate the product of the two, i.e PRCCD*EBIT for everyday but the EBIT only changes once a quarter on random dates. Summarizing, I want to be able to calculating the product of EBIT and PRCCD going forward using only new values of EBIT when they change each quarter randomly
DATADATE PRCCD EBIT
1984-02-01 00:00:00.000 28.625 NULL
1984-02-02 00:00:00.000 27.875 NULL
1984-02-03 00:00:00.000 26.75 420.155
1984-02-06 00:00:00.000 27 NULL
1984-02-07 00:00:00.000 26.875 NULL
.
.
.
DATADATE PRCCD EBIT
1984-05-02 00:00:00.000 30.75 NULL
1984-05-03 00:00:00.000 30.875 NULL
1984-05-04 00:00:00.000 30.75 NULL
1984-05-07 00:00:00.000 31.125 499.228
1984-05-08 00:00:00.000 31.75 NULL
.
.
.
1984-07-31 00:00:00.000 25.625 NULL
1984-08-01 00:00:00.000 26.75 NULL
1984-08-02 00:00:00.000 26.375 348.364
1984-08-03 00:00:00.000 26.75 NULL
1984-08-06 00:00:00.000 27 NULL
Thanks for the help!
one of the solutions I came to:
select TD.Date, TD.C CD, TQ.C CQ, TQ.C1, TQ.C/TQ.C1 EBITps,TQ.C/TQ.C1/TD.C PE
from
(select DataDate date, PRCCD C from sec_dprc where GVKEY = 008068) TD
cross apply (select top 1 rdq date, ebit C, csh12q C1 from rdq_temp where rdq<=TD.Date order by rdq desc) TQ
order by TD.Date

What you are looking for is a non-equijoin between the two tables. This would be much easier if you had effective and end date on the rdq_temp data. In order to add them in SQL Server, you can do a self join and aggregation (other databases support lag() and lead() functionality).
The following query does this where condition on the join is essentially a "between":
with rdq as (
select r.datadate, r.ebit, min(rnext.datadate) as nextdatadate
from rdq_temp r left outer join
rdq_temp rnext
on r.datadate < rnext.datedate
group by r.datadate, r.ebit
)
select datadate, prccid, rdq.ebit
from sec_dprc sd left outer join
rdq
on sd.datadate >= rdq.datadate and rdq.datadate < rdq.nextdatadate
I'm guessing that data by quarters is not very big, so this should work fine. If you had more data, I would strongly suggest having effective and end dates, rather than just the asof date, in the rdq records.

I havent checked the performance of this one, but I think it gives the result you want.
select datadate
,prccid
,ebit
,( select top 1 ebit
from sec_dprc s2
where s2.datadate <= s1.datadate
and ebit is not null
order by datadate desc
) as latestEbit
from sec_dprc s1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Select all records from Table 1 but only the row in Table 2 with the max Version number - sql

Related

oracle sql get transactions between the period

Iterating through rows to capture the value in the next row

Sql Server Self JOIN (pushing column values down)

SQL query to sort data by time and date and then select only the newest record

Mixing date frequencies in SQL

Categories

Resources