Sql Server Self JOIN (pushing column values down) - sql

I am asked to do the following:
"CycleStartDate needs to be the BillDate from the previous BillDate record. If a previous record does not exist, you should use the most recent CycleEndDate from the DataTime table"
CycleStartDate and CycleEndDate are columns in a table called DataTime
BillDate is a column in a table called BillingData
This is the BillDate values:
2012-07-27 00:00:00.000
2012-07-27 00:00:00.000
2012-08-27 00:00:00.000
2012-08-27 00:00:00.000
2012-09-28 00:00:00.000
2012-09-28 00:00:00.000
2012-10-26 00:00:00.000
2012-10-26 00:00:00.000
2012-11-27 00:00:00.000
2012-11-27 00:00:00.000
2012-12-27 00:00:00.000
How would I set the CycleStartDate values based on the requirements?
The tables Datetime and BillingData are connected by a column called MeterID.

Try something similar to this...
SELECT B.BillDate,
ISNULL(
B2.BillDate,
(SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID)
) CycleStartDate
FROM BillingData B
OUTER APPLY (
SELECT TOP 1 B2.BillDate
FROM BillingData B2
WHERE B2.MeterID = B.MeterID AND
B2.BillingData < B.BillingData
ORDER BY B2.BillingData DESC
) B2
I still have one doubt... Do you need to take the SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID or the SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID AND DT.CycleEndDate < B.BillDate?
But it can be done without the OUTER APPLY...
SELECT B.BillDate,
ISNULL(
(SELECT MAX(B2.BillDate)
FROM BillingData B2
WHERE B2.MeterID = B.MeterID AND
B2.BillingData < B.BillingData),
(SELECT MAX(CycleEndDate) FROM DataTime DT WHERE DT.MeterID = B.MeterID)
) CycleStartDate
FROM BillingData B
I think the second version is quite readable... For each row of BillingData B, look for the biggest BillDate (MAX(B2.BillDate)) lesser than the current BillDate and of the same MeterID. If not present (the ISNULL, if the first one is not present then it's NULL, so it goes to the second part of the ISNULL), look for the biggest CycleEndDate from DataTime with the same MeterID and return it.

You can use the ROW_NUMBER() function for offsetting a JOIN:
SELECT a.BillDate, COALESCE(b.BillDate,c.CycleEndDate) 'CycleEndDate'
FROM (SELECT *,ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY BillDate DESC)'RowRank'
FROM YourTable
)a
LEFT JOIN (SELECT *,ROW_NUMBER() OVER (PARTITION BY MeterID ORDER BY BillDate DESC)'RowRank'
FROM YourTable
)b
ON a.RowRank = b.RowRank - 1
AND a.MeterID = b.MeterID
LEFT JOIN (SELECT MeterID,MAX(CycleEndDate)'CycleEndDate'
FROM DataTime
GROUP BY MeterID
) c
ON a.MeterID = c.MeterID
The PARTITION BY may not be necessary as well as the MeterID criteria in the JOIN, your wording is a little confusing as to whether the ORDER BY should be ascending or descending, as it is above the newest record will be the one that gets it's date from the DateTime table, remove DESC to make it the oldest record that gets it's value from that table.

Related

Select all records from Table 1 but only the row in Table 2 with the max Version number

I have Table Trades.Transaction and Table Trades.BondRef. They can be joined on InstrumentDescription but produce one to many rows as there are multiple ISIN/CUSIP (BondRef) per InstrumentDescription (Transaction). I would like to join but only display the row from Trades.BondRefwhich has the max Version number. I have reviewed numerous posts and come up with the code below.
SELECT tr.TradeDate,
tr.InstrumentDescription,
B.maxVersion,
B.IsLatest,
B.Isin,
B.Cusip,
B.RbcType1,
B.RbcType2,
B.RbcType3
FROM [trade_management].[dbo].[Trades.Transaction] tr WITH (NOLOCK)
INNER JOIN (
SELECT InstrumentDescription,
MAX(version) maxVersion,
IsLatest,
Isin,
Cusip,
RbcType1,
RbcType2,
RbcType3
FROM [trade_management].[dbo].[Trades.BondRef]
WHERE ValidTo between '2018-10-30 00:00:00.0000000 +00:00' and '2018-10-30 23:59:29.0000000 +00:00'
GROUP BY InstrumentDescription,IsLatest, Isin,Cusip,RbcType1,RbcType2,RbcType3
) AS B
ON B.InstrumentDescription = tr.InstrumentDescription
WHERE
(tr.OrigSystem = 'RBCE TOMS' OR tr.OrigSystem = 'SALE')
and (BookingAccountType = 'CLIENT' OR BookingAccountType = 'MASTER')
and tr.BookingAccountFacilitatorTeamCode in ('ESF','MJC','43B','DWV','G9J','698','9DN','A2T','AX3') -- HK Sales
and tr.IsLatest = 1
and tr.Status not in ('Cancelled')
and tr.TradeDate between '2018-10-30 00:00:00.0000000 +00:00' and '2018-10-30 23:59:29.0000000 +00:00'
order by tr.tradedate
I'm getting duplicate rows being returned as my groupby includes the Isin and Cusip. Note CBAAU 4 1/2 12/09/25 with Version 249 should be the only row returned.
TradeDate InstrumentDescription maxVersion Isin Cusip RbcType1 RbcType2 RbcType3
2018-10-30 NESNVX 3 1/8 03/22/23 124 XS1796233150 NULL CORP INDUSTRIAL EURO_MTN
2018-10-30 HSBC 6 1/4 PERP 116 US404280BN80 404280BN8 CORP BANK GLOBAL
2018-10-30 CBAAU 4 1/2 12/09/25 248 US2027A0HR32 2027A0HR3 CORP BANK PRIV_PLACEMENT
2018-10-30 CBAAU 4 1/2 12/09/25 249 US2027A1HR15 2027A1HR1 CORP BANK EURO-DOLLAR
2018-10-30 EIB 8 3/4 08/18/25 434 XS1274823571 NULL SUPRA NATIONAL EURO_MTN
But if I remove them I can display the fields.
Column 'trade_management.dbo.Trades.BondRef.Isin' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
So how can I retrieve the columns in Trades.BondRef in the Select staement if they are not included in the subquery?
Instead of using a GROUP BY and a MAX you could use the window function ROW_NUMBER.
Since ROW_NUMBER can be given an order.
To determine which record will have row_number = 1.
And you can also combine an ORDER BY ROW_NUMBER with a TOP 1 WITH TIES.
...
INNER JOIN (
SELECT TOP 1 WITH TIES
InstrumentDescription,
version AS maxVersion,
IsLatest,
Isin,
Cusip,
RbcType1,
RbcType2,
RbcType3
FROM [trade_management].[dbo].[Trades.BondRef]
WHERE ValidTo between '2018-10-30 00:00:00.0000000 +00:00' and '2018-10-30 23:59:29.0000000 +00:00'
ORDER BY ROW_NUMBER() OVER (PARTITION BY InstrumentDescription ORDER BY version DESC)
) AS B
ON B.InstrumentDescription = tr.InstrumentDescription
...
You don't do anything to remove any non-maximum versions. If you can use common-table expressions then this is just another step, to find the maximum version per instrument description:
WITH B AS ( SELECT InstrumentDescription,
MAX(version) maxVersion,
IsLatest,
Isin,
Cusip,
RbcType1,
RbcType2,
RbcType3
FROM [trade_management].[dbo].[Trades.BondRef]
WHERE ValidTo between '2018-10-30 00:00:00.0000000 +00:00' and '2018-10-30 23:59:29.0000000 +00:00'
GROUP BY InstrumentDescription,IsLatest, Isin,Cusip,RbcType1,RbcType2,RbcType3),
C AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY InstrumentDescription ORDER BY maxVersion DESC) AS version_id FROM B)
SELECT tr.TradeDate,
tr.InstrumentDescription,
C.maxVersion,
C.IsLatest,
C.Isin,
C.Cusip,
C.RbcType1,
C.RbcType2,
C.RbcType3
FROM [trade_management].[dbo].[Trades.Transaction] tr WITH (NOLOCK)
INNER JOIN C
ON C.InstrumentDescription = tr.InstrumentDescription AND c.version_id = 1
WHERE
(tr.OrigSystem = 'RBCE TOMS' OR tr.OrigSystem = 'SALE')
and (BookingAccountType = 'CLIENT' OR BookingAccountType = 'MASTER')
and tr.BookingAccountFacilitatorTeamCode in ('ESF','MJC','43B','DWV','G9J','698','9DN','A2T','AX3') -- HK Sales
and tr.IsLatest = 1
and tr.Status not in ('Cancelled')
and tr.TradeDate between '2018-10-30 00:00:00.0000000 +00:00' and '2018-10-30 23:59:29.0000000 +00:00'
order by tr.tradedate

Filter LEFT JOINed table with dates to display current event, else future, else past?

I have a table that lists vacation information for different users (username, vacation start, and vacation end dates) -- 4 users are listed below:
Username VacationStart DeploymentEnd
rsuarez 2014-03-10 2014-03-26
studd 2014-01-18 2014-01-29
studd 2014-02-11 2014-02-26
studd 2014-03-02 2014-03-04
ssteele 2014-03-11 2014-03-26
ssteele 2014-03-18 2014-03-28
atidball 2014-03-05 2014-03-20
atidball 2014-03-06 2014-03-26
atidball 2014-03-13 2014-03-20
atidball 2014-03-18 2014-03-31
For a new query, I want to display only 4 rows, with each user having only one set of vacation dates displayed, either current/in-progress vacation, future/next vacation (if no current exists) or most recent (if two above are false).
The end result should be following (assuming today is 3/9/2014):
Username VacationStart DeploymentEnd
rsuarez 2014-03-10 2014-03-26
studd 2014-03-02 2014-03-04
ssteele 2014-03-11 2014-03-26
atidball 2014-03-05 2014-03-20
Vacation dates are actually coming from another table (data_vacations), which I left join to data_users. I am trying to perform case selection inside left join statement.
Here is what I tried before, but my logic fails there, since I ended up to mix different vacation end dates to vacation start dates:
SELECT Username, VacationStart, VacationEnd
FROM data_users
LEFT JOIN
(
SELECT userGUID,
CASE WHEN MIN(CASE WHEN (VacationEnd < getdate()) THEN NULL ELSE VacationStart END) IS NULL THEN MAX(VacationStart)
ELSE MIN(VacationStart) END AS VacationStart,
CASE WHEN MIN(CASE WHEN (VacationEnd < getdate()) THEN NULL ELSE VacationEnd END) IS NULL THEN MAX(VacationEnd)
ELSE MIN(VacationEnd) END AS VacationEnd
FROM data_vacations
GROUP BY userGUID
) b ON(data_empl_master.userGUID= b.userGUID)
What am I doing wrong? How could I fix it?
Also.. on side note.. Do I perform this filtering in LEFT JOIN correctly? Since data_users is much bigger, having distinct user ids... and I would like to join the available vacation information based on example above, while still displaying all unique user ids.
Using a common table expression to rank by category (current = 1, future = 2, past = 3) and each category individually by start date/differene from GETDATE(), you can get the result you want by ranking the result using ROW_NUMBER();
DECLARE #DATE DATETIME = GETDATE()
;WITH cte AS (
SELECT *, 1 r, VacationStart s FROM data_users
WHERE #DATE BETWEEN VacationStart and DeploymentEnd
UNION ALL
SELECT *,2 r, VacationStart - #DATE s FROM data_users
WHERE VacationStart > #DATE
UNION ALL
SELECT *,3 r, #DATE - DeploymentEnd s FROM data_users
WHERE DeploymentEnd < #DATE
), cte2 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY username ORDER BY r,s) rn FROM cte
)
SELECT Username, VacationStart, DeploymentEnd FROM cte2 WHERE rn=1;
An SQLfiddle to test with.
Getting the date as a variable is necessary to get a consistent GETDATE() value over the whole query, otherwise it may not be consistent if called multiple times.
select u.name,s.startdate,s.enddate
from users u
left join
(
select su.name,
max(su.start) as startdate,
max(su.end) as enddate from users su group by su.name
)s on u.name= s.name
group by u.name
Since you are asking two questions I will answer the one about getting the vacation dates and let you figure out the join.
I don't think you can get the desired vacations dates in one simple query. First you need to establish if the given date range is in past, present or future. Then you need to order those ranges by start/end dates to get the most recent or next upcoming. You need sort the past vacations in descending and upcoming in ascending order. Funny enough user atidball has two vacations in-progress, I sorted that in the same manner as future vacation. Finally apply your rules, I did that by sorting by state.
declare #currentDate date = '20140309'
;
with cte1 as
(
-- state: the lower number the higher priority
select Username, VacationStart, DeploymentEnd,
case
when VacationStart <= #currentDate and DeploymentEnd >= #currentDate
then 0 -- in progress
when VacationStart > #currentDate
then 1 -- future
when DeploymentEnd < #currentDate
then 2 -- past
else NULL
end as state
from data_vacations
)
, cte2 as
(
select *,
row_number() over(partition by username, state order by VacationStart, DeploymentEnd) as rn
from cte1
where state < 2 -- current or upcoming
union all
select *,
row_number() over(partition by username, state order by DeploymentEnd desc, VacationStart desc) as rn
from cte1
where state = 2 -- past
)
, cte3 as
(
-- apply the rules: find the record with highest priority
select Username, min(state) as minstate
from cte1
group by Username
)
select cte2.Username, cte2.VacationStart, cte2.DeploymentEnd
from cte2
inner join cte3
on cte2.Username = cte3.Username
and cte2.state = cte3.minstate
and cte2.rn = 1 -- most recent or next upcoming
See the SQLFiddle.

Iterating through rows to capture the value in the next row

I have been a long time reader of this forum. It has helped me a lot, however I have a question which I cannot find a solution specific to my requirements.
I am given the task to develop a metric to determine how many days the 'Staff Performance Evaulations' are past due. The data comes in the following format:
EmployeeID LastEvalCompleteDate NextEvalDueDate
1001 2010-01-01 2010-11-01
1001 2010-11-20 2011-11-01
1001 2011-10-29 2012-11-15
1002 NULL 2013-12-01
According to the sample data above, the employee 1001 has had 3 evals since 2010-01-01. Employee 1002 has started this year and his first eval is due on 2013-12-01.
What I need to do is to convert the data to this format:
EmployeeID EvalDueDate EvalCompleteDate DaysPastDue
1001 2010-11-01 2010-11-20 19
1001 2011-11-01 2011-10-29 -2
1001 2012-11-15 NULL 342 (based on today's date)
1002 2013-12-01 NULL -39 (based on today's date)
As you noticed, I derive a new row by taking the value of NextEvalDueDate column and mapping it to the EvalDueDate column in my new table. I also take the value in the LastEvalCompleteDate column in the NEXT row and map it to the NextEvalDueDate column.
I am having trouble with iterating through the rows for a given EmployeeID. I tried using ROW_NUMBER() OVER (PARTITION BY ...) but it did not take me anywhere.
I appreciate any kind of help. Thank you.
You went into right direction using ROW_NUMBER() OVER (PARTITION BY ...). Don't know where have you stuck, but it should be something like this:
WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY NextEvalDueDate) RN
FROM dbo.Table1
)
SELECT
c1.EmployeeID
, c1.NextEvalDueDate AS EvalDueDate
, c2.LastEvalCompleteDate AS EvalCompleteDate
, DATEDIFF(DAY, c1.NextEvalDueDate, COALESCE(c2.LastEvalCompleteDate, GETDATE())) AS DaysPastDue
FROM CTE c1
LEFT JOIN CTE c2 ON c1.EmployeeID = c2.EmployeeID AND c1.RN = c2.RN - 1
ORDER BY c1.EmployeeID, c1.RN
DECLARE #Results TABLE
(
EmployeeID INT NOT NULL,
RowNum INT NOT NULL,
PRIMARY KEY (RowNum, EmployeeID),
LastEvalCompleteDate DATE,
NextEvalDueDate DATE
);
INSERT #Results (RowNum, EmployeeID, LastEvalCompleteDate, NextEvalDueDate)
SELECT ROW_NUMBER() OVER(PARTITION BY e.EmployeeID ORDER BY e.LastEvalCompleteDate),
e.EmployeeID,
e.LastEvalCompleteDate,
e.NextEvalDueDate
FROM dbo.EmployeeEvaluation e;
WITH Base
AS
(
SELECT crt.RowNum,
crt.EmployeeID,
crt.NextEvalDueDate AS EvalDueDate,
nxt.LastEvalCompleteDate AS EvalCompleteDate
FROM #Results crt
LEFT JOIN #Results nxt ON crt.EmployeeID = nxt.EmployeeID AND crt.RowNum + 1 = nxt.RowNum
)
SELECT r.*,
DATEDIFF(DAY, r.EvalDueDate, ISNULL(r.EvalCompleteDate, GETDATE())) AS DaysPastDue
FROM Base r
ORDER BY r.EmployeeID, r.RowNum

SQL query to sort data by time and date and then select only the newest record

I am trying to trying to update create a query that will sort data by Date and time, find the newest record and then update another field in the record marking it as so.
Take my life harder the time and data are two separate fields and the time is also a string.
So here is what I have so far,
UPDATE server.ESCC_HWAY_ASSETS_GULLIES_N
set CURRENT_REC = 'Y'
From server.ESCC_HWAY_ASSETS_GULLIES_N A
inner join (
SELECT GULLY_ID, Max([DATE]) AS MaxDate, MAX([TIME]) AS MaxTime
FROM server.ESCC_HWAY_ASSETS_GULLIES_N B
GROUP BY GULLY_ID, [DATE] ) B
on A.GULLY_ID = B.GULLY_ID and A.[DATE] = B.MaxDate and A.[TIME] = B.MaxTime
This results in data that is sorted by time and date but it updates all fields it finds, apart from on dates where there are two entries. Then it only updates the newest record.
I am testing on a single record - B47605 which gives the following results on this queuer
SELECT GULLY_ID, Max([DATE]) AS MaxDate, MAX([TIME]) AS MaxTime
FROM ESMAPADMIN.ESCC_HWAY_ASSETS_GULLIES_N B
WHERE GULLY_ID = 'B47605'
GROUP BY GULLY_ID, [DATE]
Gully_ID MaxDate MaxTime
B47605 2008-03-12 00:00:00.000 09:02:29
B47605 2008-09-19 00:00:00.000 09:51:14
B47605 2009-02-16 00:00:00.000 11:18:28
B47605 2009-08-21 00:00:00.000 12:34:45
B47605 2010-03-16 00:00:00.000 09:22:26
B47605 2010-08-25 00:00:00.000 11:19:55
B47605 2011-03-07 00:00:00.000 12:19:56
B47605 2012-05-02 00:00:00.000 20:57:54
The result I would like is to only have the newest record returned so -
Gully_ID MaxDate MaxTime
B47605 2012-05-02 00:00:00.000 20:57:54
I am not sure how to go from where i am to where i need to be, so any help would be appreciated.
Assuming you are using SQL Server 2005+ (because of the [] I see)
;WITH latestResult
AS
(
SELECT Gully_ID, MaxDate, MaxTime,
ROW_NUMBER() OVER (PARTITION BY Gully_ID
ORDER BY MaxDate DESC, MaxTime DESC) RN
FROM tableName
)
SELECT Gully_ID, MaxDate, MaxTime
FROM latestResult
WHERE RN = 1
SQLFiddle Demo
Ended up using the following, thanks to all that helped me with this.
UPDATE ....
set CURRENT_REC = 'Y'
where [objectID] in
(
select [objectID] from
(
SELECT [objectID],[GULLY_ID], [date], [time],
ROW_NUMBER() over (partition by gully_id order by date desc, time desc) rown
FROM ....
) as t
where rown=1
)

Mixing date frequencies in SQL

I have the query below:
select s1.DATADATE, s1.PRCCD, c.EBIT
from sec_dprc s1
left outer join rdq_temp c
on s1.GVKEY = c.GVKEY
and s1.DATADATE = c.rdq
where s1.GVKEY = 008068
order by s1.DATADATE
I am trying to create a rolling calculation that between the two columns, the PRCCD column is daily prices and the EBIT column is a quarterly value. I want to be able to calculate the product of the two, i.e PRCCD*EBIT for everyday but the EBIT only changes once a quarter on random dates. Summarizing, I want to be able to calculating the product of EBIT and PRCCD going forward using only new values of EBIT when they change each quarter randomly
DATADATE PRCCD EBIT
1984-02-01 00:00:00.000 28.625 NULL
1984-02-02 00:00:00.000 27.875 NULL
1984-02-03 00:00:00.000 26.75 420.155
1984-02-06 00:00:00.000 27 NULL
1984-02-07 00:00:00.000 26.875 NULL
.
.
.
DATADATE PRCCD EBIT
1984-05-02 00:00:00.000 30.75 NULL
1984-05-03 00:00:00.000 30.875 NULL
1984-05-04 00:00:00.000 30.75 NULL
1984-05-07 00:00:00.000 31.125 499.228
1984-05-08 00:00:00.000 31.75 NULL
.
.
.
1984-07-31 00:00:00.000 25.625 NULL
1984-08-01 00:00:00.000 26.75 NULL
1984-08-02 00:00:00.000 26.375 348.364
1984-08-03 00:00:00.000 26.75 NULL
1984-08-06 00:00:00.000 27 NULL
Thanks for the help!
one of the solutions I came to:
select TD.Date, TD.C CD, TQ.C CQ, TQ.C1, TQ.C/TQ.C1 EBITps,TQ.C/TQ.C1/TD.C PE
from
(select DataDate date, PRCCD C from sec_dprc where GVKEY = 008068) TD
cross apply (select top 1 rdq date, ebit C, csh12q C1 from rdq_temp where rdq<=TD.Date order by rdq desc) TQ
order by TD.Date
What you are looking for is a non-equijoin between the two tables. This would be much easier if you had effective and end date on the rdq_temp data. In order to add them in SQL Server, you can do a self join and aggregation (other databases support lag() and lead() functionality).
The following query does this where condition on the join is essentially a "between":
with rdq as (
select r.datadate, r.ebit, min(rnext.datadate) as nextdatadate
from rdq_temp r left outer join
rdq_temp rnext
on r.datadate < rnext.datedate
group by r.datadate, r.ebit
)
select datadate, prccid, rdq.ebit
from sec_dprc sd left outer join
rdq
on sd.datadate >= rdq.datadate and rdq.datadate < rdq.nextdatadate
I'm guessing that data by quarters is not very big, so this should work fine. If you had more data, I would strongly suggest having effective and end dates, rather than just the asof date, in the rdq records.
I havent checked the performance of this one, but I think it gives the result you want.
select datadate
,prccid
,ebit
,( select top 1 ebit
from sec_dprc s2
where s2.datadate <= s1.datadate
and ebit is not null
order by datadate desc
) as latestEbit
from sec_dprc s1