SQL Using PARTITION when comparing values in consecutive DataRows - sql

I'm using a SQL statement to compare consecutive values of a field [Allocation] as follows:
;WITH cteMain AS
(SELECT AllocID, CaseNo, FeeEarner, Allocation, ROW_NUMBER() OVER (ORDER BY AllocID) AS sn
FROM tblAllocations)
SELECT m.AllocID, m.CaseNo, m.FeeEarner, m.Allocation,
ISNULL(sLag.Allocation, 0) AS prevAllocation,
(m.Allocation - ISNULL(sLag.Allocation, 0)) AS movement
FROM cteMain AS m
LEFT OUTER JOIN cteMain AS sLag
ON sLag.sn = m.sn-1;
The query returns a calculated field [movement] which is the increase or decrease in consecutive values of [Allocation].
I have included a screen shot of the data returned by this query.
However the query is not yet complete. I need to revise the statement so that the consecutive values of [Allocation] compared are grouped / partitioned by [FeeEarner] and [CaseNo].
For example, at line 18 of the data, the [Allocation] is 800 and is compared to a previous value of 600. But the previous value belongs to a different [CaseNo] i.e. 6 rather than 31. In fact [FeeEarner] 'PJW' has no previous [Allocation] on [CaseNo] '31' and so the [prevAllocation] should be '0' from the ISNULL keyword.
I have tried changing
OVER (ORDER BY AllocID)
to
OVER (PARTITION BY CaseNo, FeeEarner ORDER BY AllocID)
But that results in a lot of lines of data being repeated.
Can someone advise how to compare consecutive values of [Allocation] but only between rows of data with matching [FeeEarner] AND [CaseNo] please?
NOTE - I cannot use LAG because my customer is using SQL Server 2008 R2 which does not support Parallel Data Warehousing.

I believe you were close. Try this (notice the added pieces in the join clause to match the partition - without this you will match every row number 3 with every row number 2 across partitions, which is what you were seeing):
;WITH cteMain AS
(
SELECT AllocID, CaseNo, FeeEarner, Allocation,
ROW_NUMBER() OVER (PARTITION BY CaseNo, FeeEarner ORDER BY AllocID) AS sn
FROM tblAllocations
)
SELECT m.AllocID, m.CaseNo, m.FeeEarner, m.Allocation,
ISNULL(sLag.Allocation, 0) AS prevAllocation,
(m.Allocation - ISNULL(sLag.Allocation, 0)) AS movement
FROM cteMain AS m
LEFT OUTER JOIN cteMain AS sLag
ON sLag.CaseNo = m.CaseNo
AND sLag.FeeEarner = m.FeeEarner
AND sLag.sn = m.sn-1

You need to change your join condition as well:
FROM cteMain m LEFT OUTER JOIN
cteMain sLag
ON sLag.sn = m.sn-1 and sLag.FeeEarner = m.FeeEarner and slag.CaseNo = m.CaseNo
Also, you should have only one order by in the row_number() call.
Also, if you are using Oracle, SQL Server 2012, newer versions of DB2, or Postgres, then the lead()/lag() functions would be a better choice.

One more option with OUTER APPLY and EXISTS
SELECT t1.AllocID, t1.CaseNo, t1.FreeEarner, t1.Allocation,
ISNULL(o.Allocation, 0) AS PrevAllocation,
(t1.Allocation - ISNULL(o.Allocation, 0)) AS movement
FROM tblAllocations t1
OUTER APPLY (
SELECT t2.AllocID, t2.CaseNo, t2.FreeEarner, t2.Allocation
FROM tblAllocations t2
WHERE EXISTS (
SELECT 1
FROM tblAllocations t3
WHERE t1.AllocID > t3.AllocID
HAVING MAX(t3.AllocID) = t2.AllocID
) AND t1.CaseNo = t2.CaseNo
) o

Related

SQL - Get the sum of several groups of records

DESIRED RESULT
Get the hours SUM of all [Hours] including only a single result from each [DevelopmentID] where [Revision] is highest value
e.g SUM 1, 2, 3, 5, 6 (Result should be 22.00)
I'm stuck trying to get the appropriate grouping.
DECLARE #CompanyID INT = 1
SELECT
SUM([s].[Hours]) AS [Hours]
FROM
[dbo].[tblDev] [d] WITH (NOLOCK)
JOIN
[dbo].[tblSpec] [s] WITH (NOLOCK) ON [d].[DevID] = [s].[DevID]
WHERE
[s].[Revision] = (
SELECT MAX([s2].[Revision]) FROM [tblSpec] [s2]
)
GROUP BY
[s].[Hours]
use row_number() to identify the latest revision
SELECT SUM([Hours])
FROM (
SELECT *, R = ROW_NUMBER() OVER (PARTITION BY d.DevID
ORDER BY s.Revision)
FROM [dbo].[tblDev] d
JOIN [dbo].[tblSpec] s
ON d.[DevID] = s.[DevID]
) d
WHERE R = 1
If you want one row per DevId, then that should be in the GROUP BY (and presumably in the SELECT as well):
SELECT s.DevId, SUM(s.Hours) as hours
FROM [dbo].[tblDev] d JOIN
[dbo].[tblSpec] s
ON [d].[DevID] = [s].[DevID]
WHERE s.Revision = (SELECT MAX(s2.Revision) FROM tblSpec s2)
GROUP BY s.DevId;
Also, don't use WITH NOLOCK unless you really know what you are doing -- and I'm guessing you do not. It is basically a license that says: "You can get me data even if it is not 100% accurate."
I would also dispense with all the square braces. They just make the query harder to write and to read.

Hive SQL Cross Join Question on condition

Wonderful coders of the universe, I have a question:
I'm writing a HIVE SQL script, and I'm wondering if it's possible to cross join on a condition (condition below is where the dayofweek is a friday), or if there's a performance-light alternative to what I'm doing below. I ONLY need to add in 2 rows to dates that are Fridays, which is just a persist of the Friday date data for Saturdays and Sundays. I get an error on the join condition, but I'm wondering if it's possible to bypass that somehow.
To be crystal clear, the way the query is written below gives me an error (specifically and DAYOFWEEK(performance_end_date) = 6). Just wondering if there's a way to write this where the syntax will be accepted.
Please advise.
select
portfolio_name
,Cast(Date_add(a.performance_end_date, crs.crs) AS TIMESTAMP) AS performance_end_date
,return
,nav
,nav_id
,row_no
from
(
SELECT portfolio_name, performance_end_date, return, cast(cast(nav as decimal(20,2))as string) as nav, nav_id
,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
WHERE
portfolio_code IN ('1994','2078','2155','2365','2367')
and
year=2020 and month=09
) a
CROSS JOIN (SELECT stack(2, 1,2) as crs) crs and DAYOFWEEK(performance_end_date) = 6
where a.row_no = 1
CROSS JOIN don't have "join condition", so move your criteria to where clause
CROSS JOIN (SELECT stack(2, 1,2) as crs) crs
where a.row_no = 1 and DAYOFWEEK(performance_end_date) = 6

ORACLE SQL - Compare dates without join

I have a very large table of data 1+ billion rows. If I try to join that table to itself to do a comparison, the cost on the estimated plan is unrunnable (cost: 226831405289150). Is there a way I can achieve the same results as the query below without a join, perhaps an over partition?
What I need to do is make sure another event did not happen within 24 hours before or after the one with the wildcare was received.
Thanks so much for your help!
select e2.SYSTEM_NO,
min(e2.DT) as dt
from SYSTEM_EVENT e2
inner join table1.event el2
on el2.event_id = e2.event_id
left join ( Select se.DT
from SYSTEM_EVENT se
where
--fails
( se.event_id in ('101','102','103','104')
--restores
or se.event_id in ('106','107','108','109')
)
) e3
on e3.dt-e2.dt between .0001 and 1
or e3.dt-e2.dt between -1 and .0001
where el2.descr like '%WILDCARE%'
and e3.dt is null
and e2.REC_STS_CD = 'A'
group by e2.SYSTEM_NO
Not having any test data it is difficult to determine what you are trying to achieve but it appears you could try using an analytic function with a range window:
SELECT system_no,
MIN( dt ) AS dt
FROM (
SELECT system_no,
dt,
COUNT(
CASE
WHEN ( se.event_id in ('101','102','103','104') --fails
OR se.event_id in ('106','107','108','109') ) --restores
THEN 1
END
) OVER (
ORDER BY dt
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS num
FROM system_event
) se
WHERE num = 0
AND REC_STS_CD = 'A'
AND EXISTS(
SELECT 1
FROM table1.event te
WHERE te.descr like '%WILDCARE%'
AND te.event_id = se.event_id
)
GROUP BY system_no
This is not direct answer for your question but it is a bit too long for comment.
How old data may be inserted? 48h window means you need to check only subset of data not whole 1bilion row table if data is inserted incrementally. So if it is please reduce data in comparison by some with clause or temporary table.
If you still need to compare along whole table I would go for partitioning by event_id or other attribute if there is better partition. And compare each group separately.
where el2.descr like '%WILDCARE%' is performance killer for such huge table.

Percentage difference between numbers in two columns

My SQL experience is fairly minimal so please go easy on me here. I have a table tblForEx and I'm trying to create a query that looks at one particular column LastSalesRateChangeDate and also ForExRate.
Basically what I want to do is for the query to check that LastSalesRateChangeDate and then pull the ForExRate that is on the same line (obviously in the ForExRate column), then I need to check to see if there is a +/- 5% change since the last time the LastSalesRateChangeDate changed. I hope this makes sense, I tried to explain it as clearly as possible.
I believe I would need to create a 'subquery' to look at the LastSalesRateChangeDate and pull the ForEx rate from that date, but I just don't know how to go about this.
I should add this is being done in Access (SQL)
Sample data, here is what the table looks like:
| BaseCur | ForCur | ForExRate | LastSalesRateChangeDate
| USD | BRL | 1.718 | 12/9/2008
| USD | BRL | 1.65 | 11/8/2008
So I would need a query to look at the LastSalesRateChangeDate column, check to see if the date has changed, if so take the ForExRate value and then give a percentage difference of that ForExRate value since the last record.
So the final result would likely look like
"BaseCur" "ForCur" "Percentage Change since Last Sales Rate Change"
USD BRL X%
Gordon's answer pointed in the right direction:
SELECT t2.*, (SELECT top 1 t.ForExRate
FROM tblForEx t
where t.BaseCur=t2.BaseCur AND t.ForCur=t2.ForCur and t.LastSalesRateChangeDate<t2.LastSalesRateChangeDate
order by t.LastSalesRateChangeDate DESC, t.ForExRate DESC
) AS PreviousRate, [ForExRate]/[PreviousRate]-1 AS ChangeRatio
FROM tblForEx AS t2;
Access gives errors where the TOP 1 in the subquery causes "ties". We broke the ties and therefore removed the error by adding an extra item to the ORDER BY clause. To get the ratio to display as a percentage, switch to the design view and change the properties of that column accordingly.
If I understand correctly, you want the previous value. In MS Access, you can use a correlated subquery:
select t.*,
(select top (1) t2.LastSalesRateChangeDate
from tblForEx as t2
where t2.BaseCur = t.BaseCur and t2.ForCur = t.ForCur
t2.LastSalesRateChangeDate < t.LastSalesRateChangeDate
order by t2.LastSalesRateChangeDate desc
) as prev_LastSalesRateChangeDate
from t;
Now, with this as a subquery, you can get the previous exchange rate using a join:
select t.*, ( (t.ForExRate / tprev.ForExRate) - 1) as change_ratio
from (select t.*,
(select top (1) t2.LastSalesRateChangeDate
from tblForEx as t2
where t2.BaseCur = t.BaseCur and t2.ForCur = t.ForCur
t2.LastSalesRateChangeDate < t.LastSalesRateChangeDate
order by t2.LastSalesRateChangeDate desc
) as prev_LastSalesRateChangeDate
from t
) as t inner join
tblForEx as tprev
on tprev.BaseCur = t.BaseCur and tprev.ForCur = t.ForCur
tprev.LastSalesRateChangeDate = t.prev_LastSalesRateChangeDate;
As per my understanding, you can use LEAD function to get last changed date Rate in a new column by using below query:
WITH CTE AS (
SELECT *, LEAD(ForExRate, 1) OVER(PARTITION BY BaseCur, ForCur ORDER BY LastChangeDate DESC) LastValue
FROM #TT
)
SELECT BaseCur, ForCur, ForExRate, LastChangeDate , CAST( ((ForExRate - ISNULL(LastValue, 0))/LastValue)*100 AS float)
FROM CTE
Problem here is:
for every last row in group by you will have new calculalted column which we have made using LEAD function.
If there is only a single row for a particular BaseCur and ForCur, then also you will have NULL in column.
Resolution:
If you are sure that there will be at least two rows for each BaseCur and ForCur, then you can use WHERE clause to remove NULL values in final result.
WITH CTE AS (
SELECT *, LEAD(ForExRate, 1) OVER(PARTITION BY BaseCur, ForCur ORDER BY LastChangeDate DESC) LastValue
FROM #TT
)
SELECT BaseCur, ForCur, ForExRate, LastChangeDate , CAST( ((ForExRate - ISNULL(LastValue, 0))/LastValue)*100 AS float) Percentage
FROM CTE
WHERE LastValue IS NOT NULL
SELECT basetbl.BaseCur, basetbl.ForCur, basetbl.NewDate, basetbl.OldDate, num2.ForExRate/num1.ForExRate*100 AS PercentChange FROM
(((SELECT t.BaseCur, t.ForCur, MAX(t.LastSalesRateChangeDate) AS NewDate, summary.Last_Date AS OldDate
FROM (tblForEx AS t
LEFT JOIN (SELECT TOP 2 BaseCur, ForCur, MAX(LastSalesRateChangeDate) AS Last_Date FROM tblForEx AS t1
WHERE LastSalesRateChangeDate <>
(SELECT MAX(LastSalesRateChangeDate) FROM tblForEx t2 WHERE t2.BaseCur = t1.BaseCur AND t2.ForCur = t1.ForCur)
GROUP BY BaseCur, ForCur) AS summary
ON summary.ForCur = t.ForCur AND summary.BaseCur = t.BaseCur)
GROUP BY t.BaseCur, t.ForCur, summary.Last_Date) basetbl
LEFT JOIN tblForEx num1 ON num1.BaseCur=basetbl.BaseCur AND num1.ForCur = basetbl.ForCur AND num1.LastSalesRateChangeDate = basetbl.OldDate))
LEFT JOIN tblForEx num2 ON num2.BaseCur=basetbl.BaseCur AND num2.ForCur = basetbl.ForCur AND num2.LastSalesRateChangeDate = basetbl.NewDate;
This uses a series of subqueries. First, you are selecting the most recent date for the BaseCur and ForCur. Then, you are joining onto that the previous date. I do that by using another subquery to select the top two dates, and exclude the one that is equal to the previously established most recent date. This is the "summary" subquery.
Then, you get the BaseCur, ForCur, NewDate, and OldDate in the "basetbl" subquery. After that, it is two simple joins of the original table back onto those dates to get the rate that was applicable then.
Finally, you are selecting your BaseCur, ForCur, and whatever formula you want to use to calculate the rate change. I used a simple ratio in that one, but it is easy to change. You can remove the dates in the first line if you want, they are there solely as a reference point.
It doesn't look pretty, but complicated Access SQL queries never do.

SQL Server adjust each value in a column by another table

I have two tables, TblVal and TblAdj.
In TblVal I have a bunch of values that I need adjusted according to TblAdj for a given TblVal.PersonID and TblVal.Date and then returned in some ViewAdjustedValues. I must apply only those adjustments where TblAdj.Date >= TblVal.Date.
The trouble is that since all the adjustments are either a subtraction or a division, they need to be made in order. Here is the table structure:
TblVal: PersonID, Date, Value
TblAdj: PersonID, Date, SubtractAmount, DivideAmount
I want to return ViewAdjustedValues: PersonID, Date, AdjValue
Can I do this without iterating through TblAdj using a WHILE loop and an IF block to either subtract or divide as necessary? Is there some nested SELECT table magic I can perform that would be faster?
I think you can do it without a loop, but whether you want to or not is another question. A query that I think works is below (SQL Fiddle here). The key ideas are as follows:
Each SubtractAmount has the ultimate effect of subtracting SubtractAmount divided by the product of all later DivideAmounts for the same PersonID. The Date associated with the PersonID isn't relevant to this adjustment (fortunately). The CTE AdjustedAdjustments contains these adjusted SubtractAmount values.
The initial Value for a PersonID gets divided by the product of all DivideAmount values on or after that persons Date.
EXP(SUM(LOG(x))) works as an aggregate product if all values of x are positive. You should constrain your DivideAmount values to assure this, or adjust the code accordingly.
If there are no DivideAmounts, the associated product is NULL and changed to 1. Similarly, NULL sums of adjusted SubtractAmount values are changed to zero. A left join is used to preserve an values that are not subject to any adjustments.
SQL Server 2012 supports an OVER clause for aggregates, which was helpful here to aggregate "all later DivideAmounts."
WITH AdjustedAdjustments AS (
select
PersonID,
Date,
SubtractAmount/
EXP(
SUM(LOG(COALESCE(DivideAmount,1)))
OVER (
PARTITION BY PersonID
ORDER BY Date
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
)
) AS AdjustedSubtract,
DivideAmount
FROM TblAdj
)
SELECT
p.PersonID,
p.Value/COALESCE(EXP(SUM(LOG(COALESCE(DivideAmount,1)))),1)
-COALESCE(SUM(a.AdjustedSubtract),0) AS AmountAdjusted
FROM TblVal AS p
LEFT OUTER JOIN AdjustedAdjustments AS a
ON a.PersonID = p.PersonID
AND a.Date >= p.Date
GROUP BY p.PersonID, p.Value, p.Date;
Try something like following:
with CTE_TblVal (PersonID,Date,Value)
as
(
select A.PersonID, A.Date, A.Value
from TblVal A
inner join TblAdj B
on A.PersonID = B.PersonID
where B.Date >= A.Date
)
update CTE_TblVal
set Date = TblAdj.Date,
Value = TblAdj.Value
from CTE_TblVal
inner join TblAdj
on CTE_Tblval.PersonID = TblAdj.PersonID
output inserted.* into ViewAdjustedValues
select * from ViewAdjustedValues