I'm looking for an SQL way to get the value from the next row.
The data I have looks like:
CUST PROD From_Qty Disc_Pct
23 Brush 1 0
23 Brush 13 1
23 Brush 52 4
77 Paint 1 0
77 Paint 22 7
What I need to end up with is this, (I want to create the To_Qty row):
CUST PROD From_Qty To_Qty Disc_Pct
23 Brush 1 12 0
23 Brush 13 51 1 #13 is 12+1
23 Brush 52 99999 4 #52 is 51+1
77 Paint 1 21 0 #1 is 99999+1
77 Paint 22 99999 7 #22 is 21+1
I've got 100K+ rows to do this to, and it has to be SQL because my ETL application allows SQL but not stored procedures etc.
How can I get the value from the next row so I can create To_Qty?
SELECT *,
LEAD([From_Qty], 1, 100000) OVER (PARTITION BY [CUST] ORDER BY [From_Qty]) - 1 AS To_Qty
FROM myTable
LEAD() will get you the next value based on the order of [From_Qty].. you use PARTITION BY [CUST] to reset when [Cust] changes values
or you can use a CTE and Row_Number.
WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [CUST] ORDER BY [From_Qty]) Rn
FROM myTable
)
SELECT t1.*,
ISNULL(t2.From_Qty - 1, 99999) To_Qty
FROM cte t1
LEFT JOIN cte t2 ON t1.Cust = t2.Cust AND t1.Rn + 1 = t2.Rn
SELECT
CUST,
PROD,
FROM_QTY ,
COALESCE(MIN(FROM_QTY) OVER (PARTITION BY CUST, PROD ORDER BY FROM_QTY DESC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 10000)-1,
DISC_PCT
FROM <tablename>
ORDER BY CUST, PROD, FROM_QTY
If you are running SQL Server 2012 or later versions, you can use the LAG and LEAD functions for accessing prior or subsequent rows along with the current row.
You can use LEAD and FIRST_VALUE analytic functions to generate the result you mentioned. By using LEAD() function the next value with in the customer group can be retrieved an the FIRST_VALUE() will give the first value with in the customer group.
Say for eg. CUST=23... LEAD will return 13 and FIRST_VALUE will return 1... TO_QTY= LEAD - FIRST_VALUE i.e.. 13-1=12. In similar way the formula mentioned below will compute for all the 100k rows in your table.
SELECT CUST,
PROD,
FROM_QTY,
CASE WHEN LEAD( FROM_QTY,1 ) OVER ( PARTITION BY CUST ORDER BY FROM_QTY ) IS NOT NULL
THEN
LEAD( FROM_QTY,1 ) OVER ( PARTITION BY CUST ORDER BY FROM_QTY ) -
FIRST_VALUE( FROM_QTY ) OVER ( PARTITION BY CUST ORDER BY FROM_QTY )
ELSE 99999
END AS TO_QTY,
DISC_PCT
FROM Yourtable;
Insert the data into a temp table with the same columns but an id auto increment field added. Insert them ordered, I'm assuming by cust, prod, then from_qty.
Now you can run an update statement on the temp table.
UPDATE #mytable
SET To_Qty = (SELECT From_Qty - 1 FROM #mytable AS next WHERE next.indexfield = #mytable.indexfield + 1 AND next.cust = #mytable.cust and next.prod = #mytable.prod)
and then another one to do the 99999 with a not exists clause.
Then insert the data back to your new or modified table.
declare #Table table(CUST int, PROD varchar(50), From_Qty int, Disc_Pct int)
insert into #Table values
(23, 'Brush', 1, 0)
,(23, 'Brush', 13, 1)
,(23, 'Brush', 52, 4)
,(77, 'Paint', 1, 0)
,(77, 'Paint', 22, 7)
SELECT CUST, Prod, From_qty,
LEAD(From_Qty,1,100000) OVER(PARTITION BY cust ORDER BY from_qty)-1 AS To_Qty,
Disc_Pct
FROM #Table
Related
Suppose I have this table:
CREATE TABLE #t1
(
PersonID int ,
ExamDates date,
Score varchar(50) SPARSE NULL,
);
SET dateformat mdy;
INSERT INTO #t1 (PersonID, ExamDates, Score)
VALUES (1, '1.1.2018',70),
(1, '1.13.2018', 100),
(1, '1.18.2018', 85),
(2, '1.1.2018', 90),
(2, '2.1.2018', 95),
(2, '3.15.2018', 95),
(2, '7.30.2018', 100),
(3, '1.1.2018', 80),
(3, '1.2.2018', 80),
(3, '5.3.2018', 50),
(4, '2.1.2018', 90),
(4, '2.20.2018', 100);
I would like to find observations that occurs at least 3 times spanning at least 15 days but no more than 90 days for each unique ID.
My final table should look like this:
PersonID
ExamDates
Score
1
1/1/2018
70
1
1/13/2018
100
1
1/18/2018
85
2
1/1/2018
90
2
2/1/2018
95
2
3/15/2018
95
We have code working for this using R, but would like to avoid pulling large datasets into R just to run this code. We are doing this in a very large dataset and concerned about efficiency of the query.
Thanks!
-Peter
To start with, the common name for this situation is Gaps and Islands. That will help you as you search for answers or come up with similar problems in the future.
That out of the way, here is my solution. Start with this:
WITH Leads As (
SELECT t1.*
, datediff(day, ExamDates, lead(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates)) As Diff
FROM t1
)
SELECT *
FROM Leads
WHERE Diff BETWEEN 15 AND 90
I have to use the CTE, because you can't put a windowing function in a WHERE clause. It produces this result, which is only part of what you want:
PersonID
ExamDates
Score
Diff
1
2018-01-01
70
17
2
2018-01-01
90
73
This shows the first record in each group. We can use it to join back to the original table and find all the records that meet the requirements.
But first, we have a problem. The sample data only has groups with exactly three records. However, the real data might end up with groups with more than three items. In that case this would find multiple first records from the same group.
You can see it in this updated SQL Fiddle, which adds an additional record for PersonID #1 that is still inside the date range.
PersonID
ExamDates
Score
Diff
1
2018-01-01
70
17
1
2018-01-13
100
29
2
2018-01-01
90
73
I'll be using this additional record in every step from now on.
To account for this, we also need to check to see each record is not in the middle or end of a valid group. That is, also look a couple records both ahead and behind.
WITH Diffs As (
SELECT #t1.*
, datediff(day, ExamDates, lead(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates)) As LeadDiff2
, datediff(day, ExamDates, lead(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates)) As LeadDiff1
, datediff(day, lag(ExamDates, 1, NULL) over (partition by PersonID ORDER BY ExamDates), ExamDates) as LagDiff1
, datediff(day, lag(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates), ExamDates) as LagDiff2
FROM #t1
)
SELECT *
FROM Diffs
WHERE LeadDiff2 BETWEEN 15 AND 90
AND coalesce(LeadDiff1 + LagDiff1,100) > 90 /* Not in the middle of a valid group */
AND coalesce(Lagdiff2, 100) > 90 /* Not at the end of a valid group */
This code gets us back to the original results, even with the additional record. Here's the updated fiddle:
http://sqlfiddle.com/#!18/ea12ad/23
Now we can join back to the original table and find all records in each group:
WITH Diffs As (
SELECT 3t1.*
, datediff(day, ExamDates, lead(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates)) As LeadDiff2
, datediff(day, ExamDates, lead(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates)) As LeadDiff1
, datediff(day, lag(ExamDates, 1, NULL) over (partition by PersonID ORDER BY ExamDates), ExamDates) as LagDiff1
, datediff(day, lag(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates), ExamDates) as LagDiff2
FROM #t1
), FirstRecords AS (
SELECT PersonID, ExamDates, DATEADD(day, 90, ExamDates) AS FinalDate
FROM Diffs
WHERE LeadDiff2 BETWEEN 15 AND 90
AND coalesce(LeadDiff1 + LagDiff1,100) > 90 /* Not in the middle of a valid group */
AND coalesce(lagdiff2, 100) > 90 /* Not at the end of a valid group */
)
SELECT t.*
FROM FirstRecords f
INNER JOIN #t1 t ON t.PersonID = f.PersonID
AND t.ExamDates >= f.ExamDates
AND t.ExamDates <= f.FinalDate
ORDER BY t.PersonID, t.ExamDates
That gives me this, which matches your desired output and my extra record:
PersonID
ExamDates
Score
1
2018-01-01
70
1
2018-01-13
100
1
2018-01-18
85
1
2018-02-11
89
2
2018-01-01
90
2
2018-02-01
95
2
2018-03-15
95
See it work here:
http://sqlfiddle.com/#!18/ea12ad/26
Here's Eli's idea done a bit more simply, and moving all of the heavy computation to the cte, where it may possibly be more efficient:
With cte As (
Select PersonID, ExamDates
,Case When Datediff(DAY,ExamDates, Lead(ExamDates,2,Null) Over (Partition by PersonID Order by ExamDates)) Between 15 and 90
Then Lead(ExamDates,2,Null) Over (Partition by PersonID Order by ExamDates)
Else NULL End as EndDateRange
From #t1
)
Select Distinct B.*
From cte Inner Join #t1 B On B.PersonID=cte.PersonID
And B.ExamDates Between cte.ExamDates and cte.EndDateRange
The Case statement in the CTE only returns a valid date if the entry two items later satisfies the overall condition; that date is used to form a range with the current record's ExamDate. By returning NULL on non-qualified ranges we ensure the join in the outer part of the SQL is not satisfied. The Distinct clause is needed to collapse duplicates when there are are 4+ consecutive observations within the 15-90 day range.
You'll need a CTE to identify the base for the conditions which you described.
This code works with your sample set, and should work even when you have a larger set - though may require a distinct if you have overlapping results, i.e. 5 exam dates in the 15-90 range.
WITH cte AS(
SELECT
PERSONID
,EXAMDATES
,Score
,COUNT(*) OVER (PARTITION BY PERSONID ORDER BY ExamDates ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )AS COUNTS
,LAG(ExamDates,2,NULL) OVER (PARTITION BY PERSONID ORDER BY ExamDates) DIFFS
FROM #t1
)
SELECT B.*
FROM CTE
INNER JOIN #T1 B ON CTE.PERSONID = B.PERSONID
WHERE CTE.COUNTS >=3
AND DATEDIFF(DAY,CTE.DIFFS,CTE.EXAMDATES) BETWEEN 15 AND 90
AND B.EXAMDATES BETWEEN CTE.DIFFS AND CTE.EXAMDATES
I have table #NumberRange. It has a start and end number. I have to find out ranges are in sequence
Declare #NumberRange table
(
Id int primary key,
ItemId int,
[start] int,
[end] int
)
INSERT INTO #NumberRange
VALUES
(1,1,1,10),
(2,1,11,20),
(3,1,21,30),
(4,1,40,50),
(5,1,51,60),
(6,1,61,70),
(7,1,80,90),
(8,1,100,200)
Expected Result:
Note: Result Column calculated from if any continuous numbers i.e 1 to 10 ,11-20,21-30 are continuous numbers. So result column updated as 1 and then 41-50 not continuous numbers (because previous row end with 30 next row start with 40) that is why result column will be 2 and it continuous..
In 4th end with 50 and 5 th start with 51 continuous then result would be 3 because I have differentiate with Result 1...
I have used lead functions and expected result not came,..please can someone help me get the result?
Workaround:
select
*,
[Diff] = [Lead] - [end],
[Result] = Rank() OVER (PARTITION BY ([Lead] - [end]) ORDER BY Id)
from
(select
id, [start], [end], LEAD([start]) over (order by id) as [Lead]
from
#NumberRange) Z
order by
id
Use lag() to determine where the groups start. Then a cumulative sum to enumerate them:
select nr.*,
sum(case when startr = prev_endr + 1 then 0 else 1 end) over (partition by itemid order by startr) as grp
from (select nr.*, lag(endr) over (partition by itemid order by startr) as prev_endr
from numberrange nr
) nr;
Here is a db<>fiddle.
This answer assumes that ids 4 and 5 are continuous, which makes sense based on the rest of the question.
Your expected result is not clear and the questions which are asked in the comments I have too, but I think what you want to do is something similar to
select N1.*,case when N1.[end]+1=N2.[start] then 1 else 2 end Result from #NumberRange N1 inner join #NumberRange N2 on N1.Id=N2.Id-1
I have a data table that looks in practice like this:
Team Shirt Number Name
1 1 Seaman
1 13 Lucas
2 1 Bosnic
2 14 Schmidt
2 23 Woods
3 13 Tubilandu
3 14 Lev
3 15 Martin
I want to remove duplicates of team by the following logic - if there is a "1" shirt number, use that. If not, look for a 13. If not look for 14 then any.
I realise it is probably quite basic but I don't seem to be making any progress with case statements. I know it's something with sub-queries and case statements but I'm struggling and any help gratefully received!
Using SSMS.
Since you didn't specified any DBMS, let me assume row_number() would work for that :
DELETE
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY Team
ORDER BY (CASE WHEN Shirt_Number = 1
THEN 1
WHEN Shirt_Number = 13
THEN 2
WHEN Shirt_Number = 14
THEN 3
ELSE 4
END)
) AS Seq
FROM table t
) t
WHERE Seq = 1;
This assuming Shirt_Numbers have a gap else only order by Shirt_Number enough.
I think you are looking for a partition by clause usage. Solution below worked in Sql Server.
create table #eray
(team int, shirtnumber int, name varchar(200))
insert into #eray values
(1, 1, 'Seaman'),
(1, 13, 'Lucas'),
(2, 1, 'Bosnic'),
(2, 14, 'Schmidt')
;with cte as (
Select Team, ShirtNumber, Name,
ROW_NUMBER() OVER (PARTITION BY Team ORDER BY ShirtNumber ASC) AS rn
From #eray
where ShirtNumber in (1,13,14)
)
select * from cte where rn=1
If you have a table of teams, you can use cross apply:
select ts.*
from teams t cross apply
(select top (1) ts.*
from teamshirts ts
where ts.team = t.team
order by (case shirt_number when 1 then 1 when 13 then 2 when 14 then 3 else 4 end)
) ts;
If you have no numbers between 2 and 12, you can simplify this to:
select ts.*
from teams t cross apply
(select top (1) ts.*
from teamshirts ts
where ts.team = t.team
order by shirt_number
) ts;
This is my table :
i need to get running difference of applied leave with above table . here applied leave is 10 and total balance is 8 . i need to display -2 on the last line (8-10=-2).
My try:
;WITH x AS
(
select abs(balance-(applied))req from mytest where rno=1
UNION ALL
select abs(mytest.balance-abs(x.req)) from x join mytest on mytest.rno=x.rno+1
)
SELECT balance, balance-req
FROM x
Actual Result:
Expected Result:
balance | applied
5.00 | 0
2.00 | 0
1.00 | -2
can anyone help to sort this issue???... thanks in advance ....
You can use SUM and OVER combined with ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW to calculated the running difference for each row:
DECLARE #DataSource TABLE
(
[rno] INT
,[balance] DECIMAL(9,2)
,[applied] DECIMAL(9,2)
);
INSERT INTO #DataSource ([rno], [balance], [applied])
VALUES (1, 5, 10)
,(2, 2, 0)
,(3, 1, 0);
SELECT [rno]
,[balance]
,SUM([balance] - [applied]) OVER (ORDER BY [rno] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS [applied]
FROM #DataSource;
Setting 0 to the rest of the running differences (except the last one) can be done with additional manipulation or with IIF if you know the ID of the last row.
You can use window functions for this. The key would be a cumulative sum of balance:
select t.rno, t.balance, t.applied,
(case when max_rno = rno then cume_balance - sum_applied
else 0
end) as new_applied
from (select t.*,
sum(balance) over (order by rno) as cume_balance,
max(rno) over () as max_rno,
sum(applied) over () as sum_applied
from mytable t
) t;
For this example say I have a table with two fields, AREA varchar(30) and OrderNumber INT.
The table has the following data
AREA | OrderNumber
Fontana | 32
Fontana | 42
Fontana | 76
Fontana | 12
Fontana | 3
Fontana | 99
RC | 32
RC | 1
RC | 8
RC | 9
RC | 4
I would like to return
The results I would like to return is for each area the longest length of increasing consecutive values. For Fontana it is 3 (32, 42, 76). For RC it is 2 (8,9)
AREA | LongestLength
Fontana | 3
RC | 2
How would I do this on MS Sql 2005?
One way is to use a recursive CTE that steps over each row. If the row meets the criteria (increasing order number for the same area), you increase the chain length by one. If it doesn't, you start a new chain:
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1
)
, recurse as
(
select rn
, area
, OrderNumber
, 1 as ChainLength
from numbered
where rn = 1
union all
select cur.rn
, cur.area
, cur.OrderNumber
, case
when cur.area = prev.area
and cur.OrderNumber > prev.OrderNumber
then prev.ChainLength + 1
else 1
end
from recurse prev
join numbered cur
on prev.rn + 1 = cur.rn
)
select area
, max(ChainLength)
from recurse
group by
area
Live example at SQL Fiddle.
An alternative way is to use a query to find "breaks", that is, rows that end a sequence of increasing order numbers for the same area. The number of rows between breaks is the length.
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1 t1
)
-- Select rows that break an increasing chain
, breaks as
(
select row_number() over (order by cur.rn) rn2
, cur.rn
, cur.Area
from numbered cur
left join
numbered prev
on cur.rn = prev.rn + 1
where cur.OrderNumber <= prev.OrderNumber
or cur.Area <> prev.Area
or prev.Area is null
)
-- Add a final break after the last row
, breaks2 as
(
select *
from breaks
union all
select count(*) + 1
, max(rn) + 1
, null
from breaks
)
select series_start.area
, max(series_end.rn - series_start.rn)
from breaks2 series_start
join breaks2 series_end
on series_end.rn2 = series_start.rn2 + 1
group by
series_start.area
Live example at SQL Fiddle.
You do not explain why RC's longest sequence does not include 1 while Fontana's does include 32. I take it, the 1 is excluded because it is a decrease: it comes after 32. The Fontana's 32, however, is the first ever item in the group, and I've got two ideas how to explain why it is considered an increase. That's either exactly because it's the group's first item or because it is also positive (i.e. as if coming after 0 and, therefore, an increase).
For the purpose of this answer, I'm assuming the latter, i.e. a group's first item is an increase if it is positive. The below script implements the following idea:
Enumerate the rows in every AREA group in the order of the eventtime column you nearly forgot to mention.
Join the enumerated set to itself to link every row with it's predecessor.
Get the sign of the difference between the row and its preceding value (defaulting the latter to 0). At this point the problem turns into a gaps-and-islands one.
Partition every AREA group by the signs determined in #3 and enumerate every subgroup's rows.
Find the difference between the row numbers from #1 and those found in #4. That would be a criterion to identify individual streaks (together with AREA).
Finally, group the results by AREA, the sign from #3 and the result from #5, count the rows and get the maximum count per AREA.
I implemented the above like this:
WITH enumerated AS (
SELECT
*,
row = ROW_NUMBER() OVER (PARTITION BY AREA ORDER BY eventtime)
FROM atable
),
signed AS (
SELECT
this.eventtime,
this.AREA,
this.row,
sgn = SIGN(this.OrderNumber - COALESCE(last.OrderNumber, 0))
FROM enumerated AS this
LEFT JOIN enumerated AS last
ON this.AREA = last.AREA
AND this.row = last.row + 1
),
partitioned AS (
SELECT
AREA,
sgn,
grp = row - ROW_NUMBER() OVER (PARTITION BY AREA, sgn ORDER BY eventtime)
FROM signed
)
SELECT DISTINCT
AREA,
LongestIncSeq = MAX(COUNT(*)) OVER (PARTITION BY AREA)
FROM partitioned
WHERE sgn = 1
GROUP BY
AREA,
grp
;
A SQL Fiddle demo can be found here.
You can do some math by ROW_NUMBER() to figure out where you have consecutive items.
Here's the code sample:
;WITH rownums AS
(
SELECT [area],
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [ordernumber]) AS rid1,
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [eventtime]) AS rid2
FROM SomeTable
),
differences AS
(
SELECT [area],
[calc] = rid1 - rid2
FROM rownums
),
summation AS
(
SELECT [area], [calc], COUNT(*) AS lengths
FROM differences
GROUP BY [area], [calc]
)
SELECT [area], MAX(lengths) AS LongestLength
FROM differences
JOIN summation
ON differences.[calc] = summation.[calc]
AND differences.area = calc.area
GROUP BY [area]
So if I do one set of row numbers ordered by my ordernumber and another set of row numbers by my event time, the difference between those two numbers will always be the same, so long as their order is the same.
You can then get a count grouped by those differences and then pull the largest count to get what you need.
EDIT: ...
Ignore the first edit, what I get for rushing.