Fluently adding values in t-sql - sql

I have a table like this:
Items Date Price
1 2016-01-01 10
1 2016-01-02 15
1 2016-01-03 null
1 2016-01-04 null
1 2016-01-05 8
1 2016-01-06 null
1 2016-01-07 null
1 2016-01-08 null
2 2016-01-01 14
2 2016-01-02 7
2 2016-01-03 null
2 2016-01-04 null
2 2016-01-05 16
2 2016-01-06 null
2 2016-01-07 null
2 2016-01-08 5
Now I want to update the null values. The difference between the price before and after null values must be evenly added.
Example:
1 2016-01-02 15 to
1 2016-01-05 8
15 to 8 = -7
-7 / 3 = -2,333333
1 2016-01-02 15
1 2016-01-03 12,6666
1 2016-01-04 10,3333
1 2016-01-05 8
Shouldn't be made with cursors. Helptables would be OK.

This is really where you want the ignore nulls option on lag() and lead(). Alas.
An alternative is to use outer apply:
select t.*,
coalesce(t.price,
tprev.price +
datediff(day, tprev.date, t.date) * (tnext.price - tprev.price) / datediff(day, tprev.date, tnext.date)
) as est_price
from t outer apply
(select top 1 t2.*
from t t2
where t2.item = t.item and
t2.date <= t.date and
t2.price is not null
order by t2.date desc
) tprev outer apply
(select top 1 t2.*
from t t2
where t2.item = t.item and
t2.date >= t.date and
t2.price is not null
order by t2.date asc
) tnext ;
The complex arithmetic is just calculating the difference, dividing by the number of days, and then allocating the days to the current day.

WITH T1 AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Items ORDER BY Date) AS RN,
FORMAT(ROW_NUMBER() OVER (PARTITION BY Items ORDER BY Date),'D10') + FORMAT(Price,'0000000000.000000') AS RnPr
FROM YourTable
), T2 AS
(
SELECT *,
MAX(RnPr) OVER (PARTITION BY Items ORDER BY Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS prev,
MIN(RnPr) OVER (PARTITION BY Items ORDER BY Date ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS next
FROM T1
), T3 AS
(
SELECT Items,
Date,
Price,
RnPr,
InterpolatedPrice = IIF(Price IS NOT NULL,prevPrice,prevPrice + (RN - prevRN) * (nextPrice - prevPrice)/NULLIF(nextRN - prevRN,0))
FROM T2
CROSS APPLY (VALUES(CAST(SUBSTRING(prev,11,17) AS decimal(16,6)),
CAST(LEFT(prev, 10) AS INT),
CAST(SUBSTRING(next,11,17) AS decimal(16,6)),
CAST(LEFT(next, 10) AS INT)
)) V(prevPrice,prevRN,nextPrice,nextRN)
)
--UPDATE T3 SET Price = InterpolatedPrice
SELECT *
FROM T3
ORDER BY Items,
Date
Which returns
the row_number and price are bundled together in a single column (RnPr above). The order of RnPr is the same as the order by row_number. MIN and MAX both ignore NULLS. So finding the MAX(RnPr) between UNBOUNDED PRECEDING AND CURRENT ROW will include the value of the previous NOT NULL price if the price in the current row is null. And similarly MIN(RnPr) will find the next with a frame between CURRENT ROW AND UNBOUNDED FOLLOWING.
This can then be cracked apart to get the price and row number as above.
If happy with the results the final SELECT can be removed and the UPDATE uncommented as in this demo.

Just replace "YourTable" (4 of them) with your actual table name.
If you are happy with the results, then comment out the select and un-comment the UPDATE and WHERE.
Select A.Items,A.Date,
--Update YourTable Set
Price = IsNull(A.Price,((DateDiff(DD,B.Date,A.Date)/(DateDiff(DD,B.Date,C.Date)+0.0))*(C.Price - B.Price)) + B.Price)
From YourTable A
Outer Apply (Select Top 1 Date,Price from YourTable Where Items=A.Items and Date<A.Date and Price is not Null and A.Price is null Order by Price Desc) B
Outer Apply (Select Top 1 Date,Price from YourTable Where Items=A.Items and Date>A.Date and Price is not Null and A.Price is null Order by Price) C
--Where Price is NULL
Returns
Now, you'll notice nulls between 01/06 and 01/08 for Items 1. This is because there is no cap to interpolate with.

You can do this by using a series of window functions in common table expressions.
T1 assigns a row number to each row (rn)
T2 finds the first non null Price row number before and after the current row (rnb / rna)
T3 calculates the adjusted Price by looking up the prices before and after
declare #T table (Items int, Date date, Price float)
insert into #T (Items, Date, Price) values
(1, '2016-01-01', 10), (1, '2016-01-02', 15), (1, '2016-01-03', null), (1, '2016-01-04', null), (1, '2016-01-05', 8), (1, '2016-01-06', null), (1, '2016-01-07', null), (1, '2016-01-08', null), (2, '2016-01-01', 14), (2, '2016-01-02', 7), (2, '2016-01-03', null), (2, '2016-01-04', null), (2, '2016-01-05', 16), (2, '2016-01-06', null), (2, '2016-01-07', null), (2, '2016-01-08', 5)
;with T1 as
(
select *,
row_number() over (order by Items, Date) as rn
from #T
),
T2 as
(
select *,
max(case when price is null then null else rn end) over (partition by Items order by Date) as rnb,
min(case when price is null then null else rn end) over (partition by Items order by Date desc) as rna
from T1
)
select Items, Date,
isnull(price,
lag(Price, rn-rnb, Price) over (order by rn) -
(
lag(Price, rn-rnb, Price) over (order by rn) -
lead(Price, rna-rn, Price) over (order by rn)
) / (rna-rnb) * (rn-rnb)
) as Price
from T2
order by Items, Date

Related

GROUP by Largest String for all the substrings

I have a table like this where some rows have the same grp but different names. I want to group them by name such that all the substrings after removing nonalphanumeric characters are aggregated together and grouped by the largest string. The null value is considered the substring of all the strings.
grp
name
value
1
ab&c
10
1
abc d e
56
1
ab
21
1
a
23
1
xy
34
1
[null]
1
2
fgh
87
Desired result
grp
name
value
1
abcde
111
1
xy
34
2
fgh
87
My query-
Select grp,
regexp_replace(name,'[^a-zA-Z0-9]+', '', 'g') name, sum(value) value
from table
group by grp,
regexp_replace(name,'[^a-zA-Z0-9]+', '', 'g');
Result
grp
name
value
1
abc
10
1
abcde
56
1
ab
21
1
a
23
1
xy
34
1
[null]
1
2
fgh
87
What changes should I make in my query?
To solve this problem, I did the following (all of the code below is available on the fiddle here).
CREATE TABLE test
(
grp SMALLINT NOT NULL,
name TEXT NULL,
value SMALLINT NOT NULL
);
and populate it using your data + extra for testing:
INSERT INTO test VALUES
(1, 'ab&c', 10),
(1, 'abc d e', 56),
(1, 'ab', 21),
(1, 'a', 23),
(1, NULL, 1000000),
(1, 'r*&%$s', 100), -- added for testing.
(1, 'rs__t', 101),
(1, 'rs__tu', 101),
(1, 'xy', 1111),
(1, NULL, 1000000),
(2, 'fgh', 87),
(2, 'fgh', 13), -- For Charlieface
(2, NULL, 1000000),
(2, 'x', 50),
(2, 'x', 150),
(2, 'x----y', 100);
Then, you can use this query:
WITH t1 AS
(
SELECT
grp, n_str,
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str),
CASE
WHEN
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str) IS NULL
OR
POSITION
(
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str)
IN
n_str
) = 0
THEN 1
ELSE 0
END AS change,
value
FROM
test t1
CROSS JOIN LATERAL
(
VALUES
(
REGEXP_REPLACE(name,'[^a-zA-Z0-9]+', '', 'g')
)
) AS v(n_str)
WHERE n_str IS NOT NULL
), t2 AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY grp, s_change ORDER BY grp, n_str DESC) AS rn,
grp, n_str,
SUM(value) OVER (PARTITION BY grp, s_change) AS s_val,
MAX(LENGTH(n_str)) OVER (PARTITION BY grp) AS max_nom
FROM
(
SELECT
grp, n_str, change,
SUM(change) OVER (ORDER BY grp, n_str) AS s_change,
value
FROM
t1
ORDER BY grp, n_str DESC
) AS sub1
), t3 AS
(
SELECT
grp, SUM(value) AS null_sum
FROM
test
WHERE name IS NULL
GROUP BY grp
)
SELECT x.grp, x.n_str, x.s_val + y.null_sum
FROM t2 x
JOIN t3 y
ON x.max_nom = LENGTH(x.n_str) AND x.grp = y.grp
UNION
SELECT grp, n_str, s_val
FROM
t2 WHERE max_nom != LENGTH(n_str) AND rn = 1
ORDER BY grp, n_str;
Result:
grp n_str ?column?
1 abcde 2000110
1 rstu 302
1 xy 1111
2 fgh 1000100
2 xy 300
A few points to note:
Please always provide a fiddle when you ask questions such as this one with tables and data - it provides a single source of truth for the question and eliminates duplication of effort on the part of those trying to help you!
You haven't been very clear about what, exactly, should happen with NULLs - do the values count towards the SUM()? You can vary the CASE statement as required.
What happens when there's a tie in the number of characters in the string? I've included an example in the fiddle, where you get the draws - but you may wish to sort alphabetically (or some other method)?
There appears to be an error in your provided sums for the values (even taking account of counting or not values for NULL for the name field).
Finally, you don't want to GROUP BY the largest string - you want to GROUP BY the grp fields + the SUM() of the values in the the given grp records and then pick out the longest alphanumeric string in that grouping. It would be interesting to know why you want to do this?

Subtract previous row value to current row

I have the following table:
id value acc_no
-----------------
1 12 1
2 14 1
3 15 1
4 10 2
5 16 2
6 19 1
7 7 3
8 24 2
Expected output
id value acc_no result
------------------------------
1 12 1 12(current row values of acc_no=1)
2 14 1 2(14 (current row values)-12(previous row value of acc_no=1))
3 15 1 1(15-14)
4 10 2 10(current row values of acc_no=2)
5 16 2 6(16 (current row values)-12(previous row value of acc_no=2))
6 19 1 4(19(current row values)-15(previous row value of acc_no=1))
7 7 3 7(current row values of acc_no=3)
8 24 2 8(24(current row values)-16(previous row value of acc_no=2))
I tried this query:
select
id, value,
acc_no,
(value - (select value from tb_acc t1 where t1.id = t.id - 1)) as result
from
tb_acc t
But I didn't get the proper output as expected
DECLARE #Test TABLE (
id int,
value int,
acc_no int
)
INSERT #Test(id, value, acc_no)
VALUES
(1, 12, 1),
(2, 14, 1),
(3, 15, 1),
(4, 10, 2),
(5, 16, 2),
(6, 19, 1),
(7, 7, 3),
(8, 24, 2)
SELECT id, t.value, acc_no, t.value - ISNULL(v.value, 0) AS result
FROM #Test t
OUTER APPLY (
SELECT TOP (1) value
FROM #Test
WHERE id < t.id
AND acc_no = t.acc_no
ORDER by id DESC
) v
You can do like
Option one: Using LAG() function (I just notice you are using 2008 but I post it for other readers as well)
SELECT *,
Value - LAG(Value, 1, 0) OVER(PARTITION BY acc_no ORDER BY ID) Result
FROM T
ORDER BY ID;
Option two: Using a CTE and a window function + ISNULL()
WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY acc_no ORDER BY id) RN
FROM T
)
SELECT T1.id,
T1.value,
T1.acc_no,
T1.value - ISNULL(T2.value, 0) Result
FROM CTE T1 LEFT JOIN CTE T2
ON T1.acc_no = T2.acc_no
AND
T1.RN = T2.RN + 1
ORDER BY T1.id;
Live Demo
Using window functions:
;WITH CTE AS
(
SELECT id, value, acc_no,
ROW_NUMBER() OVER (PARTITION BY acc_no ORDER BY id) AS seq
FROM tb_acc
)
SELECT t1.*, t1.value - COALESCE(t2.value, 0)
FROM CTE AS t1
LEFT JOIN CTE AS t2 ON t1.acc_no = t2.acc_no AND t1.seq = t2.seq + 1
You just need a windowed SUM:
SELECT
id
,value
,acc_no
,value - isnull(sum([value]) over (partition by acc_no order by id rows between 1 preceding and 1 preceding ), 0) as result
FROM tb_acc t
order by id

combining strings from muliple rows in sql server

How can I obtain the below output in sql server 2012.
Table
ID | Values|
1 a
1 b
1 c
2 d
2 e
The output should be such that the first row has a fixed number of values(2) seperated by comma and the next row has the remaining values seperated by comma
ID
ID | Values|
1 a,b
1 c
2 d,e
Each id should contain maximum two values in a single row.The remaining values should come in the next row.
Try to use my code:
use db_test;
create table dbo.test567
(
id int,
[values] varchar(max)
);
insert into dbo.test567
values
(1, 'a'),
(1, 'b'),
(1, 'c'),
(2, 'd'),
(2, 'e')
with cte as (
select
id,
[values],
row_number() over(partition by id order by [values] asc) % 2 as rn1,
(row_number() over(partition by id order by [values] asc) - 1) / 2 as rn2
from dbo.test567
), cte2 as (
select
id, max(case when rn1 = 1 then [values] end) as t1, max(case when rn1 = 0 then [values] end) as t2
from cte
group by id, rn2
)
select
id,
case
when t2 is not null then concat(t1, ',', t2)
else t1
end as [values]
from cte2
order by id, [values]

Get the latest records per Group By SQL

I have the following table:
CREATE TABLE orders (
id INT PRIMARY KEY IDENTITY,
oDate DATE NOT NULL,
oName VARCHAR(32) NOT NULL,
oItem INT,
oQty INT
-- ...
);
INSERT INTO orders
VALUES
(1, '2016-01-01', 'A', 1, 2),
(2, '2016-01-01', 'A', 2, 1),
(3, '2016-01-01', 'B', 1, 3),
(4, '2016-01-02', 'B', 1, 2),
(5, '2016-01-02', 'C', 1, 2),
(6, '2016-01-03', 'B', 2, 1),
(7, '2016-01-03', 'B', 1, 4),
(8, '2016-01-04', 'A', 1, 3)
;
I want to get the most recent rows (of which there might be multiple) for each name. For the sample data, the results should be:
id
oDate
oName
oItem
oQty
...
5
2016-01-02
C
1
2
6
2016-01-03
B
2
1
7
2016-01-03
B
1
4
8
2016-01-04
A
1
3
The query might be something like:
SELECT oDate, oName, oItem, oQty, ...
FROM orders
WHERE oDate = ???
GROUP BY oName
ORDER BY oDate, id
Besides missing the expression (represented by ???) to calculate the desired values for oDate, this statement is invalid as it selects columns that are neither grouped nor aggregates.
Does anyone know how to do get this result?
The rank window clause allows you to, well, rank rows according to some partitioning, and then you could just select the top ones:
SELECT oDate, oName, oItem, oQty, oRemarks
FROM (SELECT oDate, oName, oItem, oQty, oRemarks,
RANK() OVER (PARTITION BY oName ORDER BY oDate DESC) AS rk
FROM my_table) t
WHERE rk = 1
This is a generic query without using analytical function.
SQLFiddle Demo
SELECT a.*
FROM table1 a
INNER JOIN
(SELECT max(odate) modate,
oname,
oItem
FROM table1
GROUP BY oName,
oItem
)
b ON a.oname=b.oname
AND a.oitem=b.oitem
AND a.odate=b.modate
Add a primary key suppose id field to the table and make it auto increment,. Then order by id you will get it. It is the traditional way. By using your table you can only order by oDate. But is is having same date multiple times, so it also won't solve your problem.
I think you need a query like this:
SELECT *
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY oName ORDER BY oDate DESC) seq
FROM yourTable) t
WHERE (seq <= 2)
ORDER BY oDate;
You have to use ROW_NUMBER in following:
select oDate, oName, oItem, oQty, oRemarks
from (
select *, row_number() over(partition by oName, oItem order by oDate desc) rn
from #t
)x
where rn = 1
order by oDate
OUTPUT
oDate oName oItem oQty oRemarks
2016-01-01 A 001 2
2016-01-01 A 002 1 test
2016-01-02 C 001 2
2016-01-03 B 001 4
2016-01-03 B 002 1

Coalesce over Rows in MSSQL 2008,

I'm trying to determine the best approach here in MSSQL 2008.
Here is my sample data
TransDate Id Active
-------------------------
1/18 1pm 5 1
1/18 2pm 5 0
1/18 3pm 5 Null
1/18 4pm 5 1
1/18 5pm 5 0
1/18 6pm 5 Null
If grouped by Id and ordered by the TransDate, I want the last Non Null Value for the Active Column, and the MAX of TransDate
SELECT MAX(TransDate) AS TransDate,
Id,
--LASTNonNull(Active) AS Active
Here would be the results:
TransDate Id Active
---------------------
1/18 6pm 5 0
It would be like a Coalesce but over the rows, instead of two values/columns.
There would be many other columns that would also have this similiar method applied, so I really don't want to make a seperate join for each of the columns.
Any ideas?
I'd probably use a correlated sub query.
SELECT MAX(TransDate) AS TransDate,
Id,
(SELECT TOP (1) Active
FROM T t2
WHERE t2.Id = t1.Id
AND Active IS NOT NULL
ORDER BY TransDate DESC) AS Active
FROM T t1
GROUP BY Id
A way without
SELECT
Id,
MAX(TransDate) AS TransDate,
CAST(RIGHT(MAX(CONVERT(CHAR(23),TransDate,121) + CAST(Active AS CHAR(1))),1) AS BIT) AS Active,
/*You can probably figure out a more efficient thing to
compare than the above depending on your data. e.g.*/
CAST(MAX(DATEDIFF(SECOND,'19500101',TransDate) * CAST(10 AS BIGINT) + Active)%10 AS BIT) AS Active2
FROM T
GROUP BY Id
Or following the comments would cross apply work better for you?
WITH T (TransDate, Id, Active, SomeOtherColumn) AS
(
select GETDATE(), 5, 1, 'A' UNION ALL
select 1+GETDATE(), 5, 0, 'B' UNION ALL
select 2+GETDATE(), 5, null, 'C' UNION ALL
select 3+GETDATE(), 5, 1, 'D' UNION ALL
select 4+GETDATE(), 5, 0, 'E' UNION ALL
select 5+GETDATE(), 5, null,'F'
),
T1 AS
(
SELECT MAX(TransDate) AS TransDate,
Id
FROM T
GROUP BY Id
)
SELECT T1.TransDate,
Id,
CA.Active AS Active,
CA.SomeOtherColumn AS SomeOtherColumn
FROM T1
CROSS APPLY (SELECT TOP (1) Active, SomeOtherColumn
FROM T t2
WHERE t2.Id = T1.Id
AND Active IS NOT NULL
ORDER BY TransDate DESC) CA
This example should help, using analytical functions Max() OVER and Row_Number() OVER
create table tww( transdate datetime, id int, active bit)
insert tww select GETDATE(), 5, 1
insert tww select 1+GETDATE(), 5, 0
insert tww select 2+GETDATE(), 5, null
insert tww select 3+GETDATE(), 5, 1
insert tww select 4+GETDATE(), 5, 0
insert tww select 5+GETDATE(), 5, null
select maxDate as Transdate, id, Active
from (
select *,
max(transdate) over (partition by id) maxDate,
ROW_NUMBER() over (partition by id
order by case when active is not null then 0 else 1 end, transdate desc) rn
from tww
) x
where rn=1
Another option, quite expensive, would be doing it through XML. For educational purposes only
select
ID = n.c.value('#id', 'int'),
trandate = n.c.value('(data/transdate)[1]', 'datetime'),
active = n.c.value('(data/active)[1]', 'bit')
from
(select xml=convert(xml,
(select id [#id],
( select *
from tww t
where t.id=tww.id
order by transdate desc
for xml path('data'), type)
from tww
group by id
for xml path('node'), root('root'), elements)
)) x cross apply xml.nodes('root/node') n(c)
It works on the principle that the XML generated has each record as a child node of the ID. Null columns have been omitted, so the first column found using xpath (child/columnname) is the first non-null value similar to COALESCE.
You could use a subquery:
SELECT MAX(TransDate) AS TransDate
, Id
, (
SELECT TOP 1 t2.Active
FROM YourTable t2
WHERE t1.id = t2.id
and t2.Active is not null
ORDER BY
t2.TransDate desc
)
FROM YourTable t1
I created a temp table named #temp to test my solution, and here is what I came up with:
transdate id active
1/1/2011 12:00:00 AM 5 1
1/2/2011 12:00:00 AM 5 0
1/3/2011 12:00:00 AM 5 null
1/4/2011 12:00:00 AM 5 1
1/5/2011 12:00:00 AM 5 0
1/6/2011 12:00:00 AM 5 null
1/1/2011 12:00:00 AM 6 2
1/2/2011 12:00:00 AM 6 3
1/3/2011 12:00:00 AM 6 null
1/4/2011 12:00:00 AM 6 2
1/5/2011 12:00:00 AM 6 null
This query...
select max(a.transdate) as transdate, a.id, (
select top (1) b.active
from #temp b
where b.active is not null
and b.id = a.id
order by b.transdate desc
) as active
from #temp a
group by a.id
Returns these results.
transdate id active
1/6/2011 12:00:00 AM 5 0
1/5/2011 12:00:00 AM 6 2
Assuming a table named "test1", how about using ROW_NUMBER, OVER and PARTITION BY?
SELECT transdate, id, active FROM
(SELECT transdate, ROW_NUMBER() OVER(PARTITION BY id ORDER BY transdate desc) AS rownumber, id, active
FROM test1
WHERE active is not null) a
WHERE a.rownumber = 1