How to perform calculation in a specific column with multiple conditions - sql

I am looking for a clean solution to perform calculation from a single column with a few conditions and insert it in the same table. My existing solution is to use a while loop with many variable declarations, writing simple query to store value, perform calculation and finally insert it as a new row to the table. However it looks messy and complicated. I am wondering if there is a better solution to it?
Original Table
Week | Indicator | Value
1 A 2
1 B 3
1 D 10
1 E 5
1 X 12
1 Y 6
2 A 4
2 B 5
2 D 7
2 E 3
2 X 4
2 Y 2
...
53
Updated Table
Week | Indicator | Value
1 A 2
1 B 3
1 C 5
1 D 10
1 E 5
1 F 5
1 X 12
1 Y 6
1 Z 2
2 A 4
2 B 5
2 C 9
2 D 7
2 E 3
2 F 4
2 X 4
2 Y 2
2 Z 2
In this example in the updated table, every 3rd row involves different calculation for the same week such that the 3rd row is an addition, 6th row is a subtraction and the 9th row is a division.
The calculation does not restrict to only addition and could include other forms of calculation formulas. I am just using addition as a simple illustration.
Here is an example of my SQL solution:
DECLARE #total_rows int;
SET #total_rows = (SELECT COUNT(*) FROM original_table);
DECLARE #wk varchar(5);
DECLARE #indicator1 char(1);
DECLARE #indicator2 char(1);
SET #indicator1 = 'A';
SET #indicator2 = 'B';
DECLARE #a_value int;
DECLARE #b_value int;
DECLARE #cal_value int;
DECLARE #iteration int
SET #iteration = 1
WHILE #iteration <= #total_rows
BEGIN
IF #iteration <= 53
SET #wk = concat('W',#iteration)
SET #a_value = (SELECT value
FROM original_table
WHERE indicator = #indicator1 and week = #wk);
SET #b_value = (SELECT value
FROM original_table
WHERE indicator = #indicator2 and week = #wk);
SET #cal_value = (#a_value/ NULLIF(#b_value,0)) *1000000;
....
SET #iteration = #iteration + 1
END
Not going to post the entire SQL script as it is quite lengthy but I hope you get the gist of it.

Is this not as simple as INSERT and SUM..?
INSERT INTO dbo.YourTable ([Week],Indicator,[Value])
SELECT YT.[Week],
'C' AS Indicator,
SUM(YT.[Value]) AS [Value]
FROM dbo.YourTable YT
GROUP BY YT.[Week];
DB<>Fiddle

Related

How can I create cumulative string concatenation across rows with ordering of the string?

I am looking to add a column to my database which performs row-wise cumulative concatenations of the values of another column for each ID, and orders the resulting string by a different hierarchy. The dataset is very large and the result I designed an smaller test data isn't workable on a larger scale so I need help redesigning it.
I have written the query so far using a combination of a recursive CTE to perform the cumulative concatenation (step 1 output below), and then a slightly clunky function (step 2 output below) to order the strings according to a separate hierarchy which also removes the '1' value. These work on a small subset of my data (n=60), but when I try to run an a larger subset (n=500,000) the CTE table runs forever (stopped without completing at 2hours). The real dataset will be in an order of magnitude of hundred of millions of rows, so solution isn't workable for that scale.
ID Start_Date End_Date Seg step1 step2
1 01/04/1946 31/12/1990 1 1 1
1 01/01/1991 08/01/2007 4 4 4
1 09/01/2007 04/02/2007 1 1 1
1 05/02/2007 18/10/2017 4 14 4
1 01/04/2013 18/10/2017 8 148 48
1 11/11/2014 18/10/2017 7 1487 487
2 01/05/1931 31/12/1997 1 1 1
2 01/01/1998 20/01/2014 4 4 4
2 31/01/2011 20/01/2014 6 146 46
2 21/02/2013 20/01/2014 5 1465 456
2 01/04/2013 20/01/2014 8 14658 4586
2 29/04/2013 20/01/2014 7 146587 45876
There are additional complicated logic elements, such as only starting the cumulation when the start date is earlier than the previous row end date, so a solution which allows flexibility by adding where or case when statements is key.
Example of the recursive CTE and ordering function I have used (not adapted for the simplified table shown, but indicative of the structure I have used) are below.
Recursive CTE (output step 1 column)
with t (ID, Segment,start_date, start_comb,updated_end_date ,rn) as (
select ID, Segment, start_date, case when Segment_end_date <> resolved_date OR Segment_end_date is null then 1 else 0 end as start_comb
,updated_end_date
,row_number() over (partition by ID order by start_date) as rn
from #test_IDs
)
,r (ID, orig_seg, Segment, rn, start_comb, start_date, updated_end_date) as (
select ID, cast(Segment as varchar(max)), cast(Segment as varchar(max)),rn, start_comb, start_date, updated_end_date
from t
where start_comb=0
union all
select r.ID, cast(t.segment as varchar(max)) as orig_seg
, Segment = cast( (concat(r.Segment,t.Segment)) as varchar(max))
, t.rn, t.start_comb, t.start_date, t.updated_end_date
from r
join t on t.ID = r.ID and t.rn = r.rn + 1 and t.start_comb <> 0
)
Ordering function (output step 2 column)
if object_id ('reformat') is not null
drop function reformat
create function dbo.reformat
(
#unordered_Segs varchar(max)
)
returns varchar(255)
as
begin
declare #healthy int, #first int, #second int, #third int, #fourth int, #fifth int, #outtext int
if Charindex('4',#unordered_segs) > 0
set #first = 4
else set #first = ''
if Charindex('5',#unordered_segs) > 0
set #second = 5
else set #second = ''
if Charindex('8',#unordered_segs) > 0
set #third = 8
else set #third = ''
if Charindex('7',#unordered_segs) > 0
set #fourth = 7
else set #fourth = ''
if Charindex('6',#unordered_segs) > 0
set #fifth = 6
else set #fifth = ''
if Charindex('1',#unordered_segs) > 0 and len(#unordered_segs) = 1
set #outtext = 1
else
set #outtext = Replace((concat(#first,#second,#third,#fourth,#fifth)),'0','')
return #outtext
end
Thanks!

SQL UPDATE Query Using FROM

While answering a test i faced the following question, which i wasn't able to solve:
Given the following table Z and query:
Table Z:
| Value |
---------
| 1 |
| 2 |
| 3 |
| 4 |
---------
Query:
UPDATE Z
SET VALUE = Y.VALUE + 1
FROM Z AS Y
WHERE Y.VALUE = Z.VALUE + 1;
SELECT SUM(VALUE) FROM Z;
The question asks for the result of this query execution. The question doesn't mention a specific SQL language.
The CORRECT answer is 16.
I don't know how this query can achieve this result. I wasn't even able to execute this query in a real environment, it complains about some syntax error near "FROM".
1 - Do you guys know how this query works?
2 - How could i proceed in order to execute this query?
P.S. I had a hard time trying to find some information about the FROM clause inside the UPDATE query.
One database where the code will work is Postgres. According to RexTester, this is indeed the answer.
The reason should be because you are adding "2" to each matching Z value: z = y.value + 1 = z.value + 1 + 1 -- but the fourth value does not match. Postgres generates the following:
value
1 4
2 3
3 4
4 5
This is the same data just in a different order.
With a similar statement, SQL Server does the right thing:
UPDATE Z
SET val = Y.val + 1
FROM Z, Z AS Y
WHERE Y.val = Z.val + 1;
(I am using the dreaded comma in a FROM clause to keep the two statements as similar as possible.)
It returns:
val
1 3
2 4
3 5
4 4
The two result sets are the same, they are just in a different order.
I hope this could help (made on MSSQL):
The first SELECT will show you original values the the update will use.
I made UPDATE inside transaction / rollback so it will not change the table. You can delete BEGIN TRAN and ROLLBACK TRAN if you want to change your data table.
CREATE TABLE TZ (VALUE INT)
INSERT INTO TZ VALUES (1),(2),(3),(4)
SELECT Z.VALUE AS Z_VALUE, Y.VALUE AS Y_VALUE
FROM TZ Z
INNER JOIN TZ Y ON Y.VALUE=Z.VALUE +1
;
BEGIN TRAN
UPDATE Z SET VALUE=Y.VALUE+1
FROM TZ Z
INNER JOIN TZ Y ON Y.VALUE=Z.VALUE +1
;
SELECT * FROM TZ;
SElECT SUM(VALUE) AS TOT FROM TZ;
ROLLBACK TRAN
Output first SELECT:
Z_VALUE, Y_VALUE
1 , 2
2 , 3
3 , 4
Output of SELECT after UPDATE:
VALUE
3
4
5
4
So, SUM is actually 16
If this worked—and I'd not expect it to on every database—this might help illustrate what's going on. Y is just an alias of Z. This table represents the joining of the tables during the update and the final results:
Z values Y (alias) joined on Z + 1 Update Value Z following update
======== ========================= ============ ==================
1 NO MATCH
1 2 3 3
2 3 4 4
3 4 5 5
4 NO MATCH 4
(SUM: 16)

sql reset updating rows when a value changes

I have a table as below:
id code value total
==========================
1 A/01 5
2 A/01 8
3 A/01 6
1 A/02 8
2 A/02 3
3 A/02 7
1 A/03 6
2 A/03 9
3 A/03 2
I want to update the total with value of same row + previous row's. I declared a variable as below and update the table:
DECLARE #sum int
SET #sum = 0
UPDATE #table set total = #sum, #sum = #sum + value
It works perfect if I select for the first code:
SELECT * FROM table WHERE code='A/01'
id code value total
==========================
1 A/01 5 5
2 A/01 8 13
3 A/01 6 19
But if I select the whole table, then it does this:
id code value total
==========================
1 A/01 5 5
2 A/01 8 13
3 A/01 6 19
1 A/02 8 27
2 A/02 3 30
3 A/02 7 37
1 A/03 6 43
2 A/03 9 52
3 A/03 2 54
How can I update the table as I explained so it resets adding values when value in "code" column changes?? Please help, I need the following result, Thanks!
id code value total
==========================
1 A/01 5 5
2 A/01 8 13
3 A/01 6 19
1 A/02 8 8
2 A/02 3 11
3 A/02 7 18
1 A/03 6 6
2 A/03 9 15
3 A/03 2 17
The Update-Clause affects the whole set. So you can't define that it should reset the #sum after changing the code.
Try using the Window Function Sum(value) Partition by code and Order By ID
For example
SELECT id, value
Sum(value) OVER (PARTITION BY code ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total
FROM [table]
This set you can use to UPDATE your table.
You can use windowed SUM to calculate total:
WITH cte AS
(
SELECT a.id , a.code , a.[value] ,
total = SUM(a.[value]) OVER (PARTITION BY code ORDER BY id)
FROM #mytable a
)
UPDATE m
SET total = c.total
FROM #mytable m
JOIN cte c
ON m.id = c.id
AND m.code = c.code;
SELECT *
FROM #mytable
ORDER BY code, id;
LiveDemo
You cannot use directly windowed function in UPDATE like:
UPDATE #mytable
SET total = SUM([value]) OVER (PARTITION BY code ORDER BY id)
so you need to use subquery/cte to calculate sum first.
EDIT:
Your first attempt:
DECLARE #sum int = 0;
UPDATE #table set total = #sum, #sum = #sum + value;
Can be dangerous and return unpredictable results. (Forget a moment about multiple code and assume there is only 'A/01' in table. You've assumed that sum will work row by row with increasing id.
But there is a catch what if parallelism is involved?
If you need this kind of operation read more about quirky update

SQL Server 2012: Conditionally Incrementing a counter user ROW_NUMBER()

I am trying to apply ROW_NUMBER() to increment a counter based on particular conditions.
My data looks like this, with the target counter being the Prep column
id DSR PrepIndicator Prep
--------------------------------------
1662835 -1 1 1
1662835 14 2 2
1662835 14 2 3
1662835 20 2 4
1667321 -1 1 1
1667321 30 2 2
1667321 14 2 3
1680648 -1 1 1
1680648 14 2 2
1680648 60 1 1
1680648 14 2 2
1680648 14 2 3
1683870 -1 1 1
1683870 12 2 2
1683870 10 2 3
1683870 60 1 1
1683870 7 2 2
Ignoring the PrepIndicator column for the moment, the business logic I am trying to implement is as follows:
For each of the Id's, starting from 1, increment the Prep counter if the DSR is less than 42.
If it is 42 or greater, reset the Prep counter to 1.
The PrepIndicator, in effect, creates a flag to implement this, in that if PrepIndicator = 1 then Prep = 1. If PrepIndicator = 2, then increment Prep.
I'd prefer to achieve this without the PrepIndicator column if possible.
How would I achieve this conditional increment with ROW_NUMBER()?
I've tried
ROW_NUMBER() OVER (PARTITION BY id, PrepIndicator ORDER BY id)
but it doesn't seem to work when the DSR is >= 42.
Any suggestions or help would be great. Thanks!
First, you will need explicit ordering. "Incrementing the counter" only has meaning if you have a previous value. You can add an IDENTITY column to the table, or use ROW_NUMBER() OVER ORDER BY(/* your logic here */). In your table, you do not even have unique values for the first three columns (see 1680648, 14, 2), so I would think adding an ID is the way to go.
To do what you want to achieve, I believe you must do this in a loop. If you use ROW_NUMBER() you may wish to select into a temporary table. By the nature of your question, the term counter indicates you will have a variable.
UPDATE TableA SET rowId = ROW_NUMBER() OVER(ORDER BY id, DSR, PrepIndicator)
then "conditional" seems to signal a good use of CASE
DECLARE #counter INT = 1
DECLARE #row INT = 1
DECLARE #DSR INT
UPDATE TableA SET Prep = #counter
SET #row = (SELECT rowId FROM TableA WHERE rowId > #row)
WHILE EXISTS( SELECT TOP 1 1 FROM TableA WHERE rowId = #row )
BEGIN
SELECT #DSR = DSR FROM TableA WHERE rowId = #row
SET #counter = CASE WHEN #DSR < 42 THEN #counter + 1 ELSE 1 END
UPDATE TableA SET Prep = #counter WHERE rowId = #row
SET #row = (SELECT rowId FROM TableA WHERE rowId > #row)
END
First, you need to add a primary key because there is no physical order in a SQL table; we can call it IdK. The following code should then give you what you want:
select *, row_number() over (partition by Id, (Select Count (*) from MyTable t2 where t2.idk <= t1.idk and t2.id = t1.id and DSR >= 42) order by idk) prep
from MyTable t1
order by idk
As to why your code doesn't work, this is because the rows are first grouped before the partition/numbering is done. In the case with the two columns id and PrepIndicator for the partition, we get the following intermediary result for the last 5 row before the numbering:
id DSR PrepIndicator Row_Number (Id, PrepIndicator)
1683870 -1 1 1
1683870 60 1 2
1683870 12 2 1
1683870 10 2 2
1683870 7 2 3
Notice that the line with DSR = 60 is now in the second position. This is clearly what you don't want to have. In the case with the Select count(*)..., we have the following result for the last 5 rows after the grouping is done, just before the numbering:
id DSR ...Count() Row_Number (Id, ...Count())
1683870 -1 0 1
1683870 12 0 2
1683870 10 0 3
1683870 60 1 1
1683870 7 1 2
You can notice that in this case, there is no change of position for any row.

Sum a column and get the first row in Transact Sql

I have a table MOUVEMENTS which has 3 columns :
ID IDREF NUMBER
1 1 5
2 1 3
3 1 4
4 1 2
5 2 1
I'd like to fetch the rows of this table with that constraints :
IDREF = 1
Ordered by ID ASC
and the X first SUM of NUMBER (by IDREF)
I imagine that we will first calculate the SUM. And then we will restrict with that column
ID IDREF NUMBER SUM
1 1 5 5
2 1 3 8
3 1 4 12
4 1 2 2
5 2 1 1
In this case, if we want to have 11, we will take the two first column + the third and we will change the number to have a coherent value.
So the result awaited :
ID IDREF NUMBER SUM
1 1 5 5
2 1 3 8
3 1 3 11
Please note the change in the third line on the NUMBER and SUM column.
Do you know how to achieve that ?
This query should work from sql 2000 to 2008 R2
I've created a solution here which uses a view: http://www.sqlfiddle.com/#!3/ebb01/15
The view contains a running total column for each IDRef:
CREATE VIEW MouvementsRunningTotals
AS
SELECT
A.ID,
A.IDRef,
MAX(A.Number) Number,
SUM (B.Number) RunningTotal
FROM
Mouvements A
LEFT JOIN Mouvements B ON A.ID >= B.ID AND A.IDRef = B.IDRef
GROUP BY
A.ID,
A.IDRef
If you can't create a view then you could create this as a temporary table in tsql.
Then the query is a self join on that view, in order to determine which is the last row to be include based on the Number you pass in. Then a CASE statement ensures the correct value for the last row:
DECLARE #total int
DECLARE #idRef int
SELECT #total = 4
SELECT #idRef = 1
SELECT
A.ID,
A.IDRef,
CASE
WHEN A.RunningTotal <= #total THEN A.Number
ELSE #total - B.RunningTotal
END Number
FROM
MouvementsRunningTotals A
LEFT JOIN MouvementsRunningTotals B ON
A.IDRef = B.IDRef
AND A.RunningTotal - A.Number = B.RunningTotal
WHERE
A.IDRef = #IDRef
AND (A.RunningTotal <= #total
OR (A.RunningTotal > #total AND B.RunningTotal < #total))
You can add more data in the Build Schema box and change the Number in the #total parameter in the Query box to test it.
select id, (select top 1 number from mouvements) as number, idref
from mouvements where idref=1 order by id asc