SQL Server 2012: Conditionally Incrementing a counter user ROW_NUMBER() - sql

I am trying to apply ROW_NUMBER() to increment a counter based on particular conditions.
My data looks like this, with the target counter being the Prep column
id DSR PrepIndicator Prep
--------------------------------------
1662835 -1 1 1
1662835 14 2 2
1662835 14 2 3
1662835 20 2 4
1667321 -1 1 1
1667321 30 2 2
1667321 14 2 3
1680648 -1 1 1
1680648 14 2 2
1680648 60 1 1
1680648 14 2 2
1680648 14 2 3
1683870 -1 1 1
1683870 12 2 2
1683870 10 2 3
1683870 60 1 1
1683870 7 2 2
Ignoring the PrepIndicator column for the moment, the business logic I am trying to implement is as follows:
For each of the Id's, starting from 1, increment the Prep counter if the DSR is less than 42.
If it is 42 or greater, reset the Prep counter to 1.
The PrepIndicator, in effect, creates a flag to implement this, in that if PrepIndicator = 1 then Prep = 1. If PrepIndicator = 2, then increment Prep.
I'd prefer to achieve this without the PrepIndicator column if possible.
How would I achieve this conditional increment with ROW_NUMBER()?
I've tried
ROW_NUMBER() OVER (PARTITION BY id, PrepIndicator ORDER BY id)
but it doesn't seem to work when the DSR is >= 42.
Any suggestions or help would be great. Thanks!

First, you will need explicit ordering. "Incrementing the counter" only has meaning if you have a previous value. You can add an IDENTITY column to the table, or use ROW_NUMBER() OVER ORDER BY(/* your logic here */). In your table, you do not even have unique values for the first three columns (see 1680648, 14, 2), so I would think adding an ID is the way to go.
To do what you want to achieve, I believe you must do this in a loop. If you use ROW_NUMBER() you may wish to select into a temporary table. By the nature of your question, the term counter indicates you will have a variable.
UPDATE TableA SET rowId = ROW_NUMBER() OVER(ORDER BY id, DSR, PrepIndicator)
then "conditional" seems to signal a good use of CASE
DECLARE #counter INT = 1
DECLARE #row INT = 1
DECLARE #DSR INT
UPDATE TableA SET Prep = #counter
SET #row = (SELECT rowId FROM TableA WHERE rowId > #row)
WHILE EXISTS( SELECT TOP 1 1 FROM TableA WHERE rowId = #row )
BEGIN
SELECT #DSR = DSR FROM TableA WHERE rowId = #row
SET #counter = CASE WHEN #DSR < 42 THEN #counter + 1 ELSE 1 END
UPDATE TableA SET Prep = #counter WHERE rowId = #row
SET #row = (SELECT rowId FROM TableA WHERE rowId > #row)
END

First, you need to add a primary key because there is no physical order in a SQL table; we can call it IdK. The following code should then give you what you want:
select *, row_number() over (partition by Id, (Select Count (*) from MyTable t2 where t2.idk <= t1.idk and t2.id = t1.id and DSR >= 42) order by idk) prep
from MyTable t1
order by idk
As to why your code doesn't work, this is because the rows are first grouped before the partition/numbering is done. In the case with the two columns id and PrepIndicator for the partition, we get the following intermediary result for the last 5 row before the numbering:
id DSR PrepIndicator Row_Number (Id, PrepIndicator)
1683870 -1 1 1
1683870 60 1 2
1683870 12 2 1
1683870 10 2 2
1683870 7 2 3
Notice that the line with DSR = 60 is now in the second position. This is clearly what you don't want to have. In the case with the Select count(*)..., we have the following result for the last 5 rows after the grouping is done, just before the numbering:
id DSR ...Count() Row_Number (Id, ...Count())
1683870 -1 0 1
1683870 12 0 2
1683870 10 0 3
1683870 60 1 1
1683870 7 1 2
You can notice that in this case, there is no change of position for any row.

Related

Updating Rows in a normalized table

normalized table
ID SEQ Type Value Flag
1 1 a 100 -
1 2 a 200 -
1 3 a 250 -
1 4 b 200 -
2 1 a 150 -
2 2 b 100 -
2 3 b 200 -
How do I write a single update statement such that the resulting table is populated as follows
ID SEQ Type Value Flag
1 1 a 100 valid
1 2 a 200 repeat
1 3 a 250 repeat
1 4 b 200 valid
2 1 a 150 valid
2 2 b 100 valid
2 3 b 200 repeat
Edit: included seq column
only the first occurence of the value for a type for a ID group should have the valid flag
should it be written as two separate update statements?
can someone clarify me?
Much appreciated
Populate the table first using row_number() and then update the table.
Option 1:
select
Id,
Type,
Value,
null as Flag,
row_number() over (partition by ID, Type order by SEQ) as rnk
from yourTable
then you can use update
update yourTable
set flag = case
when rnk = 1 then 'valid'
else 'repeat'
end
Option 2:
You may be able to do without using update statement as following
select
Id,
SEQ,
Type,
Value,
case
when rnk = 1 then 'valid'
else 'repeat'
end as flag
from
(
select
Id,
SEQ,
Type,
Value,
row_number() over (partition by ID, Type order by SEQ) as rnk
from yourTable
) val

Replace a column value with random values

I want to replace values in a column with randomized values
NO LINE
-- ----
1 1
1 2
1 3
1 4
2 1
2 2
3 1
4 1
4 2
I want to randomize column NO and replace with random values. I have 5 million records and doing something like below script gives me 5 million unique NO's but as you can see NO is not unique and i want the same random value assigned for the same NO.
UPDATE table1
SET NO= abs(checksum(NewId())) % 100000000
I want my resultant dataset like below
NO LINE
------ ----
99 1
99 2
99 3
99 4
1092 1
1092 2
3456 1
41098 1
41098 2
I would recommend rand() with a seed:
UPDATE table1
SET NO = FLOOR(rand(NO) * 100000000);
This runs a slight risk of collisions, so two different NO rows could get the same value.
If the numbers do not need to be "random" you can give them consecutive values in an arbitrary order and avoid collisions:
with toupdate as (
select t1.*,
dense_rank() over (order by rand(NO), no) as new_no
from t
)
update toupdate
set no = new_no;

Is there a way to update groups of rows with separate incrementing values in one query

Lets say you have the following table:
Id Index
1 3
1 1
2 1
3 3
1 5
what I would like to have is the following:
Id Index
1 0
1 1
2 0
3 0
1 2
As you might notice, the goal is for every row where Id is the same, to incrementally update the Index column, starting from zero.
Now, I know this is fairly simple with using cursors, but out of curiosity is there a way to do this with single UPDATE query, somehow combining with temp tables, common table expressions or something similar?
Yes, assuming that the you don't really care about the order of the values for the new index values. SQL Server offers updatable CTEs and window functions that do exactly what you want:
with toupdate as (
select t.*, row_number() over (partition by id order by (select NULL)) as newindex
from table t
)
update toupdate
set index = newindex;
If you want them in a specific order, then you need another column to specify the ordering. The existing index column doesn't work.
With Row_number() -1 and CTE you can write as:
CREATE TABLE #temp1(
Id int,
[Index] int)
INSERT INTO #temp1 VALUES (1,3),(1,1),(2,1),(3,3),(1,5);
--select * from #temp1;
With CTE as
(
select t.*, row_number() over (partition by id order by (select null))-1 as newindex
from #temp1 t
)
Update CTE
set [Index] = newindex;
select * from #temp1;
Demo
I'm not sure why you would want to do this really, but I had fun figuring it out!
This solution relies on your table having a primary key for the self join... but you could always create an auto inc index if none exists and this is a one off job... This will also have the added benefit of getting you to think about the precise ordering of this you want... as currently there is no way of saying which order [ID] will get [Index] in.
UPDATE dbo.Example
SET [Index] = b.newIndex
FROM dbo.Example a
INNER JOIN (
select
z.ID,
z.[Index],
(row_number() over (partition by ID order by (select NULL))) as newIndex
from Example z
) b ON a.ID = b.ID AND a.[Index]=b.[Index] --Is this a unique self join for your table?.. no PK provided. You might need to make an index first.
Probably, this is what you want
SELECT *,RANK() OVER(PARTITION BY Id ORDER BY [Index])-1 AS NewIndex FROM
(
SELECT 1 AS Id,3 [Index]
UNION
SELECT 1,1
UNION
SELECT 2,1
UNION
SELECT 3,3
UNION
SELECT 1,5
) AS T
& the result will come as
Now if you want to update the table then execute this script
UPDATE tblname SET Index=RANK() OVER(PARTITION BY t.Id ORDER BY t.[Index])-1
FROM tblname AS t
In case I am missing something or any further assistance is required please let me know.
CREATE TABLE #temp1(
Id int,
Value int)
INSERT INTO #temp1 VALUES (1,2),(1,3),(2,3),(4,5)
SELECT
Id
,Value
,ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Id) Id
FROM #temp1
Start with this :)
Gave me results like
Id Value Count
1 2 1
1 3 2
1 2 3
1 3 4
1 2 5
1 3 6
1 2 7
1 3 8
2 3 1
2 4 2
2 5 3
2 3 4
2 4 5
2 5 6
2 4 7
2 5 8
2 3 9
2 3 10
3 4 1
4 5 1
4 5 2
4 5 3
4 5 4

Sum a column and get the first row in Transact Sql

I have a table MOUVEMENTS which has 3 columns :
ID IDREF NUMBER
1 1 5
2 1 3
3 1 4
4 1 2
5 2 1
I'd like to fetch the rows of this table with that constraints :
IDREF = 1
Ordered by ID ASC
and the X first SUM of NUMBER (by IDREF)
I imagine that we will first calculate the SUM. And then we will restrict with that column
ID IDREF NUMBER SUM
1 1 5 5
2 1 3 8
3 1 4 12
4 1 2 2
5 2 1 1
In this case, if we want to have 11, we will take the two first column + the third and we will change the number to have a coherent value.
So the result awaited :
ID IDREF NUMBER SUM
1 1 5 5
2 1 3 8
3 1 3 11
Please note the change in the third line on the NUMBER and SUM column.
Do you know how to achieve that ?
This query should work from sql 2000 to 2008 R2
I've created a solution here which uses a view: http://www.sqlfiddle.com/#!3/ebb01/15
The view contains a running total column for each IDRef:
CREATE VIEW MouvementsRunningTotals
AS
SELECT
A.ID,
A.IDRef,
MAX(A.Number) Number,
SUM (B.Number) RunningTotal
FROM
Mouvements A
LEFT JOIN Mouvements B ON A.ID >= B.ID AND A.IDRef = B.IDRef
GROUP BY
A.ID,
A.IDRef
If you can't create a view then you could create this as a temporary table in tsql.
Then the query is a self join on that view, in order to determine which is the last row to be include based on the Number you pass in. Then a CASE statement ensures the correct value for the last row:
DECLARE #total int
DECLARE #idRef int
SELECT #total = 4
SELECT #idRef = 1
SELECT
A.ID,
A.IDRef,
CASE
WHEN A.RunningTotal <= #total THEN A.Number
ELSE #total - B.RunningTotal
END Number
FROM
MouvementsRunningTotals A
LEFT JOIN MouvementsRunningTotals B ON
A.IDRef = B.IDRef
AND A.RunningTotal - A.Number = B.RunningTotal
WHERE
A.IDRef = #IDRef
AND (A.RunningTotal <= #total
OR (A.RunningTotal > #total AND B.RunningTotal < #total))
You can add more data in the Build Schema box and change the Number in the #total parameter in the Query box to test it.
select id, (select top 1 number from mouvements) as number, idref
from mouvements where idref=1 order by id asc

How to track how many times a column changed its value?

I have a table called crewWork as follows :
CREATE TABLE crewWork(
FloorNumber int, AptNumber int, WorkType int, simTime int )
After the table was populated, I need to know how many times a change in apt occurred and how many times a change in floor occurred. Usually I expect to find 10 rows on each apt and 40-50 on each floor.
I could just write a scalar function for that, but I was wondering if there's any way to do that in t-SQL without having to write scalar functions.
Thanks
The data will look like this:
FloorNumber AptNumber WorkType simTime
1 1 12 10
1 1 12 25
1 1 13 35
1 1 13 47
1 2 12 52
1 2 12 59
1 2 13 68
1 1 14 75
1 4 12 79
1 4 12 89
1 4 13 92
1 4 14 105
1 3 12 115
1 3 13 129
1 3 14 138
2 1 12 142
2 1 12 150
2 1 14 168
2 1 14 171
2 3 12 180
2 3 13 190
2 3 13 200
2 3 14 205
3 3 14 216
3 4 12 228
3 4 12 231
3 4 14 249
3 4 13 260
3 1 12 280
3 1 13 295
2 1 14 315
2 2 12 328
2 2 14 346
I need the information for a report, I don't need to store it anywhere.
If you use the accepted answer as written now (1/6/2023), you get correct results with the OP dataset, but I think you can get wrong results with other data.
CONFIRMED: ACCEPTED ANSWER HAS A MISTAKE (as of 1/6/2023)
I explain the potential for wrong results in my comments on the accepted answer.
In this db<>fiddle, I demonstrate the wrong results. I use a slightly modified form of accepted answer (my syntax works in SQL Server and PostgreSQL). I use a slightly modified form of the OP's data (I change two rows). I demonstrate how the accepted answer can be changed slightly, to produce correct results.
The accepted answer is clever but needs a small change to produce correct results (as demonstrated in the above db<>fiddle and described here:
Instead of doing this as seen in the accepted answer COUNT(DISTINCT AptGroup)...
You should do thisCOUNT(DISTINCT CONCAT(AptGroup, '_', AptNumber))...
DDL:
SELECT * INTO crewWork FROM (VALUES
-- data from question, with a couple changes to demonstrate problems with the accepted answer
-- https://stackoverflow.com/q/8666295/1175496
--FloorNumber AptNumber WorkType simTime
(1, 1, 12, 10 ),
-- (1, 1, 12, 25 ), -- original
(2, 1, 12, 25 ), -- new, changing FloorNumber 1->2->1
(1, 1, 13, 35 ),
(1, 1, 13, 47 ),
(1, 2, 12, 52 ),
(1, 2, 12, 59 ),
(1, 2, 13, 68 ),
(1, 1, 14, 75 ),
(1, 4, 12, 79 ),
-- (1, 4, 12, 89 ), -- original
(1, 1, 12, 89 ), -- new , changing AptNumber 4->1->4 ges)
(1, 4, 13, 92 ),
(1, 4, 14, 105 ),
(1, 3, 12, 115 ),
...
DML:
;
WITH groupedWithConcats as (SELECT
*,
CONCAT(AptGroup,'_', AptNumber) as AptCombo,
CONCAT(FloorGroup,'_',FloorNumber) as FloorCombo
-- SQL SERVER doesnt have TEMPORARY keyword; Postgres doesn't understand # for temp tables
-- INTO TEMPORARY groupedWithConcats
FROM
(
SELECT
-- the columns shown in Andriy's answer:
-- https://stackoverflow.com/a/8667477/1175496
ROW_NUMBER() OVER ( ORDER BY simTime) as RN,
-- AptNumber
AptNumber,
ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime) as RN_Apt,
ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime) as AptGroup,
-- FloorNumber
FloorNumber,
ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime) as RN_Floor,
ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime) as FloorGroup
FROM crewWork
) grouped
)
-- if you want to see how the groupings work:
-- SELECT * FROM groupedWithConcats
-- otherwise just run this query to see the counts of "changes":
SELECT
COUNT(DISTINCT AptCombo)-1 as CountAptChangesWithConcat_Correct,
COUNT(DISTINCT AptGroup)-1 as CountAptChangesWithoutConcat_Wrong,
COUNT(DISTINCT FloorCombo)-1 as CountFloorChangesWithConcat_Correct,
COUNT(DISTINCT FloorGroup)-1 as CountFloorChangesWithoutConcat_Wrong
FROM groupedWithConcats;
ALTERNATIVE ANSWER
The accepted-answer may eventually get updated to remove the mistake. If that happens I can remove my warning but I still want leave you with this alternative way to produce the answer.
My approach goes like this: "check the previous row, if the value is different in previous row vs current row, then there is a change". SQL doesn't have idea or row order functions per se (at least not like in Excel for example; )
Instead, SQL has window functions. With SQL's window functions, you can use the window function RANK plus a self-JOIN technique as seen here to combine current row values and previous row values so you can compare them. Here is a db<>fiddle showing my approach, which I pasted below.
The intermediate table, showing the columns which has a value 1 if there is a change, 0 otherwise (i.e. FloorChange, AptChange), is shown at the bottom of the post...
DDL:
...same as above...
DML:
;
WITH rowNumbered AS (
SELECT
*,
ROW_NUMBER() OVER ( ORDER BY simTime) as RN
FROM crewWork
)
,joinedOnItself AS (
SELECT
rowNumbered.*,
rowNumberedRowShift.FloorNumber as FloorShift,
rowNumberedRowShift.AptNumber as AptShift,
CASE WHEN rowNumbered.FloorNumber <> rowNumberedRowShift.FloorNumber THEN 1 ELSE 0 END as FloorChange,
CASE WHEN rowNumbered.AptNumber <> rowNumberedRowShift.AptNumber THEN 1 ELSE 0 END as AptChange
FROM rowNumbered
LEFT OUTER JOIN rowNumbered as rowNumberedRowShift
ON rowNumbered.RN = (rowNumberedRowShift.RN+1)
)
-- if you want to see:
-- SELECT * FROM joinedOnItself;
SELECT
SUM(FloorChange) as FloorChanges,
SUM(AptChange) as AptChanges
FROM joinedOnItself;
Below see the first few rows of the intermediate table (joinedOnItself). This shows how my approach works. Note the last two columns, which have a value of 1 when there is a change in FloorNumber compared to FloorShift (noted in FloorChange), or a change in AptNumber compared to AptShift (noted in AptChange).
floornumber
aptnumber
worktype
simtime
rn
floorshift
aptshift
floorchange
aptchange
1
1
12
10
1
0
0
2
1
12
25
2
1
1
1
0
1
1
13
35
3
2
1
1
0
1
1
13
47
4
1
1
0
0
1
2
12
52
5
1
1
0
1
1
2
12
59
6
1
2
0
0
1
2
13
68
7
1
2
0
0
Note instead of using the window function RANK and JOIN, you could use the window function LAG to compare values in the current row to the previous row directly (no need to JOIN). I don't have that solution here, but it is described in the Wikipedia article example:
Window functions allow access to data in the records right before and after the current record.
If I am not missing anything, you could use the following method to find the number of changes:
determine groups of sequential rows with identical values;
count those groups;
subtract 1.
Apply the method individually for AptNumber and for FloorNumber.
The groups could be determined like in this answer, only there's isn't a Seq column in your case. Instead, another ROW_NUMBER() expression could be used. Here's an approximate solution:
;
WITH marked AS (
SELECT
FloorGroup = ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime),
AptGroup = ROW_NUMBER() OVER ( ORDER BY simTime)
- ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime)
FROM crewWork
)
SELECT
FloorChanges = COUNT(DISTINCT FloorGroup) - 1,
AptChanges = COUNT(DISTINCT AptGroup) - 1
FROM marked
(I'm assuming here that the simTime column defines the timeline of changes.)
UPDATE
Below is a table that shows how the distinct groups are obtained for AptNumber.
AptNumber RN RN_Apt AptGroup (= RN - RN_Apt)
--------- -- ------ ---------
1 1 1 0
1 2 2 0
1 3 3 0
1 4 4 0
2 5 1 4
2 6 2 4
2 7 3 4
1 8 5 => 3
4 9 1 8
4 10 2 8
4 11 3 8
4 12 4 8
3 13 1 12
3 14 2 12
3 15 3 12
1 16 6 10
… … … …
Here RN is a pseudo-column that stands for ROW_NUMBER() OVER (ORDER BY simTime). You can see that this is just a sequence of rankings starting from 1.
Another pseudo-column, RN_Apt contains values produces by the other ROW_NUMBER, namely ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime). It contains rankings within individual groups of identical AptNumber values. You can see that, for a newly encountered value, the sequence starts over, and for a recurring one, it continues where it stopped last time.
You can also see from the table that if we subtract RN from RN_Apt (could be the other way round, doesn't matter in this situation), we get the value that uniquely identifies every distinct group of same AptNumber values. You might as well call that value a group ID.
So, now that we've got these IDs, it only remains for us to count them (count distinct values, of course). That will be the number of groups, and the number of changes is one less (assuming the first group is not counted as a change).
add an extra column changecount
CREATE TABLE crewWork(
FloorNumber int, AptNumber int, WorkType int, simTime int ,changecount int)
increment changecount value for each updation
if want to know count for each field then add columns corresponding to it for changecount
Assuming that each record represents a different change, you can find changes per floor by:
select FloorNumber, count(*)
from crewWork
group by FloorNumber
And changes per apartment (assuming AptNumber uniquely identifies apartment) by:
select AptNumber, count(*)
from crewWork
group by AptNumber
Or (assuming AptNumber and FloorNumber together uniquely identifies apartment) by:
select FloorNumber, AptNumber, count(*)
from crewWork
group by FloorNumber, AptNumber