Grouping sequence number in SQL - sql

I have a table like below.
DECLARE #Table TABLE (
[Text] varchar(100),
[Order] int,
[RequiredResult] int
);
INSERT INTO #Table
VALUES
('A',1,1),
('B',2,1),
('C',3,1),
('D',1,2),
('A',2,2),
('B',3,2),
('G',4,2),
('H',1,3),
('B',2,3);
I have used dense_rank, but the results are not correct.
select [Text], [Order], RequiredResult
, DENSE_RANK() OVER (ORDER BY [text],[Order]) AS ComputedResult
from #Table;
Results:
Text
Order
RequiredResult
ComputedResult
A
1
1
1
A
2
2
2
B
2
1
3
B
2
3
3
B
3
2
4
C
3
1
5
D
1
2
6
G
4
2
7
H
1
3
8
Please help me to calculate the RequiredResult column.

It looks like the RequiredResult column is simple a running sequence that resets after each broken sequence in the Order column when you process the records in the order they were inserted.
This is a typical Data Island analysis task, except in this case the islands are the rows that are sequential sets, the boundary is when the numbering resets back to 1.
Record the input sequence by adding an IDENTITY column to the table variable.
Calculate an island identifier
Due to the rule about the rows being in sequence based on the Order column, we can calculate a unique number for the Island by subtracting the Order from the IDENTITY column, in this case Id
We can then use DENSE_RANK() ordering by the Island Number
Putting all that together:
DECLARE #Table TABLE (
[Id] int IDENTITY(1,1),
[Text] varchar(100),
[Order] int,
[RequiredResult] int
);
INSERT INTO #Table
VALUES
('A',1,1),
('B',2,1),
('C',3,1),
('D',1,2),
('A',2,2),
('B',3,2),
('G',4,2),
('H',1,3),
('B',2,3);
SELECT [Text],[Order]
, [Id]-[Order] as Island
, RequiredResult
, DENSE_RANK() OVER (ORDER BY [ID]-[ORDER]) AS CalculatedResult
FROM #Table
ORDER BY [ID]
Text
Order
Island
RequiredResult
CalculatedResult
A
1
0
1
1
B
2
0
1
1
C
3
0
1
1
D
1
3
2
2
A
2
3
2
2
B
3
3
2
2
G
4
3
2
2
H
1
7
3
3
B
2
7
3
3
The key here is that we need to record the input sequence so we can us it in the calculation. It doesn't matter what actual numbering value the Id column has, only that it is also in sequence. If that number sequence is broken, then you could use the ROW_NUMER() function result to calculate the Island Number but the specifics on that would depend on the initial query that provides the basic sequential dataset.

You seem to have an ordering in mind for the rows. SQL tables represent unordered (multi)sets. The only column in your data that has the appropriate ordering is text, but your real data might have another column with this information.
Basically, you just want a cumulative sum of the number of 1s up to each row. That would be:
select t.*,
sum(case when ord = 1 then 1 else 0 end) over (order by text)
from t

Related

Finding adjacent column values from the last non-null value of a certain column in Snowflake (SQL) using partition by

Say I have the following table:
ID
T
R
1
2
1
3
Y
1
4
1
5
1
6
Y
1
7
I would like to add a column which equals the value from column T based on the last non-null value from column R. This means the following:
ID
T
R
GOAL
1
2
1
3
Y
1
4
Y
3
1
5
4
1
6
Y
4
1
7
6
I do have many ID's so I need to make use of the OVER (PARTITION BY ...) clause. Also, if possible, I would like to use a single statement, like
SELECT *
, GOAL
FROM TABLE
So without any extra select statement.
T is in ascending order so just null it out according to R and take the maximum looking backward.
select *,
max(case when R is not null then T end)
over (
partition by id
order by T
rows between unbounded preceding and 1 preceding
) as GOAL
from TBL
http://sqlfiddle.com/#!18/c927a5/5

Update new foreign key column of existing table with ids from another table in SQL Server

I have an existing table to which I have added a new column which is supposed to hold the Id of a record in another (new) table.
Simplified structure is sort of like this:
Customer table
[CustomerId] [GroupId] [LicenceId] <-- new column
Licence table <-- new table
[LicenceId] [GroupId]
The Licence table has a certain number of licences per group than can be assigned to customers in that same group. There are multiple groups, and each group has a variable number of customers and licences.
So say there are 100 licences available for group 1 and there are 50 customers in group 1, so each can get a license. There are never more customers than there are licences.
Sample
Customer
[CustomerId] [GroupId] [LicenceId]
1 1 NULL
2 1 NULL
3 1 NULL
4 1 NULL
5 2 NULL
6 2 NULL
7 2 NULL
8 3 NULL
9 3 NULL
Licence
[LicenceId] [GroupId]
1 1
2 1
3 1
4 1
5 1
6 1
7 2
8 2
9 2
10 2
11 2
12 3
13 3
14 3
15 3
16 3
17 3
Desired outcome
Customer
[CustomerId] [GroupId] [LicenceId]
1 1 1
2 1 2
3 1 3
4 1 4
5 2 7
6 2 8
7 2 9
8 3 12
9 3 13
So now I have to do this one time update to give every customer a licence and I have no idea how to go about it.
I'm not allowed to use a cursor. I can't seem to do a MERGE UPDATE, because joining the Customer to the Licence table by GroupId will result in multiple hits.
How do I assign each customer the next available LicenceId within their group in one query?
Is this even possible?
You can use window functions:
with c as (
select c.*, row_number() over (partition by groupid order by newid()) as seqnum
from customers c
),
l as (
select l.*, row_number() over (partition by groupid order by newid()) as seqnum
from licenses c
)
update c
set c.licenceid = l.licenseid
from c join
l
on c.seqnum = l.seqnum and c.groupid = l.groupid;
This assigns the licenses randomly. That is really just for fun. The most efficient method is to use:
row_number() over (partition by groupid order by (select null)) as seqnum
SQL Server often avoids an additional sort operation in this case.
But you might want to order them by something else -- for instance by the ordering of the customer ids, or by some date column, or something else.
Gordon has put it very well in his answer.
Let me break it down into simpler steps for you.
Step 1. Use the ROW_NUMBER() function to assign a SeqNum to the Customers. Use PARTITION BY GroupId so that the number starts from 1 in every group. I would ORDER BY CustomerId
Step 2. Use the ROW_NUMBER() function to assign a SeqNum to the Licences. Use PARTITION BY GroupId so that the number starts from 1 in every group. ORDER BY LicenseId because your ask is to "assign each customer the next available LicenceId within their group".
Now use these 2 queries to update LicenseId in Customer table.

Calculate "position in run" in SQL

I have a table of consecutive ids (integers, 1 ... n), and values (integers), like this:
Input Table:
id value
-- -----
1 1
2 1
3 2
4 3
5 1
6 1
7 1
Going down the table i.e. in order of increasing id, I want to count how many times in a row the same value has been seen consecutively, i.e. the position in a run:
Output Table:
id value position in run
-- ----- ---------------
1 1 1
2 1 2
3 2 1
4 3 1
5 1 1
6 1 2
7 1 3
Any ideas? I've searched for a combination of windowing functions including lead and lag, but can't come up with it. Note that the same value can appear in the value column as part of different runs, so partitioning by value may not help solve this. I'm on Hive 1.2.
One way is to use a difference of row numbers approach to classify consecutive same values into one group. Then a row number function to get the desired positions in each group.
Query to assign groups (Running this will help you understand how the groups are assigned.)
select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
Final Query using row_number to get positions in each group assigned with the above query.
select id,value,row_number() over(partition by value,rnum_diff order by id) as pos_in_grp
from (select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
) t

Sum a column and get the first row in Transact Sql

I have a table MOUVEMENTS which has 3 columns :
ID IDREF NUMBER
1 1 5
2 1 3
3 1 4
4 1 2
5 2 1
I'd like to fetch the rows of this table with that constraints :
IDREF = 1
Ordered by ID ASC
and the X first SUM of NUMBER (by IDREF)
I imagine that we will first calculate the SUM. And then we will restrict with that column
ID IDREF NUMBER SUM
1 1 5 5
2 1 3 8
3 1 4 12
4 1 2 2
5 2 1 1
In this case, if we want to have 11, we will take the two first column + the third and we will change the number to have a coherent value.
So the result awaited :
ID IDREF NUMBER SUM
1 1 5 5
2 1 3 8
3 1 3 11
Please note the change in the third line on the NUMBER and SUM column.
Do you know how to achieve that ?
This query should work from sql 2000 to 2008 R2
I've created a solution here which uses a view: http://www.sqlfiddle.com/#!3/ebb01/15
The view contains a running total column for each IDRef:
CREATE VIEW MouvementsRunningTotals
AS
SELECT
A.ID,
A.IDRef,
MAX(A.Number) Number,
SUM (B.Number) RunningTotal
FROM
Mouvements A
LEFT JOIN Mouvements B ON A.ID >= B.ID AND A.IDRef = B.IDRef
GROUP BY
A.ID,
A.IDRef
If you can't create a view then you could create this as a temporary table in tsql.
Then the query is a self join on that view, in order to determine which is the last row to be include based on the Number you pass in. Then a CASE statement ensures the correct value for the last row:
DECLARE #total int
DECLARE #idRef int
SELECT #total = 4
SELECT #idRef = 1
SELECT
A.ID,
A.IDRef,
CASE
WHEN A.RunningTotal <= #total THEN A.Number
ELSE #total - B.RunningTotal
END Number
FROM
MouvementsRunningTotals A
LEFT JOIN MouvementsRunningTotals B ON
A.IDRef = B.IDRef
AND A.RunningTotal - A.Number = B.RunningTotal
WHERE
A.IDRef = #IDRef
AND (A.RunningTotal <= #total
OR (A.RunningTotal > #total AND B.RunningTotal < #total))
You can add more data in the Build Schema box and change the Number in the #total parameter in the Query box to test it.
select id, (select top 1 number from mouvements) as number, idref
from mouvements where idref=1 order by id asc

SQL MAX(column) With Additional Criteria

I have a single table, where I want to return a list of the MAX(id) GROUPed by another identifier. However I have a third column that, when it meets a certain criteria, "trumps" rows that don't meet that criteria.
Probably easier to explain with an example. Sample table has:
UniqueId (int)
GroupId (int)
IsPriority (bit)
Raw data:
UniqueId GroupId IsPriority
-----------------------------------
1 1 F
2 1 F
3 1 F
4 1 F
5 1 F
6 2 T
7 2 T
8 2 F
9 2 F
10 2 F
So, because no row in groupId 1 has IsPriority set, we return the highest UniqueId (5). Since groupId 2 has rows with IsPriority set, we return the highest UniqueId with that value (7).
So output would be:
5
7
I can think of ways to brute force this, but I am looking to see if I can do this in a single query.
SQL Fiddle Demo
WITH T
AS (SELECT *,
ROW_NUMBER() OVER (PARTITION BY GroupId
ORDER BY IsPriority DESC, UniqueId DESC ) AS RN
FROM YourTable)
SELECT UniqueId,
GroupId,
IsPriority
FROM T
WHERE RN = 1