How to track iterations of a value in sql - sql

I am trying to track usage of a blade in a manufacturing process using SSMS 2017. The blade is loaded and used on product until it is seen to dull and then taken out for sharpening while another blade replaces it. We have 30 blades that are cycled for use and sharpening.
Using a table that provides product lot number (sequential) and blade name I would like to separate each batch use of the blade name into groups.
My sql skills are pretty basic so I've been trying row_number, rank, and some attempts at utilizing the lead/lag functions. So far this has only enabled me to break down each product into order based on blade name and identify the product on which a blade change is made. I feel like that could be useful but I'm having trouble figuring out exactly how to do it.
I would like to be able to assign each group of product manufactured with an iteration of a blade a identifying number. For example:
LotNo BladeID Iteration
418211 BH40 1
418212 BH40 1
418213 BH40 1
418214 ES11 2
418215 ES11 2
418216 BH40 3
I'm currently able to produce these incorrect results:
Using:
SELECT b.LotNo,
b.BladeID,
ROW_NUMBER() OVER (PARTITION BY b.BladeID ORDER BY b.BladeID)
FROM blades AS b
ORDER BY b.LotNo ASC;
I get:
LotNo BladeID Iteration
418211 BH40 1
418212 BH40 2
418213 BH40 3
418214 ES11 1
418215 ES11 2
418216 BH40 4

Here's a possible solution to your problem. It first creates a series of groups to identify when the number must change. Then it gets an order to assign the correct value to each group. And finally, it assigns the value for the iteration. I'm including the sample data in a consumable way so anyone can use it for testing purposes.
CREATE TABLE #Sample(
LotNo int,
BladeID varchar(10),
Iteration int
);
INSERT INTO #Sample
VALUES
(418211, 'BH40', 1),
(418212, 'BH40', 1),
(418213, 'BH40', 1),
(418214, 'ES11', 2),
(418215, 'ES11', 2),
(418216, 'BH40', 3);
GO
WITH cteGroups AS(
SELECT *,
ROW_NUMBER() OVER(ORDER BY LotNo) - ROW_NUMBER() OVER(PARTITION BY BladeID ORDER BY LotNo) AS island
FROM #Sample
),
cteOrdering AS(
SELECT *, MIN( LotNo) OVER( PARTITION BY island, BladeID) AS OrderCol
FROM cteGroups
)
SELECT LotNo,
BladeID,
Iteration,
DENSE_RANK() OVER( ORDER BY OrderCol) AS IterationCalc
FROM cteOrdering;

You can do this with lag() and a cumulative sum:
select s.*,
sum(case when prev_BladeID = BladeId then 0 else 1 end) over (order by LotNo) as Iteration
from (select s.*,
lag(s.BladeID) over (order by s.LotNo) as prev_BladeID
from #sample s
) s;
In addition to being simpler code than the difference of row numbers, I think this is also simpler conceptually. This is simply counting the number of times that the BladeID changes from one lot to the next.

Related

ROW_NUMBER function does not start from 1

I would like to ask about strange behaviour in SQL Server whilst using ROW_NUMBER() Function. Typically it should start from 1 and Order values by the selected column in Order By clause, which for the most scenarios works for me just as it is supposed to, but I have a particular case when I use a basic Select Statement:
SELECT
ROW_NUMBER() OVER (ORDER BY VIN) AS RN,
*
FROM dbo.RawData
and I get such result:
RN VIN
6301 JTEBR3FJ00K096082
6302 JTEBR3FJ00K096132
6303 JTEBR3FJ00K096146
6304 JTEBR3FJ00K096163
6305 JTEBR3FJ00K096180
6306 JTEBR3FJ00K096275
1801 5TDDZRFHX0S820530
1802 5TDDZRFHX0S824111
1803 5TDDZRFHX0S824500
1804 5TDDZRFHX0S825971
1805 5TDDZRFHX0S826456
and those are the first columns in the return table. The whole ROW_NUMBER function works randomly, after chain from 6301 to 6306, the chain from 1801 to 1940 starts etc.
The VIN column (the one I sort data based on) is set to nvarchar(17)
could you please help with solving the issue which might occur in this case?
I would be grateful for any tips what might be wrong
You can use ORDER BY to order the rows in a desired way:
SELECT ROW_NUMBER() OVER (ORDER BY VIN) AS RN
,*
FROM dbo.RawData
ORDER BY RN;
As the row_number is calculated in the SELECTE, you can use its value in the ORDER BY clause without the need of nested query.

Count half of rest of a partition by from position

I'm trying to achieve the following results:
now, the group comes from
SUM(CASE WHEN seqnum <= (0.5 * seqnum_rev) THEN i.[P&L] END) OVER(PARTITION BY i.bracket_label ORDER BY i.event_id) AS [P&L 50%],
I need that in each iteration it counts the total of rows from the end till position (seq_inv) and sum the amounts in P&L only for the half of it from that position.
for example, when
seq = 2
seq_inv will be = 13, half of it is 6 so I need to sum the following 6 positions from seq = 2.
when seq = 4 there are 11 positions till the end (seq_inv = 11), so half is 5, so I want to count 5 positions from seq = 4.
I hope this makes sense, I'm trying to come up with a rule that will be able to adapt to the case I have, since the partition by is what gives me the numbers that need to be summed.
I was also thinking if there was something to do with a partition by top 50% or something like that, but I guess that doesn't exist.
I have the advantage that I've helped him before and have a little extra context.
That context is that this is just the later stage of a very long chain of common table expressions. That means self-joins and/or correlated sub-queries are unfortunately expensive.
Preferably, this should be answerable using window functions, as the data set is already available in the appropriate ordering and partitioning.
My reading is this...
The SUM(5:9) (meaning the sum of rows 5 to row 9, inclusive) is equal to SUM(5:end) - SUM(10:end)
That leads me to this...
WITH
cumulative AS
(
SELECT
*,
SUM([P&L]) OVER (PARTITION BY bracket_label ORDER BY event_id DESC) AS cumulative_p_and_l
FROM
data
)
SELECT
*,
cum_val - LEAD(cumulative_p_and_l, seq_inv/2, 0) OVER (PARTITION BY bracket_label ORDER BY event_id) AS p_and_l_50_perc,
cum_val - LEAD(cumulative_p_and_l, seq_inv/4, 0) OVER (PARTITION BY bracket_label ORDER BY event_id) AS p_and_l_25_perc,
FROM
cumulative
NOTE: Using , &, % in column names is horrendous, don't do it ;)
EDIT: Corrected the ORDER BY in the cumulative sum.
I don't think that window functions can do what you want. You could use a correlated subquery instead, with the following logic:
select
t.*,
(
select sum(t1.P&L]
from mytable t1
where t1.seq - t.seq between 0 and t.seq_inv/2
) [P&L 50%]
from mytable t

Sql Islands and Gaps Merge Contiguous records if relevant fields hold same values

I have created a test case here for my problem https://rextester.com/ZRXSQ14415
Its must each easier to show the problem to explain what I am trying to achieve.
I have a list of records across time and I wish to merge contiguous records into a single record.
Each record has a period Date, Risk Levels and a couple of flags. When these risks and flags are the same the records should be merged when they are different then they should be a separate row.
On the Rextester example, i have almost achieved my goal, however look at rows 3 + 4 of the result.
What I want to achieve is that rows 3 + 4 would be combined such that row 3
StartDate End Date Name ... ...
17.03.2019 20.03.2019 CPWJ40-A ... ...
As all flags and risk levels are the same.
Change the SEQ expression to
..
ROW_NUMBER() OVER (ORDER BY PeriodDate) - ROW_NUMBER() OVER (Partition BY ImplicitRisk,QCReadyRisk,IsQualityControlReady, ActivePeriod ORDER BY PeriodDate) AS SEQ
..
This way you'll get the proper grouping of islands of ImplicitRisk,QCReadyRisk,IsQualityControlReady, ActivePeriod.
This answer is purely to complement Serg answer with the full query.
SELECT MIN(d.PeriodDate) AS StartDate,
MAX(d.PeriodDate) AS EndDate,
ImplicitRisk,
QcReadyRisk,
IsQualityControlReady,
ActivePeriod,
LocationEventName
FROM
(
SELECT c.*,
ROW_NUMBER() OVER (ORDER BY PeriodDate) - ROW_NUMBER() OVER (Partition BY LocationEventId, ImplicitRisk, QCReadyRisk, IsQualityControlReady, ActivePeriod ORDER BY PeriodDate) AS grp
FROM tab c
--order by PeriodDate
) d
group by ImplicitRisk, QcReadyRisk, IsQualityControlReady, ActivePeriod, LocationEventName, grp
order by 1

Calculate Rank Pattern without any order value

My Data is like this -
You can check 3 columns, jil_equipment_id,req_group,operand.
Based on these 3 columns i have to generate a new "Patern" Column.
The patern column is a patern and starts from 2 and increases by 1 for each repeated combination of jil_equipment_id,req_group,operand.
The final data will look like this.
Please suggest me any possible approach. I am not able to use the RANK()/DENSE_RANK() Function on this.
You can use row_number(). You want to use the partition by as well:
select t.*,
(1 + row_number() over (partition by jil_equipment_id, req_group, operand
order by content_id
)
) as pattern
from t;
select *,Row_Number() over(partition by jil_equipment_id,req_group,operand order by jil_equipment_id,req_group,operand) + 1 as pattern
from tab
you can use row_number() function for this.

Split the results of a query in half

I'm trying to export rows from one database to Excel and I'm limited to 65000 rows at a shot. That tells me I'm working with an Access database but I'm not sure since this is a 3rd party application (MRI Netsource) with limited query ability. I've tried the options posted at this solution (Is there a way to split the results of a select query into two equal halfs?) but neither of them work -- in fact, they end up duplicating results rather than cutting them in half.
One possibly related issue is that this table does not have a unique ID field. Each record's unique ID can be dynamically formed by the concatenation of several text fields.
This produces 91934 results:
SELECT * from note
This produces 122731 results:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY notedate) AS rn FROM note
) T1
WHERE rn % 2 = 1
EDIT: Likewise, this produces 91934 results, half of them with a tile_nr value of 1, the other half with a value of 2:
SELECT *, NTILE(2) OVER (ORDER BY notedate) AS tile_nr FROM note
However this produces 122778 results, all of which have a tile_nr value of 1:
SELECT bldgid, leasid, notedate, ref1, ref2, tile_nr
FROM (
SELECT *, NTILE(2) OVER (ORDER BY notedate) AS tile_nr FROM note) x
WHERE x.tile_nr = 1
I know that I could just use a COUNT to get the exact number of records, run one query using TOP 65000 ORDER BY notedate, and then another that says TOP 26934 ORDER BY notedate DESC, for example, but as this dataset changes a lot I'd prefer some way to automate this to save time.