Sql Islands and Gaps Merge Contiguous records if relevant fields hold same values - sql

I have created a test case here for my problem https://rextester.com/ZRXSQ14415
Its must each easier to show the problem to explain what I am trying to achieve.
I have a list of records across time and I wish to merge contiguous records into a single record.
Each record has a period Date, Risk Levels and a couple of flags. When these risks and flags are the same the records should be merged when they are different then they should be a separate row.
On the Rextester example, i have almost achieved my goal, however look at rows 3 + 4 of the result.
What I want to achieve is that rows 3 + 4 would be combined such that row 3
StartDate End Date Name ... ...
17.03.2019 20.03.2019 CPWJ40-A ... ...
As all flags and risk levels are the same.

Change the SEQ expression to
..
ROW_NUMBER() OVER (ORDER BY PeriodDate) - ROW_NUMBER() OVER (Partition BY ImplicitRisk,QCReadyRisk,IsQualityControlReady, ActivePeriod ORDER BY PeriodDate) AS SEQ
..
This way you'll get the proper grouping of islands of ImplicitRisk,QCReadyRisk,IsQualityControlReady, ActivePeriod.

This answer is purely to complement Serg answer with the full query.
SELECT MIN(d.PeriodDate) AS StartDate,
MAX(d.PeriodDate) AS EndDate,
ImplicitRisk,
QcReadyRisk,
IsQualityControlReady,
ActivePeriod,
LocationEventName
FROM
(
SELECT c.*,
ROW_NUMBER() OVER (ORDER BY PeriodDate) - ROW_NUMBER() OVER (Partition BY LocationEventId, ImplicitRisk, QCReadyRisk, IsQualityControlReady, ActivePeriod ORDER BY PeriodDate) AS grp
FROM tab c
--order by PeriodDate
) d
group by ImplicitRisk, QcReadyRisk, IsQualityControlReady, ActivePeriod, LocationEventName, grp
order by 1

Related

ROW_NUMBER function does not start from 1

I would like to ask about strange behaviour in SQL Server whilst using ROW_NUMBER() Function. Typically it should start from 1 and Order values by the selected column in Order By clause, which for the most scenarios works for me just as it is supposed to, but I have a particular case when I use a basic Select Statement:
SELECT
ROW_NUMBER() OVER (ORDER BY VIN) AS RN,
*
FROM dbo.RawData
and I get such result:
RN VIN
6301 JTEBR3FJ00K096082
6302 JTEBR3FJ00K096132
6303 JTEBR3FJ00K096146
6304 JTEBR3FJ00K096163
6305 JTEBR3FJ00K096180
6306 JTEBR3FJ00K096275
1801 5TDDZRFHX0S820530
1802 5TDDZRFHX0S824111
1803 5TDDZRFHX0S824500
1804 5TDDZRFHX0S825971
1805 5TDDZRFHX0S826456
and those are the first columns in the return table. The whole ROW_NUMBER function works randomly, after chain from 6301 to 6306, the chain from 1801 to 1940 starts etc.
The VIN column (the one I sort data based on) is set to nvarchar(17)
could you please help with solving the issue which might occur in this case?
I would be grateful for any tips what might be wrong
You can use ORDER BY to order the rows in a desired way:
SELECT ROW_NUMBER() OVER (ORDER BY VIN) AS RN
,*
FROM dbo.RawData
ORDER BY RN;
As the row_number is calculated in the SELECTE, you can use its value in the ORDER BY clause without the need of nested query.

How to get first row of 3 specific values of a column using Oracle SQL?

I have a table which has ID, FAMILY, ENV_XML_PATH and CREATED_DATE columns.
ID
FAMILY
ENV_XML_PATH
CREATED_DATE
15826841
CRM
path1.xml
03-09-22 6:50:34AM
15826856
SCM
path3.xml
03-10-22 7:12:20AM
15826786
IC
path4.xml
02-10-22 12:50:52AM
15825965
CRM
path5.xml
02-10-22 1:50:52AM
15653951
null
path6.xml
04-10-22 12:50:52AM
15826840
FIN
path7.xml
03-10-22 2:34:09AM
15826841
SCM
path8.xml
02-10-22 8:40:52AM
15223450
IC
path9.xml
03-09-22 5:34:09AM
15026853
SCM
path10.xml
05-10-22 4:40:59AM
Now there are 18 DISTINCT values in FAMILY column and each value has multiple rows associated (as you can see from the above image).
What I want is to get the first row of 3 specific values (CRM, SCM and IC) in FAMILY column.
Something like this:
ID
FAMILY
ENV_XML_PATH
CREATED_DATE
15826841
CRM
path1.xml
date1
15826856
SCM
path3.xml
date2
15826786
IC
path4.xml
date3
I am new to this, though I understand the logic but I am not sure how to implement it. Kindly help. Thanks.
You can use RANK for that. Something like this:
WITH groupedData AS
(SELECT id, family, env_xml_path, created_date,
RANK () OVER (PARTITION BY family ORDER BY id) AS r_num
FROM yourtable
GROUP BY id, family, env_xml_path, created_date)
SELECT id, family, env_xml_path, created_date
FROM groupedData
WHERE r_num = 1
ORDER BY id;
Thus, within the first query, your data will be grouped by family and sorted by the column you want (in my example, it will be sorted by id).
After that, you will use the second query to only take the first row of each family.
Add a WHERE clause to the first query if you need to apply further restrictions on the result set.
See here a working example: db<>fiddle
You could use a window function to get to know the row number of each partition in family ordered by the created_date, and then filter by the the three families you are interested in:
with row_window as (
select
id,
family,
env_xml_path,
created_date,
row_number() over (partition by family order by created_date asc) as rn
from <your_table>
where family in ('CRM', 'SCM', 'IC')
)
select
id,
family,
env_xml_path,
created_date
from row_window
where rn = 1
Output:
ID
FAMILY
ENV_XML_PATH
CREATED_DATE
15826841
CRM
path1.xml
03-09-22 6:50:34
15826856
SCM
path3.xml
03-10-22 7:12:20
15826786
IC
path4.xml
02-10-22 12:50:52
The question doesn't really specify what 'first' means, but I assume it means the first to be added in the table, aka the person whose date is the oldest. Try this code:
SELECT DISTINCT * FROM (yourTable) WHERE Family = 'CRM' OR
Family = 'SCM' OR Family = 'IC' ORDER BY Created_Date ASC FETCH FIRST (number) ROWS ONLY;
What it does:
Distinct - It selects different rows, which means you won't get same type of rows at the top.
Where - checks if certain condition is true
OR - it means that the select should choose rows that match those requirements. In the current situation the distinct clause means that same rows won't repeat, so you won't be getting 2 different 'CRM' family names, so it will find the first 'CRM' then the first 'SCM' and so on.
ORDER BY - orders the column in specified order. In the current one, if first rows mean the oldest, then by ordering them by date and using ASC the oldest(aka smallest date) will be at the top.
FETCH FIRST (number) ROWS ONLY - It selects only the very first couple of rows you want. For example if you need 3 different 'first' rows you need to get FETCH FIRST 3 ROWS ONLY. Combined with the distinct word it will only show 3 different rows.

Count half of rest of a partition by from position

I'm trying to achieve the following results:
now, the group comes from
SUM(CASE WHEN seqnum <= (0.5 * seqnum_rev) THEN i.[P&L] END) OVER(PARTITION BY i.bracket_label ORDER BY i.event_id) AS [P&L 50%],
I need that in each iteration it counts the total of rows from the end till position (seq_inv) and sum the amounts in P&L only for the half of it from that position.
for example, when
seq = 2
seq_inv will be = 13, half of it is 6 so I need to sum the following 6 positions from seq = 2.
when seq = 4 there are 11 positions till the end (seq_inv = 11), so half is 5, so I want to count 5 positions from seq = 4.
I hope this makes sense, I'm trying to come up with a rule that will be able to adapt to the case I have, since the partition by is what gives me the numbers that need to be summed.
I was also thinking if there was something to do with a partition by top 50% or something like that, but I guess that doesn't exist.
I have the advantage that I've helped him before and have a little extra context.
That context is that this is just the later stage of a very long chain of common table expressions. That means self-joins and/or correlated sub-queries are unfortunately expensive.
Preferably, this should be answerable using window functions, as the data set is already available in the appropriate ordering and partitioning.
My reading is this...
The SUM(5:9) (meaning the sum of rows 5 to row 9, inclusive) is equal to SUM(5:end) - SUM(10:end)
That leads me to this...
WITH
cumulative AS
(
SELECT
*,
SUM([P&L]) OVER (PARTITION BY bracket_label ORDER BY event_id DESC) AS cumulative_p_and_l
FROM
data
)
SELECT
*,
cum_val - LEAD(cumulative_p_and_l, seq_inv/2, 0) OVER (PARTITION BY bracket_label ORDER BY event_id) AS p_and_l_50_perc,
cum_val - LEAD(cumulative_p_and_l, seq_inv/4, 0) OVER (PARTITION BY bracket_label ORDER BY event_id) AS p_and_l_25_perc,
FROM
cumulative
NOTE: Using , &, % in column names is horrendous, don't do it ;)
EDIT: Corrected the ORDER BY in the cumulative sum.
I don't think that window functions can do what you want. You could use a correlated subquery instead, with the following logic:
select
t.*,
(
select sum(t1.P&L]
from mytable t1
where t1.seq - t.seq between 0 and t.seq_inv/2
) [P&L 50%]
from mytable t

How to track iterations of a value in sql

I am trying to track usage of a blade in a manufacturing process using SSMS 2017. The blade is loaded and used on product until it is seen to dull and then taken out for sharpening while another blade replaces it. We have 30 blades that are cycled for use and sharpening.
Using a table that provides product lot number (sequential) and blade name I would like to separate each batch use of the blade name into groups.
My sql skills are pretty basic so I've been trying row_number, rank, and some attempts at utilizing the lead/lag functions. So far this has only enabled me to break down each product into order based on blade name and identify the product on which a blade change is made. I feel like that could be useful but I'm having trouble figuring out exactly how to do it.
I would like to be able to assign each group of product manufactured with an iteration of a blade a identifying number. For example:
LotNo BladeID Iteration
418211 BH40 1
418212 BH40 1
418213 BH40 1
418214 ES11 2
418215 ES11 2
418216 BH40 3
I'm currently able to produce these incorrect results:
Using:
SELECT b.LotNo,
b.BladeID,
ROW_NUMBER() OVER (PARTITION BY b.BladeID ORDER BY b.BladeID)
FROM blades AS b
ORDER BY b.LotNo ASC;
I get:
LotNo BladeID Iteration
418211 BH40 1
418212 BH40 2
418213 BH40 3
418214 ES11 1
418215 ES11 2
418216 BH40 4
Here's a possible solution to your problem. It first creates a series of groups to identify when the number must change. Then it gets an order to assign the correct value to each group. And finally, it assigns the value for the iteration. I'm including the sample data in a consumable way so anyone can use it for testing purposes.
CREATE TABLE #Sample(
LotNo int,
BladeID varchar(10),
Iteration int
);
INSERT INTO #Sample
VALUES
(418211, 'BH40', 1),
(418212, 'BH40', 1),
(418213, 'BH40', 1),
(418214, 'ES11', 2),
(418215, 'ES11', 2),
(418216, 'BH40', 3);
GO
WITH cteGroups AS(
SELECT *,
ROW_NUMBER() OVER(ORDER BY LotNo) - ROW_NUMBER() OVER(PARTITION BY BladeID ORDER BY LotNo) AS island
FROM #Sample
),
cteOrdering AS(
SELECT *, MIN( LotNo) OVER( PARTITION BY island, BladeID) AS OrderCol
FROM cteGroups
)
SELECT LotNo,
BladeID,
Iteration,
DENSE_RANK() OVER( ORDER BY OrderCol) AS IterationCalc
FROM cteOrdering;
You can do this with lag() and a cumulative sum:
select s.*,
sum(case when prev_BladeID = BladeId then 0 else 1 end) over (order by LotNo) as Iteration
from (select s.*,
lag(s.BladeID) over (order by s.LotNo) as prev_BladeID
from #sample s
) s;
In addition to being simpler code than the difference of row numbers, I think this is also simpler conceptually. This is simply counting the number of times that the BladeID changes from one lot to the next.

Calculate Rank Pattern without any order value

My Data is like this -
You can check 3 columns, jil_equipment_id,req_group,operand.
Based on these 3 columns i have to generate a new "Patern" Column.
The patern column is a patern and starts from 2 and increases by 1 for each repeated combination of jil_equipment_id,req_group,operand.
The final data will look like this.
Please suggest me any possible approach. I am not able to use the RANK()/DENSE_RANK() Function on this.
You can use row_number(). You want to use the partition by as well:
select t.*,
(1 + row_number() over (partition by jil_equipment_id, req_group, operand
order by content_id
)
) as pattern
from t;
select *,Row_Number() over(partition by jil_equipment_id,req_group,operand order by jil_equipment_id,req_group,operand) + 1 as pattern
from tab
you can use row_number() function for this.