SQL Server function to parse docket numbers - sql

On our database in a Cases table, the Docket field stores the docket number(s) for each case. Each docket number takes the form such as
AB19-1-000
CD19-1043-000
EF18-24-001
These are comprised of "root" dockets and "sub" dockets. The roots here are
AB19-1
CD19-1043
EF18-24
The root dockets are comprised of the two alpha character docket prefix, which indicates the case type. Followed by a two numerical character code indicating the fiscal year the case was filed. Then a hyphen. Then a "sequence" number (with no fixed # of digits, although none have ever had more than 4 digits) indicating the sequence that the case was filed (relative to other cases of the same type that were filed in that fiscal year).
The final three digits (after the final hyphen) of the overall docket number represent the "subdocket," and are to allow for multiple filings within the docket. The initial filing is always docketed with a 000 subdocket. Subsequent filings within that root docket are subdocketed as 001, 002, 003, etc.
To make things more complicated, there can be multiple docket numbers listed within the Docket field, and (in the horrid legacy database design we have) multiple docket numbers are always separated with exactly one space. (I know. Don't get me started.)
I want to create a tool that will help us generate the docket number for new cases easier/quicker than our current approach (which uses a VBA loop and is very slow). Specifically, I want to write a function for use on SQL Server that will spit out the next sequence number for a new filing of a given type and fiscal year.
The steps would be roughly:
Accept as an argument a two character case type and a two digit fiscal year.
Identify all docket numbers entered where the alpha prefix is the specified case type and the next two characters are the specified fiscal year (ignoring subdocket, which we don't care about here).
Identify the highest existing sequence number for that case type and docket year.
Add one to the identified number, and return that number.
I'm a decent programmer, but my SQL is pretty limited to fairly normal queries. I have very limited experience creating functions. So any help (even a general outline of what this kind of function might looks like and how to create it) is much appreciated.
Here's some code to generate some simple test data.
CREATE TABLE MyCases (
CaseId INTEGER PRIMARY KEY,
Docket VARCHAR(50) not null
);
INSERT INTO MyCases
VALUES
(1, 'XL14-204-001 TS14-1-000 PI14-1-000'),
(2, 'PI14-2-000'),
(3, 'PI14-3-000'),
(4, 'PI14-4-001 XL14-22-000'),
(5, 'PI14-6-000'),
(6, 'PI14-7-000 XL14-382-000'),
(7, 'PI15-1-000 XL15-23-000'),
(8, 'PI15-2-000 TS15-23-000'),
(9, 'PI15-3-000'),
(10, 'PI15-4-000 TS15-2-000')
;
And with the desired function, if the user entered MyFunction('PI',14), the result would be 8, because the highest existing sequential number for all PI14 docket numbers is PI14-7, and adding one to 7 gives 8. Similarly, the result for MyFunction('PI',15) would be 5.

Something like this:
create or alter function MyFunction(#RootDocket char(2), #FiscalYear smallint)
returns int as
begin
declare #NextSequenceNumber int;
with q as
(
select
c.CaseId,
cd.Docket,
RootDocket = left(cd.Docket,2),
FiscalYear = cast(right(left(cd.docket,4),2) as tinyint),
SequenceNumber = cast(substring(cd.Docket,6, charindex('-',cd.Docket,7)-6) as smallint),
SubDocket = cast(right(cd.Docket,3) as smallint)
from dbo.mycases c
cross apply (select value Docket from string_split(Docket,' ') ) cd(Docket)
)
select #NextSequenceNumber = max(SequenceNumber) + 1
from q
where RootDocket = #RootDocket
and FiscalYear = #FiscalYear
return #NextSequenceNumber;
end
go
select dbo.MyFunction('PI',15);
select dbo.MyFunction('PI',14);
outputs
-----------
5
-----------
8

Here is an approach that uses a loop to sequentially check each possible sequence number for the given case type and year. As soon as an available sequence number is found, it is returned.
This might be a little more optimized that what you requested, in the sense that it will fill the gaps in the sequence, if there are any. This might, or might not be what you need.
Code:
CREATE FUNCTION GetNextAvailableSequence (
#case_type VARCHAR(2),
#fiscal_year INT
)
RETURNS INT
AS
BEGIN
DECLARE #seq INT;
DECLARE #done INT;
SET #done = 0;
SET #seq = 0;
WHILE #done = 0
BEGIN
SET #seq = #seq + 1;
IF (
SELECT COUNT(*)
FROM MyCases
WHERE ' ' + docket LIKE
'% '
+ #case_type
+ CAST(#fiscal_year as VARCHAR(2))
+ '-'
+ CAST(#seq as VARCHAR(2))
+ '%'
) = 0
BEGIN
SET #done = 1;
END;
END;
RETURN #seq;
END;
Demo on DB Fiddle:
SELECT dbo.GetNextAvailableSequence('PI', 14);
| (No column name) |
| ---------------: |
| 5 |
This fills the first gap for PI-14.
select dbo.GetNextAvailableSequence('PI', 15);
| (No column name) |
| ---------------: |
| 5 |
There are no gaps for PI-15, this is the first available sequence.

Related

How can we read a varchar column, take the integer part out and add new column incrementing that integer part using script

I need to write a SCRIPT for below scenario:
We have a column X with rows value for this column X as X01,X02,X03,X04........
The problem I am stuck with is that I needed to add another row to this table based on the value of the last row that is X04, Well I am able to identify the logic that I need to work which is given below:
I need to read value X04
Take the integer part 04
Increment by 1 => 05
Save column value as X05
I am able to pass with the 1st step which is not very hard. The problem that I am facing is the next steps. I have researched and tried quite a lot commands but none worked.
Any help is highly appreciated. Thanks.
You seem to be describing:
select concat(left(max(x), 1),
right(concat('00', try_convert(int, right(max(x), 2)) + 1), 2)
from t;
This is doing the following:
Taking the left most character.
Converting the two right characters to a number and adding one.
Converting that back to a zero-padded string.
Here is a db<>fiddle.
Now: That you want to increment a string value seems broken. You should just use an identity column or sequence to assign a number. You can format the value as a string when you query the table -- or use a computed column to store that.
Try below Script
CREATE TABLE #table (x varchar(20))
INSERT INTO #table VALUES('X01'),('X02'),('X03'),('X04')
DECLARE #maxno NVARCHAR(20)
DECLARE #maxstring NVARCHAR(20)
DECLARE #finalno NVARCHAR(20)
DECLARE #loopminno INT =1 -- you can change based on the requirement
DECLARE #loopmaxno INT =10 -- how many number we want to increment
WHILE #loopminno < #loopmaxno
BEGIN
select #maxno = MAX(CAST(SUBSTRING(x, PATINDEX('%[0-9]%', x), 100) as INT))
, #maxstring = MAX(SUBSTRING(x, 1, PATINDEX('%[0-9]%',x)-1))
from #table
where PATINDEX('%[1-9]%',x)>0
SELECT #finalno = #maxstring + CASE WHEN CAST(#maxno AS INT)<9 THEN '0' ELSE '' END + CAST(#maxno+1 AS VARCHAR(20))
INSERT INTO #table
SELECT #finalno
SET #loopminno = #loopminno+1
END

How can I add a column to an R set, containing the amount of matches of a regex

To be able to execute regular expressions in SQL Server without the use of CLR, I'm looking into using R language. I have a set of texts, and I want to count the number of matches of a regex on each row.
Note: the "R" part can be found within the #script variable in the code block shown below. This is the part where this issue is.
For the sake of example: I have a table InputData:
CREATE TABLE InputData (ID INT IDENTITY(1, 1), Text NVARCHAR(MAX))
This table contains 3 rows:
INSERT INTO InputData(Text)
VALUES ('This is the first row'),
('This is the second row'),
('This is the third row')
I'd like to run a query, that returns the number of time the letter i is found in each row, by using a regex (since the actual search I need is a bit more complicated). After some googling, I came up with the following:
EXEC SP_EXECUTE_EXTERNAL_SCRIPT
#Language=N'R',
#script = N'pattern = ".*i.*"
outData <- inData;
outData$MatchCount <- length(gregexpr(pattern, outData$Text));'
, #input_data_1 = N'select ID, Text, 0 MatchCount from InputData'
, #input_data_1_name = N'inData'
, #output_data_1_name=N'outData'
with result sets ((ID INT, Text NVARCHAR(MAX), MatchCount INT));
Now the above doesn't work, because the regexpr runs the expression over the entire set, and will always return "3", since the set contains 3 rows. I would like it to return the number of matches per row, and put the result in the correct row.
So the result should be:
1, 'This is the first row', 3
2, 'This is the second row' 2
3, 'This is the third row', 3
Ideally, I wouldn't even need to return the text in the resulting set. Only the ID and match count is enough

Needing to parse out data

I am trying to parse out certain data from a string and I am having issues.
Here is the string:
1=BETA.1.0^2=175^3=812^4=R^5=N^9=1^12=1^13=00032^14=REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR.^10=107~117~265~1114~3143~3505~3506~3513~5717^11=SA16~1~WY~WY~A~S~20100210~001~SE62^-omitted due to existing Rep Not Found
I need to return this "REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR."
Here is my query SELECT CONVERT(VARCHAR(5000),CHARINDEX('14=',Column))FROM Table
If you're parsing, can we assume that you don't know what might come after the '^14=', but you need to capture whatever does? So searching for a particular string won't work because anything could come after '^14='. The best approach is to identify the longest reliable specific string that gives you a "foothold" to find the data you're looking for. What you don't want to do is accidentally capture the wrong data if the '^14=' appears more than once in your string. It looks like the '^' is your delimiter, since I don't see one at the start of the string. So you were actually on the right track, you just need to use SUBSTRING as a commenter mentioned. You also need to identify a marker for the end of the error message, which looks like it might be the next occurring '^', correct? Check several samples to be sure of this, and make sure the end marker doesn't at any point exist before your start marker or you'll get an error.
SELECT CAST((SUBSTRING(Column,CHARINDEX('14=',Column,0),CHARINDEX('^',Column,CHARINDEX('14=',Column,0) + 1) - CHARINDEX('14=',Column,0))) AS VARCHAR(5000)) FROM Table
You may need to increment or decrement the start position and end position by doing a +1 or -1 to fully capture your error message. But this should dynamically grab any length error message provided you are positive of your starting and ending markers.
I also have here a table-valued parsing function, where you would pass it the string and the '^' and it will return a table of data with not only the 14=, but everything.
CREATE function [dbo].[fn_SplitStringByDelimeter]
(
#list nvarchar(8000)
,#splitOn char(1)
)
returns #rtnTable table
(
id int identity(1,1)
,value nvarchar(100)
)
as
begin
declare #index int
declare #string nvarchar(4000)
select #index = 1
if len(#list) < 1 or #list is null return
--
while #index!= 0
begin
set #index = charindex(#splitOn,#list)
if #index!=0
set #string = left(#list,#index - 1)
else
set #string = #list
if(len(#string)>0)
insert into #rtnTable(value) values(#string)
--
set #list = right(#list,len(#list) - #index)
if len(#list) = 0 break
end
return
end
It sounds like you're trying to get the value of argument 14. This should do it:
select substring(
someData
, charindex('^14=',someData) + 4
, charindex('^',someData, charindex('^14=',someData) + 4) - charindex('^14=',someData) - 4
) errorMessage
from myData
where charindex('^14=',someData) > 0
and charindex('^',someData, charindex('^14=',someData) + 4) > 0
Try it here: http://sqlfiddle.com/#!18/22f23/2
This gets a substring of the given input.
The substring starts at the first character after the string ^14=; i.e. we get the index of ^14= in the string, then add 4 to it to skip over the matched characters themselves.
The substring ends at the first ^ character after the one in ^14=. We get the index of that character, then subtract the starting position from it to get the length of the desired output.
Caveats: If there is no parameter (^) after ^14= this will not work. Equally if there is no ^14= (even if the string starts 14=) this will not work. From the information available that's OK; but if this is a concern please say and we can provide something to handle that more complex scenario.
Code to create table & populate demo data
create table myData (someData nvarchar(256))
insert myData (someData)
values ('1=BETA.1.0^2=175^3=812^4=R^5=N^9=1^12=1^13=00032^14=REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR.^10=107~117~265~1114~3143~3505~3506~3513~5717^11=SA16~1~WY~WY~A~S~20100210~001~SE62^-omitted due to existing Rep Not Found')
, ('1xx^14=something else.^10=xx')
You could try to use a Case When statement with wildcards to find the value that you want.
Example:
SELECT
CASE
WHEN x LIKE '%REP Not Found%'
THEN 'REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR'
ELSE
''
END AS x
FROM
#T1
You could use this query (assuming MySQL database):
-- item is the column that contains the string
select SUBSTR(item, LOCATE('REP',item), LOCATE('REPRGR.',item) + LENGTH('REPRGR.') - LOCATE('REP', item)) info_msg from Table;
Illustration:
create table parsetest (item varchar(5000));
insert into parsetest values('1=BETA.1.0^2=175^3=812^4=R^5=N^9=1^12=1^13=00032^14=REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR.^10=107~117~265~1114~3143~3505~3506~3513~5717^11=SA16~1~WY~WY~A~S~20100210~001~SE62^-omitted due to existing Rep Not Found');
select * from parsetest;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| item |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1=BETA.1.0^2=175^3=812^4=R^5=N^9=1^12=1^13=00032^14=REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR.^10=107~117~265~1114~3143~3505~3506~3513~5717^11=SA16~1~WY~WY~A~S~20100210~001~SE62^-omitted due to existing Rep Not Found |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
select SUBSTR(item, LOCATE('REP',item), LOCATE('REPRGR.',item) + LENGTH('REPRGR.') - LOCATE('REP', item)) info_msg from parsetest;
+------------------------------------------------------+
| info_msg |
+------------------------------------------------------+
| REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR. |
+------------------------------------------------------+

Get MAX value if column has a certain format

SQL Server 2008 R2
I have a table similar to this:
Example table:
ID Column
---------------
xxx1234
xxx12345
xxx123456
20150001
I am trying to get a conditional MAX value depending on the value of the column based on whether it meets as certain format. Using the above example, the fourth record, 20150001, represents a "good record" because it contains the current year, and the start of an increment. So, in my table, records that are considered "good" (those subject to the query I am trying to write) have the format "year + increment". The first three, that do not follow this format, should not be conditioned to the query since they don't match this format and should not be subject when computing the max value. Those are bad records. In the above example, the expected result would be "20150002".
The MAX query is simple enough to write, however I am wondering about an approach where I can sanitize the query to only include those records whom meet the particular format, and increment the last four digits (0001 to 0002).
TIA!
You can use the isdate function to filter out ID Columns that do not start with a valid year, and isnumeric to make sure the last 4 characters of the ID Column are valid increments. You also want the len to be 8, given this criteria. You can accomplish all this in the where clause:
-- load test data
declare #Example_Table table(ID_Column varchar(10))
insert into #Example_Table values
('xxx1234'),
('xxx12345'),
('xxx123456'),
('20150001')
-- return max valid ID_Column
select max(ID_Column) as max_ID_Column
from #Example_Table
where isdate(left(ID_Column,4)) = 1
and isnumeric(right(ID_Column,4)) = 1
and len(ID_Column) = 8
-- increment max valid ID_Column
update #Example_Table
set ID_Column = cast(ID_Column as int) + 1
where isdate(left(ID_Column,4)) = 1
and isnumeric(right(ID_Column,4)) = 1
and len(ID_Column) = 8
select * from #Example_Table
ID_Column
----------
xxx1234
xxx12345
xxx123456
20150002
You could use a regular expression to verify a correct year. The second half of the regular expression I taylored to your examples of 0001 and 0002, this could be opened up by adding '[0-9]' for each digit you're expecting.
DECLARE #Sample VARCHAR(30) = '20150001';
SELECT CASE WHEN (#Sample LIKE '[12][09][0-9][0-9]000[12]') THEN 'Yes' ELSE 'No' END;
SELECT
SUBSTRING(#Sample, 1, 4),
SUBSTRING(#Sample, 5, 4),
CASE WHEN (SUBSTRING(#Sample, 1, 4) LIKE '[12][09][0-9]') THEN 'Yes' ELSE 'No' END,
CASE WHEN (SUBSTRING(#Sample, 5, 4) LIKE '[0-9][0-9][0-9][0-9]') THEN 'Yes' ELSE 'No' END;

SQL query--String Permutations

I am trying to create a query using a db on OpenOffice where a string is entered in the query, and all permutations of the string are searched in the database and the matches are displayed. My database has fields for a word and its definition, so if I am looking for GOOD I will get its definition as well as the definition for DOG.
You'll need a third column as well. In this column you'll have the word - but with the letters sorted in alphabetical order. For example, you'll have the word APPLE and in the next column the word AELPP.
You would sort the word your looking for - and run a some SQL code like
WHERE sorted_words = 'my_sorted_word'
for the word apple, you would get something like this:
unsorted sorted
AELPP APPLE
AELPP PEPLA
AELPP APPEL
Now, you also wanted - correct me if I'm wrong, but you want all the words that can be made with **any combination ** of the letters, meaning APPLE also returns words like LEAP and PEA.
To do this, you would have to use some programming language - you would have to write a function that preformed the above recursively, for example - for the word AELLP you have
ELLP
ALLP
AELP
and so forth.. (each time subtracting one letter in every combination, and then two letters in every combination possible ect..)
Basically, you can't easily do permutations in single SQL statement. You can easily do them in another language though, for example here's how to do it in C#: http://msdn.microsoft.com/en-us/magazine/cc163513.aspx
Ok, corrected version that I think handles all situations. This will work in MS SQL Server, so you may need to adjust it for your RDBMS as far as using the local table and the REPLICATE function. It assumes a passed parameter called #search_string. Also, since it's using VARCHAR instead of NVARCHAR, if you're using extended characters be sure to change that.
One last point that I'm just thinking of now... it will allow duplication of letters. For example, "GOOD" would find "DODO" even though there is only one "D" in "GOOD". It will NOT find words of greater length than your original word though. In other words, while it would find "DODO", it wouldn't find "DODODO". Maybe this will give you a starting point to work from though depending on your exact requirements.
DECLARE #search_table TABLE (search_string VARCHAR(4000))
DECLARE #i INT
SET #i = 1
WHILE (#i <= LEN(#search_string))
BEGIN
INSERT INTO #search_table (search_string)
VALUES (REPLICATE('[' + #search_string + ']', #i)
SET #i = #i + 1
END
SELECT
word,
definition
FROM
My_Words
INNER JOIN #search_table ST ON W.word LIKE ST.search_string
The original query before my edit, just to have it here:
SELECT
word,
definition
FROM
My_Words
WHERE
word LIKE REPLICATE('[' + #search_string + ']', LEN(#search_string))
maybe this can help:
Suppose you have a auxiliary Numbers table with integer numbers.
DECLARE #s VARCHAR(5);
SET #s = 'ABCDE';
WITH Subsets AS (
SELECT CAST(SUBSTRING(#s, Number, 1) AS VARCHAR(5)) AS Token,
CAST('.'+CAST(Number AS CHAR(1))+'.' AS VARCHAR(11)) AS Permutation,
CAST(1 AS INT) AS Iteration
FROM dbo.Numbers WHERE Number BETWEEN 1 AND 5
UNION ALL
SELECT CAST(Token+SUBSTRING(#s, Number, 1) AS VARCHAR(5)) AS Token,
CAST(Permutation+CAST(Number AS CHAR(1))+'.' AS VARCHAR(11)) AS
Permutation,
s.Iteration + 1 AS Iteration
FROM Subsets s JOIN dbo.Numbers n ON s.Permutation NOT LIKE
'%.'+CAST(Number AS CHAR(1))+'.%' AND s.Iteration < 5 AND Number
BETWEEN 1 AND 5
--AND s.Iteration = (SELECT MAX(Iteration) FROM Subsets)
)
SELECT * FROM Subsets
WHERE Iteration = 5
ORDER BY Permutation
Token Permutation Iteration
----- ----------- -----------
ABCDE .1.2.3.4.5. 5
ABCED .1.2.3.5.4. 5
ABDCE .1.2.4.3.5. 5
(snip)
EDBCA .5.4.2.3.1. 5
EDCAB .5.4.3.1.2. 5
EDCBA .5.4.3.2.1. 5
(120 row(s) affected)