SQL Long String into Substrings by a length and whitespaces - sql

I Want to make out of a String ("Hello this is a String That is very odd")
Substrings by a defined legth (eg. 8) so that when string gets cut at index 8 but alway at the whitespaces not in a word.
lenth : 11
("Hello this is a String That is very odd") --> ("Hello this"),("is a String"),("That is"),("very odd")
I alredy have an array of the indexes Of the whitespaces but i dont know further.
I appreciate if you would help me

There is no easy solution...
So the simple answer is: Do not use SQL-Server for this issue. It's just the wrong tool.
Nevertheless this can be done (if you have to):
--Some declared table to mock your scenario
DECLARE #tbl TABLE(ID INT IDENTITY, YourString NVARCHAR(1000));
INSERT INTO #tbl VALUES('Hello this is a String That is very odd')
,('blah')
,('And one withaverylongword');
--use this to define the portion's length. 8 will be to little...
DECLARE #portionLenght INT = 12;
-the query
WITH cte AS
(
SELECT t.ID
,A.[key] AS fragmentPosition
,A.[value] AS fragment
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.YourString,' ','","'),'"]')) A
)
,recCTE AS
(
SELECT ID,fragmentPosition,fragment
,0 AS growingIndex
,CAST(fragment AS NVARCHAR(MAX)) AS growingString
FROM cte
WHERE fragmentPosition=0
UNION ALL
SELECT cte.ID
,cte.fragmentPosition
,cte.fragment
,recCTE.growingIndex + CASE WHEN B.newLength>#portionLenght THEN 1 ELSE 0 END
,CASE WHEN B.newLength>#portionLenght THEN cte.fragment ELSE CONCAT(recCTE.growingString,N' ',cte.fragment) END
FROM recCTE
INNER JOIN cte ON cte.ID=recCTE.ID AND cte.fragmentPosition=recCTE.fragmentPosition+1
CROSS APPLY(SELECT LEN(CONCAT(recCTE.growingString,N' ',cte.fragment))) B(newLength)
)
,final AS
(
SELECT *
,ROW_NUMBER() OVER(PARTITION BY ID,growingIndex ORDER BY fragmentPosition DESC) lastGrowing
FROM recCTE
)
SELECT * FROM final
WHERE lastGrowing=1
ORDER BY ID,fragmentPosition;
The result (with length=12)
1 Hello this
1 is a String
1 That is very
1 odd
2 blah
3 And one
3 withaverylongword
The idea in short
we use a trick with OPENJSON to transform your string into a json array and split it with a guaranteed sort order.
we use a recursive CTE to run through your fragments.
each iteration will calculate the total length of the former parts together with the new fragment
depending on this calculation the fragment will either be added or a new protion is opened.
the final CTE will add a partitioned ROW_NUMBER() to find the last entry per portion.
And no, you should not use this... :-)

Related

Extract String Between Two Different Characters

I am having some trouble trying to figure out a way to extract a string between two different characters. My issue here is that the column (CONFIG_ID) contains more that 75,000 rows and the format is not consistent, so I cannot figure out a way to get the numbers between E and B.
*CONFIG_ID*
6E15B1P
999E999B1P
1E3B1P
1E30B1P
5E24B1P
23E6B1P
Another option is to use a CROSS APPLY to calculate the values only once. Another nice thing about CROSS APPLY is that you can stack calculations and use them in the top SELECT
Notice the nullif() rather than throwing an error if the character is not found, it will return a NULL
THIS ALSO ASSUMES there are no LEADING B's
Example
Declare #YourTable Table ([CONFIG_ID] varchar(50)) Insert Into #YourTable Values
('6E15B1P')
,('999E999B1P')
,('1E3B1P')
,('1E30B1P')
,('5E24B1P')
,('23E6B1P')
,('23E6ZZZ') -- Notice No B
Select [CONFIG_ID]
,NewValue = substring([CONFIG_ID],P1,P2-P1)
From #YourTable
Cross Apply ( values (nullif(charindex('E',[CONFIG_ID]),0)+1
,nullif(charindex('B',[CONFIG_ID]),0)
) )B(P1,P2)
Results
CONFIG_ID NewValue
6E15B1P 15
999E999B1P 999
1E3B1P 3
1E30B1P 30
5E24B1P 24
23E6B1P 6
23E6ZZZ NULL -- Notice No B
SUBSTRING(config_id,PATINDEX('%E%',config_id)+1,PATINDEX('%B%',config_id)-PATINDEX('%E%',config_id)-1)
As in:
WITH dat
AS
(
SELECT config_id
FROM (VALUES ('1E30B1P')) t(config_id)
)
SELECT SUBSTRING(config_id,PATINDEX('%E%',config_id)+1,PATINDEX('%B%',config_id)-PATINDEX('%E%',config_id)-1)
FROM dat
A cased substring of a left could be enough.
select *
, CASE
WHEN [CONFIG_ID] LIKE '%E%B%'
THEN SUBSTRING(LEFT([CONFIG_ID], CHARINDEX('B',[CONFIG_ID],CHARINDEX('E',[CONFIG_ID]))),
CHARINDEX('E',[CONFIG_ID]), LEN([CONFIG_ID]))
END AS [CONFIG_EB]
from Your_Table
CONFIG_ID
CONFIG_EB
6E15B1P
E15B
999E999B1P
E999B
1E3B1P
E3B
1E30B1P
E30B
5E24B1P
E24B
23E6B1P
E6B
23E678
null
236789
null
23B456
null
Test on db<>fiddle here

sql extract rightmost number in string and increment

i have transaction codes like
"A0004", "1B2005","20CCCCCCC21"
I need to extract the rightmost number and increment the transaction code by one
"AA0004"----->"AA0005"
"1B2005"------->"1B2006"
"20CCCCCCCC21"------>"20CCCCCCCC22"
in SQL Server 2012.
unknown length of string
right(n?) always number
dealing with unsignificant number of string and number length is out of my league.
some logic is always missing.
LEFT(#a,2)+RIGHT('000'+CONVERT(NVARCHAR,CONVERT(INT,SUBSTRING( SUBSTRING(#a,2,4),2,3))+1)),3
First, I want to be clear about this: I totally agree with the comments to the question from a_horse_with_no_name and Jeroen Mostert.
You should be storing one data point per column, period.
Having said that, I do realize that a lot of times the database structure can't be changed - so here's one possible way to get that calculation for you.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
col varchar(100)
);
INSERT INTO #T (col) VALUES
('A0004'),
('1B2005'),
('1B2000'),
('1B00'),
('20CCCCCCC21');
(I've added a couple of strings as edge cases you didn't mention in the question)
Then, using a couple of cross apply to minimize code repetition, I came up with that:
SELECT col,
LEFT(col, LEN(col) - LastCharIndex + 1) +
REPLICATE('0', LEN(NumberString) - LEN(CAST(NumberString as int))) +
CAST((CAST(NumberString as int) + 1) as varchar(100)) As Result
FROM #T
CROSS APPLY
(
SELECT PATINDEX('%[^0-9]%', Reverse(col)) As LastCharIndex
) As Idx
CROSS APPLY
(
SELECT RIGHT(col, LastCharIndex - 1) As NumberString
) As NS
Results:
col Result
A0004 A0005
1B2005 1B2006
1B2000 1B2001
1B00 1B01
20CCCCCCC21 20CCCCCCC22
The LastCharIndex represents the index of the last non-digit char in the string.
The NumberString represents the number to increment, as a string (to preserve the leading zeroes if they exists).
From there, it's simply taking the left part of the string (that is, up until the number), and concatenate it to a newly calculated number string, using Replicate to pad the result of addition with the exact number of leading zeroes the original number string had.
Try This
DECLARE #test nvarchar(1000) ='"A0004", "1B2005","20CCCCCCC21"'
DECLARE #Temp AS TABLE (ID INT IDENTITY,Data nvarchar(1000))
INSERT INTO #Temp
SELECT #test
;WITH CTE
AS
(
SELECT Id,LTRIM(RTRIM((REPLACE(Split.a.value('.' ,' nvarchar(max)'),'"','')))) AS Data
,RIGHT(LTRIM(RTRIM((REPLACE(Split.a.value('.' ,' nvarchar(max)'),'"','')))),1)+1 AS ReqData
FROM
(
SELECT ID,
CAST ('<S>'+REPLACE(Data,',','</S><S>')+'</S>' AS XML) AS Data
FROM #Temp
) AS A
CROSS APPLY Data.nodes ('S') AS Split(a)
)
SELECT CONCAT('"'+Data+'"','-------->','"'+CONCAT(LEFT(Data,LEN(Data)-1),CAST(ReqData AS VARCHAR))+'"') AS ExpectedResult
FROM CTE
Result
ExpectedResult
-----------------
"A0004"-------->"A0005"
"1B2005"-------->"1B2006"
"20CCCCCCC21"-------->"20CCCCCCC22"
STUFF(#X
,LEN(#X)-CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END+1
,LEN(((RIGHT(#X,CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END)/#N)+1)#N)
,((RIGHT(#X,CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END)/#N)+1)#N)
works on number only strings
99 becomes 100
mod(#N) increments

SQL Summing digits of a number

i'm using presto. I have an ID field which is numeric. I want a column that adds up the digits within the id. So if ID=1234, I want a column that outputs 10 i.e 1+2+3+4.
I could use substring to extract each digit and sum it but is there a function I can use or simpler way?
You can combine regexp_extract_all from #akuhn's answer with lambda support recently added to Presto. That way you don't need to unnest. The code would be really self explanatory if not the need for cast to and from varchar:
presto> select
reduce(
regexp_extract_all(cast(x as varchar), '\d'), -- split into digits array
0, -- initial reduction element
(s, x) -> s + cast(x as integer), -- reduction function
s -> s -- finalization
) sum_of_digits
from (values 1234) t(x);
sum_of_digits
---------------
10
(1 row)
If I'm reading your question correctly you want to avoid having to hardcode a substring grab for each numeral in the ID, like substring (ID,1,1) + substring (ID,2,1) + ...substring (ID,n,1). Which is inelegant and only works if all your ID values are the same length anyway.
What you can do instead is use a recursive CTE. Doing it this way works for ID fields with variable value lengths too.
Disclaimer: This does still technically use substring, but it does not do the clumsy hardcode grab
WITH recur (ID, place, ID_sum)
AS
(
SELECT ID, 1 , CAST(substring(CAST(ID as varchar),1,1) as int)
FROM SO_rbase
UNION ALL
SELECT ID, place + 1, ID_sum + substring(CAST(ID as varchar),place+1,1)
FROM recur
WHERE len(ID) >= place + 1
)
SELECT ID, max(ID_SUM) as ID_sum
FROM recur
GROUP BY ID
First use REGEXP_EXTRACT_ALL to split the string. Then use CROSS JOIN UNNEST GROUP BY to group the extracted digits by their number and sum over them.
Here,
WITH my_table AS (SELECT * FROM (VALUES ('12345'), ('42'), ('789')) AS a (num))
SELECT
num,
SUM(CAST(digit AS BIGINT))
FROM
my_table
CROSS JOIN
UNNEST(REGEXP_EXTRACT_ALL(num,'\d')) AS b (digit)
GROUP BY
num
;

Remove ASCII Extended Characters 128 onwards (SQL)

Is there a simple way to remove extended ASCII characters in a varchar(max). I want to remove all ASCII characters from 128 onwards. eg - ù,ç,Ä
I have tried this solution and its not working, I think its because they are still valid ASCII characters?
How do I remove extended ASCII characters from a string in T-SQL?
Thanks
The linked solution is using a loop which is - if possible - something you should avoid.
My solution is completely inlineable, it's easy to create an UDF (or maybe even better: an inline TVF) from this.
The idea: Create a set of running numbers (here it's limited with the count of objects in sys.objects, but there are tons of example how to create a numbers tally on the fly). In the second CTE the strings are splitted to single characters. The final select comes back with the cleaned string.
DECLARE #tbl TABLE(ID INT IDENTITY, EvilString NVARCHAR(100));
INSERT INTO #tbl(EvilString) VALUES('ËËËËeeeeËËËË'),('ËaËËbËeeeeËËËcË');
WITH RunningNumbers AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr
FROM sys.objects
)
,SingleChars AS
(
SELECT tbl.ID,rn.Nmbr,SUBSTRING(tbl.EvilString,rn.Nmbr,1) AS Chr
FROM #tbl AS tbl
CROSS APPLY (SELECT TOP(LEN(tbl.EvilString)) Nmbr FROM RunningNumbers) AS rn
)
SELECT ID,EvilString
,(
SELECT '' + Chr
FROM SingleChars AS sc
WHERE sc.ID=tbl.ID AND ASCII(Chr)<128
ORDER BY sc.Nmbr
FOR XML PATH('')
) AS GoodString
FROM #tbl As tbl
The result
1 ËËËËeeeeËËËË eeee
2 ËaËËbËeeeeËËËcË abeeeec
Here is another answer from me where this approach is used to replace all special characters with secure characters to get plain latin

Select a portion of a comma delimited string in DB2/DB2400

I need to select a value within a comma delimited string using only SQL. Is this possible?
Data
A B C
1 Luigi Apple,Banana,Pineapple,,Citrus
I need to select specifically the 2nd item in column C, in this case banana. I need help. I cannot create new SQL functions, I can only use SQL. This is the as400 so the SQL is somewhat old tech.
Update..
With help from #Sandeep we were able to come up with
SELECT xmlcast(xmlquery('$x/Names/Name[2]' passing xmlparse(document CONCAT(CONCAT('<?xml version="1.0" encoding="UTF-8" ?><Names><Name>',REPLACE(ODWDATA,',','</Name><Name>')),'</Name></Names>')) as "x") as varchar(1000)) FROM ACL00
I'm getting this error
Keyword PASSING not expected. Valid tokens: ) ,.
New update. Problem solved by using UDF of Oracle's INSTR
I'm assuming db2 which I don't use, so the following syntax may not be bang on but the approach works.
In Oracle I'd use INSTR() and SUBSTR(), Google suggests LOCATE() and SUBSTR() for db2
Use LOCATE to get the position of the first comma, and use that value in SUBSTR to grab the end of YourColumn starting after the first comma
SUBSTR(YourColumn, LOCATE(YourColumn, ',') + 1)
You started with "Apple,Banana,Pineapple,,Citrus", you should now have "Banana,Pineapple,,Citrus", so we use LOCATE and SUBSTR again on the string returned above.
SUBSTR(SUBSTR(YourColumn, LOCATE(YourColumn, ',') + 1), 1, LOCATE(SUBSTR(YourColumn, LOCATE(YourColumn, ',') + 1), ',') - 1)
First SUBSTR is getting the right hand side of the string so we only need a start position parameter, second SUBSTR is grabbing the left side of the string so we need two, the start position and the length to return.
If you want 2nd item only than you can use substring function:
DECLARE #TABLE TABLE
(
A INT,
B VARCHAR(100),
C VARCHAR(100)
)
DECLARE #NTH INT = 3
INSERT INTO #TABLE VALUES (1,'Luigi','Apple,Banana,Pineapple,,Citrus')
SELECT REPLACE(REPLACE(CAST(CAST('<Name>'+ REPLACE(C,',','</Name><Name>') +'</Name>' AS XML).query('/Name[sql:variable("#NTH")]') AS VARCHAR(1000)),'<Name>',''),'</Name>','') FROM #TABLE
I am answering my own question now. It is impossible to do this with the built in functions within AS400
You have to create an UDF of Oracle's INSTR
Enter this within STRSQL it will create a new function called INSTRB
CREATE FUNCTION INSTRB (C1 VarChar(4000), C2 VarChar(4000), N integer, M integer)
RETURNS Integer
SPECIFIC INSTRBOracleBase
LANGUAGE SQL
CONTAINS SQL
NO EXTERNAL ACTION
DETERMINISTIC
BEGIN ATOMIC
DECLARE Pos, R, C2L Integer;
SET C2L = LENGTH(C2);
IF N > 0 THEN
SET (Pos, R) = (N, 0);
WHILE R < M AND Pos > 0 DO
SET Pos = LOCATE(C2,C1,Pos);
IF Pos > 0 THEN
SET (Pos, R) = (Pos + 1, R + 1);
END IF;
END WHILE;
RETURN (Pos - 1)*(1-SIGN(M-R));
ELSE
SET (Pos, R) = (LENGTH(C1)+N, 0);
WHILE R < M AND Pos > 0 DO
IF SUBSTR(C1,Pos,C2L) = C2 THEN
SET R = R + 1;
END IF;
SET Pos = Pos - 1;
END WHILE;
RETURN (Pos + 1)*(1-SIGN(M-R));
END IF;
END
Then to select the nth delimited value within a comma delimited string... in this case the 14th
use this query utilizing the new function
SELECT SUBSTRING(C,INSTRB(C,',',1,13)+1,INSTRB(C,',',1,14)-INSTRB(C,',',1,13)-1) FROM TABLE
A much prettier solution IMO would be to encapsulate a Recursive Common Table Expression (recursive CTE aka RCTE) of the data from the column C to generate a result TABLE [i.e. a User Defined Table Function (a Table UDF aka UDTF)] then use a Scalar Subselect to choose which effective record\row number.
select
a
, b
, ( select S.token_vc
from table( split_tokens(c) ) as S
where S.token_nbr = 2
) as "2nd Item of column C"
from The_Table /* in OP described with columns a,b,c but no DDL */
Yet prettier would be to make the result of that same RCTE a scalar value, so as to allow being invoked simply as a Scalar UDF with the effective row number [as another argument] defining specifically which element to select.
select
a
, b
, split_tokens(c, 2) as "2nd Item of column C"
from The_Table /* in OP described with columns a,b,c but no DDL */
The latter could be more efficient, limiting the row-data produced by the RCTE, to only the desired numbered token and those preceding numbered tokens. I can not comment on the efficiency with regard to impacts on CPU and storage as contrasted with any of the other answers offered, but my own experience with the temporary-storage implementation and the overall quickness of the RCTE results has been positive especially when other row selection limits the number of derived-table results that must be produced for the overall query request.
The UDF [and\or UDTF and the RCTE that implements them] is left as an exercise for the reader; mostly, because I do not have a system on a release that has support for recursive table expressions. If asked [e.g. in a comment to this answer], I could provide untested code source.
I have found the locate_in_string function to work very well in this case.
select substr(
c,
locate_in_string(c, ',')+1,
locate_in_string(c, ',', locate_in_string(c, ',')+1) - locate_in_string(c, ',')-1
) as fruit2
from ACL00 for read only with ur;