Needing to parse out data - sql

I am trying to parse out certain data from a string and I am having issues.
Here is the string:
1=BETA.1.0^2=175^3=812^4=R^5=N^9=1^12=1^13=00032^14=REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR.^10=107~117~265~1114~3143~3505~3506~3513~5717^11=SA16~1~WY~WY~A~S~20100210~001~SE62^-omitted due to existing Rep Not Found
I need to return this "REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR."
Here is my query SELECT CONVERT(VARCHAR(5000),CHARINDEX('14=',Column))FROM Table

If you're parsing, can we assume that you don't know what might come after the '^14=', but you need to capture whatever does? So searching for a particular string won't work because anything could come after '^14='. The best approach is to identify the longest reliable specific string that gives you a "foothold" to find the data you're looking for. What you don't want to do is accidentally capture the wrong data if the '^14=' appears more than once in your string. It looks like the '^' is your delimiter, since I don't see one at the start of the string. So you were actually on the right track, you just need to use SUBSTRING as a commenter mentioned. You also need to identify a marker for the end of the error message, which looks like it might be the next occurring '^', correct? Check several samples to be sure of this, and make sure the end marker doesn't at any point exist before your start marker or you'll get an error.
SELECT CAST((SUBSTRING(Column,CHARINDEX('14=',Column,0),CHARINDEX('^',Column,CHARINDEX('14=',Column,0) + 1) - CHARINDEX('14=',Column,0))) AS VARCHAR(5000)) FROM Table
You may need to increment or decrement the start position and end position by doing a +1 or -1 to fully capture your error message. But this should dynamically grab any length error message provided you are positive of your starting and ending markers.
I also have here a table-valued parsing function, where you would pass it the string and the '^' and it will return a table of data with not only the 14=, but everything.
CREATE function [dbo].[fn_SplitStringByDelimeter]
(
#list nvarchar(8000)
,#splitOn char(1)
)
returns #rtnTable table
(
id int identity(1,1)
,value nvarchar(100)
)
as
begin
declare #index int
declare #string nvarchar(4000)
select #index = 1
if len(#list) < 1 or #list is null return
--
while #index!= 0
begin
set #index = charindex(#splitOn,#list)
if #index!=0
set #string = left(#list,#index - 1)
else
set #string = #list
if(len(#string)>0)
insert into #rtnTable(value) values(#string)
--
set #list = right(#list,len(#list) - #index)
if len(#list) = 0 break
end
return
end

It sounds like you're trying to get the value of argument 14. This should do it:
select substring(
someData
, charindex('^14=',someData) + 4
, charindex('^',someData, charindex('^14=',someData) + 4) - charindex('^14=',someData) - 4
) errorMessage
from myData
where charindex('^14=',someData) > 0
and charindex('^',someData, charindex('^14=',someData) + 4) > 0
Try it here: http://sqlfiddle.com/#!18/22f23/2
This gets a substring of the given input.
The substring starts at the first character after the string ^14=; i.e. we get the index of ^14= in the string, then add 4 to it to skip over the matched characters themselves.
The substring ends at the first ^ character after the one in ^14=. We get the index of that character, then subtract the starting position from it to get the length of the desired output.
Caveats: If there is no parameter (^) after ^14= this will not work. Equally if there is no ^14= (even if the string starts 14=) this will not work. From the information available that's OK; but if this is a concern please say and we can provide something to handle that more complex scenario.
Code to create table & populate demo data
create table myData (someData nvarchar(256))
insert myData (someData)
values ('1=BETA.1.0^2=175^3=812^4=R^5=N^9=1^12=1^13=00032^14=REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR.^10=107~117~265~1114~3143~3505~3506~3513~5717^11=SA16~1~WY~WY~A~S~20100210~001~SE62^-omitted due to existing Rep Not Found')
, ('1xx^14=something else.^10=xx')

You could try to use a Case When statement with wildcards to find the value that you want.
Example:
SELECT
CASE
WHEN x LIKE '%REP Not Found%'
THEN 'REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR'
ELSE
''
END AS x
FROM
#T1

You could use this query (assuming MySQL database):
-- item is the column that contains the string
select SUBSTR(item, LOCATE('REP',item), LOCATE('REPRGR.',item) + LENGTH('REPRGR.') - LOCATE('REP', item)) info_msg from Table;
Illustration:
create table parsetest (item varchar(5000));
insert into parsetest values('1=BETA.1.0^2=175^3=812^4=R^5=N^9=1^12=1^13=00032^14=REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR.^10=107~117~265~1114~3143~3505~3506~3513~5717^11=SA16~1~WY~WY~A~S~20100210~001~SE62^-omitted due to existing Rep Not Found');
select * from parsetest;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| item |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1=BETA.1.0^2=175^3=812^4=R^5=N^9=1^12=1^13=00032^14=REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR.^10=107~117~265~1114~3143~3505~3506~3513~5717^11=SA16~1~WY~WY~A~S~20100210~001~SE62^-omitted due to existing Rep Not Found |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
select SUBSTR(item, LOCATE('REP',item), LOCATE('REPRGR.',item) + LENGTH('REPRGR.') - LOCATE('REP', item)) info_msg from parsetest;
+------------------------------------------------------+
| info_msg |
+------------------------------------------------------+
| REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR. |
+------------------------------------------------------+

Related

How can we read a varchar column, take the integer part out and add new column incrementing that integer part using script

I need to write a SCRIPT for below scenario:
We have a column X with rows value for this column X as X01,X02,X03,X04........
The problem I am stuck with is that I needed to add another row to this table based on the value of the last row that is X04, Well I am able to identify the logic that I need to work which is given below:
I need to read value X04
Take the integer part 04
Increment by 1 => 05
Save column value as X05
I am able to pass with the 1st step which is not very hard. The problem that I am facing is the next steps. I have researched and tried quite a lot commands but none worked.
Any help is highly appreciated. Thanks.
You seem to be describing:
select concat(left(max(x), 1),
right(concat('00', try_convert(int, right(max(x), 2)) + 1), 2)
from t;
This is doing the following:
Taking the left most character.
Converting the two right characters to a number and adding one.
Converting that back to a zero-padded string.
Here is a db<>fiddle.
Now: That you want to increment a string value seems broken. You should just use an identity column or sequence to assign a number. You can format the value as a string when you query the table -- or use a computed column to store that.
Try below Script
CREATE TABLE #table (x varchar(20))
INSERT INTO #table VALUES('X01'),('X02'),('X03'),('X04')
DECLARE #maxno NVARCHAR(20)
DECLARE #maxstring NVARCHAR(20)
DECLARE #finalno NVARCHAR(20)
DECLARE #loopminno INT =1 -- you can change based on the requirement
DECLARE #loopmaxno INT =10 -- how many number we want to increment
WHILE #loopminno < #loopmaxno
BEGIN
select #maxno = MAX(CAST(SUBSTRING(x, PATINDEX('%[0-9]%', x), 100) as INT))
, #maxstring = MAX(SUBSTRING(x, 1, PATINDEX('%[0-9]%',x)-1))
from #table
where PATINDEX('%[1-9]%',x)>0
SELECT #finalno = #maxstring + CASE WHEN CAST(#maxno AS INT)<9 THEN '0' ELSE '' END + CAST(#maxno+1 AS VARCHAR(20))
INSERT INTO #table
SELECT #finalno
SET #loopminno = #loopminno+1
END

Extract number between two substrings in sql

I had a previous question and it got me started but now I'm needing help completing this. Previous question = How to search a string and return only numeric value?
Basically I have a table with one of the columns containing a very long XML string. There's a number I want to extract near the end. A sample of the number would be this...
<SendDocument DocumentID="1234567">true</SendDocument>
So I want to use substrings to find the first part = true so that Im only left with the number.
What Ive tried so far is this:
SELECT SUBSTRING(xml_column, CHARINDEX('>true</SendDocument>', xml_column) - CHARINDEX('<SendDocument',xml_column) +10087,9)
The above gives me the results but its far from being correct. My concern is that, what if the number grows from 7 digits to 8 digits, or 9 or 10?
In the previous question I was helped with this:
SELECT SUBSTRING(cip_msg, CHARINDEX('<SendDocument',cip_msg)+26,7)
and thats how I got started but I wanted to alter so that I could subtract the last portion and just be left with the numbers.
So again, first part of the string that contains the digits, find the two substrings around the digits and remove them and retrieve just the digits no matter the length.
Thank you all
You should be able to setup your SUBSTRING() so that both the starting and ending positions are variable. That way the length of the number itself doesn't matter.
From the sound of it, the starting position you want is right After the "true"
The starting position would be:
CHARINDEX('<SendDocument DocumentID=', xml_column) + 25
((adding 25 because I think CHARINDEX gives you the position at the beginning of the string you are searching for))
Length would be:
CHARINDEX('>true</SendDocument>',xml_column) - CHARINDEX('<SendDocument DocumentID=', xml_column)+25
((Position of the ending text minus the position of the start text))
So, how about something along the lines of:
SELECT SUBSTRING(xml_column, CHARINDEX('<SendDocument DocumentID=', xml_column)+25,(CHARINDEX('>true</SendDocument>',xml_column) - CHARINDEX('<SendDocument DocumentID=', xml_column)+25))
Have you tried working directly with the xml type? Like below:
DECLARE #TempXmlTable TABLE
(XmlElement xml )
INSERT INTO #TempXmlTable
select Convert(xml,'<SendDocument DocumentID="1234567">true</SendDocument>')
SELECT
element.value('./#DocumentID', 'varchar(50)') as DocumentID
FROM
#TempXmlTable CROSS APPLY
XmlElement.nodes('//.') AS DocumentID(element)
WHERE element.value('./#DocumentID', 'varchar(50)') is not null
If you just want to work with this as a string you can do the following:
DECLARE #SearchString varchar(max) = '<SendDocument DocumentID="1234567">true</SendDocument>'
DECLARE #Start int = (select CHARINDEX('DocumentID="',#SearchString)) + 12 -- 12 Character search pattern
DECLARE #End int = (select CHARINDEX('">', #SearchString)) - #Start --Find End Characters and subtract start position
SELECT SUBSTRING(#SearchString,#Start,#End)
Below is the extended version of parsing an XML document string. In the example below, I create a copy of a PLSQL function called INSTR, the MS SQL database does not have this by default. The function will allow me to search strings at a designated starting position. In addition, I'm parsing a sample XML string into a variable temp table into lines and only looking at lines that match my search criteria. This is because there may be many elements with the words DocumentID and I'll want to find all of them. See below:
IF EXISTS (select * from sys.objects where name = 'INSTR' and type = 'FN')
DROP FUNCTION [dbo].[INSTR]
GO
CREATE FUNCTION [dbo].[INSTR] (#String VARCHAR(8000), #SearchStr VARCHAR(255), #Start INT, #Occurrence INT)
RETURNS INT
AS
BEGIN
DECLARE #Found INT = #Occurrence,
#Position INT = #Start;
WHILE 1=1
BEGIN
-- Find the next occurrence
SET #Position = CHARINDEX(#SearchStr, #String, #Position);
-- Nothing found
IF #Position IS NULL OR #Position = 0
RETURN #Position;
-- The required occurrence found
IF #Found = 1
BREAK;
-- Prepare to find another one occurrence
SET #Found = #Found - 1;
SET #Position = #Position + 1;
END
RETURN #Position;
END
GO
--Assuming well formated xml
DECLARE #XmlStringDocument varchar(max) = '<SomeTag Attrib1="5">
<SendDocument DocumentID="1234567">true</SendDocument>
<SendDocument DocumentID="1234568">true</SendDocument>
</SomeTag>'
--Split Lines on this element tag
DECLARE #SplitOn nvarchar(25) = '</SendDocument>'
--Let's hold all lines in Temp variable table
DECLARE #XmlStringLines TABLE
(
Value nvarchar(100)
)
While (Charindex(#SplitOn,#XmlStringDocument)>0)
Begin
Insert Into #XmlStringLines (value)
Select
Value = ltrim(rtrim(Substring(#XmlStringDocument,1,Charindex(#SplitOn,#XmlStringDocument)-1)))
Set #XmlStringDocument = Substring(#XmlStringDocument,Charindex(#SplitOn,#XmlStringDocument)+len(#SplitOn),len(#XmlStringDocument))
End
Insert Into #XmlStringLines (Value)
Select Value = ltrim(rtrim(#XmlStringDocument))
--Now we have a table with multple lines find all Document IDs
SELECT
StartPosition = CHARINDEX('DocumentID="',Value) + 12,
--Now lets use the INSTR function to find the first instance of '">' after our search string
EndPosition = dbo.INSTR(Value,'">',( CHARINDEX('DocumentID="',Value)) + 12,1),
--Now that we know the start and end lets use substring
Value = SUBSTRING(value,(
-- Start Position
CHARINDEX('DocumentID="',Value)) + 12,
--End Position Minus Start Position
dbo.INSTR(Value,'">',( CHARINDEX('DocumentID="',Value)) + 12,1) - (CHARINDEX('DocumentID="',Value) + 12))
FROM
#XmlStringLines
WHERE Value like '%DocumentID%' --Only care about lines with a document id

Replace every alpha character with itself + wildcard in string SQL Server

My goal is to create a query that will search for results related to a specific keyword.
Say in a database we had the word cat.
Regardless of if the user types C a t, C.A.T. or Cat I want to find a result related to the search as long as the alpha numeric characters are in the correct sequence that is all that matters
Say in the database we have these 4 records
cat
c/a/t
c.a.t
c. at
If the user types in C#$*(&A T I'd like to get all 4 results.
What I have written so far in my query is a function that strips any non-alphanumeric characters from the input string.
What can I do to replace each alphanumeric character with itself and add a wildcard at the end?
For every alpha character my input would look similar to this
C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%
Actually, that search string will return only one record from this table: the row with 'c.a.t '.
This is because the expression C%[^a-zA-Z0-9]%A does not mean there can't be any alpha-numeric chars between C and A.
What it actually means is there should be at least one non alpha-numeric value between C and A.
Moreover, it will return incorrect values as well - a value like 'c u a s e t ' will be returned.
You need to change your where clause to something like this:
WHERE column LIKE '%C%A%T%'
AND column NOT LIKE '%C%[a-zA-Z0-9]%A%[a-zA-Z0-9]%T%'
This way, if you have cat in the correct order, the first row will resolve to true, and if there are no other alpha-numeric chars between c, a, and t the second row will resolve to true.
Here is a test script, where you can see for yourself what I mean:
DECLARE #T AS TABLE
(
a varchar(20)
)
INSERT INTO #T VALUES
('cat'),
('c/a/t'),
('c.a.t '),
('c. at'),
('c u a s e t ')
-- Incorrect where clause
SELECT *
FROM #T
WHERE a LIKE 'C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%'
-- correct where clause
SELECT *
FROM #T
WHERE a LIKE '%C%A%T%'
AND a NOT LIKE '%C%[a-zA-Z0-9]%A%[a-zA-Z0-9]%T%'
You can also see it in action in this link.
And since I had some spare time, here is a script to create both the like and the not like patterns from the input string:
DECLARE #INPUT varchar(100) = '#*# c %^&# a ^&*$&* t (*&(%!##$'
DECLARE #Index int = 1,
#CurrentChar char(1),
#Like varchar(100),
#NotLike varchar(100) = '%'
WHILE #Index < LEN(#Input)
BEGIN
SET #CurrentChar = SUBSTRING(#INPUT, #Index, 1)
IF PATINDEX('%[^a-zA-Z0-9]%', #CurrentChar) = 0
BEGIN
SET #NotLike = #NotLike + #CurrentChar + '%[a-zA-Z0-9]%'
END
SET #Index = #Index + 1
END
SELECT #NotLike = LEFT(#NotLike, LEN(#NotLike) - 12),
#Like = REPLACE(#NotLike, '%[a-zA-Z0-9]%', '%')
SELECT *
FROM #T
WHERE a LIKE #Like
AND a NOT LIKE #NotLike
You can recursively go through your (cleaned) search string and to each letter add the expression you would like. In my example #builtString should be what you would like to use further on, if I understood correctly.
declare #cleanSearch as nvarchar(10) = 'CAT'
declare #builtString as nvarchar(100) = ''
WHILE LEN(#cleanSearch) > 0 -- loop until you deplete the search string
BEGIN
SET #builtString = #builtString + substring(#cleanSearch,1,1) + '%[^a-zA-Z0-9]%' -- append the letter plus regular expression
SET #cleanSearch = right(#cleanSearch, len(#cleanSearch) - 1) -- remove first letter of the search string
END
SELECT #builtString --will look like C%[^a-zA-Z0-9]%A%[^a-zA-Z0-9]%T%[^a-zA-Z0-9]%
SELECT #cleanSearch --#cleanSearch is now empty

Extract e-mail address from string

I have table called Entities with the column CustomData.
I need to extract the email address from each row.
Also if value is null I need to to show as null.
Sample rows from CustomData:
Id CustomData Name
273 [{"Name":"Customer","Value":"test customer"},{"Name":"Address","Value":null},{"Name":"Email","Value":null},{"Name":"Company Name","Value":null},{"Name":"Other Phone","Value":null}] 2323123213
274 [{"Name":"Customer","Value":"Cash Sale"},{"Name":"Address","Value":null},{"Name":"Email","Value":"test#outlook.com"},{"Name":"Company Name","Value":null},{"Name":"Other Phone","Value":null}] 2222222222
This is the string i will be using to update my system.
I have previously achieved selecting the phone number form this same data but it was a fixed length. I can't seem to pull the e-mail address.
I will post a couple of the different methods I have tried so far once im back at my PC
Well, in such a denormalized data your only option is to parse it and try to get email. Most elegant way - is to use json parser, but it is not awailable in current versions of sql server, so you have to parse it manually.
Assuming each record for email starts with {"Name":"Email","Value":, you can do it in a few steps:
Find position of {"Name":"Email","Value": in your string.
Find first occurence of } in the right remainder of the string.
Get substring in between.
Check if it is string equals to 'null' - then return null, otherwise return string itself.
So it can be done like in this snippet:
declare #data nvarchar(max), #pattern nvarchar(max)
select #data = '[{"Name":"Customer","Value":"test customer"},
{"Name":"Address","Value":null},
{"Name":"Email","Value":null},
{"Name":"Company Name","Value":null},
{"Name":"Other Phone","Value":null}]'
select #pattern = '{"Name":"Email","Value":'
select nullif(substring(#data,
charindex(#pattern, #data, 0) + len(#pattern),
charindex('}', #data, charindex(#pattern, #data, 0))
- charindex(#pattern, #data, 0) - len(#pattern)
), 'null')

Looking for a scalar function to find the last occurrence of a character in a string

Table FOO has a column FILEPATH of type VARCHAR(512). Its entries are absolute paths:
FILEPATH
------------------------------------------------------------
file://very/long/file/path/with/many/slashes/in/it/foo.xml
file://even/longer/file/path/with/more/slashes/in/it/baz.xml
file://something/completely/different/foo.xml
file://short/path/foobar.xml
There's ~50k records in this table and I want to know all distinct filenames, not the file paths:
foo.xml
baz.xml
foobar.xml
This looks easy, but I couldn't find a DB2 scalar function that allows me to search for the last occurrence of a character in a string. Am I overseeing something?
I could do this with a recursive query, but this appears to be overkill for such a simple task and (oh wonder) is extremely slow:
WITH PATHFRAGMENTS (POS, PATHFRAGMENT) AS (
SELECT
1,
FILEPATH
FROM FOO
UNION ALL
SELECT
POSITION('/', PATHFRAGMENT, OCTETS) AS POS,
SUBSTR(PATHFRAGMENT, POSITION('/', PATHFRAGMENT, OCTETS)+1) AS PATHFRAGMENT
FROM PATHFRAGMENTS
)
SELECT DISTINCT PATHFRAGMENT FROM PATHFRAGMENTS WHERE POS = 0
I think what you're looking for is the LOCATE_IN_STRING() scalar function. This is what Info Center has to say if you use a negative start value:
If the value of the integer is less than zero, the search begins at
LENGTH(source-string) + start + 1 and continues for each position to
the beginning of the string.
Combine that with the LENGTH() and RIGHT() scalar functions, and you can get what you want:
SELECT
RIGHT(
FILEPATH
,LENGTH(FILEPATH) - LOCATE_IN_STRING(FILEPATH,'/',-1)
)
FROM FOO
One way to do this is by taking advantage of the power of DB2s XQuery engine. The following worked for me (and fast):
SELECT DISTINCT XMLCAST(
XMLQuery('tokenize($P, ''/'')[last()]' PASSING FILEPATH AS "P")
AS VARCHAR(512) )
FROM FOO
Here I use tokenize to split the file path into a sequence of tokens and then select the last of these tokens. The rest is only conversion from SQL to XML types and back again.
I know that the problem from the OP was already solved but I decided to post the following anyway to hopefully help others like me that land here.
I came across this thread while searching for a solution to my similar problem which had the exact same requirement but was for a different kind of database that was also lacking the REVERSE function.
In my case this was for a OpenEdge (Progress) database, which has a slightly different syntax. This made the INSTR function available to me that most Oracle typed databases offer.
So I came up with the following code:
SELECT
SUBSTRING(
foo.filepath,
INSTR(foo.filepath, '/',1, LENGTH(foo.filepath) - LENGTH( REPLACE( foo.filepath, '/', '')))+1,
LENGTH(foo.filepath))
FROM foo
However, for my specific situation (being the OpenEdge (Progress) database) this did not result into the desired behaviour because replacing the character with an empty char gave the same length as the original string. This doesn't make much sense to me but I was able to bypass the problem with the code below:
SELECT
SUBSTRING(
foo.filepath,
INSTR(foo.filepath, '/',1, LENGTH( REPLACE( foo.filepath, '/', 'XX')) - LENGTH(foo.filepath))+1,
LENGTH(foo.filepath))
FROM foo
Now I understand that this code won't solve the problem for T-SQL because there is no alternative to the INSTR function that offers the Occurence property.
Just to be thorough I'll add the code needed to create this scalar function so it can be used the same way like I did in the above examples.
-- Drop the function if it already exists
IF OBJECT_ID('INSTR', 'FN') IS NOT NULL
DROP FUNCTION INSTR
GO
-- User-defined function to implement Oracle INSTR in SQL Server
CREATE FUNCTION INSTR (#str VARCHAR(8000), #substr VARCHAR(255), #start INT, #occurrence INT)
RETURNS INT
AS
BEGIN
DECLARE #found INT = #occurrence,
#pos INT = #start;
WHILE 1=1
BEGIN
-- Find the next occurrence
SET #pos = CHARINDEX(#substr, #str, #pos);
-- Nothing found
IF #pos IS NULL OR #pos = 0
RETURN #pos;
-- The required occurrence found
IF #found = 1
BREAK;
-- Prepare to find another one occurrence
SET #found = #found - 1;
SET #pos = #pos + 1;
END
RETURN #pos;
END
GO
To avoid the obvious, when the REVERSE function is available you do not need to create this scalar function and you can just get the required result like this:
SELECT
SUBSTRING(
foo.filepath,
LEN(foo.filepath) - CHARINDEX('\', REVERSE(foo.filepath))+2,
LEN(foo.filepath))
FROM foo
You could just do it in a single statement:
select distinct reverse(substring(reverse(FILEPATH), 1, charindex('/', reverse(FILEPATH))-1))
from filetable