Splitting variable content in SQL - sql

I have a variable in a stored procedure that contains a string of characters like
[Tag]MESSAGE[/Tag]
I need a way to get the MESSAGE part from within the tags.
Any help would be much appreciated

Note: I have tested it on Oracle RDBMS
A more reliable approach is to use REGEXP_REPLACE.
REGEXP_REPLACE(value, pattern)
Example
SELECT REGEXP_REPLACE(
'<Tag>Message</Tag>',
'\s*</?\w+((\s+\w+(\s*=\s*(".*?"|''.*?''|[^''">\s]+))?)+\s*|\s*)/?>\s*') FROM DUAL;
Just replace "<" with "[" if your tags are different

What you need is this:
SELECT SUBSTRING(ColumnName,CHARINDEX('html_tag',ColumnName)+LEN('html_tag'),CHARINDEX('html_close_tag',ColumnName)-LEN('html_close_tag')) FROM TableName
You'll require to change the html_tag and html_close_tag with your own HTML tag that you want to get rid of.

If the column contains only single tag, simple call of substring function should be enough. Otherwise there will always be some point where regular expression does not suffice since you fall into trap (see this legendary StackOverflow answer).

Related

extracting special character from a string in oracle sql

I need a query to remove all alphanumeric characters from a string and give me only special characters.
If string is '##45gr##3' query should give me '####'.
SELECT REGEXP_REPLACE('##45gr##3','[^[:punct:]'' '']', NULL) FROM dual;
The old-fashioned way, with a replace() call:
select translate(upper(txt)
, '.1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ'
, '.') as spec_txt
from t42
/
With a replace() solution is better to type all the characters we want to exclude. Yes that is the larger set (probably) but at least it's well-defined. We don't want to keep revising the code to handle new characters in the input string.
Obviously regex solution is more compact and who can deny the elegance of #venkatesh solution? However, evaluating regular expressions is more expensive than more focused SQL functions, so it's worth knowing alternative ways of doing things, for those cases when performance is an issue.
Everything written in comments is most probably very true (especially the 5th one that talks about exceptions, special cases etc.). My feeling is that Jörg knows more about regular expressions than I'll ever know.
Anyway, to cut a long story short, in its very simple appearance, regarding the question posted ("remove all numbers and letters"), something like this might work:
SQL> select regexp_replace('a##45gr##3$.', '[[:digit:]]+|[[:alpha:]]+', '') result
2 from dual;
RESULT
------
####$.
SQL>

How to select values around .(dot) using sql

I am running below query in Teradata :
sel requesttext from dbc.tables
where tablename='old_employee_table'
Result:
alter table DB_NAME.employee_table,no fallback ;
I want to get below result using SQL:
DB_NAME.employee_table
Requesttext can be:
create set table DB_NAME.employee_table;
DB Name and table can occur anywhere in the result. Since .(dot) is joining them that's why i want to split with .(dot).
Basically I need sql which can result me surrounding values of .(dot)
I want DBName and Tablename in result.
I'm not a Teradata person, but this should work for both strings given so far, as long as teradata's regexp_substr() supports positive look-behind and positive look-ahead assertions (I might have the Teradata syntax wrong, so a little tweaking may be needed):
SELECT REGEXP_SUBSTR(requesttext, '(?<= )(\w+\.\w+)(?=[,$]?)', 1, 1)
FROM dbc.tables
WHERE tablename='old_employee_table'
See the regex101 example. Hopefully it translates to Teradata easily.
The regex looks for and returns the words either side of and including the period, when preceded by a space, and followed by an optional comma or the end of the line.
You could do this with either regexp_substr() or strtok().
As Jamie Zawinski said:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
So I would go with the strtok() method. Also I'm lazy and regular expressions are hard.
Function strtok() takes three arguments:
The string being split
The delimiter to split the string
The number of the token to grab.
To get at the <database>.<table> from that string that is returned in your query, we can split by a space, grab the third token, then split that by a comma and grab the first token.
That would look like:
SELECT strtok(strtok(requestText,' ',3),',',1)
FROM dbc.tables
WHERE tablename='old_employee_table'

SQL Remove Substring From Query Results

I have a query that is returning data from a database. In a single field there is a rather long text comment with a segment, which is clearly defined with marking tags like !markerstart! and !markerend!. I would like to have a query return with the string segment between the two markers removed (and the markers removed too).
I would normally do this client-side after I get the data back, however, the problem is that the query is an INSERT query that gets it's data from a SELECT statement. I don't want the text segment to be stored in the archival/reporting table (working with an OLTP application here), so I need to find a way to get the SELECT statement to return exactly what is to be inserted, which, in this case, means getting the SELECT statement to strip out the unwanted phrase instead of doing it in post-processing client-side.
My only thought is to use some convoluted combination of SUBSTRING, CHARINDEX, and CONCAT, but I'm hoping there is a better way, but, based on this, I don't see how. Anyone have ideas?
Sample:
This is a long string of text in some field in a database that has a segment that needs to be removed. !markerstart! This is the segment that is to be removed. It's length is unknown and variable. !markerend! The part of this field that appears after the marker should remain.
Result:
This is a long string of text in some field in a database that has a segment that needs to be removed. The part of this field that appears after the marker should remain.
SOLUTION USING STUFF:
I really don't like how verbose this is, but I can put it in a function if I really need to. It isn't ideal, but it is easier and faster than a CLR routine.
SELECT STUFF(CAST(Description AS varchar(MAX)), CHARINDEX('!markerstart!', Description), CHARINDEX('!markerend!', Description) + 11 - CHARINDEX('!markerstart!', Description), '') AS Description
FROM MyTable
You may want to consider implementing a CLR user-defined function that returns the parsed data.
The following link demonstrates how to use a CLR UDF RegEx function for pattern matching and data extraction.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Regards,
You can use Stuff function or Replace function and replace your unwanted symbols with ''.
STUFF('EXP',START_POS,'NUMBER_OF_CHARS','REPLACE_EXP')

Isolate SQL field using regex

I'm trying to isolate a specific field in a SQL dump file so I can edit it but I'm not having any luck.
The regex I'm using is:
^(?:(?:'[^\r\n']*'|[^,\r\n]*),){6}('[^\r\n']*'|[^,\r\n]*)
Which is supposed to grab the seventh field and place it inside reference 1.
The trouble is that this is stumbling when ever it finds a comma inside a text field and counts the partial match as the allowable matches.
Eg. (1, 'Title', 1, 3, '2006-09-29', 'Commas, the bane of my regex', 'This is the target', 2, 4) matches " the bane of my regex'" instead of "'This is the target'".
It might be easier to load the SQL into a temp database and then do a SELECT to get the data in that field.
Do you have control over the dump file, or are they historic or outside of your control?
If you can choose a better delimeter, comma really is a terrible choice.
[^,\r\n]*, matches
'Commas,
I suggest [^,\r\n']*, instead.
I think you will have more luck if you make the regex more specific. I havent tested this but I believe this should work.
Also as Paul suggests you might try a different delimiter to make this easier.
Enjoy!
\d{1,4}(,){1}('){1}[a-zA-Z0-9,]+('){1}\d{1,4}(,){1}\d{1,4}(,){1}('){1}[0-9-]+('){1}(,){1}('){1}[a-zA-Z0-9,]+('){1}(,){1}('){1}[a-zA-Z0-9,]+('){1}(,){1}\d{1,4}(,){1}\d{1,4}(\r\n){1}
Doh!
My fields weren't just split with a comma. They were split with a comma followed by a space.
Correct RegEx is
^(?:(?:'[^\r\n']*'|[^,\r\n]*), ){6}('[^\r\n']*'|[^,\r\n]*)
Now it works.
Sorry to waste you time with this one. It was Beta's response that got me thinking as it was the second alternation in play for all fields. The extra space forced it to use this option rather than the option enclosed within quotes.

SQL Server xml string parsing in varchar field

I have a varchar column in a table that is used to store xml data. Yeah I know there is an xml data type that I should be using, but I think this was set up before the xml data type was available so a varchar is what I have to use for now. :)
The data stored looks similar to the following:
<xml filename="100100_456_484351864768.zip"
event_dt="10/5/2009 11:42:52 AM">
<info user="TestUser" />
</xml>
I need to parse the filename to get the digits between the two underscores which in this case would be "456". The first part of the file name "shouldn't" change in length, but the middle number will. I need a solution that would work if the first part does change in length (you know it will change because "shouldn't change" always seems to mean it will change).
For what I have for now, I'm using XQuery to pull out the filename because I figured this is probably the better than straight string manipulation. I cast the string to xml to do this, but I'm not an XQuery expert so of course I'm running into issues. I found a function for XQuery (substring-before), but was unable to get it to work (I'm not even sure that function will work with SQL Server). There might be an XQuery function to do this easily, but if there is I am unaware of it.
So, I get the filename from the table with a query similar to the following:
select CAST(parms as xml).query('data(/xml/#filename)') as p
from Table1
From this I'd assume that I'd be able to CAST this back to a string then do some instring or charindex function to figure out where the underscores are so that I can encapsulate all of that in a substring function to pick out the part I need. Without going too far into this I am pretty sure that I can eventually get it done this way, but I know that there has to be an easier way. This way would make a huge unreadable field in the SQL Statement which even if I moved it to a function would still be confusing to try to figure out what is going on.
I'm sure there is an easier than this since it seems to be simple string manipulation. Perhaps someone can point me in the right direction. Thanks
You can use XQuery for this - just change your statement to:
SELECT
CAST(parms as xml).value('(/xml/#filename)[1]', 'varchar(260)') as p
FROM
dbo.Table1
That gives you a VARCHAR(260) long enough to hold any valid file name and path - now you have a string and can work on it with SUBSTRING etc.
Marc
The straightforward way to do this is with SUBSTRING and CHARINDEX. Assuming (wise or not) that the first part of the filename doesn't change length, but that you still want to use XQuery to locate the filename, here's a short repro that does what you want:
declare #t table (
parms varchar(max)
);
insert into #t values ('<xml filename="100100_456_484351864768.zip" event_dt="10/5/2009 11:42:52 AM"><info user="TestUser" /></xml>');
with T(fName) as (
select cast(cast(parms as xml).query('data(/xml/#filename)') as varchar(100)) as p
from #t
)
select
substring(fName,8,charindex('_',fName,8)-8) as myNum
from T;
There are sneaky solutions that use other string functions like REPLACE and PARSENAME or REVERSE, but none is likely to be more efficient or readable. One possibility to consider is writing a CLR routine that brings regular expression handling into SQL.
By the way, if your xml is always this simple, there's no particular reason I can see to use XQuery at all. Here are two queries that will extract the number you want. The second is safer if you don't have control over extra white space in your xml string or over the possibility that the first part of the file name will change length:
select
substring(parms,23,charindex('_',parms,23)-23) as myNum
from #t;
select
substring(parms,charindex('_',parms)+1,charindex('_',parms,charindex('_',parms)+1)-charindex('_',parms)-1) as myNum
from #t;
Unfortunately, SQL Server is not a conformant XQuery implementation - rather, it's a fairly limited subset of a draft version of XQuery spec. Not only it doesn't have fn:substring-before, it also doesn't have fn:index-of to do it yourself using fn:substring, nor fn:string-to-codepoints. So, as far as I can tell, you're stuck with SQL here.