Dynamic, Nested Replace - sql

I'm using SQL Server 2008 and need to strip out quite a bit of data within a string. Because of the nature and variability of the string, I think I'm needing to use multiple, nested REPLACE commands. The problem is each REPLACE needs to build on the previous one. Here is a sample of what I'm looking at:
<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Essentially, I need it to return just the text outside of the <> brackets so for this example it would be:
Treatment by test.
Also, I wanted to mention that the strings inside the <> brackets can vary quite a bit for each row both by content and length, but it isn't relevant for what I'm needing other than making it more complex for replacing.
Here is what I've tried:
REPLACE(note,substring(note,patindex('<%>',note),CHARINDEX('>',note) - CHARINDEX('<',note) + 1),'')
And it returns:
<Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Somehow I need to keep going with replacing each of the <> brackets but don't know how to proceed. Any help or guidance would be greatly appreciated!!!

Depending on how you have that string holding the HTML fragment available you could try to use something like:
SELECT convert(xml, '<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>').value('/', 'varchar(255)') as stripped
You convert it to XML and then use the built in xml parser function "value".

Related

Does SQL have a number separator (like an underscore) to split up large number literals

I'm trying to make it easier to read some SQL code where we need to hardcode in some large numbers
I'd like to do something like this:
SELECT 3_800_000
Which, according to this post, is really treated like this:
SELECT 3 _800_000
SELECT 3 AS [_800_000]
JS allows numeric separators
let x = 1000000000000
let y = 1_000_000_000_000
console.log(x==y) // true
Also, C# added digit separators in 7.0 as well
// These are equivalent.
var bigNumber = 123456789012345678;
var bigNumberSplit = 123_456_789_012_345_678;
Is something similar possible in T-SQL?
Note: I'm not looking for a way to format the output, I'm looking for a way to make the source code easier to read for big numbers
The answer is, unfortunately (and as commenters pointed out, thanks folks), that this is not currently a feature of T-SQL.
I was not aware of those JS and C#, so thanks for that.

Need to extract specific text from a column on excel using either Alteryx or Pandas

I have a column that contains a specific set of text that I need to be retained and the rest removed or moved to another column. Unfortunately, I am not able to use normal text-to-column due to the variation of the text arrangement.
For example, I need the word Issue and the id associated with it to be separated. I am struggling to figure out a way to do this with the variation of the arrangement of the text I need.
If someone can help me find a solution using Alteryx would be much appreciated, if not Pandas would also work.
Thanks all.
Use str.extract with Pattern to extract specific text from the data frame [Pandas]
df['After']=df['Before'].str.extract(pat='(ISSUE \d+|issue \d+)',expand=False)
For an Alteryx-only solution, the easiest way would be an Alteryx Formula using REGEX_Replace:
REGEX_Replace([Before],".*(issue \d+).*","?1",1)
If you don't like RegEx, basic string manipulations can do it also: basically it's a Substring...
Substring([Before], *starting index*, *length*)
The starting index is easy: it's just FindString([Before],"ISSUE")
The length isn't too hard either: it's the index (using FindString again) of the first comma in the substring that starts with "ISSUE": SubString([Before],FindString([Before],"ISSUE"))
Combining all that and spreading it out a bit:
Substring(
[Before],
FindString([Before],"ISSUE"),
FindString(
SubString(
[Before],
FindString([Before],"ISSUE")
),","
)
)

How to remove several symbols from a string in BiqQuery

I have strings which contain numbers like that:
a20cdac0_19221bdc12022bab3fe05a43df4a7dbe
I need to get only symbols after underscore symbol:
19221bdc12022bab3fe05a43df4a7dbe
Unfortunately, the amount of those symbols is always different, so I can't use just RIGHT function.
I know that probably REGEXP might help, but I can't understand how to use that exactly. Will be very grateful for the help.
Below is for BigQuery Standard SQL (using regexp)
regexp_extract(value, r'_(.*)') regexp_approach
if to apply to sample value from your question
regexp_extract('a20cdac0_19221bdc12022bab3fe05a43df4a7dbe', r'_(.*)') regexp_approach
result is
Yet, another regexp option is to use regexp_replace as in below example
regexp_replace(value, r'^.*?_', '')
Note: using split in this case is also an option unless you have more than one _ in which case you will get part between first and second _
split(value, '_')[safe_offset(1)]
Also, as you can see you need to use safe to prevent error in cases when _ is absent
You can use the split function like this
select split('a20cdac0_19221bdc12022bab3fe05a43df4a7dbe','_')[ORDINAL(2)];
https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#split

converting code to symbol

I'm using system to control the process in my company and some of it has comment table, let us say it is Opr_comm. the content inside that attribute is various sometimes it contain "<", ">", "'" and so many symbols. here is the query.
SELECT
T1.Prod_No,
T1.Proc_CD,
T1.Opr_Comm
FROM
P110 T1
I have figure it out, when we input to system < it will turned to be &lt, when > it will be &gt. it happen when i retrieve it from database.
So, how can i convert that code to be symbol again?
Thank you very much for your help
If you're just looking to convert back to symbols when you are reading from the table, and not during the write process, you can use the REPLACE function
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions134.htm
It's bulky, but if this is a limited issue it will work. You'd replace your code:
SELECT
T1.Prod_No,
T1.Proc_CD,
replace(replace(T1.Opr_Comm,"<","<"),">",">") AS Opr_Comm
From
P110 T1
Of course if you're able to prevent this encoding from happening in the first place, you'd be better off.

When to use StringBuilder? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
String vs StringBuilder
I just revisited some of the books that I used to pick up VB.NET. I am not sure I've got this in my head, understand how/what StringBuilder is.
What is the guidance for using? Is it best to use it if you are are concatenating 2 strings or 50?
Or when the the total string length is greater than 128 characters?
Or will you see a performance benefit whenever you use it to add strings together?
In which case is it better to use a StringBuilder instance to build a SQL statement than string.format("Select * from x where y = {0}",1)?
It's always struck me that declaring another variable and including a name space is not beneficial for small string concatenations, but I am not sure now.
Sorry, lot of documentation tells you what to use, just not what's best.
I've got an article on this very topic. In summary (copied from the bottom of the page):
Definitely use StringBuilder when you're concatenating in a non-trivial loop - especially if you don't know for sure (at compile time) how many iterations you'll make through the loop. For example, reading a file a character at a time, building up a string as you go using the += operator is potentially performance suicide.
Definitely use the concatenation operator when you can (readably) specify everything which needs to be concatenated in one statement. (If you have an array of things to concatenate, consider calling String.Concat explicitly - or String.Join if you need a delimiter.)
Don't be afraid to break literals up into several concatenated bits - the result will be the same. You can aid readability by breaking a long literal into several lines, for instance, with no harm to performance.
If you need the intermediate results of the concatenation for something other than feeding the next iteration of concatenation, StringBuilder isn't going to help you. For instance, if you build up a full name from a first name and a last name, and then add a third piece of information (the nickname, maybe) to the end, you'll only benefit from using StringBuilder if you don't need the (first name + last name) string for other purpose (as we do in the example which creates a Person object).
If you just have a few concatenations to do, and you really want to do them in separate statements, it doesn't really matter which way you go. Which way is more efficient will depend on the number of concatenations the sizes of string involved, and what order they're concatenated in. If you really believe that piece of code to be a performance bottleneck, profile or benchmark it both ways.
Here is my rule of thumb:
StringBuilder is best used when the exact number of concatenations is unknown at compile time.
Coding Horror has a good article concerning this question, The Sad Tragedy of Micro-Optimization Theater.
Personally I use StringBuilder when I have more than just one or two strings to concatenate. I'm not sure if there's a real performance hit to be gained, but I've always read and been told that doing a regular concatenation with multiple strings creates an extra copy of the string each time you do it, while using StringBuilder keeps one copy until you call the final ToString() method on it.
Someone's figured out experimentally that the critical number is 6. More than 6 concatenations in a row and you should use a StringBuilder. Can't remember where I found this.
However, note that if you just write this in a line:
"qwert" + "yuiop" + "asdf" + "gh" + "jkl;" + "zxcv" + "bnm" + ",."
That gets converted into one function call (I don't know how to write it in VB.net)
String.Concat("qwert", "yuiop", "asdf", "gh", "jkl;", "zxcv", "bnm", ",.");
So if you're doing all concatenations on one line, then don't bother with StringBuilder because String.Concat effectively will do all the concatenations in one go. It's only if you're doing them in a loop or successively concatenating.
My rule - when you're adding to a string in a For or Foreach loop, use the StringBuilder.