Regex Extract ( left part of the string from 1 underscore before 1) - sql

I want to Extract the data from a string (left data from XXX-1 underscore)
E.g
I have a sting like
A_B_C_D_E_F_G_H_I_J
I want the data from A to D
My F is constant so i can use String Before F.
Please help me on this

https://regex101.com/r/3hXoQe/1/
"/([_A-Z]*?)F/"
() = capturing group
[] = allowed characters, so [_A-Z] accepts any uppercase letter and "_"
*?F = until it finds letter F
Do you mean that?

Related

Select rows which contain numeric substrings in Pandas

I need to delete rows from a dataframe in which a particular column contains string which contains numeric substrings. See the shaded column of my dataframe.
rows with values like 0E as prefix or 21 (any two digit number) as suffix or 24A (any two digit number with a letter) as suffix should be deleted.
Any suggestions?
Thanks in advance.
You can use boolean indexing with a str.contains() regex:
^0E - starts with 0E
\d{2}$ - ends with 2 digits
\d{2}[A-Z]$ - ends with 2 digits and 1 capital letter
col = ... # target column
mask = df[col].str.contains(r'^0E|\d{2}$|\d{2}[A-Z]$')
df = df.loc[~mask]
#tdy gave a good answer, but only one place need to be modified if I understand it correctly.
For value ends with two digits or two digits and a capital character, the regex should be:
.*\d{2}[A-Z]?$

hive:multiple string replace in a column

It's easy to replace a character one time with a function such as this:
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)
But how to deal with multiple string replacements in a column at one time?
For example, with relation like A to #, B to #, C to Z, how would one change "ABC" into "##Z"?
Use translate(input, from, to) function, it translates the input string by replacing the characters present in the from string with the corresponding characters in the to string:
hive> select translate('initial string ABC A B C','ABC','##Z');
OK
initial string ##Z # # Z
Time taken: 0.063 seconds, Fetched: 1 row(s)

Find Each Occurrence of X and Insert a Carriage Return

A colleague has some data he is putting into a flat file (.txt) and needs to insert a carriage return before EACH occurrence of 'POL01', 'SUB01','VEH01','MCO01'.
I did use:
For Each line1 As String In System.IO.File.ReadAllLines(BodyFileLoc)
If line1.Contains("POL01") Or line1.Contains("SUB01") Or line1.Contains("VEH01") Or line1.Contains("MCO01") Then
Writer.WriteLine(Environment.NewLine & line1)
Else
Writer.WriteLine(line1)
End If
Next
But unfortunately it turns out that the file is not formatted in 'lines' by SSIS but as one whole string.
How can I insert a carriage return before every occurrence of the above?
Test Text
POL01CALT302276F 332 NBPM 00101 20151113201511130001201611132359 2015111300010020151113000100SUB01CALT302276F 332 NBPMP01 Akl Abi-Khalil 19670131 M U33 Stoford Close SW19 6TJ 2015111300010020151113000100VEH01CALT302276F 332 NBPM001LV56 LEJ N 2006VAUXHALL CA 2015111300010020151113000100MCO01CALT302276F 332 NBPM0101 0 2015111300010020151113000100POL01CALT742569N
You can use regular expressions for this, specifically by using Regex.Replace to find and replace each occurrence of the strings you're looking for with a newline followed by the matching text:
Dim str as String = "xxxPOL01xxxSUB01xxxVEH01xxxMCO01xxx"
Dim output as String = Regex.Replace(str, "((?:POL|SUB|VEH|MCO)01)", Environment.NewLine + "$1")
'output contains:
'xxx
'POL01xxx
'SUB01xxx
'VEH01xxx
'MCO01xxx
There may be a better way to construct this regular expression, but this is a simple alternation on the different letters, followed by 01. This matched text is represented by the $1 in the replacement string.
If you're new to regular expressions, there are a number of tools that help you understand them - for example, regex101.com will show you an explanation of the one I have used here:

Substring function does not work in Vb?

I am trying to mask SSn and want show it on label caption.
lblSPTINTo.Caption = rsMM("SPTIN")
lblCPTINTo.Caption = rsMM("CPTIN")
i am trying to use substring function to get last 4 characters but i not am to able to use it as it throws compile error .
lblSPTINTo.Caption = rsMM("SPTIN").sutbstring(4,4)
Replace sutbstring with Substring.
But it won't work that way because the first parameter is the index and the second parameter in Substring is the length, if you want the last 4 characters:
Dim last4 As String = rsMM("SPTIN")
If last4.Length > 4 Then last4 = last4.Substring(last4.Length - 4)

count occurences of string in substring with condition

I need to count how often a number is present in a string. it should count EVERY occurence with a whitespace in front, except those followed by a =.
For example:
If i need to know how many "1" there are in this string: this is a 1 ramdnom string with 2 numbers 1 with 1=something it should return 2, as the third one is followed by an =
To find the occurrences I am using this: occurences = mystring.Split(" 1").Length - 1
But how to exclude those followed by a =?
Thanks
Something like,
Dim occurrences = Regex.Matches(yourString, "\W[0-9]([^=]|$)").Count
If you'd like to do replacements, use a Regex.Replace overload.
Breaking it down, this expression matches
\W // any whitespace character
[0-9] // any deciaml digit
( // either
[^=] // not =
| // or
$ // the end of the string
)