replace occurance of character between 2 string using mawk only - awk

replace occurance of character in string using mawk
Hi Guys, need this solution using mawk, please help.
I have 2 strings STR_IN & STR_CMPR.
Want to repalce all character of STR_CMPR in STR_IN
main thing is, if STR_IN have some character twice & in STR_CMPR same character is only once then from STR_IN only one character should be replce.
can someone help using mawk only, no other method please.
if it can be achieve using gsub & regex or match & regex then its best. I dont want to run through each character using some loop.
Below are 3 examples with expected output.
eg 1 :
STR_IN="AABBCCDD";
STR_CMPR="DBAC";
if using gsub;
gsub(STR_CMPR, "", STR_IN);
result should be, STR_IN = ABCD;
if using match,
result : STR_IN_MATCH_CNT = 4 out of 8;
eg 2 :
STR_IN="DBAC";
STR_CMPR="AABBCCDD";
if using gsub;
gsub(STR_CMPR, "", STR_IN);
result should be, STR_IN = blank;
if using match;
result : STR_IN_MATCH_CNT = 4 out of 4;
eg 3 :
STR_IN="DDBBAC";
STR_CMPR="AABCCD";
if using gsub,
gsub(STR_CMPR, "", STR_IN);
result should be, STR_IN = DB;
if using match,
result : STR_IN_MATCH_CNT = 4 out of 6;

There is no way out of this, a for loop is needed, the quickest is:
STR_IN_MATCH_CNT=length(STR_IN)
for(i=1;i<=length(STR_CMPR);++i) sub(substr(STR_CMPR,i,1),"",STR_IN)
STR_IN_MATCH_CNT-=length(STR_IN)

Related

pandas contains regex

I would like to match all cells that beginns with 978 number. But following code matches 397854 or nan too.
an_transaction_product["kniha"] = np.where(an_transaction_product["zbozi_ean"].str.contains('^978', regex=True) , 1, 0)
What do I do wrong please?
This doesn't work because .str.contains will check if the regex occurs anywhere in the string.
If you insist on using regex, .str.match does what you want.
But for this simple case .str.startswith("978") is clearer.
Apart from regex, you can use .loc to find cells that start with '978'. The code below will assign 1 to such cells in column 'A', just as an example:
df.loc[df['A'].astype(str).str[:3]=='978', 'A'] = 1
note: astype(str) converts the number to string and then str[:3] gets the first 3 characters, and then compares it to '978'.

Find Each Occurrence of X and Insert a Carriage Return

A colleague has some data he is putting into a flat file (.txt) and needs to insert a carriage return before EACH occurrence of 'POL01', 'SUB01','VEH01','MCO01'.
I did use:
For Each line1 As String In System.IO.File.ReadAllLines(BodyFileLoc)
If line1.Contains("POL01") Or line1.Contains("SUB01") Or line1.Contains("VEH01") Or line1.Contains("MCO01") Then
Writer.WriteLine(Environment.NewLine & line1)
Else
Writer.WriteLine(line1)
End If
Next
But unfortunately it turns out that the file is not formatted in 'lines' by SSIS but as one whole string.
How can I insert a carriage return before every occurrence of the above?
Test Text
POL01CALT302276F 332 NBPM 00101 20151113201511130001201611132359 2015111300010020151113000100SUB01CALT302276F 332 NBPMP01 Akl Abi-Khalil 19670131 M U33 Stoford Close SW19 6TJ 2015111300010020151113000100VEH01CALT302276F 332 NBPM001LV56 LEJ N 2006VAUXHALL CA 2015111300010020151113000100MCO01CALT302276F 332 NBPM0101 0 2015111300010020151113000100POL01CALT742569N
You can use regular expressions for this, specifically by using Regex.Replace to find and replace each occurrence of the strings you're looking for with a newline followed by the matching text:
Dim str as String = "xxxPOL01xxxSUB01xxxVEH01xxxMCO01xxx"
Dim output as String = Regex.Replace(str, "((?:POL|SUB|VEH|MCO)01)", Environment.NewLine + "$1")
'output contains:
'xxx
'POL01xxx
'SUB01xxx
'VEH01xxx
'MCO01xxx
There may be a better way to construct this regular expression, but this is a simple alternation on the different letters, followed by 01. This matched text is represented by the $1 in the replacement string.
If you're new to regular expressions, there are a number of tools that help you understand them - for example, regex101.com will show you an explanation of the one I have used here:

Determining What Line Does in Awk

I'm a very new beginner to awk. I'm reading over a simple loop statement where by using the split() command I have defined the 'a' array before the beginning of the loop and the 'b' array in each iteration of the loop.
Can someone help me with the statement below? I put it in to perspective since I know what the splits and for loop are doing.
split($2,a,":");
for(i=1,i<length(a),i++){
split(a[i],b," ")
#I don't know what the statement below this line does.
#It appears to be creating a multidimensional thing?
x[b[1]]=b[2]
It looks like a single dimension array. Let's say if you had a text file with one line like this:
1|age 10:fname john:lname smith|12345
Assuming a delimiter of pipe symbol |, your $2 is going to be age 10:fname john:lname smith.
Split that by colon : will give 3 items: age 10, fname john and lname smith
for loops through these 3 items. It takes the first item age 10
It is split that up by space. b[1] is now age, b[2] is now 10
Array x['age'] is set to 10
Similarly, x['lname'] is set to smith and x['fname'] is set to 'john'
x[b[1]]=b[2]
It's not creating a multidementional array.
x is a array. it's assigning the value of array key b[z] to b[z]. z is a positive integer I just used here.

Substring function does not work in Vb?

I am trying to mask SSn and want show it on label caption.
lblSPTINTo.Caption = rsMM("SPTIN")
lblCPTINTo.Caption = rsMM("CPTIN")
i am trying to use substring function to get last 4 characters but i not am to able to use it as it throws compile error .
lblSPTINTo.Caption = rsMM("SPTIN").sutbstring(4,4)
Replace sutbstring with Substring.
But it won't work that way because the first parameter is the index and the second parameter in Substring is the length, if you want the last 4 characters:
Dim last4 As String = rsMM("SPTIN")
If last4.Length > 4 Then last4 = last4.Substring(last4.Length - 4)

count occurences of string in substring with condition

I need to count how often a number is present in a string. it should count EVERY occurence with a whitespace in front, except those followed by a =.
For example:
If i need to know how many "1" there are in this string: this is a 1 ramdnom string with 2 numbers 1 with 1=something it should return 2, as the third one is followed by an =
To find the occurrences I am using this: occurences = mystring.Split(" 1").Length - 1
But how to exclude those followed by a =?
Thanks
Something like,
Dim occurrences = Regex.Matches(yourString, "\W[0-9]([^=]|$)").Count
If you'd like to do replacements, use a Regex.Replace overload.
Breaking it down, this expression matches
\W // any whitespace character
[0-9] // any deciaml digit
( // either
[^=] // not =
| // or
$ // the end of the string
)