Unable to match regex in string using awk

Unable to match regex in string using awk - awk

I am trying to fetch the lines in which the second part of the line contains a pattern from the first part of the line.
$ cat file.txt
String1 is a big string|big
$ awk -F'|' ' { if ($2 ~ /$1/) { print $0 } } ' file.txt
But it is not working.
I am not able to find out what is the mistake here.
Can someone please help?

Two things: No slashes, and your numbers are backwards.
awk -F\| '$1~$2' file.txt

I guess what you meant is part of the string in the first part should be a part of the 2nd part.if this is what you want! then,
awk -F'|' '{n=split($1,a,' ');for(i=1,i<=n;i++){if($2~/a[i]/)print $0}}' your_file

There are surprisingly many things wrong with your command line:
1) You aren't using the awk condition/action syntax but instead needlessly embedding a condition within an action,
2) You aren't using the default awk action but instead needlessly hand-coding a print $0.
3) You have your RE operands reversed.
4) You are using RE comparison but it looks like you really want to match strings.
You can fix the first 3 of the above by modifying your command to:
awk -F'|' '$1~$2' file.txt
but I think what you really want is "4" which would mean you need to do this instead:
awk -F'|' 'index($1,$2)' file.txt

Related

How to combine these awk commands?

Can someone please explain to me how I can combine these piped awks to a single awk?
awk 'match($0, /(,|^)[^,]*shalvar[^,]*(,|$)/) {
print substr($0, RSTART, RLENGTH)}' file.txt |
awk 'gsub(",","")' | awk '{$1=$1};1'
I try this but it doesn't work:
awk 'match($0, /(,|^)[^,]*shalvar[^,]*(,|$)/) {
gsub(",","");$1=$1;print substr($0, RSTART, RLENGTH)}' file.txt
I understand that it shouldn't work because the characters are removed but the pointers don't change. How can I fix it now?

You need to wrap things the other way around. Collect the string you want to extract, then do the manipulations on the extracted value, just like your original script with multiple Awk scripts in a pipeline did.
awk 'match($0, /(,|^)[^,]*shalvar[^,]*(,|$)/) {
g=substr($0, RSTART, RLENGTH);
gsub(",","",g);
# $1=$1 is nice but we cannot use that here; here is a workaround
gsub(/^ *| *$/, "", g);
print g}' file.txt
The shortcut $1=$1 for trimming whitespace around a value works in an isolated Awk script if you are confident that there is only one field, but here, we don't necessarily have a single field (or do we?) so I use a more general solution to explicitly trim whitespace around the extracted string which also avoids relying on a well-known but still obscure side effect.
If shalvar is actually a variable you want to receive from the shell like $foo , try
awk -v field="$foo" 'match($0, "(^|,)[^,]*" field "[^,]*(,|$)") {
...
to interpolate the variable into a string which is then applied as a regular expression.

using a wildcard in awk

Using awk, I want to print all lines that have a string in the first column that starts with 22_
I tried the following, but obviously * does not work as a wildcard in awk:
awk '$1=="22_*" {print $0}' input > output
Is this possible in awk?

Let's start with a test file:
$ cat >file
22_something keep
23_other omit
To keep only lines that start with 22_:
$ awk '/^22_/' file
22_something keep
Alternatively, if you prefer to reference the first field explicitly, we could use:
$ awk '$1 ~ /^22_/' file
22_something keep
Note that we don't have to write {print $0} after the condition because that is exactly the default action that awk associates with a condition.
At the start of a regular expressions, ^ matches the beginning of a line. Thus, if you want 22_ to occur at the start of a line or the start of a field, you want to write ^22_.
In the condition $1 ~ /^22_/, note that the operator is ~. That operator tells awk to check if the preceding string, $1, matches the regular expression ^22_.

Chosen answer does not answer how to use a wildcard in awk, which is achieved using .* (instead of *):
awk '$1=="22_.*" {print $0}' input > output

Grep part of string after symbol and shuffle columns

I would like to take the number after the - sign and put is as column 2 in my matrix. I know how to grep the string but not how to print it after the text string.
in:
1-967764 GGCTGGTCCGATGGTAGTGGGTTATCAGAACT
3-425354 GCATTGGTGGTTCAGTGGTAGAATTCTCGCC
4-376323 GGCTGGTCCGATGGTAGTGGGTTATCAGAAC
5-221398 GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT
6-180339 TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT
out:
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339

awk -F'[[:space:]-]+' '{print $3,$2}' file

Seems like a simple substitution should do the job:
sed -E 's/[0-9]+-([0-9]+)[[:space:]]*(.*)/\2 \1/' file
Capture the parts you're interested in and use them in the replacement.
Alternatively, using awk:
awk 'sub(/^[0-9]+-/, "") { print $2, $1 }' file
Remove the leading digits and - from the start of the line. When this is successful, sub returns true, so the action is performed, printing the second field, followed by the first.

Using regex ( +|-) as field separator:
$ awk -F"( +|-)" '{print $3,$2}' file
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339

here is another awk
$ awk 'split($1,a,"-") {print $2,a[2]}' file

awk '{sub(/.-/,"");print $2,$1}' file
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339

Combining awk search with standard awk and awk delimiter

I`m working on a set of data for which I need specific fields as output:
The data looks like this:
/home/oracle/db.log.gz:2013-1-19T00:00:25 <user.info> 1 2013-1-19T00:00:53.911 host_name RT_FLOW [junos#26.1.1.1.2.4 source-address="10.1.2.0" source-port="616" destination-address="100.1.1.2" destination-port="23" service-name="junos-telnet" nat-source-address="20x.2x.1.2" nat-source-port="3546" nat-destination-address="9x.12x.3.0"]
From above I need three things:
(I) - 2013-1-19T00:00:53.911 which is $4
(II)- source-address="10.1.2.0" which is $8 of which I need only 10.1.2.0
(III) - destination-address="100.1.1.2" which $10 of which I need only 100.1.1.2
I cannot use simple awk like this -> awk '{ print $4 \t $8 \t $10 }' since there are some fields after "device_name" in the log file which are not always present in all log lines so I have to make use of delimiters such as
awk -F 'source-address=' '{print $2}' | awk '{print $1} -> this gives source-addressIP which is (II) requirement
I`m not sure how do I combine using a awk search for I and II and III.
Can someone help?

I believe sed is better for this job
sed -r 's/([^ ]+[ ]+){3}([^ ]+).*[ ]+source-address="([^"]+)".*[ ]+destination-address="([^"]+)".*/\2\t\3\t\4/' file
Output:
2013-1-19T00:00:53.911 10.1.2.0 100.1.1.2

What do you exactly want?
solve the problem using any (reasonably standard) tool
solve this challenge using one instance of awk
solve the problem using just awk, no matter how many instances it costs
For the first case, you could parse the line using scripting language of your choice (mine would be Perl), or do it the hard way using sed and a single big substitution. Or something between the two – use three regexes to get the parts you want.
For the second case, you could adapt any of the former solutions, preferably the sed one. Awk and sed solutions have already been posted.
For the third case, you could just run the obvious awk solutions you mentioned in your question and send the results to a single pipe like { awk …; awk …; awk …; } < file | consumer.

Try doing this :
awk '{print gensub(/.*\s+([0-9]{4}-[0-9]+-[0-9]+T[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]+).*source-address="([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*destination-address="([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*/, "(I) \\1\n(II) \\2\n(III) \\3", "g"); }' file
Another solution using perl :
perl -lne 'print "(", "I" x ++$c, ") $_" for m/.*?\s+(\d{4}-\d+-\d+T\d{2}:\d{2}:\d{2}.\d+).*source-address="(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).*destination-address="(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).*/' file
Outputs :
(I) 2013-1-19T00:00:53.911
(II) 10.1.2.0
(III) 100.1.1.2

Modifying a number value in text

I have a text coming in as
A1:B2.C3.D4.E5
A2:B7.C10.D0.E9
A0:B1.C9.D4.E8
I wonder how to change it as
A1:B2.C1.D4.E5
A2:B7.C8.D0.E9
A0:B1.C7.D4.E8
using Awk. First problem is multiple delimiter. Second is, how to get the C-Value and Decrement by 2.

awk solution:
$ awk -F"." '{$2=substr($2,0,1)""substr($2,2)-2;}1' OFS="." file
A1:B2.C1.D4.E5
A2:B7.C8.D0.E9
A0:B1.C7.D4.E8

I was wondering wether awk regexp would do the job, but apparently, awk cannot capture pattern. This is why I suggest perl solution:
$ cat data.txt
A1:B2.C3.D4.E5
A2:B7.C10.D0.E9
A0:B1.C9.D4.E8
$ perl -pe 's/C([0-9]+)/"C" . ($1-2)/ge;' data.txt
A1:B2.C1.D4.E5
A2:B7.C8.D0.E9
A0:B1.C7.D4.E8

Admittedly, I probably would have done this using the substr() function like Guru has shown:
awk 'BEGIN { FS=OFS="." } { $2 = substr($2,0,1) substr($2,2) - 2 }1' file
I do also like Aif's answer using Perl probably just a little more. Shorter is sweeter, isn't it? However, GNU awk can capture pattens. Here's how:
awk 'BEGIN { FS=OFS="." } match($2, /(C)(.*)/, a) { $2 = a[1] a[2] - 2}1' file

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Unable to match regex in string using awk - awk

Two things: No slashes, and your numbers are backwards. awk -F\| '$1~$2' file.txt

I guess what you meant is part of the string in the first part should be a part of the 2nd part.if this is what you want! then, awk -F'|' '{n=split($1,a,' ');for(i=1,i<=n;i++){if($2~/a[i]/)print $0}}' your_file

Related

How to combine these awk commands?

using a wildcard in awk

Grep part of string after symbol and shuffle columns

Combining awk search with standard awk and awk delimiter

Modifying a number value in text

Categories

Resources