Awk - Grep - Match the exact string in a file

Awk - Grep - Match the exact string in a file - awk

I have a file that looks like this
ON,111111,TEN000812,Super,7483747483,767,Free
ON,262762,BOB747474,SuperMan,4347374,676,Free
ON,454644,FRED84848,Super Man,65757,555,Free
I need to match the values in the fourth column exactly as they are written. So if I am searching for "Super" I need it to return the line with "Super" only.
ON,111111,TEN000812,Super,7483747483,767,Free
Likewise, if I'm looking for "Super Man" I need that exact line returned.
ON,454644,FRED84848,Super Man,65757,555,Free
I have tried using grep, but grep will match all instances that contain Super. So if I do this:
grep -i "Super" file.txt
It returns all lines, because they all contain "Super"
ON,111111,TEN000812,Super,7483747483,767,Free
ON,262762,BOB747474,SuperMan,4347374,676,Free
ON,454644,FRED84848,Super Man,65757,555,Free
I have also tired with awk, and I believe I'm close, but when I do:
awk '$4==Super' file.txt
I still get output like this:
ON,111111,TEN000812,Super,7483747483,767,Free
ON,262762,BOB747474,SuperMan,4347374,676,Free
I have been at this for hours, and any help would be greatly appreciated at this point.

You were close, or I should say very close just put field delimiter as comma in your solution and you are all set.
awk 'BEGIN{FS=","} $4=="Super"' Input_file
Also one more thing in OP's attempt while comparison with 4th field with string value, string should be wrapped in "
OR in case you want to mention value to be compared as an awk variable then try following.
awk -v value="Super" 'BEGIN{FS=","} $4==value' Input_file

You are quite close actually, you can try :
awk -F, '$4=="Super" {print}' file.txt
I find this form easier to grasp. Slightly longer than #RavinderSingh13 though
-F is the field separator, in this case comma
Next you have a condition followed by action
Condition is to check if the fourth field has the string Super
If the string is found, print it

Related

Select first and last column using regex or linux command

I have [a text file][1] that looks something like this...
("oo" (set CANDRA-E-O 0) "ऊ")
("o" (set CANDRA-E-O ?ऑ) "ओ")
("oa" "ऑ")
("au" "औ")
I need to extract the first and last columns like:
"oo", "ऊ"
"o", "ओ"
"oa", "ऑ"
"au", "औ"
I have managed to extract the first column. But not sure how to select the second column.
\ {2}\(\".+\"\

With your shown samples/attempts, please try following awk command. Written and tested in GNU awk.
awk -v FPAT='"[^"]*"' '{for(i=1;i<=NF;i++){printf("%s%s",$i,i==NF?ORS:OFS)}}' Input_file
Explanation: Simple explanation would be, setting FPAT to '"[^"]*"' which means setting field separator as regex form, from " to till next occurrence of " comes. Then in main program going through all fields of each line and printing them, when its last field of line then printing new line else printing spaces(to get all one line values into a single line).

With this awk solution:
awk -v OFS="," '{sub(/^\(/,"",$1);sub(/\)$/,"",$NF);print $1, $NF}' file
"oo","ऊ"
"o","ओ"
"oa","ऑ"
"au","औ"
with first sub() we remove the parenthesis ( of the first field.
Idem second sub() for last parenthesis ) of the last field.
we print the two fields separated by comma: OFS=","

awk - How to extract quoted string in space delimited log file

I'm hoping there might be some simple way to do this, as I'm a total novice using awk.
I have a bunch of log files from an AWS load balancer, and I want to extract entries from these logs, where a particular response code was received.
Checking the response code is easy enough, I can do the following...
$9=="403" {print $0}
However what I really want is just the request itself, $13, However this column is quoted, and will contain spaces. It looks like so...
"GET https://[my domain name]:443/[my path] HTTP/2.0"
If I do the following...
$9=="403" {print $13}
I just get...
"GET
So what I think I need to do, is for awk (or some other appropriate utility) to extract the complete column 13, and then be able to break that down into it's individual fields, for method, URL etc.

Could you please try following. I have given inside regex of match 443 as per your sample to match it you could give it as per your need to look for 403 change it to match($0,/\".*403.*\"/) too.
awk 'match($0,/\".*443.*\"/){print substr($0,RSTART,RLENGTH)}' Input_file
IMHO advantage of this approach will be you need NOT to hard code any field number in your awk. 1 more thing I have assumed that your Input_file will have "......403....." kind of section only once and you want to print that only.
1 more additional awk where I am assuming you may have multiple occurrences of "..." so picking only that one where 403|443 is coming.
awk 'match($0,/\".*443[^"]*/){print substr($0,RSTART,RLENGTH+1)}' Input_file
EDIT: Or if your Input_file has "...443..." one time or this text is coming first after starting of line(assuming if other occurrences of ".." will come later) then you could try following.
awk -F'"' '/443/{print $2}' Input_file

newer version gawk has a built-in variable FPAT which you can use to define fields by a regex pattern. For your logs, if no other quoted fields before the field 9 and 13:
awk -v FPAT='[^[:space:]]+|"[^"]*"' '$9 == "403"{print $13}' log_file
REF: https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html

Need help AWK script

Could you let me know how to print "user.%" string in below text by awk?
The value of 'user' is not fixed and the number of strings in '( )' are not fixed.
start user1.table% NOT (%OLD, %2016%) user.% another strings
UPDATE
It is the basis of SQL processing. $2 means schema.table but here user can use '%' and also exclude by NOT keyword. It ends with ')'. The next one is a second schema.table and that is the one I want to catch.
I think I should parse the string after ')' with a regular expression but failed.
Regular expression:
[)]\s+(\S+)
Above expression can be used to catch that string I guess.
How can I apply this one in awk script(Not one liner).

If the structure of the query keeps the same, you can use this:
awk -F'[).]' '{print $3".%"}'
I'm using the closing parenthesis or the literal dot as the delimiter. Doing so the value of interest is in field 3.
While it is simple it leaves some whitespace in front of user. We can enhance the field delimiter regex to fix this:
awk -F')[[:space:]]*|[.]' '{print $3".%"}'
Btw, you may use this sed command alternatively:
sed 's/.*)[[:space:]]*\([^.]*\).*/\1.%/'
or if you have GNU grep, use this:
grep -oP '\)\s*\K[^%]*%'

Try this (GNU awk):
awk '{match($0, /[)] +([^ ]+)/, var);print var[1];}'
You need to match first (GNU awk function).

Given your posted sample input, all you need is:
awk '{print $6}'
e.g.:
$ echo 'start user1.table% NOT (%OLD, %2016%) user.% another strings' |
awk '{print $6}'
user.%
If that doesn't work for you then your posted sample input isn't representative enough of your real input so edit your question to include a few lines of truly representative sample input and the expected output given that input.

if $column_A="" equals to delete $column_A in awk?

I wish to delete one column of my data in awk but what I found is using command like $column_A="". Is column_A really deleted in this way?
For example, I wish to delete the second column and I found a solution: awk 'BEGIN{FS="\t";OFS="\t"}!($2="")' which print the result like: $1^0^0$3. It seems that it is the content of the second column is deleted but the second column.

after reading dev-null's comment, I got idea what are you asking...
My answer is: it depends on how do you define "a column is deleted".
see this example:
kent$ echo "foo,bar,blah"|awk -F, -v OFS="," '{$2="";print}'
foo,,blah
kent$ echo "foo,bar,blah"|awk -F, -v OFS="," '{print $1,$3}'
foo,blah
You see the difference? If you set the $x="" The column is still there, but it becomes an empty string. So the FS before and after stay. If this is what you wanted, it is fine. Otherwise just skip outputing the target column, like the 2nd example shows.

I would use cut for that:
cut -d$'\t' -f1,3- file
-f1,3- selects the first field, skips field 2 and then selects fields 3 to end.

Explain this awk command

Please explain what exactly this awk command does:
awk '$0!~/^$/{print $0}'

It removes blank lines. The condition is $0 (the whole line) does not match !~ the regexp /^$/ (the beginning of the line immediately followed by the end of the line).
Similar to grep -v '^$'

It prints non-empty input lines. Note: "Empty" does not mean "blank", in this case.

Your example could be rewritten as simply:
awk '!/^$/'
or
sed '/^$/d'

Like Ben Jackson and the others said, it removes completely empty lines. Not the ones with one ore more whitespaces, but the zero character long ones. We will never know if this was the intended behaviour.
I'd like to remark, that the code is at least redundant if not even triple redundant depending on what it's used for.
What it does is that it prints the input line to the output if the input line is not the empty line.
Since the standard behaviour of awk is, that the input line is printed if a condition without a following program block is met, this would suffice:
awk '$0!~/^$/' or even shorter awk '$0!=""'
If you could be sure, that no line would be parsed to zero, even a
awk'$0'
could do the trick.

Make it readable first...
echo '$0!~/^$/{print $0}' | a2p
==>
$, = ' ';
$\ = "\n";
while (<>) {
chomp;
if ($_ !~ /^$/) {
print $_;
}
}
And the interpret. In this case, don't print empty lines.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Awk - Grep - Match the exact string in a file - awk

Related

Select first and last column using regex or linux command

awk - How to extract quoted string in space delimited log file

Need help AWK script

if $column_A="" equals to delete $column_A in awk?

Explain this awk command

Categories

Resources