awk condition for 30th column of a line is not working - awk

My Input file looks like below,
1,,B4,3000,Rushab,UNI,20130919T22:45:05+0100,20190930T23:59:59+0100,,kapeta,,6741948090816,2285917436,971078887,1283538808965528,20181102_20001,,,,,,,,,,,,,,,C
2,,B4,3000,Rushab,UNI,20130919T22:45:05+0100,20190930T23:59:59+0100,20181006T11:57:13+0100,,vsuser,6741948090816,2285917436,971078887,1283538808965528,20181102_20001,,,,,,,,,,,,,,,H
1,,F1,100000,RAWBANK,UNI,20180416T15:25:00+0100,20190416T23:59:59+0100,,enrruac,,7522609506635,3101315044,998445487,1290161608965816,20181102_20001,,,,,,,,,,,,,,,C
4,,F1,100000,RAWBANK,UNI,20180416T15:25:00+0100,20190416T23:59:59+0100,20181007T22:25:13+0100,,vsuser,7522609506635,3101315044,998445487,1290161608965816,20181102_20001,,,,,,,,,,,,,,,H
i want to print only the line that are starting with '1' and ends with 'C', so i am trying with below command,
awk -F, '$1=='1' && $31=='C'{print $0}' input_file.txt
but i am not getting any output.

Use double quotes:
awk -F, '$1=="1" && $31=="C"{print $0}' file
or
awk -F, '$1=="1" && $31=="C"' file

As other users suggested, this can be done using a simple regex. So you can use sed as well as awk
sed '/^1,.*,C$/!d' file

Related

Chain awk regex matches like grep

I am trying to use awk to select/remove data based on cell entries in a CSV file.
How do I chain Awk commands to build up complex searches like I have done with grep? I plan to use Awk to select rows based on matching criteria in cells in multiple columns, not just the first column as in this example.
Test data
123,line1
123a,line2
abc,line3
G-123,line4
G-123a,line5
Separate Awk statements with intermediate files
awk '$1 !~ /^[[:digit:]]/ {print $0}' file.txt > output1.txt
awk '$1 !~ /^G-[[:digit:]]/ {print $0}' output1.txt > output2.txt
mv output2.txt output.txt
cat output.txt
Chained or multi-line grep version (I think limited to first column only)
grep -v \
-e "^[[:digit:]]" \
-e "^G-[[:digit:]]" \
file.txt > output.txt
cat output.txt
How can I rewrite the Awk command to avoid the intermediate files?
Generally, in awk there are boolean operators available (it's better than grep! :) )
awk '/match1/ || /match2/' file
awk '(/match1/ || /match2/ ) && /match3/' file
and so on ...
In your example you could use something like:
awk -F, '$1 ~ /^[[:digit:]]/ || $1 ~ /G-[[:digit:]]/' input >> output
Note: This is just an example of how to use boolean operators. Also the regular expression itself could have been used here to express the alternative match:
awk -F, '$1 ~ /^(G-)?[[:digit:]]/' input >> ouput
In your awk commands and example, awk regards file.txt as having only one field because you have not defined FS, so the default whitespace field separator is used.
With that said, you can easily AND your two pattern matches together like this:
awk '($1 !~ /^[[:digit:]]/) && ($1 !~ /^G-[[:digit:]]/) {print $0}' file.txt
To make awk use comma as a field separator, you can define it in a BEGIN block. In this example, the output should be just line3
awk 'BEGIN {FS=","} ($1 !~ /^[[:digit:]]/) && ($1 !~ /^G-[[:digit:]]/) {print $2}' file.txt
I would suggest the literal translation of that grep command in awk is
awk '
/^[[:digit:]]/ {next}
/^G-[[:digit:]]/ {next}
{print}
' file.txt
But you have several examples of how to write it more concisely.
You can use
awk '$1 !~ /^(G-)?[[:digit:]]/' file.txt > output.txt
The awk tries to find in Field 1:
^ - start of string
(G-)? - an optional G- char sequence (note the regex flavor in awk is POSIX ERE, so (...) denotes a capturing group and ? denotes a one or zero times quantifier)
[[:digit:]] - a digit.
If the match is found, the record (=line) is not printed. Else, the line is printed.
to stick to your question, I would use:
awk '$1 !~ /^[[:digit:]]/ && $1 !~ /G-[[:digit:]]/' file.txt > output.txt
But I like the #Wiktor Stribiżew REGEX approach!
With your shown samples, this could be also done in grep in a single regexp, we need not to chain the different regex, adding this solution in case you/anyone need it; could be helpful.
grep -v -E '^(G-)?[[:digit:]]' Input_file
Explanation: Simple explanation would be, using grep's -v option to omit lines which are matching the mentioned pattern. Then using -E option of it to enable ERE(extended regular expressions). In main program using regex ^(G-)?[[:digit:]] to match if line starts from G- OR digit then don't print that line.

awk command to read a key value pair from a file

I have a file input.txt which stores information in KEY:VALUE form. I'm trying to read GOOGLE_URL from this input.txt which prints only http because the seperator is :. What is the problem with my grep command and how should I print the entire URL.
SCRIPT
$> cat script.sh
#!/bin/bash
URL=`grep -e '\bGOOGLE_URL\b' input.txt | awk -F: '{print $2}'`
printf " $URL \n"
INPUT_FILE
$> cat input.txt
GOOGLE_URL:https://www.google.com/
OUTPUT
https
DESIRED_OUTPUT
https://www.google.com/
Since there are multiple : in your input, getting $2 will not work in awk because it will just give you 2nd field. You actually need an equivalent of cut -d: -f2- but you also need to check key name that comes before first :.
This awk should work for you:
awk -F: '$1 == "GOOGLE_URL" {sub(/^[^:]+:/, ""); print}' input.txt
https://www.google.com/
Or this non-regex awk approach that allows you to pass key name from command line:
awk -F: -v k='GOOGLE_URL' '$1==k{print substr($0, length(k FS)+1)}' input.txt
Or using gnu-grep:
grep -oP '^GOOGLE_URL:\K.+' input.txt
https://www.google.com/
Could you please try following, written and tested with shown samples in GNU awk. This will look for string GOOGLE_URL and will catch further either http or https value from url, in case you need only https then change http[s]? to https in following solution please.
awk '/^GOOGLE_URL:/{match($0,/http[s]?:\/\/.*/);print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^GOOGLE_URL:/{ ##Checking condition if line starts from GOOGLE_URL: then do following.
match($0,/http[s]?:\/\/.*/) ##Using match function to match http[s](s optional) : till last of line here.
print substr($0,RSTART,RLENGTH) ##Printing sub string of matched value from above function.
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you need anything coming after first : then try following.
awk '/^GOOGLE_URL:/{match($0,/:.*/);print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Take your pick:
$ sed -n 's/^GOOGLE_URL://p' file
https://www.google.com/
$ awk 'sub(/^GOOGLE_URL:/,"")' file
https://www.google.com/
The above will work using any sed or awk in any shell on every UNIX box.
I would use GNU AWK following way for that task:
Let file.txt content be:
EXAMPLE_URL:http://www.example.com/
GOOGLE_URL:https://www.google.com/
KEY:GOOGLE_URL:
Then:
awk 'BEGIN{FS="^GOOGLE_URL:"}{if(NF==2){print $2}}' file.txt
will output:
https://www.google.com/
Explanation: GNU AWK FS might be pattern, so I set it to GOOGLE_URL: anchored (^) to begin of line, so GOOGLE_URL: in middle/end will not be seperator (consider 3rd line of input). With this FS there might be either 1 or 2 fields in each line - latter is case only if line starts with GOOGLE_URL: so I check number of fields (NF) and if this is second case I print 2nd field ($2) as first record in this case is empty.
(tested in gawk 4.2.1)
Yet another awk alternative:
gawk -F'(^[^:]*:)' '/^GOOGLE_URL:/{ print $2 }' infile

changing value of first column in a file using awk

I have a .csv file that has multiple columns and I wanted to change the value of the first column in the file to "positive" for all rows except when it is "negative"
So if I have the file
product,0 0,no way
brand,0 0 0,detergent
product,0 0 1,sugar
negative,0 0 1, sight
I want to make it
positive,0 0,no way
positive,0 0,detergent
positive,0 0 1,sugar
negative,0 0 1, sight
How can I accomplish this using awk?
this awk one-liner should help you:
awk -F, -v OFS="," '$1="positive"' file
since you tagged with sed:
sed 's/[^,]*/positive/' file
update for OP's new requirement:
awk -F, -v OFS="," '$1="negative"==$1?$1:"positive"' file
or:
awk -F, -v OFS="," '"negative"==$1||$1="positive"' file
This might work for you (GNU sed):
sed '/^negative,/!s/^[^,]*/positive/' file
If the first word is not negative change it to positive.

Print line modified and the line after using awk

I want to modify lines in a file using awk and print the new lines with the following line.
My file is like this
Name_Name2_ Name3_Name4
ASHRGSJFSJRGDJRG
Name5_Name6_Name7_Name8
ADGTHEGHGTJKLGRTIWRK
I want
Name-Name2
ASHRGSJFSJRGDJRG
Name5-Name6
ADGTHEGHGTJKLGRTIWRK
I have sued awk to modify my file:
awk -F'_' {print $1 "-" $2} file > newfile
but I don't know how to tell to print also the line just after (ABDJRH)
sure is it possible with awk x=NR+1 NR<=x
thanks
Following awk may help you on same.
awk -F"_" '/_/{print $1"-"$2;next} 1' Input_file
assuming your structure in sample (no separation in line with "data" letter )
awk '$0=$1' Input_file
# or with sed
sed 's/[[:space:]].*//' Input_file

Using AWK to extract one column from a tab separated file

I know this is a simple question, but the awk command is literally melting my brain. I have a tab separated file "inputfile.gtf" and I need to extract one column from it and put it into a new file "newfile.tsv" I cannot for the life of me figure out the proper syntax to do this with awk. Here is what I've tried:
awk -F, 'BEGIN{OFS="/t"} {print $8}' inputfile.gtf > newfile.tsv
also
awk 'BEGIN{OFS="/t";FS="/t"};{print $8}' inputfile.gtf > newfile.tsv
Both of these just give me an empty file. Everywhere I search, people seem to have completely different ways of trying to achieve this simple task, and at this point I am completely lost. Any help would be greatly appreciated. Thanks.
Why not simpler :
awk -F'\t' '{print $8}' inputfile.gtf > newfile.tsv
You have specified the wrong delimiter /t, the tab character typed as \t:
awk 'BEGIN{ FS=OFS="\t" }{ print $8 }' inputfile.gtf > newfile.tsv
Your 1st command :
awk -F, 'BEGIN{OFS="/t"} {print $8}' inputfile.gtf > newfile.tsv
You are setting -F, which is not required, as your file is not , comma separated.
next, OFS="/t" : Syntax is incorrect, it should be OFS="\t", but again you don't need this as you don't want to set Output fields separator as \t since you're printing only a single record and OFS is not at all involved in this case; unless you print atleast two fields.
Your 2nd command :
awk 'BEGIN{OFS="/t";FS="/t"};{print $8}' inputfile.gtf > newfile.tsv
Again it's not /t it should be \t. Also FS="\t" is similar to -F "\t"
What you actually need is :
awk -F"\t" '{print $8}' inputfile.gtf > newfile.tsv
or
awk -v FS="\t" '{print $8}' inputfile.gtf > newfile.tsv
And if your file has just tabs and your fields don't have spaces in between then you can simply use :
awk '{print $8}' inputfile.gtf > newfile.tsv