Using awk command to display lines backwards - awk

Fairly new to the awk command and still playing with it, I am trying to display multiple lines of a file, lets say lines 3-5, and display it backwards. So with the given file:
Hello World
How are you
I love computer science,
I am using awk,
And it is hard.
And it should output:
science, computer love I
awk, using am I
hard. is it And
Any step in the correct direction will be beneficial!!

Following awk may help you in same, where I am using start and end variables to get only those lines which are needed to be printed by OP.
awk -v start=3 -v end=5 'FNR>=start && FNR<=end{for(;NF>0;NF--){printf("%s%s",$NF,NF==1?RS:FS)}}' Input_file
Output will be as follows.
science, computer love I
awk, using am I
hard. is it And
Explanation: Adding explanation to solution too now.
awk -v start=3 -v end=5 ' ##Mentioning variables named start and end where start is denoting the starting line and end is denoting end line which we want to print.
FNR>=start && FNR<=end{ ##Checking condition here if variable FNR(awk out of the box variable) value is greater than or equal to variable start AND FNR value is less than or equal to end variable. If condition is TRUE then do following:
for(;NF>0;NF--){ ##Initiating a for loop which starts from value of NF(Number of fields, which is out of the box variable of awk) and it runs till NF is 0.
printf("%s%s",$NF,NF==1?RS:FS)} ##Printing value of NF(value of field) and other string will be either space of new line(by checking when field is first then print new line as print space).
}
' Input_file ##Mentioning Input_file name here.

$ cat tst.awk
NR>2 && NR<6 {
for (i=NF; i>0; i--) {
printf "%s%s", $i, (i>1?OFS:ORS)
}
}
$ awk -f tst.awk file
science, computer love I
awk, using am I
hard. is it And

You can use the following awk command to reach your goal:
input:
$ cat text
Hello World
How are you
I love computer science,
I am using awk,
And it is hard.
output:
$ awk 'NR<3{print}NR>=3{for(i=0; i<NF; i++){printf "%s ",$(NF-i);} printf "\n";}' text
Hello World
How are you
science, computer love I
awk, using am I
hard. is it And
Explanations:
NR<3{print} will print first 2 lines in the correct order
NR>=3{for(i=0; i<NF; i++){printf $(NF-i)" ";} printf "\n";}' from the 3rd line you have a loop on all the field identified by NF and you print them one after another from the last one to the first one ($NF is the last one $1 is the first one) and you separate each field with a space. Last but not least after the loop you print and EOL char.
Now, if you do not need to print the first 2 lines use:
$ awk 'NR>=3{for(i=0; i<NF; i++){printf "%s ",$(NF-i);} printf "\n";}' text
science, computer love I
awk, using am I
hard. is it And
For files with more lines for which you want to print only a range (3-5) use:
$ awk 'NR>=3 && NR<=5{for(i=0; i<NF; i++){printf "%s ",$(NF-i);} printf "\n";}' text

Related

How to find and match an exact string in a column using AWK?

I'm having trouble on matching an exact string that I want to find in a file using awk.
I have the file called "sup_groups.txt" that contains:
(the structure is: "group_name:pw:group_id:user1<,user2>...")
adm:x:4:syslog,adm1
admins:x:1006:adm2,adm12,manuel
ssl-cert:x:122:postgres
ala2:x:1009:aceto,salvemini
conda:x:1011:giovannelli,galise,aceto,caputo,haymele,salvemini,scala,adm2,adm12
adm1Group:x:1022:adm2,adm1,adm3
docker:x:998:manuel
now, I want to extract the records that have in the user list the user "adm1" and print the first column (the group name), but you can see that there is a user called "adm12", so when i do this:
awk -F: '$4 ~ "adm1" {print $1}' sup_groups.txt
the output is:
adm
admins
conda
adm1Group
the command of course also prints those records that contain the string "adm12", but I don't want these lines because I'm interested only on the user "adm1".
So, How can I change this command so that it just prints the lines 1 and 6 (excluding 2 and 5)?
thank you so much and sorry for my bad English
EDIT: thank you for the answers, u gave me inspiration for the solution, i think this might work as well as your solutions but more simplified:
awk -F: '$4 ~ "adm,|adm1$|:adm1," {print $1}' sup_groups.txt
basically I'm using ORs covering all the cases and excluding the "adm12"
let me know if you think this is correct
1st solution: Using split function of awk. With your shown samples, please try following awk code.
awk -F':' '
{
num=split($4,arr,",")
for(i=1;i<=num;i++){
if(arr[i]=="adm1"){
print
}
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -F':' ' ##Starting awk program from here setting field separator as : here.
{
num=split($4,arr,",") ##Using split to split 4th field into array arr with delimiter of ,
for(i=1;i<=num;i++){ ##Running for loop till value of num(total elements of array arr).
if(arr[i]=="adm1"){ ##Checking condition if arr[i] value is equal to adm1 then do following.
print ##printing current line here.
}
}
}
' Input_file ##Mentioning Input_file name here.
2nd solution: Using regex and conditions in awk.
awk -F':' '$4~/^adm1,/ || $4~/,adm1,/ || $4~/,adm1$/' Input_file
OR if 4th field doesn't have comma at all then try following:
awk -F':' '$4~/^adm1,/ || $4~/,adm1,/ || $4~/,adm1$/ || $4=="adm1"' Input_file
Explanation: Making field separator as : and checking condition if 4th field is either equal to ^adm1,(starting adm1,) OR its equal to ,adm1, OR its equal to ,adm1$(ending with ,adm1) then print that line.
This should do the trick:
$ awk -F: '"," $4 "," ~ ",adm1," { print $1 }' file
The idea behind this is the encapsulate both the group field between commas such that each group entry is encapsulated by commas. So instead of searching for adm1 you search for ,adm1,
So if your list looks like:
adm2,adm12,manuel
and, by adding commas, you convert it too:
,adm2,adm12,manuel,
you can always search for ,adm1, and find the perfect match .
once u setup FS per task requirements, then main body becomes barely just :
NF = !_ < NF
or even more straight forward :
{m,n,g}awk —- --NF
=
{m,g}awk 'NF=!_<NF' OFS= FS=':[^:]*:[^:]*:[^:]*[^[:alpha:]]?adm[0-9]+.*$'
adm
admins
conda
adm1Group

What does this Awk expression mean

I am working with bash script that has this command in it.
awk -F ‘‘ ‘/abc/{print $3}’|xargs
What is the meaning of this command?? Assume input is provided to awk.
The quick answer is it'll do different things depending on the version of awk you're running and how many fields of output the awk script produces.
I assume you meant to write:
awk -F '' '/abc/{print $3}'|xargs
not the syntactically invalid (due to "smart quotes"):
awk -F ‘’’/abc/{print $3}’|xargs
-F '' is undefined behavior per POSIX so what it will do depends on the version of awk you're running. In some awks it'll split the current line into 1 character per field. in others it'll be ignored and the line will be split into fields at every sequence of white space. In other awks still it could do anything else.
/abc/ looks for a string matching the regexp abc on the current line and if found invokes the subsequent action, in this case {print $3}.
However it's split into fields, print $3 will print the 3rd such field.
xargs as used will just print chunks of the multi-line input it's getting all on 1 line so you could get 1 line of all-fields output if you don't have many fields being output or several lines of multi-field output if you do.
I suspect the intent of that code was to do what this code actually will do in any awk alone:
awk '/abc/{printf "%s%s", sep, substr($0,3,1); sep=OFS} END{print ""}'
e.g.:
$ printf 'foo\nxabc\nyzabc\nbar\n' |
awk '/abc/{printf "%s%s", sep, substr($0,3,1); sep=OFS} END{print ""}'
b a

Can I delete a field in awk?

This is test.txt:
0x01,0xDF,0x93,0x65,0xF8
0x01,0xB0,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0xB2,0x00,0x76
If I run
awk -F, 'BEGIN{OFS=","}{$2="";print $0}' test.txt
the result is:
0x01,,0x93,0x65,0xF8
0x01,,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,,0x00,0x76
The $2 wasn't deleted, it just became empty.
I hope, when printing $0, that the result is:
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
All the existing solutions are good though this is actually a tailor made job for cut:
cut -d, -f 1,3- file
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
If you want to remove 3rd field then use:
cut -d, -f 1,2,4- file
To remove 4th field use:
cut -d, -f 1-3,5- file
I believe simplest would be to use sub function to replace first occurrence of continuous ,,(which are getting created after you made 2nd field NULL) with single ,. But this assumes that you don't have any commas in between field values.
awk 'BEGIN{FS=OFS=","}{$2="";sub(/,,/,",");print $0}' Input_file
2nd solution: OR you could use match function to catch regex from first comma to next comma's occurrence and get before and after line of matched string.
awk '
match($0,/,[^,]*,/){
print substr($0,1,RSTART-1)","substr($0,RSTART+RLENGTH)
}' Input_file
It's a bit heavy-handed, but this moves each field after field 2 down a place, and then changes NF so the unwanted field is not present:
$ awk -F, -v OFS=, '{ for (i = 2; i < NF; i++) $i = $(i+1); NF--; print }' test.txt
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01
0x01,0x00,0x76
$
Tested with both GNU Awk 4.1.3 and BSD Awk ("awk version 20070501" on macOS Mojave 10.14.6 — don't ask; it frustrates me too, but sometimes employers are not very good at forward thinking). Setting NF may or may not work on older versions of Awk — I was a little surprised it did work, but the surprise was a pleasant one, for a change.
If Awk is not an absolute requirement, and the input is indeed as trivial as in your example, sed might be a simpler solution.
sed 's/,[^,]*//' test.txt
This is especially elegant if you want to remove the second field. A more generic approach to remove, the nth field would require you to put in a regex which matches the first n - 1 followed by the nth, then replace that with just the the first n - 1.
So for n = 4 you'd have
sed 's/\([^,]*,[^,]*,[^,]*,\)[^,]*,/\1/' test.txt
or more generally, if your sed dialect understands braces for specifying repetitions
sed 's/\(\([^,]*,\)\{3\}\)[^,]*,/\1/' test.txt
Some sed dialects allow you to lose all those pesky backslashes with an option like -r or -E but again, this is not universally supported or portable.
In case it's not obvious, [^,] matches a single character which is not (newline or) comma; and \1 recalls the text from first parenthesized match (back reference; \2 recalls the second, etc).
Also, this is completely unsuitable for escaped or quoted fields (though I'm not saying it can't be done). Every comma acts as a field separator, no matter what.
With GNU sed you can add a number modifier to substitute nth match of non-comma characters followed by comma:
sed -E 's/[^,]*,//2' file
Using awk in a regex-free way, with the option to choose which line will be deleted:
awk '{ col = 2; n = split($0,arr,","); line = ""; for (i = 1; i <= n; i++) line = line ( i == col ? "" : ( line == "" ? "" : "," ) arr[i] ); print line }' test.txt
Step by step:
{
col = 2 # defines which column will be deleted
n = split($0,arr,",") # each line is split into an array
# n is the number of elements in the array
line = "" # this will be the new line
for (i = 1; i <= n; i++) # roaming through all elements in the array
line = line ( i == col ? "" : ( line == "" ? "" : "," ) arr[i] )
# appends a comma (except if line is still empty)
# and the current array element to the line (except when on the selected column)
print line # prints line
}
Another solution:
You can just pipe the output to another sed and squeeze the delimiters.
$ awk -F, 'BEGIN{OFS=","}{$2=""}1 ' edward.txt | sed 's/,,/,/g'
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
$
Commenting on the first solution of #RavinderSingh13 using sub() function:
awk 'BEGIN{FS=OFS=","}{$2="";sub(/,,/,",");print $0}' Input_file
The gnu-awk manual: https://www.gnu.org/software/gawk/manual/html_node/Changing-Fields.html
It is important to note that making an assignment to an existing field changes the value of $0 but does not change the value of NF, even when you assign the empty string to a field." (4.4 Changing the Contents of a Field)
So, following the first solution of RavinderSingh13 but without using, in this case,sub() "The field is still there; it just has an empty value, delimited by the two colons":
awk 'BEGIN {FS=OFS=","} {$2="";print $0}' file
0x01,,0x93,0x65,0xF8
0x01,,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,,0x00,0x76
My solution:
awk -F, '
{
regex = "^"$1","$2
sub(regex, $1, $0);
print $0;
}'
or one line code:
awk -F, '{regex="^"$1","$2;sub(regex, $1, $0);print $0;}' test.txt
I found that OFS="," was not necessary
I would do it following way, let file.txt content be:
0x01,0xDF,0x93,0x65,0xF8
0x01,0xB0,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0xB2,0x00,0x76
then
awk 'BEGIN{FS=",";OFS=""}{for(i=2;i<=NF;i+=1){$i="," $i};$2="";print}' file.txt
output
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
Explanation: I set OFS to nothing (empty string), then for 2nd and following column I add , at start. Finally I set what is now comma and value to nothing. Keep in mind this solution would need rework if you wish to remove 1st column.

AWK - get value between two strings over multiple lines

input.txt:
>block1
111111111111111111111
>block2
222222222222222222222
>block3
333333333333333333333
AWK command:
awk '/>block2.*>/' input.txt
Expected output
222222222222222222222
However, AWK is returning nothing. What am I misunderstanding?
Thanks!
If you want to print the line after the line containing >block2, then you could use:
awk '/^>block2$/ { nr=NR+1 } NR == nr { print }'
Track the record number plus 1 when you find the match; when the current record number matches the remembered one, print the current record.
If you want all the lines between the line >block2 and >block3, then you'd use:
awk '/^>block2$/,/^>block3/ {if ($0 !~ /^>block[23]$/) print }'
For all lines between the two markers, if the line doesn't match either marker, print it. The output is the same with the sample data file.
another awk
$ awk 'c&&c--; /^>block2/{c=1}' file
222222222222222222222
c specifies how many lines you want to print after the match. If you want the text between two markers
$ awk '/^>block3/{exit} s; /^>block2/{s=1}' file
222222222222222222222
if there are multiple instances and you want them all, just change exit to s=0
You probably meant:
$ awk '/>/{f=/^>block2$/;next} f' file
222222222222222222222

Counting the number of specific values in a column with awk

I have data (data.csv):
"1",5.1,"s"
"2",3.3,"s"
"3",2.7,"c"
and I want to count the number of line whose 3rd element is "s" or "c" with AWK (count.awk):
BEGIN{FS=","; s_count=0; c_count=0}
($3=="s"){s_count++}
($3=="c"){c_count++}
END{print s_count; print c_count}
then
$awk -f count.awk data.csv
but this does not work. Its output is:
0
0
this is not I expected. Why?
$ awk -V
GNU Awk 4.1.0, API: 1.0 (GNU MPFR 3.1.2, GNU MP 5.1.2)
Note: I use Awk on cygwin.
The problem is that your target field has embedded double quotes, so you need to match them too, by including them - \-escaped - in the string to match against:
awk '
BEGIN{FS=","; s_count=0; c_count=0}
($3=="\"s\"") {s_count++}
($3=="\"c\"") {c_count++}
END{ print s_count; print c_count }
' data.csv
As an aside, you can simplify your awk program somewhat:
the parentheses are not needed (have not verified on cygwin, but given that it's awk interpreting the string, I wouldn't expect that to matter)
you don't strictly need to initialize your output variables, because awk defaults uninitialized variables to 0 in numerical contexts.
BEGIN{FS=","}
$3 == "\"s\"" {s_count++}
$3 == "\"c\"" {c_count++}
END{ print s_count; print c_count }
This is a job for an array. Here is an awk command:
awk -F, '{gsub(/\"/,"",$3);a[$3]++} END {for (i in a) print i,a[i]}' file
c 1
s 2
It counts the number of c and s occurrences. Also counts other letters if they exist.