AWK: How do I subtract the final field on those lines that have numbers - awk

I have a file containing:
A......Page 23
by John Smith
B......Page 73
by Jane Doe
C......Page 131
by Alice Grey
And I want to subtract the numbers by 22, so the first line will be A......Page 1.
I have searched in many places about gsub or any other awk option with no avail. I have done it through vim editor but knowing the awk solution will be great.

Short awk approach:
$ awk '$NF ~ /^[0-9]+$/{ $NF = $NF-22 }1' file
A......Page 1
by John Smith
B......Page 51
by Jane Doe
C......Page 109
by Alice Grey
$NF ~ /^[0-9]+$/ - considering only lines which last field value $NF is a digit, as it is subjected to arithmetic operation

Could you please try following.
awk '{$NF=$NF~/[0-9]/ && $NF>22?$NF-22:$NF} 1' Input_file
I have considered that you need NOT to have negative values in last column so I have checked condition if $NF last column is greater than 22 then only perform subtraction and then 2nd condition I considered that(which is obvious) that you want to subtract 22 with digits only so have put condition where it checks if last column is having digits in it or not.

Related

Join of two files introduces extraneous newline

Update: I figured out the reason for the extraneous newline. I created file1 and file2 on a Windows machine. Windows adds <cr><newline> to the end of each line. So, for example, the first record in file1 is not this:
Bill <tab> 25 <newline>
Instead, it is this:
Bill <tab> 25 <cr><newline>
So when I set a[Bill] to $2 I am actually setting it to $2<cr>.
I used a hex editor and removed all of the <cr> symbols in file1 and file2. Now the AWK program works as desired.
I have seen the SO posts on using AWK to do a natural join of two files. I took one of the solutions and am trying to get it to work. Alas, I have been unsuccessful. I am hoping you can tell me what I am doing wrong.
Note: I appreciate other solutions, but what I really want is to understand why my AWK program doesn't work (i.e., why/how an extraneous newline is being introduced).
I want to do a join of these two files:
file1 (name, tab, age):
Bill 25
John 24
Mary 21
file2 (name, tab, marital-status)
Bill divorced
Glenn married
John married
Mary single
When joined, I expect to see this (name, tab, age, tab, marital-status):
Bill 25 divorced
John 24 married
Mary 21 single
Notice that file2 has a person named Glenn, but file1 doesn't. No record in file1 joins to it.
My AWK program almost produces that result. But, for reasons I don't understand, the marital-status value is on the next line:
Bill 25
divorced
John 24
married
Mary 21
single
Here is my AWK program:
awk 'BEGIN { OFS = '\t' }
NR == FNR { a[$1] = ($1 in a? a[$1] OFS : "")$2; next }
$1 in a { $0 = $0 OFS a[$1]; delete a[$1]; print }' file2 file1 > joined_file1_file2
You may try this awk solution:
awk 'BEGIN {FS=OFS="\t"} {sub(/\r$/, "")}
FNR == NR {m[$1]=$2; next} {print $0, m[$1]}' file2 file1
Bill 25 divorced
John 24 married
Mary 21 single
Here:
Using sub(/\r$/, "") to remove any DOS line ending
If $1 doesn't exist in mapping m then m[$1] will be an empty string so we can simplify awk processing

Awk not printing what is wanted

my attempt:
awk '$4 != "AZ" && max<$6 || NR==1{ max=$6; data=$0 } END{ print data }' USA.txt
I am trying to print the row that
does NOT have "AZ" in the 4th column
and the greatest value in the 6th column
the file has 6 colums
firstname lastname town/city state-abv. zipcode score
Shellstrop Eleanor Phoenix AZ 85023 -2920765
Shellstrop Donna Tarantula_Springs NV 89047 -5920765
Mendoza Jason Jacksonville FL 32205 -4123794
Mendoza Douglas Jacksonville FL 32209 -3193274
Peleaz Steven Jacksonville FL 32203 -3123794
Based on your attempts, please try following awk code. This checks if 4th field is NOT AZ then it compares previous value of max with current value of $6 if its greater than previous value then it assigns current $6 to max else keeps it to previous value. In END block of awk program its printing its value.
awk -v max="" '$4!="AZ"{max=(max>$6?max:$6)} END{print max}' Input_file
To print complete row for a maximum value found would be:
awk -v max="" '$4!="AZ"{max=(max>$6?max:$6);arr[$6]=$0} END{print arr[max]}' Input_file

Linux word in column ends with 'a'

I have text file. I need to print people whose names ends with 'a' and have more than 30 years.
I did this:
awk '{if($4>30)print $1,$2}' New.txt
I don't know how to finish.
New.txt:
Name Lastname City Age
John Smith Verona 12
Barney Stinson York 55
Jessica Alba London 33
Could you please try following, written as per shown samples in GNU awk.
awk '$1~/[aA]$/ && $NF>30{print $1,$2}' Input_file
Explanation: Simply checking condition if 1st field ends with a OR A AND last field is greater than 30 then print that line's first and second fields.
You can try
awk '{if($4>30 && $1 ~ /a$/) print$1, $2}' New.txt

Printing out a particular row based on condition in another row

apologies if this really basic stuff but i just started with awk
so i have an input file im piping into awk like below. format never changes (like below)
name: Jim
gender: male
age: 40
name: Joe
gender: female
age: 36
name: frank
gender: Male
age: 40
I'm trying to list all names where age is 40
I can find them like so
awk '$2 == "40" {print $2 }'
but cant figure out how to print the name
Could you please try following(I am driving as of now so couldn't test it).
awk '/^age/{if($NF==40){print val};val="";next} /^name/{val=$0}' Input_file
Explanation: 1st condition checking ^name if a line starts from it then store that line value in variable val. Then in other condition checking if a line starts from age; then checking uf that line's 2nd field is greater than 40 then print value if variable val and nullify it too.
Using gnu awk and set Record Selector to nothing makes it works with blocks.
awk -v RS="" '/age: 40/ {print $2}' file
Jim
frank
Some shorter awk versions of suspectus and RavinderSingh13 post
awk '/^name/{n=$2} /^age/ && $NF==40 {print n}' file
awk '/^name/{n=$2} /^age: 40/ {print n}' file
Jim
frank
If line starts with name, store the name in n
IF line starts with age and age is 40 print n
Awk knows the concept records and fields.
Files are split in records where consecutive records are split by the record separator RS. Each record is split in fields, where consecutive fields are split by the field separator FS.
By default, the record separator RS is set to be the <newline> character (\n) and thus each record is a line. The record separator has the following definition:
RS:
The first character of the string value of RS shall be the input record separator; a <newline> by default. If RS contains more than one character, the results are unspecified. If RS is null, then records are separated by sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a <newline> shall always be a field separator, no matter what the value of FS is.
So with the file format you give, we can define the records based on RS="".
So based on this, we can immediately list all records who have the line age: 40
$ awk 'BEGIN{RS="";ORS="\n\n"}/age: 40/
There are a couple of problems with the above line:
What if we have a person that is 400 yr old, he will be listed because the line /age: 400/ contains that the requested line.
What if we have a record with a typo stating age:40 or age : 40
What if our record has a line stating wage: 40 USD/min
To solve most of these problems, it is easier to work with well-defined fields in the record and build the key-value-pairs per record:
key value
---------------
name => Jim
gender => male
age => 40
and then, we can use this to select the requested information:
$ awk 'BEGIN{RS="";FS="\n"}
# build the record
{ delete rec;
for(i=1;i<=NF;++i) {
# find the first ":" and select key and value as substrings
j=index($i,":"); key=substr($i,1,j-1); value=substr($i,j+1)
# remove potential spaces from front and back
gsub(/(^[[:blank:]]*|[[:blank:]]$)/,key)
gsub(/(^[[:blank:]]*|[[:blank:]]$)/,value)
# store key-value pair
rec[key] = value
}
}
# select requested information and print
(rec["age"] == 40) { print rec["name"] }' file
This is not a one-liner, but it is robust. Furthermore, this method is fairly flexible and adaptable to make selections based on a more complex logic.
If you are not averse to using grep and the format is always the same:
cat filename | grep -B2 "age: 40" | grep -oP "(?<=name: ).*"
Jim
frank
awk -F':' '/^name/{name=$2} \
/^age/{if ($NF==40)print name}' input_file

Print rows that has numbers in it

this is my data - i've more than 1000rows . how to get only the the rec's with numbers in it.
Records | Num
123 | 7 Y1 91
7834 | 7PQ34-102
AB12AC|87 BWE 67
5690278| 80505312
7ER| 998
Output has to be
7ER| 998
5690278| 80505312
I'm new to linux programming, any help would be highly useful to me. thanks all
I would use awk:
awk -F'[[:space:]]*[|][[:space:]]*' '$2 ~ /^[[:digit:]]+$/'
If you want to print the number of lines deleted as you've been asking in comments, you may use this:
awk -F'[[:space:]]*[|][[:space:]]*' '
{
if($2~/^[[:digit:]]+$/){print}else{c++}
}
END{printf "%d lines deleted\n", c}' file
A short and simple GNU awk (gawk) script to filter lines with numbers in the second column (field), assuming a one-word field (e.g. 1234, or 12AB):
awk -F'|' '$2 ~ /\y[0-9]+\y/' file
We use the GNU extension for regexp operators, i.e. \y for matching the word boundary. Other than that, pretty straightforward: we split fields on | and look for isolated digits in the second field.
Edit: Since the question has been updated, and now explicitly allows for multiple words in the second field (e.g. 12 AB, 12-34, 12 34), to get lines with numbers and separators only in the second field:
awk -F'|' '$2 ~ /^[- 0-9]+$/' file
Alternatively, if we say only letters are forbidden in the second field, we can use:
awk -F'|' '$2 ~ /^[^a-zA-Z]+$/' file