Linux word in column ends with 'a' - awk

I have text file. I need to print people whose names ends with 'a' and have more than 30 years.
I did this:
awk '{if($4>30)print $1,$2}' New.txt
I don't know how to finish.
New.txt:
Name Lastname City Age
John Smith Verona 12
Barney Stinson York 55
Jessica Alba London 33

Could you please try following, written as per shown samples in GNU awk.
awk '$1~/[aA]$/ && $NF>30{print $1,$2}' Input_file
Explanation: Simply checking condition if 1st field ends with a OR A AND last field is greater than 30 then print that line's first and second fields.

You can try
awk '{if($4>30 && $1 ~ /a$/) print$1, $2}' New.txt

Related

Join of two files introduces extraneous newline

Update: I figured out the reason for the extraneous newline. I created file1 and file2 on a Windows machine. Windows adds <cr><newline> to the end of each line. So, for example, the first record in file1 is not this:
Bill <tab> 25 <newline>
Instead, it is this:
Bill <tab> 25 <cr><newline>
So when I set a[Bill] to $2 I am actually setting it to $2<cr>.
I used a hex editor and removed all of the <cr> symbols in file1 and file2. Now the AWK program works as desired.
I have seen the SO posts on using AWK to do a natural join of two files. I took one of the solutions and am trying to get it to work. Alas, I have been unsuccessful. I am hoping you can tell me what I am doing wrong.
Note: I appreciate other solutions, but what I really want is to understand why my AWK program doesn't work (i.e., why/how an extraneous newline is being introduced).
I want to do a join of these two files:
file1 (name, tab, age):
Bill 25
John 24
Mary 21
file2 (name, tab, marital-status)
Bill divorced
Glenn married
John married
Mary single
When joined, I expect to see this (name, tab, age, tab, marital-status):
Bill 25 divorced
John 24 married
Mary 21 single
Notice that file2 has a person named Glenn, but file1 doesn't. No record in file1 joins to it.
My AWK program almost produces that result. But, for reasons I don't understand, the marital-status value is on the next line:
Bill 25
divorced
John 24
married
Mary 21
single
Here is my AWK program:
awk 'BEGIN { OFS = '\t' }
NR == FNR { a[$1] = ($1 in a? a[$1] OFS : "")$2; next }
$1 in a { $0 = $0 OFS a[$1]; delete a[$1]; print }' file2 file1 > joined_file1_file2
You may try this awk solution:
awk 'BEGIN {FS=OFS="\t"} {sub(/\r$/, "")}
FNR == NR {m[$1]=$2; next} {print $0, m[$1]}' file2 file1
Bill 25 divorced
John 24 married
Mary 21 single
Here:
Using sub(/\r$/, "") to remove any DOS line ending
If $1 doesn't exist in mapping m then m[$1] will be an empty string so we can simplify awk processing

Awk not printing what is wanted

my attempt:
awk '$4 != "AZ" && max<$6 || NR==1{ max=$6; data=$0 } END{ print data }' USA.txt
I am trying to print the row that
does NOT have "AZ" in the 4th column
and the greatest value in the 6th column
the file has 6 colums
firstname lastname town/city state-abv. zipcode score
Shellstrop Eleanor Phoenix AZ 85023 -2920765
Shellstrop Donna Tarantula_Springs NV 89047 -5920765
Mendoza Jason Jacksonville FL 32205 -4123794
Mendoza Douglas Jacksonville FL 32209 -3193274
Peleaz Steven Jacksonville FL 32203 -3123794
Based on your attempts, please try following awk code. This checks if 4th field is NOT AZ then it compares previous value of max with current value of $6 if its greater than previous value then it assigns current $6 to max else keeps it to previous value. In END block of awk program its printing its value.
awk -v max="" '$4!="AZ"{max=(max>$6?max:$6)} END{print max}' Input_file
To print complete row for a maximum value found would be:
awk -v max="" '$4!="AZ"{max=(max>$6?max:$6);arr[$6]=$0} END{print arr[max]}' Input_file

AWK: How do I subtract the final field on those lines that have numbers

I have a file containing:
A......Page 23
by John Smith
B......Page 73
by Jane Doe
C......Page 131
by Alice Grey
And I want to subtract the numbers by 22, so the first line will be A......Page 1.
I have searched in many places about gsub or any other awk option with no avail. I have done it through vim editor but knowing the awk solution will be great.
Short awk approach:
$ awk '$NF ~ /^[0-9]+$/{ $NF = $NF-22 }1' file
A......Page 1
by John Smith
B......Page 51
by Jane Doe
C......Page 109
by Alice Grey
$NF ~ /^[0-9]+$/ - considering only lines which last field value $NF is a digit, as it is subjected to arithmetic operation
Could you please try following.
awk '{$NF=$NF~/[0-9]/ && $NF>22?$NF-22:$NF} 1' Input_file
I have considered that you need NOT to have negative values in last column so I have checked condition if $NF last column is greater than 22 then only perform subtraction and then 2nd condition I considered that(which is obvious) that you want to subtract 22 with digits only so have put condition where it checks if last column is having digits in it or not.

Remove line if the second or second to last character is a space in the first column of a CSV

First of all, I apologise for not giving an example of what I've tried because with this one I really don't know where to begin. It's a job for SED or AWK, that's about as far as I can get.
I would like to remove lines if:
The second character is a space in the first column
The second to last character is a space in the first column
Example input
John Smith|Chicago|IL
J Smith|Chicago|IL
Jane Brown|New York|NY
Jane B|New York|NY
Expected Output
John Smith|Chicago|IL
Jane Brown|New York|NY
The files are | delimited, some contain 4 columns of data, others contain 5 or more (I know it's bad formatting, but it's data collected by a NGO that I'm trying to help them with) but in each case I'd like this to happen just for the first column of the file.
I simply translated your two criteria into regexp and use grep with option -v to remove these patterns
The second character is a space in the first column -> ^[^|]
The second to last character is a space in the first column -> ^[^|]* [^|]\|
grep -Ev '(^[^|] )|(^[^|]* [^|]\|)' <input>
Result:
John Smith|Chicago|IL
Jane Brown|New York|NY
Could you please try following.
awk 'BEGIN{FS=OFS="|"} substr($1,2,1)==" " || substr($1,length($1)-1,1)==" "{next} 1' Input_file
This awk should do:
awk -F\| '{s=split($1,a,"")} !(a[2]==" " || a[s-1]==" ")' file
John Smith|Chicago|IL
Jane Brown|New York|NY
It splits the first field inn to array a and length in s. Then test second and second last if empty.
Easy to read and easy to understand how it works :)
$ awk -F'|' '$1 !~ /^. | .$/' file
John Smith|Chicago|IL
Jane Brown|New York|NY
Smaller version of "Corentin Limier" answer
grep -Ev '(^. )|(^* .\|)' filename
Result:
John Smith|Chicago|IL
Jane Brown|New York|NY
This may also be possible with "sed" command
sed '/^. /d' filename | sed '/ .|/d'

awk find out how many times columns two and three equal specific word

Lets say I have a names.txt file with the following
Bob Billy Billy
Bob Billy Joe
Bob Billy Billy
Joe Billy Billy
and using awk I want to find out how many times $2 = Billy while $3 = Billy. In this case my desired output would be 3 times.
Also, I'm testing this on a mac if that matters.
You first need to test $2==$3 then test that one of those equals "Billy". Increment a counter and then print the result at the end:
$ awk '$2==$3 && $2=="Billy"{cnt++} END{print cnt+0}' names.txt
3
Or, you could almost write just what you said:
$ awk '$2=="Billy" && $3=="Billy" {cnt++} END{print cnt+0}' names.txt
3
And if you want to use a variable so you don't need to type it several times:
$ awk -v name='Billy' '$2==name && $3==name {cnt++}
END{printf "Found \"%s\" %d times\n", name, cnt+0}' names.txt
Found "Billy" 3 times
Or, you could collect them all up and report what was found:
$ awk '{cnts[$2 "," $3]++}
END{for (e in cnts) print e ": " cnts[e]}' names.txt
Billy,Billy: 3
Billy,Joe: 1
You may also consider use grep to do that,
$ grep -c "\sBilly\sBilly" name.txt
3
-c: print a count of matching lines