How to use Awk to create a new field but retain the original field? - awk

Can this be done in Awk?
FILE_IN (Input file)
ID_Number|Title|Name
65765765|The Cat Sat on the Mat|Dennis Smith
65765799|The Dog Sat on the Catshelf|David Jones
65765797|The Horse Sat on the Sofa|Jeff Jones
FILE_OUT (Desired Results)
ID_Number|Title|Nickname|Name
65765765|The Cat Sat on the Mat|Cat Sat|Dennis Smith
65765799|The Dog Sat on the Catshelf|Dog|David Jones
65765797|The Horse Sat on the Sofa||Jeff Jones
Logic to apply:
IF Title contains “ Cat Sat ” OR " cat sat " THEN Nickname = “Cat Sat” #same titlecase/text as was found#
IF Title contains “ Dog ” OR " dog " THEN Nickname = “Dog”
Also, is this task possible with Sed?

This might work for you (GNU sed):
sed -i '1s/|/&Nickname&/2;1b;s/|.*\b\(Cat\|Dog\)\b.*|/&\u\1|/I;t;s/|.*|/&|/' file
Insert the column Nickname into the headings. If the second column contains either the word Cat or Dog insert a third column with the matching word in it. Otherwise insert a blank third column.

another awk
$ awk 'BEGIN{FS=OFS="|"}
{delete a;
match($2,"([Cc]at [Ss]at|[Dd]og)",a);
$NF=(NR==1?"Nickname":a[1]) OFS $NF}1' file
ID_Number|Title|Nickname|Name
65765765|The Cat Sat on the Mat|Cat Sat|Dennis Smith
65765799|The Dog Sat on the Catshelf|Dog|David Jones
65765797|The Horse Sat on the Sofa||Jeff Jones

You could try this with GNU awk:
awk -F"|" -v OFS="|" 'NR==1{$2 = $2 OFS "Nickname"}
NR>1{if($0 ~ /\s*[Cc]at [Ss]at\s+/) n="Cat"; else if($0 ~ /\s*[dD]og\s+/)n="Dog";
else n=""; $2 = $2 OFS n} 1' file
-F "|" OFS="|" to specify delimiter input and output respectively.
NR==1 To handle header case.
NR>1 To handle data case.
With the same logic, you could use this more compacted code:
awk -F"|" -v OFS="|" 'NR==1{$2 = $2 OFS "Nickname"}
NR>1{n=($0 ~ /\s*[Cc]at [Ss]at\s+/) ? "Cat" : ($0 ~ /\s*[dD]og\s+/) ? "Dog" : ""; $2 = $2 OFS n} 1' file

Related

Join of two files introduces extraneous newline

Update: I figured out the reason for the extraneous newline. I created file1 and file2 on a Windows machine. Windows adds <cr><newline> to the end of each line. So, for example, the first record in file1 is not this:
Bill <tab> 25 <newline>
Instead, it is this:
Bill <tab> 25 <cr><newline>
So when I set a[Bill] to $2 I am actually setting it to $2<cr>.
I used a hex editor and removed all of the <cr> symbols in file1 and file2. Now the AWK program works as desired.
I have seen the SO posts on using AWK to do a natural join of two files. I took one of the solutions and am trying to get it to work. Alas, I have been unsuccessful. I am hoping you can tell me what I am doing wrong.
Note: I appreciate other solutions, but what I really want is to understand why my AWK program doesn't work (i.e., why/how an extraneous newline is being introduced).
I want to do a join of these two files:
file1 (name, tab, age):
Bill 25
John 24
Mary 21
file2 (name, tab, marital-status)
Bill divorced
Glenn married
John married
Mary single
When joined, I expect to see this (name, tab, age, tab, marital-status):
Bill 25 divorced
John 24 married
Mary 21 single
Notice that file2 has a person named Glenn, but file1 doesn't. No record in file1 joins to it.
My AWK program almost produces that result. But, for reasons I don't understand, the marital-status value is on the next line:
Bill 25
divorced
John 24
married
Mary 21
single
Here is my AWK program:
awk 'BEGIN { OFS = '\t' }
NR == FNR { a[$1] = ($1 in a? a[$1] OFS : "")$2; next }
$1 in a { $0 = $0 OFS a[$1]; delete a[$1]; print }' file2 file1 > joined_file1_file2
You may try this awk solution:
awk 'BEGIN {FS=OFS="\t"} {sub(/\r$/, "")}
FNR == NR {m[$1]=$2; next} {print $0, m[$1]}' file2 file1
Bill 25 divorced
John 24 married
Mary 21 single
Here:
Using sub(/\r$/, "") to remove any DOS line ending
If $1 doesn't exist in mapping m then m[$1] will be an empty string so we can simplify awk processing

AWK join next 2 rows from csv

I'm using the following command to join the next row but would like to join the next 2 rows
awk -v OFS=', ' 'NR==1{first=$0} NR>1{print prev, $0} {prev=$0} END{print prev, first}' test.csv
test.csv
rabbit
cat
dog
turtle
sheep
cow
Result:
rabbit, cat
cat, dog
dog, turtle
turtle, sheep
sheep, cow
cow, rabbit
Desired result:
rabbit, cat, dog
cat, dog, turtle
dog, turtle, sheep
turtle, sheep, cow
sheep, cow, rabbit
cow, rabbit, cat
any help would be appreciated
Could you please try following, written in GNU awk. Written and tested with shown samples only.
awk -v RS= -v OFS=', ' '
{
for(i=1;i<=(NF-1);i++){
printf("%s %s %s %s\n",$i,OFS $(i+1),OFS $(i+2),i==(NF-1)?$1:"")
}
}
END{
print $NF,$1,$2
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -v RS= -v OFS=', ' ' ##Starting awk program from here.
{
for(i=1;i<=(NF-1);i++){ ##Running for loop from 1to till 2nd last element of line.
printf("%s %s %s %s\n",$i,OFS $(i+1),OFS $(i+2),i==(NF-1)?$1:"") ##Printing current field, OFS next field and next to next field and checking condition if i==NF-1 then print 1st field else print nothing.
}
}
END{ ##Starting END block for this code from here.
print $NF,$1,$2 ##Printing last field then 1st and 2nd field here.
}
' Input_file ##Mentioning Input_file name here.
Just read the complete file into memory and then manipulate the columns:
$ awk -v OFS="," -v n=3 '{a[NR-1]=$0}END{for(i=0;i<NR;++i) for(j=0;j<n;++j) printf "%s%s",a[(i+j)%NR], (j==n-1?ORS:OFS)}' file

Formatting file using awk

I am attempting to format two pieces of data from my awk script. Here is a piece of my raw data
Mike:James:314849866:mjames69#asu.edu:5059358554:NM:8830:Johnson:Rd:Albuquerque:87122
There are a total of nine lines like this. I have formatted it in this manner
Mike James, 314849866
8830 Johnson Rd
Albuquerque, NM 87122
mjames69#asu.edu
5059358554
using this code:
cat rawadd | awk -F: ' NR == 1,NR == 9 {print $1 " " $2 ", " $3 "\n" $7 " " $8 " " $9 "\n" $10 ", " $6 " " $11 "\n" $4 "\n" $5
I would like to format the phone number, the fifth line like (505)935-8554. So I created a new variable $tel, and replaced it with the $5 variable that I extracted from the rawadd file.
Here is what that new code looks like:
tel=`"(${5:0:3}) ${5:3:3}-${5:6:4}"`
cat rawadd | awk -F: ' NR == 1,NR == 9 {print $1 " " $2 ", " $3 "\n" $7 " " $8 " " $9 "\n" $10 ", " $6 " " $11 "\n" $4 "\n" $tel "\n"}';
But my output is coming out like this
Mike James, 314849866
8830 Johnson Rd
Albuquerque, NM 87122
mjames69#asu.edu
Mike:James:314849866:mjames69#asu.edu:5059358554:NM:8830:Johnson:Rd:Albuquerque:87122
On the fifth line it just prints the actual line input, and not the formatted telephone number. I was hoping that I might add the formatting directly into the awk command, but cannot figure out a way. I would also like to format the Id number on the first line to be 314-84-9866. Any help would be great.
Thank you
Very similar to #karakfas answer but will work in any awk, uses a choice of OFS that I think better represents your real output fields, and will only process the first 9 input lines, like in your original script:
$ cat tst.awk
BEGIN { FS=":"; OFS=", " }
{
print $1 " " $2, substr($3,1,3) "-" substr($3,4,2) "-" substr($3,6)
print $7 " " $8 " " $9
print $10, $6 " " $11
print $4
print "(" substr($5,1,3) ")" substr($5,4,3) "-" substr($5,7)
}
NR==9 { exit }
$ awk -f tst.awk file
Mike James, 314-84-9866
8830 Johnson Rd
Albuquerque, NM 87122
mjames69#asu.edu
(505)935-8554
The first problem you were having with your tel variable is that when you write:
tel=7
awk '{print $tel}'
you actually have 2 completely different variables, a shell variable named tel created by tel=7 in the shell script, and an awk variable also named tel created by print $tel in the awk script and completely unrelated to the shell variable of the same name.
The second problem you were having is that to access the contents of an awk variable you just use the variable name, just like you would in C, you do not prepend it with a $ like you would in shell.
The third problem you were having is that since the awk variable tel is unset it gets the value zero-or-null (all awk variables are of type numeric-string - google that) and so when you use $tel that's the same as if you said $0 whose contents are the entire input line (record).
All of those together are why you were getting the input line reproduced in your output.
The syntax to do what you were trying to do would be:
tel=7
awk -v tel="$tel" '{print tel}'
where -v tel="$tel" is initializing an awk variable named tel with the contents of the shell variable also named tel. More clearly:
shelltel=7
awk -v awktel="$shelltel" '{print awktel}'
It's very important to understand that awk is not shell - it's a completely separate tool with it's own scope and a language whose syntax is much more similar to C than to shell.
Here is one solution with gawk,
awk -F: '{
print $1, $2 ",", gensub(/(...)(..)(....)/,"\\1-\\2-\\3",1,$3);
print $7, $8, $9;
print $10 ",", $6, $11;
print $4
print gensub(/(...)(...)(....)/,"(\\1)\\2-\\3",1,$5)
}' file
will give
Mike James, 314-84-9866
8830 Johnson Rd
Albuquerque, NM 87122
mjames69#asu.edu
(505)935-8554
your NR==1 ... didn't make any sense to me. Perhaps you have some reason but need to explain.

Print default value if index is not in awk array

$ cat file1 #It contains ID:Name
5:John
4:Michel
$ cat file2 #It contains ID
5
4
3
I want to Replace the IDs in file2 with Names from file1, output required
John
Michel
NO MATCH FOUND
I need to expand the below code to reult NO MATCH FOUND text.
awk -F":" 'NR==FNR {a[$1]=$2;next} {print a[$1]}' file1 file2
My current result:
John
Michel
<< empty line
Thanks,
You can use a ternary operator for this: print ($1 in a)?a[$1]:"NO MATCH FOUND". That is, if $1 is in the array, print it; otherwise, print the text "NO MATCH FOUND".
All together:
$ awk -F":" 'NR==FNR {a[$1]=$2;next} {print ($1 in a)?a[$1]:"NO MATCH FOUND"}' f1 f2
John
Michel
NO MATCH FOUND
You can test whether the index occurs in the array:
$ awk -F":" 'NR==FNR {a[$1]=$2;next} $1 in a {print a[$1]; next} {print "NOT FOUND"}' file1 file2
John
Michel
NOT FOUND
if file2 has only digit (no space at the end)
awk -F ':' '$1 in A {print A[$1];next}{if($2~/^$/) print "NOT FOUND";else A[$1]=$2}' file1
if not
awk -F '[:[:blank:]]' '$1 in A {print A[$1];next}{if($2~/^$/) print "NOT FOUND";else A[$1]=$2}' file1 file2

Internal piping with awk

Let say i have input line:
input:
{x:y} abc det uyt llu
how to process it, to get expected output:
output:
{x:y} abc%det%uyt%llu
Question is how to concatanate fields 2-end of line, and in that string change space with %
where separator is space
I need fixed first part {x:y} and implementing pipe for fields 2-end of line
Here is another awk
awk '{$1=$1;sub(/%/," ")}1' OFS="%" file
echo '{x:y} abc det uyt llu' | awk '{$1=$1;sub(/%/," ")}1' OFS="%"
{x:y} abc%det%uyt%llu
This change all space to %, using OFS and $1=$1, then change the first % to space.
You can use this awk:
s='{x:y} abc det uyt llu'
awk '{printf "%s%s", $1, OFS; for (i=2; i<=NF; i++) printf "%s%s", $i, (i==NF)?RS:"%"}' <<< "$s"
{x:y} abc%det%uyt%llu
Another awk:
awk '{printf "%s%s", $1, OFS; OFS="%"; $1=""; print substr($0, 2)}' <<< "$s"
{x:y} abc%det%uyt%llu