Awk Scripting printf ignoring my sort command - awk

I am trying to run a script that I have set up but when I go to sort the contents and display the text the content is printed but the sort command is ignored and the information is just printed. I tried this code format using awk and the sort function is ignored but I am not sure why.
Command I tried:
sort -t, -k4 -k3 | awk -F, '{printf "%-18s %-27s %-15s %s\n", $1, $2, $3, $4 }' c_list.txt
The output I am getting is:
Jim Girv 199 pathway rd Orlando FL
Megan Rios 205 highwind dr Sacremento CA
Tyler Scott 303 cross st Saint James NY
Tim Harding 1150 Washton ave Pasadena CA
The output I need is:
Tim Harding 1150 Washton ave Pasadena CA
Megan Rios 205 highwind dr Sacremento CA
Jim Girv 199 pathway rd Orlando FL
Tyler Scott 303 cross st Saint James NY
It just ignores the sort command but still prints the info I need in the format from the file.
I need it to sort based off the fourth field first the state and the third field next the town then display the information.
An example where each field is separated by a comma.
Field 1 Field 2 Field 3 Field 4
Jim Girv, 199 pathway rd, Orlando, FL

The problem is you're doing sort | awk 'script' file instead of sort file | awk 'script' so sort is sorting nothing and consequently producing no output while awk is operating on your original file and so producing output from that. You should have noticed that your sort command is hanging too for lack of input and you should have mentioned that in your question.
To demonstrate:
$ cat file
c
b
a
$ sort | awk '1' file
c
b
a
$ sort file | awk '1'
a
b
c

Related

Extract text between patterns in new files

I'm trying to analyze a file with the following structure:
AAAAA
123
456
789
AAAAA
555
777
999
777
The idea is to detect the 'AAAAA' pattern and extract the two following lines. After this is done, I would like to append the next 'AAAAA' pattern and the following two lines, so th final file will look something like this:
AAAAA
123
456
AAAA
555
777
Taking into account that the last one will not end with the 'AAAAA' pattern.
Any idea about how this can be done ? I've use sed but I don't know how to select the number of lines to be retained after the pattern...
Fo example with AWK:
awk '/'$AAAAA'/,/'$AAAAA'/' INPUTFILE.txt
Bu this will only extract all the text between the two AAAAA
Thanks
With sed
sed -n '/AAAAA/{N;N;p}' file.txt
with smart counters
$ awk '/AAAAA/{n=3} n&&n--' file
AAAAA
123
456
AAAAA
555
777
The grep command has a flag that prints lines after each match. For example:
grep AAAAA --after 2 <file>
Unless I misunderstood, this should match your requirements, and is much simpler than awk scripts.
You may try this awk:
awk '$1 == "AAAAA" {n = NR+2} NR <= n' file
AAAAA
123
456
AAAAA
555
777
just cheat
mawk/mawk2/gawk 'BEGIN { FS = OFS = "AAAAA\n"; RS = "^$";
} END { for(x=2; x<= NF; x++) { print $(x) } }'
no one says the fields must be split by spaces, and rows must be new-lines one-by-one. By design of FS, every field after $1 will contain the matches you need, and fitting multiple "rows" of each within $2 etc.
In this example, in $2 you will find 12 bytes, like this :
1 2 3 \n 4 5 6 \n 7 8 9 \n # spaced out for readability

Using awk to export a mysql table to .csv

I've run into an issue where I'm trying to understand awk for a class. We are supposed to take the table full of names and some other information and divide each field using "," to make it easier to export to .csv. So far what I have removes all extra characters including the initial "," tied to the first field. I'm down to 2 last issues with my script. The first is adding the "," to divide each field. I know this seems basic, but I'm having a hard time wrapping my head around it. The second is that occasionally $2 is followed by an extra initial standing in for a middle name. I have no idea how to incorporate another field to every other line that does not have an initial.
The table is the following:
+---------------------------------+------------+------+----------+
| Name | NumCourses | Year | Semester |
+---------------------------------+------------+------+----------+
| ABDULHADI, ASHRAF M | 2 | 1990 | 3 |
| ACHANTA, BALA | 2 | 1995 | 3 |
| ACHANTA, BALA | 2 | 1996 | 3 |
+---------------------------------+------------+------+----------+
My Code:
awk 'NR==3, N==6{gsub(","," "); gsub(/\|/, " "); gsub(/\+/," "); gsub(/\-/," "); print $1, $2, $3, $4, $5, $6}' awktest.txt
Output:
ABDULHADI ASHRAF M 2 1990 3
ACHANTA BALA 2 1995 3
ACHANTA BALA 2 1996 3
P.S. It should be noted that we were instructed to rip out the headers.
Expected Output:
ABDULHADI,ASHRAF,M,2,1990,3
ACHANTA,BALA,N/A,,2,1995,3
ACHANTA,BALA,N/A,2,1996,3
Your approach to first remove punctuation characters is good. Keeping it, you could write:
awk -v OFS="," '{gsub(","," ");gsub(/\|/," ")}{$1=$1}NF==5{$2=$2",N/A"}NF>4' awktest.txt
Let's unwind it and understand what is happening:
awk -v OFS="," ' #Output field separator is set to comma
{
gsub(","," ") #Substitute any comma by space
gsub(/\|/," ") #Substitute any pipe by space
}
{$1=$1} #Force line rebuild so that OFS is used to separate fields outputed
NF==5{$2=$2",N/A"} #If there are only 5 fields, middle-name is missing, so append ",N/A" to 2nd field
NF>4 #Print resulting lines that have at least 5 fields (this gets rid of headers)
' awktest.txt
Output:
ABDULHADI,ASHRAF,M,2,1990,3
ACHANTA,BALA,N/A,2,1995,3
ACHANTA,BALA,N/A,2,1996,3
Feel free to request further clarification if you need it.

Sed replace nth column of multiple tsv files without header

Here are multiple tsv files, where I want to add 'XX' characters only in the second column (everywhere except in the header) and save it to this same file.
Input:
$ls
file1.tsv file2.tsv file3.tsv
$head -n 4 file1.tsv
a b c
James England 25
Brian France 41
Maria France 18
Ouptut wanted:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18
I tried this, but the result is not kept in the file, and a simple redirection won't work:
# this works, but doesn't save the changes
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed "s|^|X${i}_|"
i=$((i+1))
done
# adding '-i' option to sed: this throws an error but would be perfect (sed no input files error)
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed -i "s|^|T${i}_|"
i=$((i+1))
done
Some help would be appreciated.
The second column is particularly easy because you simply replace the first occurrence of the separator.
for file in *.tsv; do
sed -i '2,$s/\t/\tX1_/' "$file"
done
If your sed doesn't recognize the symbol \t, use a literal tab (in many shells, you type it with ctrlv tab.) On *BSD (and hence MacOS) you need -i ''
AWK solution:
awk -i inplace 'BEGIN { FS=OFS="\t" } NR!=1 { $2 = "X1_" $2 } 1' file1.tsv
Input:
a b c
James England 25
Brian France 41
Maria France 18
Output:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18

Awk print different character on seperate line

I have been search aorund and could not find the answer... wonder if anyone can help here.
suppose I have a file contents the following:
File1:
name Joe
day Wednesday
lunch was fish
name John
dinner pie
day tuesday
lunch was noodles
name Mary
day Friday
lunch was fish pie
I wanted to grep and print only their name and what they had for lunch.
I suppose i can do
cat file1 | grep -iE 'name|lunch'
but what if i want to do a awk to just have their name and food like this output below?
Joe
fish
John
noodles
Mary
fish pie
I am aware to use awk to print, but this may require awk, is it possible for awk to lets say print $2 on one line, and print $3 on another?
Can I also output it in this format:
Person food
Joe fish
John noodles
Mary fish pie
Thanks
You can for example say:
$ awk '/name/ {print $2} /lunch/ {$1=$2=""; print}' file
Joe
fish
John
noodles
Mary
fish pie
Or remove the lunch was text:
awk '/name/ {print $2} /lunch/ {gsub("lunch was ",""); print}' file
To make the output in two columns:
$ awk -v OFS="\t" '/name/ {name=$2} /lunch/ {gsub("lunch was ",""); print name, $0}' a
Joe fish
John noodles
Mary fish pie
awk
with awk you can do it in one shot,
awk -v RS="" '{n=$2;sub(/.*lunch was\s*/,"");print n,$0}' file
Note that with this one-liner, the format of your input file should be fixed. Your data should be stored in data blocks and lunch was line should be at the end of each data block.
test with your example:
kent$ awk -v RS="" '{n=$2;sub(/.*lunch was\s*/,"");print n,$0}' file
Joe fish
John noodles
Mary fish pie
grep & sed
also you can do it in two steps, grep the values out, and merge lines
grep -Po 'name\s*\K.*|lunch was\s*\K.*' file|sed 'N;s/\n/ /'
with your input file, it outputs:
kent$ grep -Po 'name\s*\K.*|lunch was\s*\K.*' file|sed 'N;s/\n/ /'
Joe fish
John noodles
Mary fish pie

AWK associative array

Suppose I have 2 files
File-1 map.txt
1 tony
2 sean
3 jerry
4 ada
File-2 relation.txt
tony sean
jerry ada
ada sean
Expected-Output result.txt
1 2
3 4
4 2
My code was:
awk 'FNR==NR{map[$1]=$2;next;} {$1=map[$1]; $2=map[$2]; print $0}' map.txt relation.txt > output.txt
But I got the left column only:
1
3
4
It seems that something wrong near $2=map[$2].
Very appreciated if you could help.
You've got the mapping creation the wrong way around, it needs to be:
map[$2] = $1
Your current script maps numbers to names whereas what you seem to be after is a map from names to numbers.
The following transcript shows the corrected script:
pax> cat m.txt
1 tony
2 sean
3 jerry
4 ada
pax> cat r.txt
tony sean
jerry ada
ada sean
pax> awk 'FNR==NR{map[$2]=$1;next;}{$1=map[$1];$2=map[$2];print $0}' m.txt r.txt
1 2
3 4
4 2
Using awk.
awk 'FNR==NR{map[$2]=$1;next;}{print map[$1], map[$2]}' m.txt r.txt