Sed replace nth column of multiple tsv files without header - awk

Here are multiple tsv files, where I want to add 'XX' characters only in the second column (everywhere except in the header) and save it to this same file.
Input:
$ls
file1.tsv file2.tsv file3.tsv
$head -n 4 file1.tsv
a b c
James England 25
Brian France 41
Maria France 18
Ouptut wanted:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18
I tried this, but the result is not kept in the file, and a simple redirection won't work:
# this works, but doesn't save the changes
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed "s|^|X${i}_|"
i=$((i+1))
done
# adding '-i' option to sed: this throws an error but would be perfect (sed no input files error)
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed -i "s|^|T${i}_|"
i=$((i+1))
done
Some help would be appreciated.

The second column is particularly easy because you simply replace the first occurrence of the separator.
for file in *.tsv; do
sed -i '2,$s/\t/\tX1_/' "$file"
done
If your sed doesn't recognize the symbol \t, use a literal tab (in many shells, you type it with ctrlv tab.) On *BSD (and hence MacOS) you need -i ''

AWK solution:
awk -i inplace 'BEGIN { FS=OFS="\t" } NR!=1 { $2 = "X1_" $2 } 1' file1.tsv
Input:
a b c
James England 25
Brian France 41
Maria France 18
Output:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18

Related

Extract text between patterns in new files

I'm trying to analyze a file with the following structure:
AAAAA
123
456
789
AAAAA
555
777
999
777
The idea is to detect the 'AAAAA' pattern and extract the two following lines. After this is done, I would like to append the next 'AAAAA' pattern and the following two lines, so th final file will look something like this:
AAAAA
123
456
AAAA
555
777
Taking into account that the last one will not end with the 'AAAAA' pattern.
Any idea about how this can be done ? I've use sed but I don't know how to select the number of lines to be retained after the pattern...
Fo example with AWK:
awk '/'$AAAAA'/,/'$AAAAA'/' INPUTFILE.txt
Bu this will only extract all the text between the two AAAAA
Thanks
With sed
sed -n '/AAAAA/{N;N;p}' file.txt
with smart counters
$ awk '/AAAAA/{n=3} n&&n--' file
AAAAA
123
456
AAAAA
555
777
The grep command has a flag that prints lines after each match. For example:
grep AAAAA --after 2 <file>
Unless I misunderstood, this should match your requirements, and is much simpler than awk scripts.
You may try this awk:
awk '$1 == "AAAAA" {n = NR+2} NR <= n' file
AAAAA
123
456
AAAAA
555
777
just cheat
mawk/mawk2/gawk 'BEGIN { FS = OFS = "AAAAA\n"; RS = "^$";
} END { for(x=2; x<= NF; x++) { print $(x) } }'
no one says the fields must be split by spaces, and rows must be new-lines one-by-one. By design of FS, every field after $1 will contain the matches you need, and fitting multiple "rows" of each within $2 etc.
In this example, in $2 you will find 12 bytes, like this :
1 2 3 \n 4 5 6 \n 7 8 9 \n # spaced out for readability

Awk Scripting printf ignoring my sort command

I am trying to run a script that I have set up but when I go to sort the contents and display the text the content is printed but the sort command is ignored and the information is just printed. I tried this code format using awk and the sort function is ignored but I am not sure why.
Command I tried:
sort -t, -k4 -k3 | awk -F, '{printf "%-18s %-27s %-15s %s\n", $1, $2, $3, $4 }' c_list.txt
The output I am getting is:
Jim Girv 199 pathway rd Orlando FL
Megan Rios 205 highwind dr Sacremento CA
Tyler Scott 303 cross st Saint James NY
Tim Harding 1150 Washton ave Pasadena CA
The output I need is:
Tim Harding 1150 Washton ave Pasadena CA
Megan Rios 205 highwind dr Sacremento CA
Jim Girv 199 pathway rd Orlando FL
Tyler Scott 303 cross st Saint James NY
It just ignores the sort command but still prints the info I need in the format from the file.
I need it to sort based off the fourth field first the state and the third field next the town then display the information.
An example where each field is separated by a comma.
Field 1 Field 2 Field 3 Field 4
Jim Girv, 199 pathway rd, Orlando, FL
The problem is you're doing sort | awk 'script' file instead of sort file | awk 'script' so sort is sorting nothing and consequently producing no output while awk is operating on your original file and so producing output from that. You should have noticed that your sort command is hanging too for lack of input and you should have mentioned that in your question.
To demonstrate:
$ cat file
c
b
a
$ sort | awk '1' file
c
b
a
$ sort file | awk '1'
a
b
c

Awk pass variable containing columns to be printed

I want to pass a variable to awk containing which columns to print from a file?
In this trivial case file.txt contains a single line
11 22 33 44 55
This is what I've tried:
awk -v a='2/4' -v b='$2/$4' '{print a"\n"$a"\n"b"\n"$b}' file.txt
output:
2/4
22
$2/$4
11 22 33 44 55
desired output:
0.5
Is there any way to do this type of "eval" of variable as a command?
Here is one method for dividing columns specified in variables:
$ awk -v num=2 -v denom=4 '{print $num/$denom}' file.txt
0.5
If you trust the person who creates the shell variable b, then here is a method that offers flexibility:
$ b='$2/$4'; awk "{print $b}" file.txt
0.5
$ b='$1*$2'; awk "{print $b}" file.txt
242
$ b='$2,$2/$4,$5'; awk "{print $b}" file.txt
22 0.5 55
The flexibility here is due to the fact that b can contain any awk code. This approach requires that you trust the creator of b.

Counting occurrences between pattern matches in data file and generating a report

I have a file structured like this:
MATCH A and B
001
005
101
MATCH A and C
020
400
MATCH B and C
001
156
807
920
I want to generate a report that looks like:
A and B: 3
A and C: 2
B and C: 4
I imagine the tools to use are sed/awk. I know that sed can print lines between pattern matches, but the following ends up printing out the whole file.
sed -n '/^MATCH/,/^MATCH/p' file.txt | wc -l
This returns the number of lines in the whole file. Any tips on where to look at next? It seems that this isn't the most common task and I haven't been able to find many other suggestions.
This awk should do:
awk -v RS= '{print $2,$3,$4":",NF-4}' file
A and B: 3
A and C: 2
B and C: 4
Since record are separated by one blank line, and RS is set to nothing,
we just have to count the fields NF minus first line.
This may be better:
awk -v RS= -F"\n" '{print $1":",NF-1}' file
MATCH A and B: 3
MATCH A and C: 2
MATCH B and C: 4
Or remove the MATCH word:
awk -v RS= -F"\n" '{sub("MATCH ","",$1);print $1":",NF-1}' file
A and B: 3
A and C: 2
B and C: 4

How do I print a range of data in awk?

I am reviewing my access_logs with a statment like:
cat access_log | grep 16/Sep/2012:17 | awk '{print $12 $13 $14 $15 $16}' | sort | uniq -c | sort -n | tail -40
The purpose is to see the user agent of the anyone that has been hitting my server for the last hour sorted by number of hits. My server has unusual activity to I want stop any unwanted spiders/etc.
But the part: awk '{print $12 $13 $14 $15 $16}' would be much preferred as something like: awk '{print $12-through-end-of-line}' so that I could see the whole user agent which is a different length for each one.
Is there a way to do this with awk?
Not extremely elegant, but this works:
grep 16/Sep/2012:17 access_log | awk '{for (i=12;i<=NF;++i) printf "%s ",$i;print ""}'
It has the side effect of condensing multiple spaces between fields down to one, and putting an extra one at the end of the line, though, which probably isn't critical.
I've never found one; in situations like this, I use cut (assuming I don't need awk's flexible handling of field separation):
# Assuming tab-separated fields, cut's default
grep 16/Sep/2012:17 access_log | cut -f12- | sort | uniq -c | sort -n | tail -40
# For space-separated fields (single spaces, not arbitrary amounts of whitespace)
grep 16/Sep/2012:17 access_log | cut -d' ' -f12- | sort | uniq -c | sort -n | tail -40
(Clarification: I've never found a good way. I've used #twalberg's for-loop when necessary, but prefer using cut if possible.)
$ echo somefields:; cat somefields ; echo from-to.awk: ; \
cat from-to.awk ; echo ;awk -f from-to.awk somefields
somefields:
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
from-to.awk:
{ for (i=12; i<=NF; i++) { printf "%s ", $i }; print "" }
l m n o p q r s t u v w x y z
12 13 14 15 16 17 18 19 20 21
from man awk:
NF The number of fields in the current input record.
So you basically loop through fields (separated by spaces) from 12 to the last one.
why not
#!/bin/bash
awk "/$1/"'{for (i=12;i<=NF;i++) printf("%s ", $i) ;printf "\n"}' log | sort | uniq -c | sort -n | tail -40
in a script file.
Then you can call it like
myMonitor.sh 16/Sep/2012:17
Don't have a way to test this right. Appologies for any formatting/syntax errors.
Hopefully you get the idea.
IHTH
awk '/16/Sep/2012:17/{for(i=1;i<12;i++){$i="";}print}' access_log| sort | uniq -c | sort -n | tail -40