Bash command to print even number columns from multiple files - awk

I have files t_1.24/data.dat, t_2.48/data.dat, t_3.72/data.dat ... and each file have two columns. I want to grep the 2nd columns of each file and put them together column by column. I know I can paste them together and do awk '{print $2, $4, ..., $2*n}, but since I have a large number of files, it's obvious not a good way to do it and I believe there are much better solutions. Could anyone give some suggestions to solve this?
Edited: In my case, the files have the same number of lines and each column is separated by space without header. For example, if t_10.48/data.dat is:
9.10000000e+00 -1.14092155e-03
9.10023800e+00 -1.14131197e-03
9.10047601e+00 -1.14171327e-03
9.10071401e+00 -1.14212571e-03
t_2.14/data.dat is:
9.10000000e+00 -1.09822747e-03
9.10023800e+00 -1.09833529e-03
9.10047601e+00 -1.09844835e-03
9.10071401e+00 -1.09856643e-03
what I want is :
-1.09822747e-03 -1.14092155e-03
-1.09833529e-03 -1.14131197e-03
-1.09844835e-03 -1.14171327e-03
-1.09856643e-03 -1.14212571e-03
And I do need to paste them in the order of original file name (eg. t_2.48 has to be before t_10.48).

$ paste $(printf '%s\n' t_*/data.dat | sort -t'_' -k2,2n) |
awk '{for (i=2; i<=NF; i+=2) printf "%s%s", $i, (i<NF ? OFS : ORS)}'
-1.09822747e-03 -1.14092155e-03
-1.09833529e-03 -1.14131197e-03
-1.09844835e-03 -1.14171327e-03
-1.09856643e-03 -1.14212571e-03

Use cut and
paste:
paste <(cut -f2 file1) <(cut -f2 file2) ...
You can also generate and run the command in bash using a Perl one-liner like so:
perl -e '$cmd = join q{ }, q{paste}, map { "<(cut -f2 $_)" } #ARGV; system qq{bash -c "$cmd"};' file1 file2 ...

Related

Removing content of a column based on number of occurences

I have a file (; seperated) with data like this
111111121;000-000.1;000-000.2
111111211;000-000.1;000-000.2
111112111;000-000.1;000-000.2
111121111;000-000.1;000-000.2
111211111;000-000.1;000-000.2
112111111;000-000.1;000-000.2
121111112;000-000.2;020-000.8
121111121;000-000.2;020-000.8
121111211;000-000.2;020-000.8
121113111;000-000.3;000-200.2
211111121;000-000.1;000-000.2
I would like to remove any $3 that has less than 3 occurences, so the outcome would be like
111111121;000-000.1;000-000.2
111111211;000-000.1;000-000.2
111112111;000-000.1;000-000.2
111121111;000-000.1;000-000.2
111211111;000-000.1;000-000.2
112111111;000-000.1;000-000.2
121111112;000-000.2;020-000.8
121111121;000-000.2;020-000.8
121111211;000-000.2;020-000.8
121113111;000-000.3
211111121;000-000.1;000-000.2
That is, only $3 got deleted, as it had only a single occurence
Sadly I am not really sure if (thus how) this could be done relatively easily (as doing the =COUNT.IF matching, and manuel delete in Excel feels quite embarrassing)
$ awk -F';' 'NR==FNR{cnt[$3]++;next} cnt[$3]<3{sub(/;[^;]+$/,"")} 1' file file
111111121;000-000.1;000-000.2
111111211;000-000.1;000-000.2
111112111;000-000.1;000-000.2
111121111;000-000.1;000-000.2
111211111;000-000.1;000-000.2
112111111;000-000.1;000-000.2
121111112;000-000.2;020-000.8
121111121;000-000.2;020-000.8
121111211;000-000.2;020-000.8
121113111;000-000.3
211111121;000-000.1;000-000.2
or if you prefer:
$ awk -F';' 'NR==FNR{cnt[$3]++;next} {print (cnt[$3]<3 ? $1 FS $2 : $0)}' file file
this awk one-liner can help, it processes the file twice:
awk -F';' 'NR==FNR{a[$3]++;next}a[$3]<3{NF--}7' file file
Though that awk solutions are the best in terms of performance, your goal could be also achieved with something like this:
while IFS=" " read a b;do
if [[ "$a" -lt "3" ]];then
sed -i "s/$b//" b.txt
fi
done <<<"$(cut -d";" -f3 b.txt |sort |uniq -c)"
Operation is based on the output of cut which counts occurrences.
$cut -d";" -f3 b.txt |sort |uniq -c
7 000-000.2
1 000-200.2
3 020-000.8
Above works for editing source file in place, so keep a back up for testing.
You can feed the file twice to awk. On the first run you gather a statistic that you use in the second run:
script.awk
FNR == NR { stats[ $3 ]++
next
}
{ if( stats[$3] < 3) print $1 $2
else print
}
Run it like this: awk -F\; -f script.awk yourfile yourfile .
The condition FNR == NR is true during processing of the first filename given to awk. The next statement skips the second block.
Thus the second block is only used for processing the second filename given to awk (which is here the same as the first filename).

awk first line not working removing columns

I'm trying to remove columns beyond number 26 from all lines of a file, using this code:
awk '{ FS = ";" ; for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}'
It is working well in all the lines but for the first one, where it shows 2 more fields (and cuts the last in two).
Is there anything wrong in my code?
Thanks a lot
This is because you set FS on every line, while it should be in a BEGIN{} block (or outside as a parameter, like others answers correctly suggest):
awk 'BEGIN{FS=";"} {for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}' file
In fact, to accomplish your goal it is easier to use cut:
cut -d';' -f-26 file
^ ^^^
| all fields up to the 26th
delimiter
Example with 4 cols
sample file:
$ cat a
1col1;col2;col3;col4;col5;col6
2col1;col2;col3;col4;col5;col6
3col1;col2;col3;col4;col5;col6
previous code:
$ awk '{FS=";"; for(i=1;i<NF;i++) if (i<4) printf $i FS}{print $4}' a
2col1;col2;col3;col4
3col1;col2;col3;col4
new code:
$ awk 'BEGIN{FS=";"} {for(i=1;i<NF;i++) if (i<4) printf $i FS}{print $4}' a
1col1;col2;col3;col4
2col1;col2;col3;col4
3col1;col2;col3;col4
with cut:
$ cut -d';' -f-4 a
1col1;col2;col3;col4
2col1;col2;col3;col4
3col1;col2;col3;col4
You can try this awk,
awk -F';' 'NF>26{NF=26}1' OFS=';' yourfile
#fedorqui is right.
But you can also use this to set Field Separator :
awk -F";" '{for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}' file

Awk to find duplicates across column

I have a bunch of dns entries in a file
a1.us.company.com ------ DO NOT PRINT
a2.us.us.company.com ------PRINT------ ("us" is repeated)
a3.eu.a3.compamy.com ------PRINT------ ("a3" is repeated)
a4.tx.a4.tx.company.com -----PRINT------- ("a4" and "tx" is repeated)
awk 'BEGIN {FS="."; OFS="."} {if ($2==$3) print $1"."$2"."$NF}' device_list
awk 'BEGIN {FS="."; OFS="."} {if ($1==$3) print $1"."$2"."$NF}' device_list
I am using 2 commands above.
Can someone please give me a awk command that lists duplicate columns per row.
Some of the names are crazy with as many as 7 to 8 . separated fields.
$ cat file
a1.us.company.com
a2.us.us.company.com
a3.eu.a3.compamy.com
a4.tx.a4.tx.company.com
$ awk -F'.' '{delete seen; for (i=1;i<=NF;i++) if (seen[$i]++) {print; next} }' file
a2.us.us.company.com
a3.eu.a3.compamy.com
a4.tx.a4.tx.company.com
Note that using delete seen is GNU-awk specific, with other awks you can delete the whole array by doing split("",seen).
$ awk -F. '{for(i=1;i<=NF;i++)if(x[$i]++){print;delete x;next}}' file
a2.us.us.company.com
a3.eu.a3.compamy.com
a4.tx.a4.tx.company.com
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk

strip out value from return using awk

Im not sure how to strip out the "DST=" from these lines..
Here is my command(its returning what it should) and please if there is a more efficient way or a better way, feel free to criticize.
awk '{print $10}' iptables.log |sort -u
DST=96.7.49.64
DST=96.7.49.65
DST=96.7.50.64
DST=98.27.88.26
DST=98.27.88.28
DST=98.27.88.45
DST=98.27.88.50
As you can see, I need to grab unique ip's from iptable log.
Thanks!
If you you don't mind the unsorted output, here's a better way using awk:
awk '!a[$10]++ { sub(/DST=/,"",$10); print $10 }' file
or you can keep it all in one process, and use awk's equivalent sub() function, i.e.
awk '{sub(/DST=/,"",$10); print $10}' iptables.log |sort -u
Update:
Is there anyway to key just on DST= regardless of whether its at space 10 or 11?
awk '$10~/^DST=/{sub(/DST=/,"",$10); print $10};$11~/^DST=/{sub(/DST=/,"",$11); print $11}' iptables.log | sort -u
OR
awk '{for (i=9;i<13;i++) {
if ($i ~ /^DST=/) { sub(/DST=/, "", $i); print $i}
}
}' iptables.log | sort -u
Note that here, you can change the range of fields to check and print, I'm testing fields 9-12 just for example. variables in awk like $i refer to the i'th' element in the current line, just like $1, $9, $87, etc, etc.
As I don't have iptables.log to test with, I can't test it except to confirm that the awk syntax doesn't fail. It this doesn't work, please post 2-4 sample lines of simplified data.
IHTH
You could pipe the result of your output through sed to remove the DST= from each line:
awk '{print $10}' iptables.log | sed 's/^DST=//' | sort -u
awk '{split($10,a,"=");b[a[2]];next}END{for(i in b)print i}' iptables.log

awk to read specific column from a file

I have a small problem and I would appreciate helping me in it.
In summary, I have a file:
1,5,6,7,8,9
2,3,8,5,35,3
2,46,76,98,9
I need to read specific lines from it and print them into another text document. I know I can use (awk '{print "$2" "$3"}') to print the second and third columns beside each other. However, I need to use two statement as (awk '{print "$2"}' >> file.text) then (awk '{print "$3"}' >> file.text), but the two columns would appear under each other and not beside each other.
How can I make them appear beside each other?
If you must extract the columns in separate processes, use paste to stitch them together. I assume your shell is bash/zsh/ksh, and I assume the blank lines in your sample input should not be there.
paste -d, <(awk -F, '{print $2}' file) <(awk -F, '{print $3}' file)
produces
5,6
3,8
46,76
Without the process substitutions:
awk -F, '{print $2}' file > tmp1
awk -F, '{print $3}' file > tmp2
paste -d, tmp1 tmp2 > output
Update based on your answer:
On first appearance, that's a confusing setup. Does this work?
for (( x=1; x<=$number_of_features; x++ )); do
feature_number=$(sed -n "$x {p;q}" feature.txt)
if [[ -f out.txt ]]; then
cut -d, -f$feature_number file.txt > out.txt
else
paste -d, out.txt <(cut -d, -f$feature_number file.txt) > tmp &&
mv tmp out.txt
fi
done
That has to read the file.txt file a number of times. It would clearly be more efficient to only have to read it once:
awk -F, -f numfeat=$number_of_features '
# read the feature file into an array
NR==FNR {
colno[++i] = $0
next
}
# now, process the file.txt and emit the desired columns
{
sep = ""
for (i=1; i<=numfeat; i++) {
printf "%s%s", sep, $(colno[i])
sep = FS
}
print ""
}
' feature.txt file.txt > out.txt
Thanks all for contributing in the answers. I believe that i should be more clearer in my question, sorry for that.
My code is as follow:
for (( x = 1; x <= $number_of_features ; x++ )) # the number extracted from a text file
do
feature_number=$(awk 'FNR == "'$x'" {print}' feature.txt)
awk -F, '{print $"'$feature_number'"}' file.txt >> out.txt
done
Basically, I extract the feature number (which is the same as column number) from a text document and then print that column. the text document may contains many features number.
The thing is, each time I have different features number (which reflect the column number). so, applying the above solutions are not sufficient for this problem.
I hope it is clearer now.
Waiting for your comments please.
Thanks
Ahmad
instead of using awks file redirection, use shell redirection eg
awk '{print $2,$3}' >> file
the comma is replaced with the value of the output field seperator( space by default ).