awk printf prints whole file into one line - awk

I am using the following awk command with printf to edit the number of flowing digits in a matrix file.
while read -r line; do
awk '{ printf "%.3e ", $0}'
done < water.txt > water3.txt
It works fine besides, all the lines are converted into one long line, and I would like to keep the lines in the matrix.
Can anyone help?

You'll find that awk is printing the whole file except for the first line.
Because you're not explicitly giving awk a filename or data redirected into its stdin, it will slurp up the rest of the < water.txt redirection.
while read -r line; do
awk '{ printf "%.3e ", $0}' <<<"$line"
done < water.txt > water3.txt
or much more simply without the shell loop
awk '{printf "%.3e ", $0}' water.txt > water3.txt

all the lines are converted into one long line, and I would like to
keep the lines in the matrix.
printf does not append row separator (which by default is newline) so you need to do it yourself, typical GNU AWK usage in this case would be
awk '{ printf "%.3e\n", $0}' water.txt > water3.txt
if you must use < AT ANY PRICE then you could do it following way
awk '{ printf "%.3e\n", $0}' < water.txt > water3.txt

Related

AWK print a CR before line number

I made a command to dynamically display how many files tar has processed:
tar zcvf some_archive.tar.gz /a/lot/of/files | \
awk 'ORS="\r"{print NR} END{print "\n"}'
In this way, I can see a growing number, as tar outputs a line for each file processed.
However, the cursor is always under the first digit. I want it to be after the last digit, so I have this:
awk 'ORS=""{print "\r"NR} END{print "\n"}'
Sadly, AWK stopped generating any output dynamically.
So how should I do it?
Not sure why, but changing to printf works for me (and then also you don't need to set ORS):
for i in {1..20}; do echo x; sleep 1; done | awk '{printf "\r" NR} END {print ""}'
This may be a more satisfying answer, adding a flush to force the output:
for i in {1..20}; do echo x; sleep 1; done | awk -v ORS="" '{print "\r" NR; fflush()} END {print "\n"}'

Exact string match in awk

I have a file test.txt with the next lines
1997 100 500 2010TJ
2010TJXML 16 20 59
I'm using the next awk line to get information only about string 2010TJ
awk -v var="2010TJ" '$0 ~ var {print $0}' test.txt
But the code print the two lines. I want to know how to get the line containing the exact string
1997 100 500 2010TJ
the string can be placed in any column of the file.
Several options:
Use a gawk word boundary (not POSIX awk...):
$ gawk '/\<2010TJ\>/' file
An actual space or tab or what is separating the columns:
$ awk '/^2010TJ /' file
Or compare the field directly to the string:
$ awk '$1=="2010TJ"' file
You can loop over the fields to test each field if you wish:
$ awk '{for (i=1;i<=NF;i++) if ($i=="2010TJ") {print; next}}' file
Or, given your example of setting a variable, those same using a variable:
$ gawk -v s=2010TJ '$0~"\\<" s "\\>"'
$ awk -v s=2010TJ '$0~"^" s " "'
$ awk -v s=2010TJ '$1==s'
Note the first is a little different than the second and third. The first is the standalone string 2010TJ anywhere in $0; the second and third is a string that starts with that string.
Try this (for testing only column 1) :
awk '$1 == "2010TJ" {print $0}' test.txt
or grep like (all columns) :
gawk '/\<2010TJ\>/ {print $0}' test.txt
Note
\< \> is word boundarys
another awk with word boundary
awk '/\y2010TJ\y/' file
note \y matches either beginning or end of a word.

awk first line not working removing columns

I'm trying to remove columns beyond number 26 from all lines of a file, using this code:
awk '{ FS = ";" ; for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}'
It is working well in all the lines but for the first one, where it shows 2 more fields (and cuts the last in two).
Is there anything wrong in my code?
Thanks a lot
This is because you set FS on every line, while it should be in a BEGIN{} block (or outside as a parameter, like others answers correctly suggest):
awk 'BEGIN{FS=";"} {for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}' file
In fact, to accomplish your goal it is easier to use cut:
cut -d';' -f-26 file
^ ^^^
| all fields up to the 26th
delimiter
Example with 4 cols
sample file:
$ cat a
1col1;col2;col3;col4;col5;col6
2col1;col2;col3;col4;col5;col6
3col1;col2;col3;col4;col5;col6
previous code:
$ awk '{FS=";"; for(i=1;i<NF;i++) if (i<4) printf $i FS}{print $4}' a
2col1;col2;col3;col4
3col1;col2;col3;col4
new code:
$ awk 'BEGIN{FS=";"} {for(i=1;i<NF;i++) if (i<4) printf $i FS}{print $4}' a
1col1;col2;col3;col4
2col1;col2;col3;col4
3col1;col2;col3;col4
with cut:
$ cut -d';' -f-4 a
1col1;col2;col3;col4
2col1;col2;col3;col4
3col1;col2;col3;col4
You can try this awk,
awk -F';' 'NF>26{NF=26}1' OFS=';' yourfile
#fedorqui is right.
But you can also use this to set Field Separator :
awk -F";" '{for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}' file

what does it means in awk script? `awk -F "|" '{!a[$1]++}{printf RS $1}{print FS $2}' input.txt`

I need meaning of below code in unix, and help me go forward..
`awk -F "|" '{!a[$1]++}{printf RS $1}{print FS $2}' input.txt`
My sample i/p file is like below
1|Balaji 1|Kumar 3|India 3|China 3|Australia 1|Dinesh
I need o/p like below
1|Balaji|Kumar|Dinesh 3|India|China|Australia
I won't explain the awk line in your question. because it doesn't make much sense:
created array a[], but never use
wrong usage of RS, FS
try this one-liner:
awk -F'[| ]' '{for(i=1;i<=NF;i++)if(i%2)a[$i]=a[$i]?a[$i]"|"$(i+1):$(i+1)}
END{for(x in a) printf x"|"a[x]" ";print ""}' file
with your example:
kent$ echo "1|Balaji 1|Kumar 3|India 3|China 3|Australia 1|Dinesh"|awk -F'[| ]' '{for(i=1;i<=NF;i++)if(i%2)a[$i]=a[$i]?a[$i]"|"$(i+1):$(i+1)}END{for(x in a) printf x"|"a[x]" ";print ""}'
1|Balaji|Kumar|Dinesh 3|India|China|Australia
Note that there would be an ending space, it could be removed in the END loop.
Surprisingly, it can be change to simply. I am not sure why !a[$1]++ is written inside that.Its obsolete overe there:
awk -F "|" '{printf RS $1}{print FS $2}' input.txt
it will print first the record separator which is newline and then $1 which is the first field and then the field separator which is "|" and then the second field $2 and then a newline(since the statement is print. If printf is used newline will not be printed).
Based on your comment, below should work:
awk '{
for(i=1;i<=NF;i++){split($i,a,"|");
b[a[1]]?b[a[1]]=b[a[1]]" "a[2]:b[a[1]]=a[2]
}
for(j in b)printf j"|"b[j]" ";
print"";}' your_file
Changing record selector makes it easy to read this data. It have only a small bug that I do not see how to solve, it prints it on two line.
awk -F\| '{a[$1]=a[$1]?a[$1]"|"$2:$2} END{for(i in a) printf i"|"a[i]" "}' RS=" " file
1|Balaji|Kumar|Dinesh
3|India|China|Australia
New version with correct output, thanks to Birei
awk -F\| '{sub(/\n/,x, $0); a[$1]=a[$1]?a[$1]"|"$2:$2} END{for(i in a) printf i"|"a[i]" "}' RS=" "
1|Balaji|Kumar|Dinesh 3|India|China|Australia

awk to read specific column from a file

I have a small problem and I would appreciate helping me in it.
In summary, I have a file:
1,5,6,7,8,9
2,3,8,5,35,3
2,46,76,98,9
I need to read specific lines from it and print them into another text document. I know I can use (awk '{print "$2" "$3"}') to print the second and third columns beside each other. However, I need to use two statement as (awk '{print "$2"}' >> file.text) then (awk '{print "$3"}' >> file.text), but the two columns would appear under each other and not beside each other.
How can I make them appear beside each other?
If you must extract the columns in separate processes, use paste to stitch them together. I assume your shell is bash/zsh/ksh, and I assume the blank lines in your sample input should not be there.
paste -d, <(awk -F, '{print $2}' file) <(awk -F, '{print $3}' file)
produces
5,6
3,8
46,76
Without the process substitutions:
awk -F, '{print $2}' file > tmp1
awk -F, '{print $3}' file > tmp2
paste -d, tmp1 tmp2 > output
Update based on your answer:
On first appearance, that's a confusing setup. Does this work?
for (( x=1; x<=$number_of_features; x++ )); do
feature_number=$(sed -n "$x {p;q}" feature.txt)
if [[ -f out.txt ]]; then
cut -d, -f$feature_number file.txt > out.txt
else
paste -d, out.txt <(cut -d, -f$feature_number file.txt) > tmp &&
mv tmp out.txt
fi
done
That has to read the file.txt file a number of times. It would clearly be more efficient to only have to read it once:
awk -F, -f numfeat=$number_of_features '
# read the feature file into an array
NR==FNR {
colno[++i] = $0
next
}
# now, process the file.txt and emit the desired columns
{
sep = ""
for (i=1; i<=numfeat; i++) {
printf "%s%s", sep, $(colno[i])
sep = FS
}
print ""
}
' feature.txt file.txt > out.txt
Thanks all for contributing in the answers. I believe that i should be more clearer in my question, sorry for that.
My code is as follow:
for (( x = 1; x <= $number_of_features ; x++ )) # the number extracted from a text file
do
feature_number=$(awk 'FNR == "'$x'" {print}' feature.txt)
awk -F, '{print $"'$feature_number'"}' file.txt >> out.txt
done
Basically, I extract the feature number (which is the same as column number) from a text document and then print that column. the text document may contains many features number.
The thing is, each time I have different features number (which reflect the column number). so, applying the above solutions are not sufficient for this problem.
I hope it is clearer now.
Waiting for your comments please.
Thanks
Ahmad
instead of using awks file redirection, use shell redirection eg
awk '{print $2,$3}' >> file
the comma is replaced with the value of the output field seperator( space by default ).