Sort and print in line - awk

Input:
54578787 -58 1
6578999 -658- 3
1352413 -541- 11
4564564 -23- 11
654564 -65- 3
6543564 -65- 1
Desired output:
column3 = 1,3,11
Using:
a=$(awk '{print $3}' text | sort -u | paste -s -d,) && paste <(echo "column3 =") <(echo $a)
I only get:
column3 = [large blank] 1,11,3
Other issue: If I remove all hyphens on the second column, I get
column3 = [large blank] ,1,11,3
I think it's a paste command issue.
Last but not least: why do I have 1,11,3 instead of 1,3,11?

I would just use awk:
$ awk '{a[$3]} END {printf "column3 = "; for (i in a) {printf "%d%s", i, (++v==length(a)?"\n":",")}}' file
column3 = 1,3,11
Explanation
a[$3] populate the a[] array with the 3rd column. This way, any new value will create a new index.
END {} perform commands after processing the whole file.
printf "column3 = " prints "column3 =".
for (i in a) {printf "%d%s", i, (++v==length(a)?"\n":",")} loop through the stored values and print them comma separated, unless it is the last one.
Your current solution would work like this:
$ paste -d" " <(echo "column3 =") <(awk '{print $3}' file | sort -u | paste -s -d,)
column3 = 1,11,3
Note there is no need to store in $a. And to have just one space, use paste -d" ".
And to have it sorted numerically? Just add -n to your sort:
$ paste -d" " <(echo "column3 =") <(awk '{print $3}' file | sort -nu | paste -s -d,)
column3 = 1,3,11
With this command you get the same output, no matter the hyphens.

You can do something like
echo "column3 = $(awk '{print $3}' test.txt |sort -nu | paste -s -d, )"
gives me
column3 = 1,3,11
One key element is to sort with the -n option to do numerical sorting.
It also works with the hyphens deleted:
echo "column3 = $(tr -d - < test.txt| awk '{print $3}' |sort -nu | paste -s -d, )"
also outputs
column3 = 1,3,11

If perl is acceptable:
perl -lanE '
$c3{$F[2]} = 1;
END {say "column3 = ", join(",", sort {$a <=> $b} keys %c3)}
' file

my gawk line looks like:
awk '{a[$3]} END{c=asorti(a,d,"#val_num_asc"); printf "column3 = ";
for(x=1;x<=c;x++)printf "%d%s", d[x],(c==x?"\n":",")}' file
output:
column3 = 1,3,11
Note
you need gawk to run that (asorti function)
sorting ascending as numbers
output in single line.

Assuming you truly want the numbers sorted and not just reproduced in the order they are first seen:
$ awk '{print $3}' file | sort -nu | awk '{s=(NR>1?s",":"")$0} END{print "column3 =",s}'
column3 = 1,3,11
You were getting 1,11,3 because without the -n arg for sort you are sorting alphabetically instead of numerically and the first char of 11 (i.e. 1) comes before the first char of 3.

Related

Replace end of line with comma and put parenthesis in sed/awk

I am trying to process the contents of a file from this format:
this1,EUR
that2,USD
other3,GBP
to this format:
this1(EUR),that2(USD),other3(GBP)
The result should be a single line.
As of now I have come up with this circuit of commands that works fine:
cat myfile | sed -e 's/,/\(/g' | sed -e 's/$/\)/g' | tr '\n' , | awk '{print substr($0, 0, length($0)- 1)}'
Is there a simpler way to do the same by just an awk command?
Another awk:
$ awk -F, '{ printf "%s%s(%s)", c, $1, $2; c = ","} END { print ""}' file
1(EUR),2(USD),3(GBP)
Following awk may help you on same.
awk -F, '{val=val?val OFS $1"("$2")":$1"("$2")"} END{print val}' OFS=, Input_file
Toying around with separators and gsub:
$ awk 'BEGIN{RS="";ORS=")\n"}{gsub(/,/,"(");gsub(/\n/,"),")}1' file
this1(EUR),that2(USD),other3(GBP)
Explained:
$ awk '
BEGIN {
RS="" # record ends in an empty line, not newline
ORS=")\n" # the last )
}
{
gsub(/,/,"(") # replace commas with (
gsub(/\n/,"),") # and newlines with ),
}1' file # output
Using paste+sed
$ # paste -s will combine all input lines to single line
$ seq 3 | paste -sd,
1,2,3
$ paste -sd, ip.txt
this1,EUR,that2,USD,other3,GBP
$ # post processing to get desired format
$ paste -sd, ip.txt | sed -E 's/,([^,]*)(,?)/(\1)\2/g'
this1(EUR),that2(USD),other3(GBP)

AWK how to count patterns on the first column?

I was trying get the total number of "??", " M", "A" and "D" from this:
?? this is a sentence
M this is another one
A more text here
D more and more text
I have this sample line of code but doesn't work:
awk -v pattern="\?\?" '{$1 == pattern} END{print " "FNR}'
$ awk '{ print $1 }' file | sort | uniq -c
1 ??
1 A
1 D
1 M
If for some reason you want an awk-only solution:
awk '{ ++cnt[$1] } END { for (i in cnt) print cnt[i], i }' file
but I think that's needlessly complicated compared to using the built-in unix tools that already do most of the work.
If you just want to count one particular value:
awk -v value='??' '$1 == value' file | wc -l
If you want to count only a subset of values, you can use a regex:
$ awk -v pattern='A|D|(\\?\\?)' '$1 ~ pattern { print $1 }' file | sort | uniq -c
1 ??
1 A
1 D
Here you do need to send a \ in order that the ?s are escaped within the regular expression. And because the \ is itself a special character within the string being passed to awk, you need to escape it first (hence the double backslash).

How to merge these codes, awk then cut

I am using awk in Debian.
input
11.22.33.44#55878:
11.22.33.43#55879:
...
...
(smtp:55.66.77.88)
(smtp:55.66.77.89)
...
...
cpe-33-22-11-99.buffalo.res.rr.com[99.11.22.33]
cpe-34-22-11-99.buffalo.res.rr.com[99.11.22.34]
...
Parts of sh codes (running in Debian)
awk '/#/ {print > "file1";next} \
/smtp/ {print > "file2";next} \
{print > "file7"}' input
#
if [ -s file1 ] ; then
#IP type => 11.22.33.44#55878:
cut -d'#' -f1 file1 >> output
rm -f file1
fi
#
if [ -s file2 ] ; then
#IP type => (smtp:55.66.77.88)
cut -d':' -f2 file2 | cut -d')' -f1 >> output
rm -f file2
fi
#
if [ -s file7 ] ; then
#IP type => cpe-33-22-11-99.buffalo.res.rr.com[99.11.22.33]
cut -d'[' -f2 file7 | cut -d']' -f1 >> output
rm -f file7
fi
then output
11.22.33.44
11.22.33.43
55.66.77.88
55.66.77.89
99.11.22.33
99.11.22.34
Is it possible to merge these codes only with awk , something like
awk '/#/ {print | cut -d'#' -f1 > "file1";next} \
/smtp/ {print | cut -d':' -f2 | cut -d')' -f1 > "file2";next} \
{print | cut -d'[' -f2 file7 | cut -d']' > "file7"}' input
I am newbie and have no idea for this,
After search questions, still no help.
any hint?
Thanks.
Best Regard.
$ awk -F'[][()#]|smtp:' '/#/{print $1;next} /smtp/{print $3;next} /\[/{print $2}' input
11.22.33.44
11.22.33.43
55.66.77.88
55.66.77.89
99.11.22.33
99.11.22.34
To save this in the file output:
awk -F'[][()#]|smtp:' '/#/{print $1;next} /smtp/{print $3;next} /\[/{print $2}' input >output
How it works
-F'[][()#]|smtp:'
This sets the field separator to (a) any of the characters ][()# or (b) the string smtp:.
/#/{print $1;next}
If the line contains #, then print the first field and skip to the next line.
/smtp/{print $3;next}
If the line contains smtp, then print the third field and skip to the next line.
/\[/{print $2}
If the line contains [, then print the second field.
Variation
There is more than one way to solve this problem, For example, using a slightly different field separator, we can still get the desired output:
$ awk -F'[][()#:]' '/#/{print $1;next} /smtp/{print $3;next} /\[/{print $2}' input
11.22.33.44
11.22.33.43
55.66.77.88
55.66.77.89
99.11.22.33
99.11.22.34

match duplicate string before a specified delimiter

cat test.txt
serverabc.test.net
serverabc.qa.net
serverabc01.test.net
serverstag.staging.net
serverstag.test.net
here i need to match the duplicate strings just before the delimiter '.'
So the expected output would be like below. because string "serverabc" and "serverstag" found to be duplicates. Please help.
serverabc.test.net
serverabc.qa.net
serverstag.staging.net
serverstag.test.net
awk to the rescue!
$ awk -F\. '{c[$1]++; a[$1]=a[$1]?a[$1]RS$0:$0}
END{for(k in c) if(c[k]>1) print a[k]}' file
serverabc.test.net
serverabc.qa.net
serverstag.staging.net
serverstag.test.net
If it is not going to be used allot I would probably just do something like this:
cut -f1 -d\. foo.txt | sort |uniq -c | grep -v " 1 " | cut -c 9-|sed 's/\(.*\)/^\1\\./' > dup.host
grep -f dup.host foo.txt
serverabc.test.net
serverabc.qa.net
serverstag.staging.net
serverstag.test.net

sum occurrence output of uniq -c

I want to sum up occurrence output of "uniq -c" command.
How can I do that on the command line?
For example if I get the following in output, I would need 250.
45 a4
55 a3
1 a1
149 a5
awk '{sum+=$1} END{ print sum}'
This should do the trick:
awk '{s+=$1} END {print s}' file
Or just pipe it into awk with
uniq -c whatever | awk '{s+=$1} END {print s}'
for each line add the value of of first column to SUM, then print out the value of SUM
awk is a better choice
uniq -c somefile | awk '{SUM+=$1}END{print SUM}'
but you can also implement the logic using bash
uniq -c somefile | while read num other
do
let SUM+=num;
done
echo $SUM
uniq -c is slow compared to awk. like REALLY slow.
{mawk/mawk2/gawk} 'BEGIN { OFS = "\t" } { freqL[$1]++; } END { # modify FS for that
# column you want
for (x in freqL) { printf("%8s %s\n", freqL[x], x) } }' # to uniq -c upon
if your input isn't large like 100MB+, then gawk suffices after adding in the
PROCINFO["sorted_in"] = "#ind_num_asc"; # gawk specific, just use gawk -b mode
if it's really large, it's far faster to use mawk2 then pipe to to
{ mawk/mawk2 stuff... } | gnusort -t'\t' -k 2,2
While the aforementioned answer uniq -c example-file | awk '{SUM+=$1}END{print SUM}' would theoretically work to sum the left column output of uniq -c so should wc -l somefile as mentioned in the comment.
If what you are looking for is the number of uniq lines in your file, then you can use this command:
sort -h example-file | uniq | wc -l