sum occurrence output of uniq -c - scripting

I want to sum up occurrence output of "uniq -c" command.
How can I do that on the command line?
For example if I get the following in output, I would need 250.
45 a4
55 a3
1 a1
149 a5

awk '{sum+=$1} END{ print sum}'

This should do the trick:
awk '{s+=$1} END {print s}' file
Or just pipe it into awk with
uniq -c whatever | awk '{s+=$1} END {print s}'

for each line add the value of of first column to SUM, then print out the value of SUM
awk is a better choice
uniq -c somefile | awk '{SUM+=$1}END{print SUM}'
but you can also implement the logic using bash
uniq -c somefile | while read num other
do
let SUM+=num;
done
echo $SUM

uniq -c is slow compared to awk. like REALLY slow.
{mawk/mawk2/gawk} 'BEGIN { OFS = "\t" } { freqL[$1]++; } END { # modify FS for that
# column you want
for (x in freqL) { printf("%8s %s\n", freqL[x], x) } }' # to uniq -c upon
if your input isn't large like 100MB+, then gawk suffices after adding in the
PROCINFO["sorted_in"] = "#ind_num_asc"; # gawk specific, just use gawk -b mode
if it's really large, it's far faster to use mawk2 then pipe to to
{ mawk/mawk2 stuff... } | gnusort -t'\t' -k 2,2

While the aforementioned answer uniq -c example-file | awk '{SUM+=$1}END{print SUM}' would theoretically work to sum the left column output of uniq -c so should wc -l somefile as mentioned in the comment.
If what you are looking for is the number of uniq lines in your file, then you can use this command:
sort -h example-file | uniq | wc -l

Related

How can I make my mathematical formula correct in bc -l or awk?

Here I want to solve a formula in bc -l or using awk.
I have some fix numbers that I can define as below:
A=5.8506
B=200.26323
C=151.3219
D=11.9275
E=0 and 5
I want to get an answer using below mathematical formula:
Ei={(B)*(C/(E*D+C))^(1/D)}^(1/3)
The answer from my formula should be 5.7965 for E=0 and 5.7965 for E=5.
Please suggest me a simple way to get answer for the mentioned mathematical formula. I did not find any code so far if it is available already.
What I have tried:
a=$(echo "$E*$D | bc -l)
echo "$a"
b=$(echo "$a+$C | bc -l)
echo "$b"
d=$(echo "$C/$b" | bc -l)
echo "$d"
E=$(echo "1/$D" | bc -l)
echo "$E"
F=$(echo "$E*$d" | bc -l)
echo "$F"
The last step should give answer for this part of my formula ( C/(E*D+C) )^(1/D), which should be 1.5232201399104 while I am getting 1.
Well, now it's awk:
$ awk -v E=5 '
BEGIN{
A=5.8506
B=200.26323
C=151.3219
D=11.9275
Ei=((B)*(C/(E*D+C))^(1/D))^(1/3)
print Ei
}'
5.79653
or
$ awk -v A=5.8506 -v B=200.26323 -v C=151.3219 -v D=11.9275 -v E=0 '
BEGIN{
Ei=((B)*(C/(E*D+C))^(1/D))^(1/3)
print Ei
}'
5.8506

How to grep the outputs of awk, line by line?

Let's say I have the following text file:
$ cat file1.txt outputs
MarkerName Allele1 Allele2 Freq1 FreqSE P-value Chr Pos
rs2326918 a g 0.8510 0.0001 0.5255 6 130881784
rs2439906 c g 0.0316 0.0039 0.8997 10 6870306
rs10760160 a c 0.5289 0.0191 0.8107 9 123043147
rs977590 a g 0.9354 0.0023 0.8757 7 34415290
rs17278013 t g 0.7498 0.0067 0.3595 14 24783304
rs7852050 a g 0.8814 0.0006 0.7671 9 9151167
rs7323548 a g 0.0432 0.0032 0.4555 13 112320879
rs12364336 a g 0.8720 0.0015 0.4542 11 99515186
rs12562373 a g 0.7548 0.0020 0.6151 1 164634379
Here is an awk command which prints MarkerName if Pos >= 11000000
$ awk '{ if($8 >= 11000000) { print $1 }}' file1.txt
This command outputs the following:
MarkerName
rs2326918
rs10760160
rs977590
rs17278013
rs7323548
rs12364336
rs12562373
Question: I would like to feed this into a grep statement to parse another text file, textfile2.txt. Somehow, one pipes the output from the previous awk command into grep AWKOUTPUT textfile2.txt
I would like each row of the awk command above to be grepped against textfile2.txt, i.e.
grep "rs2326918" textfile2.txt
## and then
grep "rs10760160" textfile2.txt
### and then
...
Naturally, I would save all resulting rows from textfile2.txt into a final file, i.e.
$ awk '{ if($8 >= 11000000) { print $1 }}' file1.txt | grep PIPE_OUTPUT_BY_ROW textfile2.txt > final.txt
How does one grep from a pipe line by line?
EDIT: To clarify, the one constraint I have is that file1.txt is actually the output of a previous pipe. (I'm trying to simplify the question somewhat.) How would that change the answer?
awk + grep solution:
grep -f <(awk '$8 >= 11000000{ print $1 }' file1.txt) textfile2.txt > final.txt
-f file - obtain patterns from file, one per line
You can use bash to do this:
bash-3.1$ echo "rs2326918" > filename2.txt
bash-3.1$ (for i in `awk '{ if($8 >= 11000000) { print $1 }}' file1.txt |
grep -v MarkerName`; do grep $i filename2.txt; done) > final.txt
bash-3.1$ cat final.txt
rs2326918
Alternatively,
bash-3.1$ cat file1.txt | (for i in `awk '{ if($8 >= 11000000) { print $1 }}' |
grep -v MarkerName`; do grep $i filename2.txt; done) > final.txt
The switch grep -v tells grep to reverse its usual activity and print all lines that do not match the pattern. This switch "inVerts" the match.
only using awk can do this for you:
$ awk 'NR>1 && NR==FNR {if ($8 >= 110000000) a[$1]++;next} \
{ for(i in a){if($0~i) print}}' file1.txt file2.txt> final.txt

shell: awk to int variable

I have a speedtest-cli skript and try to awk the result. I want to get the downloadspeed as integer, so I can compair with other results in an If, then.. condition.
part of my skript:
#!/bin/sh
speedtest-cli | awk '/Download:/ {print $2} ' > /root/tmp1
read speed1 < /root/tmp1
speedtest-cli | awk '/Download:/ {print $2} ' > /root/tmp2;
read speed2 < /root/tmp2
if [ $speed1 -gt $speed2 ];then
echo "test";fi
The problem is, that my awk result (75.27) isnt saved as integer! When it comes to if, I get an error:
sh: 75.27: bad number
I also would perfer to define the variable direct from the awk result, but that doesnt work!
speedtest-cli | var=$(awk '/Download:/ {print $2} ' > /root/tmp1)
How can I "awk" the speedtest-cli result, to get an variable that can be compaired in an if...then conditin?
please help,
thx greetings Igor
if you want to compare floating point numbers you can do within awk or with bc -l
for example
speed1=$(speedtest-cli | awk '/Download:/{print $2}')
...
speed2=$(speedtest-cli | awk '/Download:/{print $2}')
if (( $(echo "$speed1 > $speed2" | bc -l) )); then ...
another alternative is if you want to compare them as integers but don't want to lose digits due to rounding, multiply them with a large value and convert to int.
speed1000=$(speedtest-cli | awk '/Download:/{print int($2*1000)}')
...
now bash can handle integer comparisons...

How to use awk sort by column 3

I have a file (user.csv)like this
ip,hostname,user,group,encryption,aduser,adattr
want to print all column sort by user,
I tried awk -F ":" '{print|"$3 sort -n"}' user.csv , it doesn't work.
How about just sort.
sort -t, -nk3 user.csv
where
-t, - defines your delimiter as ,.
-n - gives you numerical sort. Added since you added it in your
attempt. If your user field is text only then you dont need it.
-k3 - defines the field (key). user is the third field.
Use awk to put the user ID in front.
Sort
Use sed to remove the duplicate user ID, assuming user IDs do not contain any spaces.
awk -F, '{ print $3, $0 }' user.csv | sort | sed 's/^.* //'
Seeing as that the original question was on how to use awk and every single one of the first 7 answers use sort instead, and that this is the top hit on Google, here is how to use awk.
Sample net.csv file with headers:
ip,hostname,user,group,encryption,aduser,adattr
192.168.0.1,gw,router,router,-,-,-
192.168.0.2,server,admin,admin,-,-,-
192.168.0.3,ws-03,user,user,-,-,-
192.168.0.4,ws-04,user,user,-,-,-
And sort.awk:
#!/usr/bin/awk -f
# usage: ./sort.awk -v f=FIELD FILE
BEGIN {
FS=","
}
# each line
{
a[NR]=$0 ""
s[NR]=$f ""
}
END {
isort(s,a,NR);
for(i=1; i<=NR; i++) print a[i]
}
#insertion sort of A[1..n]
function isort(S, A, n, i, j) {
for( i=2; i<=n; i++) {
hs = S[j=i]
ha = A[j=i]
while (S[j-1] > hs) {
j--;
S[j+1] = S[j]
A[j+1] = A[j]
}
S[j] = hs
A[j] = ha
}
}
To use it:
awk sort.awk f=3 < net.csv # OR
chmod +x sort.awk
./sort.awk f=3 net.csv
You can choose a delimiter, in this case I chose a colon and printed the column number one, sorting by alphabetical order:
awk -F\: '{print $1|"sort -u"}' /etc/passwd
awk -F, '{ print $3, $0 }' user.csv | sort -nk2
and for reverse order
awk -F, '{ print $3, $0 }' user.csv | sort -nrk2
try this -
awk '{print $0|"sort -t',' -nk3 "}' user.csv
OR
sort -t',' -nk3 user.csv
awk -F "," '{print $0}' user.csv | sort -nk3 -t ','
This should work
To exclude the first line (header) from sorting, I split it out into two buffers.
df | awk 'BEGIN{header=""; $body=""} { if(NR==1){header=$0}else{body=body"\n"$0}} END{print header; print body|"sort -nk3"}'
With GNU awk:
awk -F ',' '{ a[$3]=$0 } END{ PROCINFO["sorted_in"]="#ind_str_asc"; for(i in a) print a[i] }' file
See 8.1.6 Using Predefined Array Scanning Orders with gawk for more sorting algorithms.
I'm running Linux (Ubuntu) with mawk:
tmp$ awk -W version
mawk 1.3.4 20200120
Copyright 2008-2019,2020, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan
random-funcs: srandom/random
regex-funcs: internal
compiled limits:
sprintf buffer 8192
maximum-integer 2147483647
mawk (and gawk) has an option to redirect the output of print to a command. From man awk chapter 9. Input and output:
The output of print and printf can be redirected to a file or command by appending > file, >> file or | command to the end of the print statement. Redirection opens file or command only once, subsequent redirections append to the already open stream.
Below you'll find a simplied example how | can be used to pass the wanted records to an external program that makes the hard work. This also nicely encapsulates everything in a single awk file and reduces the command line clutter:
tmp$ cat input.csv
alpha,num
D,4
B,2
A,1
E,5
F,10
C,3
tmp$ cat sort.awk
# print header line
/^alpha,num/ {
print
}
# all other lines are data lines that should be sorted
!/^alpha,num/ {
print | "sort --field-separator=, --key=2 --numeric-sort"
}
tmp$ awk -f sort.awk input.csv
alpha,num
A,1
B,2
C,3
D,4
E,5
F,10
See man sort for the details of the sort options:
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition
-k, --key=KEYDEF
sort via a key; KEYDEF gives location and type
-n, --numeric-sort
compare according to string numerical value

Identifying usernames having duplicate user-id in /etc/passwd file

I am trying to find out all users in my /etc/passwd which has a user-id of 0. It should display the username as well as the user-id. I tried the following:
awk -F: '{
count[$3]++;}END {
for (i in count)
print i, count[i];
}' passwd
It gives the duplicate user-ids and how many times they are occuring . I actually want the usernames also along with the duplicate user-ids similar like
zama 0
root 0
bin 100
nologin 100
Will be great if the solution is provided with awk asscociative arrays. Other methods are also fine.
Does this do what you want?
awk -F: '{
count[$3]++; names[$3 "," count[$3]] = $1}END {
for (i in count) {
for (j = 1; j <= count[i]; j++) {
print names[i "," j], i, count[i];
}
}
}' passwd
This might work for you:
awk -F: '$3~/0/{if($3 in a){d[$3];a[$3]=a[$3]"\n"$3" "$1}else{a[$3]=$3" "$1}};END{for(x in d)print a[x]}' /etc/passwd
or this non-awk solution:
cut -d: -f1,3 /etc/passwd |
sort -st: -k2,2n |
tr ':' ' ' |
uniq -Df1 |
sed 's/\(.*\) \(.*\)/\2 \1/p;d'
myid=`cat passwd|awk -F: '{print $3 }'| sort | uniq -d`
for i in `echo "$myid"`;do
egrep "^.*:x:$i" passwd | awk -F: '{print $1 , $3}'
done