cut -f2 words.txt | sort | uniq -c | sort -nr > counted.txt
This command selects the column from the file, count each entry and sort based on frequency, and then output the result to counted.txt, ranked by frequencies, as follows:
1 71321 good
2 14945 bad
3 5891 nice
4 4641 pretty
5 4494 slow
6 3671 quick
...
However, this output delimiter is a white space , and I want a tab as delimiter. I tried the following with --output-delimiter:
cut -f2 --output-delimiter='\t' words.txt | sort | uniq -c | sort -nr > counted.txt
But it reports an error, invalid option.
How to make the output delimiter a tab?
Try to do as follows :
cut -d'\t' words.txt -f2 | sort | uniq -c | sort -nr > counted.txt
The -d option specifies the delimiter you want to use
Related
Input:
dog
fish
elephant
...
Output:
dog |
fish |
elephant|
... |
I want to add a "|" on the 9th character of every row
You should first space pad the lines to the max line width (eg: 8 chars as you say).
Then, you can use
sed 's/./&|/9' <padded.txt >output.txt
Hard-coding the output field width:
$ awk '{printf "%-*s|\n",8,$0}' file
dog |
fish |
elephant|
... |
or specifying the output field width as an argument:
$ awk -v wid=8 '{printf "%-*s|\n",wid,$0}' file
dog |
fish |
elephant|
... |
or dynamically determining the output field width from the input field widths:
$ awk 'NR==FNR{lgth=length($0); wid=(lgth > wid ? lgth : wid); next} {printf "%-*s|\n",wid,$0}' file file
dog |
fish |
elephant|
... |
If you need to further process the records, it might be a good idea to actually make the $0 9 chars wide:
$ awk '{$0=$0 sprintf("%" 9-length() "s","|")}1' file
Output:
dog |
fish |
elephant|
... |
I have a data like
31 text text t text ?::"!!/
2 te text 32 +ěščřžý
43 te www ##
It is output from uniq -c
I need to get something like
text text t text ?::"!!/
te text 32 +ěščřžý
te www ##
I tried to use something like
a=$1;
$1=""
$0=substr($0, 2);
printf $0;
print "";
But it removes me spaces and I got something like
text text t text ?::"!!/
te text 32 +ěščřžý
te www ##
And I need to save the number too.
Is there anyone, who knows how to do it?
I guess you want to remove the leading digits from each line, sed will be simpler for this task
sed -E 's/^[0-9]+ //' file
awk normalizes white space with the default FS. You can do the same with sub in awk if there is more processing.
Try this one:
$ echo "31 text text t text" |awk '{gsub($1FS$2,$2);print}'
text text t text
You can also try
$ echo "31 text text t text" |awk '{gsub(/^[0-9]+/,"");print}'
text text t text
But in this case you will have a leading space in front of each line.
$ seq 5 | uniq -c
1 1
1 2
1 3
1 4
1 5
$ seq 5 | uniq -c | awk '{sub(/^ *[^ ]+ +/,"")}1'
1
2
3
4
5
$ seq 5 | uniq -c | sed 's/^ *[^ ]* *//'
1
2
3
4
5
I have some directories and files that looks like this:
/my/directories/directory0/
|
-->File1.txt
|
-->File2.txt
/my/directories/directory1/
|
-->File1.txt
|
-->File2.txt
/my/directories/directory2/
|
-->File1.txt
|
-->File2.txt
/my/directories/directory3/
|
-->File1.txt
|
-->File2.txt
These are CSV files, and I'm trying to get the 3rd column counted, and sorted from highest to lowest.
Right now I'm able to accomplish this, but only within each directoryx specifically. For example, if I run this:
cd /my/directories/directory0/
cat *.txt | awk -F "," '{print $3}' | sort | uniq -c | sort -nr > finalOutput.txt
Then I get exactly what I want, but only with the data in that directory. I want to cat everything from all of the /my/directories/ sub directories into a single file.
I've tried to use ls or find to accomplish this, but I can't get it working. I know you can recursively cat this way, similar to this:
find /my/directories/ -name '*.txt' -exec cat {} \; > finalOutput.txt
But, I haven't been able to get this to work with a multi piped command. Any help is appreciated.
Try with xargs:
$ find . -name "f?" | xargs awk -F, '{ print $3 }' | sort | uniq -c | sort -nr
4 3
1 z
1 three
1 c
1 5
$ find . -name "f?" -exec echo "File: " {} \; -exec cat {} \; -exec echo "----" \;
File: ./d0/f1
1,2,3
----
File: ./d0/f2
3,4,5
----
File: ./d1/f1
8,9,3
----
File: ./d1/f2
a,b,c
----
File: ./d2/f1
x,y,z
----
File: ./d2/f2
one,two,three
----
File: ./d3/f1
red,yellow,3
----
File: ./d3/f2
1,2,3
----
find . -name "File*" | xargs cut -d, -f3 | sort | uniq -c | sort -nr
I am trying to find a way to exclude numbers on a file when I cat ti but I only want to exclude the numbers on print $1 and I want to keep the number that is in front of the word. I have something that I thought might might work but is not quite giving me what I want. I have also showed an example of what the file looks like.The file is separated by pipes.
cat files | awk -F '|' ' {print $1 "\t" $2}' |sed 's/0123456789//g'
input:
b1ark45 | dog23 | brown
m2eow66| cat24 |yellow
h3iss67 | snake57 | green
Output
b1ark dog23
m2eow cat24
h3iss nake57
try this:
awk -F'|' -v OFS='|' '{gsub(/[0-9]/,"",$1)}7' file
the output of your example would be:
bark | dog23 | brown
meow| cat24 |yellow
hiss | snake57 | green
EDIT
this outputs col1 (without ending numbers and spaces) and col2, separated by <tab>
kent$ echo "b1ark45 | dog23 | brown
m2eow66| cat24 |yellow
h3iss67 | snake57 | green"|awk -F'|' -v OFS='\t' '{gsub(/[0-9]*\s*$/,"",$1);print $1,$2}'
b1ark dog23
m2eow cat24
h3iss snake57
This might work for you (GNU sed):
sed -r 's/[0-9]*\s*\|\s*(\S*).*/ \1/' file
I am reviewing my access_logs with a statment like:
cat access_log | grep 16/Sep/2012:17 | awk '{print $12 $13 $14 $15 $16}' | sort | uniq -c | sort -n | tail -40
The purpose is to see the user agent of the anyone that has been hitting my server for the last hour sorted by number of hits. My server has unusual activity to I want stop any unwanted spiders/etc.
But the part: awk '{print $12 $13 $14 $15 $16}' would be much preferred as something like: awk '{print $12-through-end-of-line}' so that I could see the whole user agent which is a different length for each one.
Is there a way to do this with awk?
Not extremely elegant, but this works:
grep 16/Sep/2012:17 access_log | awk '{for (i=12;i<=NF;++i) printf "%s ",$i;print ""}'
It has the side effect of condensing multiple spaces between fields down to one, and putting an extra one at the end of the line, though, which probably isn't critical.
I've never found one; in situations like this, I use cut (assuming I don't need awk's flexible handling of field separation):
# Assuming tab-separated fields, cut's default
grep 16/Sep/2012:17 access_log | cut -f12- | sort | uniq -c | sort -n | tail -40
# For space-separated fields (single spaces, not arbitrary amounts of whitespace)
grep 16/Sep/2012:17 access_log | cut -d' ' -f12- | sort | uniq -c | sort -n | tail -40
(Clarification: I've never found a good way. I've used #twalberg's for-loop when necessary, but prefer using cut if possible.)
$ echo somefields:; cat somefields ; echo from-to.awk: ; \
cat from-to.awk ; echo ;awk -f from-to.awk somefields
somefields:
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
from-to.awk:
{ for (i=12; i<=NF; i++) { printf "%s ", $i }; print "" }
l m n o p q r s t u v w x y z
12 13 14 15 16 17 18 19 20 21
from man awk:
NF The number of fields in the current input record.
So you basically loop through fields (separated by spaces) from 12 to the last one.
why not
#!/bin/bash
awk "/$1/"'{for (i=12;i<=NF;i++) printf("%s ", $i) ;printf "\n"}' log | sort | uniq -c | sort -n | tail -40
in a script file.
Then you can call it like
myMonitor.sh 16/Sep/2012:17
Don't have a way to test this right. Appologies for any formatting/syntax errors.
Hopefully you get the idea.
IHTH
awk '/16/Sep/2012:17/{for(i=1;i<12;i++){$i="";}print}' access_log| sort | uniq -c | sort -n | tail -40