I have some directories and files that looks like this:
/my/directories/directory0/
|
-->File1.txt
|
-->File2.txt
/my/directories/directory1/
|
-->File1.txt
|
-->File2.txt
/my/directories/directory2/
|
-->File1.txt
|
-->File2.txt
/my/directories/directory3/
|
-->File1.txt
|
-->File2.txt
These are CSV files, and I'm trying to get the 3rd column counted, and sorted from highest to lowest.
Right now I'm able to accomplish this, but only within each directoryx specifically. For example, if I run this:
cd /my/directories/directory0/
cat *.txt | awk -F "," '{print $3}' | sort | uniq -c | sort -nr > finalOutput.txt
Then I get exactly what I want, but only with the data in that directory. I want to cat everything from all of the /my/directories/ sub directories into a single file.
I've tried to use ls or find to accomplish this, but I can't get it working. I know you can recursively cat this way, similar to this:
find /my/directories/ -name '*.txt' -exec cat {} \; > finalOutput.txt
But, I haven't been able to get this to work with a multi piped command. Any help is appreciated.
Try with xargs:
$ find . -name "f?" | xargs awk -F, '{ print $3 }' | sort | uniq -c | sort -nr
4 3
1 z
1 three
1 c
1 5
$ find . -name "f?" -exec echo "File: " {} \; -exec cat {} \; -exec echo "----" \;
File: ./d0/f1
1,2,3
----
File: ./d0/f2
3,4,5
----
File: ./d1/f1
8,9,3
----
File: ./d1/f2
a,b,c
----
File: ./d2/f1
x,y,z
----
File: ./d2/f2
one,two,three
----
File: ./d3/f1
red,yellow,3
----
File: ./d3/f2
1,2,3
----
find . -name "File*" | xargs cut -d, -f3 | sort | uniq -c | sort -nr
Related
cut -f2 words.txt | sort | uniq -c | sort -nr > counted.txt
This command selects the column from the file, count each entry and sort based on frequency, and then output the result to counted.txt, ranked by frequencies, as follows:
1 71321 good
2 14945 bad
3 5891 nice
4 4641 pretty
5 4494 slow
6 3671 quick
...
However, this output delimiter is a white space , and I want a tab as delimiter. I tried the following with --output-delimiter:
cut -f2 --output-delimiter='\t' words.txt | sort | uniq -c | sort -nr > counted.txt
But it reports an error, invalid option.
How to make the output delimiter a tab?
Try to do as follows :
cut -d'\t' words.txt -f2 | sort | uniq -c | sort -nr > counted.txt
The -d option specifies the delimiter you want to use
I am trying to find a way to exclude numbers on a file when I cat ti but I only want to exclude the numbers on print $1 and I want to keep the number that is in front of the word. I have something that I thought might might work but is not quite giving me what I want. I have also showed an example of what the file looks like.The file is separated by pipes.
cat files | awk -F '|' ' {print $1 "\t" $2}' |sed 's/0123456789//g'
input:
b1ark45 | dog23 | brown
m2eow66| cat24 |yellow
h3iss67 | snake57 | green
Output
b1ark dog23
m2eow cat24
h3iss nake57
try this:
awk -F'|' -v OFS='|' '{gsub(/[0-9]/,"",$1)}7' file
the output of your example would be:
bark | dog23 | brown
meow| cat24 |yellow
hiss | snake57 | green
EDIT
this outputs col1 (without ending numbers and spaces) and col2, separated by <tab>
kent$ echo "b1ark45 | dog23 | brown
m2eow66| cat24 |yellow
h3iss67 | snake57 | green"|awk -F'|' -v OFS='\t' '{gsub(/[0-9]*\s*$/,"",$1);print $1,$2}'
b1ark dog23
m2eow cat24
h3iss snake57
This might work for you (GNU sed):
sed -r 's/[0-9]*\s*\|\s*(\S*).*/ \1/' file
I have a file with the following lines of text:
jeremy , thomas , 123
peter , paul , 456
jack , jill , 789
I would like to remove all of the data except for the center item. For example ending up with a file which contains:
thomas
paul
jill
I have tried so many awk patterns my brain is exploding. Any help would be appreciated.
Try awk:
awk -F '[[:space:]]*,[[:space:]]*' '{print $2}' input.txt
Try this
cat <filepath> | tr -d ' ' | cut -d',' -f2
grep look around:
grep -Po '(?<=, ).*(?= ,)' file
Try this:
$ cat (your file)| cut -d ',' -f2 > (new file)
for instance:
$ cat /home/file1.txt | cut -d ',' -f2 > /home/file2.txt
I am reviewing my access_logs with a statment like:
cat access_log | grep 16/Sep/2012:17 | awk '{print $12 $13 $14 $15 $16}' | sort | uniq -c | sort -n | tail -40
The purpose is to see the user agent of the anyone that has been hitting my server for the last hour sorted by number of hits. My server has unusual activity to I want stop any unwanted spiders/etc.
But the part: awk '{print $12 $13 $14 $15 $16}' would be much preferred as something like: awk '{print $12-through-end-of-line}' so that I could see the whole user agent which is a different length for each one.
Is there a way to do this with awk?
Not extremely elegant, but this works:
grep 16/Sep/2012:17 access_log | awk '{for (i=12;i<=NF;++i) printf "%s ",$i;print ""}'
It has the side effect of condensing multiple spaces between fields down to one, and putting an extra one at the end of the line, though, which probably isn't critical.
I've never found one; in situations like this, I use cut (assuming I don't need awk's flexible handling of field separation):
# Assuming tab-separated fields, cut's default
grep 16/Sep/2012:17 access_log | cut -f12- | sort | uniq -c | sort -n | tail -40
# For space-separated fields (single spaces, not arbitrary amounts of whitespace)
grep 16/Sep/2012:17 access_log | cut -d' ' -f12- | sort | uniq -c | sort -n | tail -40
(Clarification: I've never found a good way. I've used #twalberg's for-loop when necessary, but prefer using cut if possible.)
$ echo somefields:; cat somefields ; echo from-to.awk: ; \
cat from-to.awk ; echo ;awk -f from-to.awk somefields
somefields:
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
from-to.awk:
{ for (i=12; i<=NF; i++) { printf "%s ", $i }; print "" }
l m n o p q r s t u v w x y z
12 13 14 15 16 17 18 19 20 21
from man awk:
NF The number of fields in the current input record.
So you basically loop through fields (separated by spaces) from 12 to the last one.
why not
#!/bin/bash
awk "/$1/"'{for (i=12;i<=NF;i++) printf("%s ", $i) ;printf "\n"}' log | sort | uniq -c | sort -n | tail -40
in a script file.
Then you can call it like
myMonitor.sh 16/Sep/2012:17
Don't have a way to test this right. Appologies for any formatting/syntax errors.
Hopefully you get the idea.
IHTH
awk '/16/Sep/2012:17/{for(i=1;i<12;i++){$i="";}print}' access_log| sort | uniq -c | sort -n | tail -40
I have e a line that looks like:
Feb 21 1:05:14 host kernel: [112.33000] SRC=192.168.0.1 DST=90.90.90.90 PREC=0x40 TTL=51 ....
I would like to the a list of uniq IPs from SRC=
How can I do this? Thanks
This will work, although you could probably simplify it further in a single awk script if you wanted:
awk '{print $7}' <your file> | awk -F= '{print $2}' | sort -u
grep -o 'SRC=\([^ ]\+\)' | cut -d= -f2 | sort -u
cat thefile | grep SRC= | sed -r 's/^.*SRC=([^ ]+).*$/\1/' | sort | uniq
This awk script will do:
{a[$7]=1}
END{for (i in a) print i}
This will print the IP addresses in order without the "SRC=" string:
awk '{a[$7] = $7} END {asort(a); for (i in a) {split(a[i], b, "="); print b[2]}}' inputfile
Example output:
192.168.0.1
192.168.0.2
192.168.1.1
grep -Po "SRC=(.[^\s]*)" file | sed 's/SRC=//' | sort -u
Ruby(1.9+)
ruby -ne 'puts $_.scan(/SRC=(.[^\s]*)/)[0] if /SRC=/' file| sort -u