How to avoid leading space using cut command? - cut

Requirement: I need to use only grep/cut/join.
I have data like:
3 abcd
23 xyz
1234 abc
I want to pipe this data to cut and then extract columns. But, when I am using cut -d' ' -f 1,2 it treats each space as its own column divider. I would prefer the first two rows be trimmed prior to cut. Is there a way?
Example (I have used tr for demonstration purposes of where the spaces are here; it is not allowed in the solution):
$ echo ' 3 abcd
23 xyz
1234 abc' | cut -d' ' -f 1,2 | tr ' ' '_'
_
_23
1234_abc
Expected output:
$3 abcd
23 xyz
1234 abc

Using just grep, you can accomplish this with the following pipe:
grep -oe "[^ ][^ ]* *[^ ][^ ]*$"
grep # a tool for matching text
-o # only prints out matching text
-e # uses a regex
[^ ] # match anything that isn't a space
* # match zero or more of the previous element
$ # the end of the line
Note: This does not account for trailing whitespace.
Demonstration:
$ echo ' 3 abcd
23 xyz
1234 abc' | grep -oe "[^ ][^ ]* *[^ ][^ ]*$"
3 abcd
23 xyz
1234 abc

Related

How to use tab as delimiter in this cut command?

cut -f2 words.txt | sort | uniq -c | sort -nr > counted.txt
This command selects the column from the file, count each entry and sort based on frequency, and then output the result to counted.txt, ranked by frequencies, as follows:
1 71321 good
2 14945 bad
3 5891 nice
4 4641 pretty
5 4494 slow
6 3671 quick
...
However, this output delimiter is a white space , and I want a tab as delimiter. I tried the following with --output-delimiter:
cut -f2 --output-delimiter='\t' words.txt | sort | uniq -c | sort -nr > counted.txt
But it reports an error, invalid option.
How to make the output delimiter a tab?
Try to do as follows :
cut -d'\t' words.txt -f2 | sort | uniq -c | sort -nr > counted.txt
The -d option specifies the delimiter you want to use

Sed replace nth column of multiple tsv files without header

Here are multiple tsv files, where I want to add 'XX' characters only in the second column (everywhere except in the header) and save it to this same file.
Input:
$ls
file1.tsv file2.tsv file3.tsv
$head -n 4 file1.tsv
a b c
James England 25
Brian France 41
Maria France 18
Ouptut wanted:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18
I tried this, but the result is not kept in the file, and a simple redirection won't work:
# this works, but doesn't save the changes
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed "s|^|X${i}_|"
i=$((i+1))
done
# adding '-i' option to sed: this throws an error but would be perfect (sed no input files error)
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed -i "s|^|T${i}_|"
i=$((i+1))
done
Some help would be appreciated.
The second column is particularly easy because you simply replace the first occurrence of the separator.
for file in *.tsv; do
sed -i '2,$s/\t/\tX1_/' "$file"
done
If your sed doesn't recognize the symbol \t, use a literal tab (in many shells, you type it with ctrlv tab.) On *BSD (and hence MacOS) you need -i ''
AWK solution:
awk -i inplace 'BEGIN { FS=OFS="\t" } NR!=1 { $2 = "X1_" $2 } 1' file1.tsv
Input:
a b c
James England 25
Brian France 41
Maria France 18
Output:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18

AWK removes my spaces

I have a data like
31 text text t text ?::"!!/
2 te text 32 +ěščřžý
43 te www ##
It is output from uniq -c
I need to get something like
text text t text ?::"!!/
te text 32 +ěščřžý
te www ##
I tried to use something like
a=$1;
$1=""
$0=substr($0, 2);
printf $0;
print "";
But it removes me spaces and I got something like
text text t text ?::"!!/
te text 32 +ěščřžý
te www ##
And I need to save the number too.
Is there anyone, who knows how to do it?
I guess you want to remove the leading digits from each line, sed will be simpler for this task
sed -E 's/^[0-9]+ //' file
awk normalizes white space with the default FS. You can do the same with sub in awk if there is more processing.
Try this one:
$ echo "31 text text t text" |awk '{gsub($1FS$2,$2);print}'
text text t text
You can also try
$ echo "31 text text t text" |awk '{gsub(/^[0-9]+/,"");print}'
text text t text
But in this case you will have a leading space in front of each line.
$ seq 5 | uniq -c
1 1
1 2
1 3
1 4
1 5
$ seq 5 | uniq -c | awk '{sub(/^ *[^ ]+ +/,"")}1'
1
2
3
4
5
$ seq 5 | uniq -c | sed 's/^ *[^ ]* *//'
1
2
3
4
5

Remove all character after matched character

I have a file with many lines
http://example.com/part-1 this number 1 one
http://example.com/part--2 this is number 21 two
http://example.com/part10 this is an number 12 ten
http://example.com/part-num-11 this is an axample number 212 eleven
How can I remove all character after "number x" + between first columd and "number x"...I wanna my output like this
http://example.com/part-1 1
http://example.com/part--2 21
http://example.com/part10 12
http://example.com/part-num-11 212
Another case :
Input:
http://server1.example.com/00/part-1 this number 1 one
http://server2.example.com/1a/part--2 this is section 21 two two
http://server3.example.com/2014/5/part10 this is an Part 12 ten ten ten
http://server5.example.com/2014/7/part-num-11 this is an PARt number 212 eleven
I wanna the same output....And the number is always in last numeric field
Here is one way:
awk -F"number" '{split($1,a," ");split($2,b," ");print a[1],b[1]}' file
http://example.com/part-1 1
http://example.com/part--2 21
http://example.com/part10 12
http://example.com/part-num-11 212
If the number you like to have is always on the second last field, this should do too:
awk '{print $1,$(NF-1)}' file
http://example.com/part-1 1
http://example.com/part--2 21
http://example.com/part10 12
http://example.com/part-num-11 212
sed -r 's/^([^0-9]*[0-9]+)[^0-9]*([0-9]+).*/\1 \2/' file
Output:
http://example.com/part-1 1
http://example.com/part--2 21
http://example.com/part10 12
http://example.com/part-num-11 212
Try this:
sed 's/ .*number \([0-9]+\).*/ \1/' myfile.txt
Thank everyone...From your comments, I have my own solution :
sed -re 's/([0-9]*[0-9]+)/#\1#/g' | sed -re 's/(^.*#).*/\1/g' | sed 's/#//g' | awk '{print $1" "$NF}'
My idea : Replace all numeric group with #[numbers]# , then select all character from start of line to "#" (sed will select last # ) and remove all rest character. Next is awk
Thank everyone (y)

How do I print a range of data in awk?

I am reviewing my access_logs with a statment like:
cat access_log | grep 16/Sep/2012:17 | awk '{print $12 $13 $14 $15 $16}' | sort | uniq -c | sort -n | tail -40
The purpose is to see the user agent of the anyone that has been hitting my server for the last hour sorted by number of hits. My server has unusual activity to I want stop any unwanted spiders/etc.
But the part: awk '{print $12 $13 $14 $15 $16}' would be much preferred as something like: awk '{print $12-through-end-of-line}' so that I could see the whole user agent which is a different length for each one.
Is there a way to do this with awk?
Not extremely elegant, but this works:
grep 16/Sep/2012:17 access_log | awk '{for (i=12;i<=NF;++i) printf "%s ",$i;print ""}'
It has the side effect of condensing multiple spaces between fields down to one, and putting an extra one at the end of the line, though, which probably isn't critical.
I've never found one; in situations like this, I use cut (assuming I don't need awk's flexible handling of field separation):
# Assuming tab-separated fields, cut's default
grep 16/Sep/2012:17 access_log | cut -f12- | sort | uniq -c | sort -n | tail -40
# For space-separated fields (single spaces, not arbitrary amounts of whitespace)
grep 16/Sep/2012:17 access_log | cut -d' ' -f12- | sort | uniq -c | sort -n | tail -40
(Clarification: I've never found a good way. I've used #twalberg's for-loop when necessary, but prefer using cut if possible.)
$ echo somefields:; cat somefields ; echo from-to.awk: ; \
cat from-to.awk ; echo ;awk -f from-to.awk somefields
somefields:
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
from-to.awk:
{ for (i=12; i<=NF; i++) { printf "%s ", $i }; print "" }
l m n o p q r s t u v w x y z
12 13 14 15 16 17 18 19 20 21
from man awk:
NF The number of fields in the current input record.
So you basically loop through fields (separated by spaces) from 12 to the last one.
why not
#!/bin/bash
awk "/$1/"'{for (i=12;i<=NF;i++) printf("%s ", $i) ;printf "\n"}' log | sort | uniq -c | sort -n | tail -40
in a script file.
Then you can call it like
myMonitor.sh 16/Sep/2012:17
Don't have a way to test this right. Appologies for any formatting/syntax errors.
Hopefully you get the idea.
IHTH
awk '/16/Sep/2012:17/{for(i=1;i<12;i++){$i="";}print}' access_log| sort | uniq -c | sort -n | tail -40