Remove all character after matched character - awk

I have a file with many lines
http://example.com/part-1 this number 1 one
http://example.com/part--2 this is number 21 two
http://example.com/part10 this is an number 12 ten
http://example.com/part-num-11 this is an axample number 212 eleven
How can I remove all character after "number x" + between first columd and "number x"...I wanna my output like this
http://example.com/part-1 1
http://example.com/part--2 21
http://example.com/part10 12
http://example.com/part-num-11 212
Another case :
Input:
http://server1.example.com/00/part-1 this number 1 one
http://server2.example.com/1a/part--2 this is section 21 two two
http://server3.example.com/2014/5/part10 this is an Part 12 ten ten ten
http://server5.example.com/2014/7/part-num-11 this is an PARt number 212 eleven
I wanna the same output....And the number is always in last numeric field

Here is one way:
awk -F"number" '{split($1,a," ");split($2,b," ");print a[1],b[1]}' file
http://example.com/part-1 1
http://example.com/part--2 21
http://example.com/part10 12
http://example.com/part-num-11 212
If the number you like to have is always on the second last field, this should do too:
awk '{print $1,$(NF-1)}' file
http://example.com/part-1 1
http://example.com/part--2 21
http://example.com/part10 12
http://example.com/part-num-11 212

sed -r 's/^([^0-9]*[0-9]+)[^0-9]*([0-9]+).*/\1 \2/' file
Output:
http://example.com/part-1 1
http://example.com/part--2 21
http://example.com/part10 12
http://example.com/part-num-11 212

Try this:
sed 's/ .*number \([0-9]+\).*/ \1/' myfile.txt

Thank everyone...From your comments, I have my own solution :
sed -re 's/([0-9]*[0-9]+)/#\1#/g' | sed -re 's/(^.*#).*/\1/g' | sed 's/#//g' | awk '{print $1" "$NF}'
My idea : Replace all numeric group with #[numbers]# , then select all character from start of line to "#" (sed will select last # ) and remove all rest character. Next is awk
Thank everyone (y)

Related

How to compare 2 files having multiple occurances of a number and output the additional occurance?

Currently i am using a awk script to compare 2 files having random numbers in non sequential order.
It works perfect , but there is just one future condition i would like to fulfill.
Current awk function
awk '
{
$0=$0+0
}
FNR==NR{
a[$0]
next
}
($0 in a){
b[$0]
next
}
{ print }
END{
for(j in a){
if(!(j in b)){ print j }
}
}
' compare1.txt compare2.txt
What the the function accomplishes currently ?
It outputs list of all the numbers which are present in compare1 but not in compare 2 and vice versa
If any number has zero in its prefix, ignore zeros while comparing ( basically the absolute value of number must be different to be treated as a mismatch ) Example - 3 should be considered matching with 003 and 014 should be considered matching with 14, 008 with 8 etc
As required It also considers a number matched even if they are not necessarily on the same line in both files
Required additional condition
In its current form , this functions works in such a way that if a file has multiple occurances of a number and other file has even one occurance of that same number , it considers the number matched for both repetitions.
I need the awk function to be edited to output any additional occurrence of a number
cat compare1.txt
57
11
13
3
889
014
91
775
cat compare2.txt
003
889
13
14
57
12
90
775
775
Expected output
12
90
11
91
**775**
The number marked here at end is currently not being shown in output in my present awk function ( 2 occurances - 1 occurrence )
As mentioned at https://stackoverflow.com/a/62499047/1745001, this is the job that comm exists to do:
$ comm -3 <(awk '{print $0+0}' compare1.txt | sort) <(awk '{print $0+0}' compare2.txt | sort)
11
12
775
90
91
and to get rid of the white space:
$ comm -3 <(awk '{print $0+0}' compare1.txt | sort) <(awk '{print $0+0}' compare2.txt | sort) |
awk '{print $1}'
11
12
775
90
91
you just need to count the occurrences and account for it in matching...
$ awk '{k=$0+0}
NR==FNR {a[k]++; next}
!(k in a && a[k]-->0);
END {for(k in a) while(a[k]-->0) print k}' file1 file2
12
90
775
11
91
note that as in your original script there is no absolute value comparison, which you can add easily by just changing k in the first line.

AWK removes my spaces

I have a data like
31 text text t text ?::"!!/
2 te text 32 +ěščřžý
43 te www ##
It is output from uniq -c
I need to get something like
text text t text ?::"!!/
te text 32 +ěščřžý
te www ##
I tried to use something like
a=$1;
$1=""
$0=substr($0, 2);
printf $0;
print "";
But it removes me spaces and I got something like
text text t text ?::"!!/
te text 32 +ěščřžý
te www ##
And I need to save the number too.
Is there anyone, who knows how to do it?
I guess you want to remove the leading digits from each line, sed will be simpler for this task
sed -E 's/^[0-9]+ //' file
awk normalizes white space with the default FS. You can do the same with sub in awk if there is more processing.
Try this one:
$ echo "31 text text t text" |awk '{gsub($1FS$2,$2);print}'
text text t text
You can also try
$ echo "31 text text t text" |awk '{gsub(/^[0-9]+/,"");print}'
text text t text
But in this case you will have a leading space in front of each line.
$ seq 5 | uniq -c
1 1
1 2
1 3
1 4
1 5
$ seq 5 | uniq -c | awk '{sub(/^ *[^ ]+ +/,"")}1'
1
2
3
4
5
$ seq 5 | uniq -c | sed 's/^ *[^ ]* *//'
1
2
3
4
5

compare first column of two altenative lines and if match then check which value is greater in column 2 of two lines and print max one line using awk

Input:
anil 14
anil 25
umar 78
umar 13
umar 06
amritha 06
amritha 25
amritha 17
Output:
anil 25
umar 78
amritha 25
How to get this output using single awk command? Please help me regards this.
If you like the largest value, try this awk
awk '{a[$1]=$2>a[$1]?$2:a[$1]} END {for (i in a) print i,a[i]}' file
amritha 25
umar 78
anil 25
You could let sort do the sorting and then let awk do the picking:
sort -r -k 1,1 -k 2,2 file | awk '{if($1!=prev){prev=$1;print $0}}'
So, that says... sort by name first, then by the second column and reverse the order so that the biggest of each name comes first. Then pass that to awk and print the line if the first column has changed relative to the previously seen one.

Multiply every nth field...elegantly

I have a text file with a series of numbers:
1 2 4 2 2 6 3 4 7 4 4 8 2 4 6 5 5 8
I need to have every third field multiplied by 3, so output would be:
1 2 12 2 2 18 3 4 21 4 4 24 2 4 18 5 5 24
Now, I've hammered out a solution already, but I know there's a quicker, more elegant one out there. Here's what I've gotten to work:
xargs -n1 < input.txt | awk '{printf NR%3 ? "%d " : $0*3" ", $1}' > output.txt
I feel that there must be an awk one-liner that can do this?? How can I make awk look at each field (instead of each record), thus not needing the call to xargs to put every field on a different line? Or maybe sed can do it?
Try:
awk '{for (i=3;i<=NF;i+=3)$i*=3; print}' input.txt > output.txt
I have not tested this yet (posted on my iPod). The print command without parameters should print out the whole (partially modified) line. You might have to set OFS=" " in the BEGIN section to get the blank as the separator in the output.
this line would work too:
awk -v RS="\\n| " -v ORS=" " '!(NR%3){$0*=3}7' file

awk - join last column with next line first column

I want to join last column of the line with the next line first column. For example:
cat FILE
12 15
22 25
32 35
42 45
to join like this:
15 22
25 32
35 42
15 (last column) joined with 22 (first column of next line).
My solution is: tr '\n' '#' < FILE | tr '\t' '\n' | grep '#' | grep -v '#$' | tr '#' '\t'
But there might be simple awk command to do this.
awk '{
for (i = 2; i < NF; i += 2)
print $i, $(i + 1)
}' RS= OFS=\\t infile
With bash:
a=($(<infile));printf '%s\t%s\n' ${a[#]:1:${#a[#]}-2}
With zsh:
printf '%s\t%s\n' ${$(<infile):1:-1}
Got it!
$ awk 'BEGIN{OFS="\t"}{if (NR==1) {a=$2} else {print a,$1;a=$2}}' file
15 22
25 32
35 42
'BEGIN{OFS="\t"} set file separator as tab.
{if (NR==1) {a=$2} for first line just store 2nd field.
else {print a,$1;a=$2}} in the rest of cases print 2nd field of previous row and 1st field of current. This way we do not print last record.
Dimitre Radoulov has the solution but if we're golfing:
awk '$1=$NF=X;1' RS= file|xargs -n2
15 22
25 32
35 42
awk 'NR!=1{print $1,p} {p=$2}'