how to extract line portion on the basis of start substring and end substring using sed or awk - awk

I have a multiline file with text having no spaces.
Thereisacat;whichisverycute.Thereisadog;whichisverycute.
Thereisacat;whichisverycute.Thereisadog;whichisverycute.
I want to extract string between cat and cute (first occurrence not second) that is the output is
;whichisvery
;whichisvery
I am close to getting it but I end up getting string from cat to the last cute with the command from here.
sed -e 's/.*cat\(.*\)cute.*/\1/'
I am getting
;whichisverycute.Thereisadog;whichisvery
;whichisverycute.Thereisadog;whichisvery
How can I get the text from cat to the first occurrence of cute not last?

Given the input you posted all you need is:
$ awk -F'cat|cute' '{print $2}' file
;whichisvery
;whichisvery

In sed:
sed 's/cute.*//;s/.*cat//' Input_file
In awk:
awk '{sub(/cute.*/,"");sub(/^.*cat/,"");print}' Input_file

Related

spacial character replacement using sed or awk

I have a string in a dat file
Times-New $\Gamma$
and I want to replace it using sed or awk (I would prefer to use awk) utility with
/Symbol G
I could replace only one $ sign but failed for the remaining text.
So what I want is
sed 's/Times-New $\Gamma$/Symbol G/g' case.dat
Could you please help me?
Could you please try following, written and tested with shown samples. We need to escape \ and $ to make awk treat them as a literal character in program.
awk '{gsub(/Times-New \$\\Gamma\$/,"/Symbol G")} 1' Input_file
OR as per OP's comment try(Thanks to anubhava sir for adding solution):
awk '{gsub(/Times-New[ \t]+(G|\$\\Gamma\$)/, "/Symbol G")} 1' Input_file
With GNU sed try:
sed -E 's/Times-New \$\\Gamma\$/Symbol G/g' Input_file

awk sed grep to extract patten with special characters

I am trying to understant the switchs and args in awk and sed
For instance, to get the number next to nonce in the line form the file response.xml:
WWW-Authenticate: ServiceAuth realm="WinREST", nonce="1828HvF7EfPnRtzSs/h10Q=="
I use by suggestion of another member
nonce=$(sed -nE 's/.*nonce="([^"]+)"/\1/p' response.xml)
to get the numbers next to the word idOperation in the line below I was trying :
idOper=$(sed -nE 's/.*idOperation="([^"]+)"/\1/p' zarph_response.xml)
line to extract the number:
{"reqStatus":{"code":0,"message":"Success","success":true},"idOperation":"185-16-6"}
how do I get the 185-16-6 ?
and if the data to extract has no ""
like the 1 next to operStatus ?
{"reqStatus":{"code":0,"message":"Success","success":true},"operStatus":1,"amountReceived":0,"amountDismissed":0,"currency":"EUR"}
Following awk may help you on same.
awk -F"\"" '/idOperation/{print $(NF-1)}' Input_file
Solution 2nd: In sed following may help you on same.
sed '/idOperation/s/\(.*:\)"\([^"]*\)\(.*\)/\2/' Input_file
EDIT: In case you want to get the digit after string operStatus then following may help you on same.
awk 'match($0,/operStatus[^,]*/){;print substr($0,RSTART+12,RLENGTH-12)}' Input_file
Using grep perl-style regexes
grep -oP "nonce=\"\K(.*)?(?=\")" Input_file
grep -oP "idOperation\":\"\K(.*)?(?=\")" Input_file
If the input is json you can use jq
jq .idOperation Input_file

Grep part of string after symbol and shuffle columns

I would like to take the number after the - sign and put is as column 2 in my matrix. I know how to grep the string but not how to print it after the text string.
in:
1-967764 GGCTGGTCCGATGGTAGTGGGTTATCAGAACT
3-425354 GCATTGGTGGTTCAGTGGTAGAATTCTCGCC
4-376323 GGCTGGTCCGATGGTAGTGGGTTATCAGAAC
5-221398 GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT
6-180339 TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT
out:
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339
awk -F'[[:space:]-]+' '{print $3,$2}' file
Seems like a simple substitution should do the job:
sed -E 's/[0-9]+-([0-9]+)[[:space:]]*(.*)/\2 \1/' file
Capture the parts you're interested in and use them in the replacement.
Alternatively, using awk:
awk 'sub(/^[0-9]+-/, "") { print $2, $1 }' file
Remove the leading digits and - from the start of the line. When this is successful, sub returns true, so the action is performed, printing the second field, followed by the first.
Using regex ( +|-) as field separator:
$ awk -F"( +|-)" '{print $3,$2}' file
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339
here is another awk
$ awk 'split($1,a,"-") {print $2,a[2]}' file
awk '{sub(/.-/,"");print $2,$1}' file
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339

AWK get specificic pattern

I have lines like this:
Volume.Free_IBM_LUN59_28D: 2072083693568
I would like to get only IBM_LUN59_28D from this line using awk.
Thanks
You can use sub to do substitutions on each input line, as per the following transcript:
pax> echo 'Volume.Free_IBM_LUN59_28D: 2072083693568' | awk '
...> {
...> sub (".*Free_", "");
...> sub (":.*", "");
...> print
...> }'
IBM_LUN59_28D
That command crosses multiple lines for readability but, if you're operating on a file and not too concerned about readability, you can just use the compressed version:
awk '{sub(".*Free_","");sub(":.*","");print}' inputFile
If you're amenable to non-awk solutions, you could also use sed:
sed -e 's/.*Free_//' -e 's/:.*//' inputFile
Note that both those solutions rely on your (somewhat sparse) test data. If your definition of "like" includes preceding textual segments other than Free_ or subsequent characters other than :, some more work may be needed.
For example, if you wanted the string between the first _ and the first :, you could use:
awk '{sub("[^_]*_","");sub(":.*","");print}'
With sed:
sed 's/[^_]*_\(.*\):.*/\1/'
Search for sequence of non _ characters followed by _ (this will match Volume.Free_), then another sequence of characters (this will match IBM_LUN59_28D, we group this for future use), followed by : and any char sequence. Substitute with the saved pattern (\1). That's it.
Sample:
$ echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | sed 's/[^_]*_\(.*\):.*/\1/'
IBM_LUN59_28D
Here is one awk
awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
Eks:
echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
IBM_LUN59_28D
It divides the line by Free_.
If line then have more than one field NF>1 then:
Split second field bye : and print first part a[1]
With awk:
echo "$val" | awk -F: '{print $1}' | awk -F. '{print $2}' | awk '{print substr($0,6)}'
where the given string is in $val.

awk to transpose lines of a text file

A .csv file that has lines like this:
20111205 010016287,1.236220,1.236440
It needs to read like this:
20111205 01:00:16.287,1.236220,1.236440
How do I do this in awk? Experimenting, I got this far. I need to do it in two passes I think. One sub to read the date&time field, and the next to change it.
awk -F, '{print;x=$1;sub(/.*=/,"",$1);}' data.csv
Use that awk command:
echo "20111205 010016287,1.236220,1.236440" | \
awk -F[\ \,] '{printf "%s %s:%s:%s.%s,%s,%s\n", \
$1,substr($2,1,2),substr($2,3,2),substr($2,5,2),substr($2,7,3),$3,$4}'
Explanation:
-F[\ \,]: sets the delimiter to space and ,
printf "%s %s:%s:%s.%s,%s,%s\n": format the output
substr($2,0,3): cuts the second firls ($2) in the desired pieces
Or use that sed command:
echo "20111205 010016287,1.236220,1.236440" | \
sed 's/\([0-9]\{8\}\) \([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1 \2:\3:\4.\5/g'
Explanation:
[0-9]\{8\}: first match a 8-digit pattern and save it as \1
[0-9]\{2\}...: after a space match 3 times a 2-digit pattern and save them to \2, \3 and \4
[0-9]\{3\}: and at last match 3-digit pattern and save it as \5
\1 \2:\3:\4.\5: format the output
sed is better suited to this job since it's a simple substitution on single lines:
$ sed -r 's/( ..)(..)(..)/\1:\2:\3./' file
20111205 01:00:16.287,1.236220,1.236440
but if you prefer here's GNU awk with gensub():
$ awk '{print gensub(/( ..)(..)(..)/,"\\1:\\2:\\3.","")}' file
20111205 01:00:16.287,1.236220,1.236440