how to extract line portion on the basis of start substring and end substring using sed or awk

how to extract line portion on the basis of start substring and end substring using sed or awk - awk

I have a multiline file with text having no spaces.
Thereisacat;whichisverycute.Thereisadog;whichisverycute.
Thereisacat;whichisverycute.Thereisadog;whichisverycute.
I want to extract string between cat and cute (first occurrence not second) that is the output is
;whichisvery
;whichisvery
I am close to getting it but I end up getting string from cat to the last cute with the command from here.
sed -e 's/.*cat\(.*\)cute.*/\1/'
I am getting
;whichisverycute.Thereisadog;whichisvery
;whichisverycute.Thereisadog;whichisvery
How can I get the text from cat to the first occurrence of cute not last?

Given the input you posted all you need is:
$ awk -F'cat|cute' '{print $2}' file
;whichisvery
;whichisvery

In sed:
sed 's/cute.*//;s/.*cat//' Input_file
In awk:
awk '{sub(/cute.*/,"");sub(/^.*cat/,"");print}' Input_file

Related

spacial character replacement using sed or awk

I have a string in a dat file
Times-New $\Gamma$
and I want to replace it using sed or awk (I would prefer to use awk) utility with
/Symbol G
I could replace only one $ sign but failed for the remaining text.
So what I want is
sed 's/Times-New $\Gamma$/Symbol G/g' case.dat
Could you please help me?

Could you please try following, written and tested with shown samples. We need to escape \ and $ to make awk treat them as a literal character in program.
awk '{gsub(/Times-New \$\\Gamma\$/,"/Symbol G")} 1' Input_file
OR as per OP's comment try(Thanks to anubhava sir for adding solution):
awk '{gsub(/Times-New[ \t]+(G|\$\\Gamma\$)/, "/Symbol G")} 1' Input_file
With GNU sed try:
sed -E 's/Times-New \$\\Gamma\$/Symbol G/g' Input_file

awk sed grep to extract patten with special characters

I am trying to understant the switchs and args in awk and sed
For instance, to get the number next to nonce in the line form the file response.xml:
WWW-Authenticate: ServiceAuth realm="WinREST", nonce="1828HvF7EfPnRtzSs/h10Q=="
I use by suggestion of another member
nonce=$(sed -nE 's/.*nonce="([^"]+)"/\1/p' response.xml)
to get the numbers next to the word idOperation in the line below I was trying :
idOper=$(sed -nE 's/.*idOperation="([^"]+)"/\1/p' zarph_response.xml)
line to extract the number:
{"reqStatus":{"code":0,"message":"Success","success":true},"idOperation":"185-16-6"}
how do I get the 185-16-6 ?
and if the data to extract has no ""
like the 1 next to operStatus ?
{"reqStatus":{"code":0,"message":"Success","success":true},"operStatus":1,"amountReceived":0,"amountDismissed":0,"currency":"EUR"}

Following awk may help you on same.
awk -F"\"" '/idOperation/{print $(NF-1)}' Input_file
Solution 2nd: In sed following may help you on same.
sed '/idOperation/s/\(.*:\)"\([^"]*\)\(.*\)/\2/' Input_file
EDIT: In case you want to get the digit after string operStatus then following may help you on same.
awk 'match($0,/operStatus[^,]*/){;print substr($0,RSTART+12,RLENGTH-12)}' Input_file

Using grep perl-style regexes
grep -oP "nonce=\"\K(.*)?(?=\")" Input_file
grep -oP "idOperation\":\"\K(.*)?(?=\")" Input_file
If the input is json you can use jq
jq .idOperation Input_file

Grep part of string after symbol and shuffle columns

I would like to take the number after the - sign and put is as column 2 in my matrix. I know how to grep the string but not how to print it after the text string.
in:
1-967764 GGCTGGTCCGATGGTAGTGGGTTATCAGAACT
3-425354 GCATTGGTGGTTCAGTGGTAGAATTCTCGCC
4-376323 GGCTGGTCCGATGGTAGTGGGTTATCAGAAC
5-221398 GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT
6-180339 TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT
out:
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339

awk -F'[[:space:]-]+' '{print $3,$2}' file

Seems like a simple substitution should do the job:
sed -E 's/[0-9]+-([0-9]+)[[:space:]]*(.*)/\2 \1/' file
Capture the parts you're interested in and use them in the replacement.
Alternatively, using awk:
awk 'sub(/^[0-9]+-/, "") { print $2, $1 }' file
Remove the leading digits and - from the start of the line. When this is successful, sub returns true, so the action is performed, printing the second field, followed by the first.

Using regex ( +|-) as field separator:
$ awk -F"( +|-)" '{print $3,$2}' file
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339

here is another awk
$ awk 'split($1,a,"-") {print $2,a[2]}' file

awk '{sub(/.-/,"");print $2,$1}' file
GGCTGGTCCGATGGTAGTGGGTTATCAGAACT 967764
GCATTGGTGGTTCAGTGGTAGAATTCTCGCC 425354
GGCTGGTCCGATGGTAGTGGGTTATCAGAAC 376323
GGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCT 221398
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCT 180339

AWK get specificic pattern

I have lines like this:
Volume.Free_IBM_LUN59_28D: 2072083693568
I would like to get only IBM_LUN59_28D from this line using awk.
Thanks

You can use sub to do substitutions on each input line, as per the following transcript:
pax> echo 'Volume.Free_IBM_LUN59_28D: 2072083693568' | awk '
...> {
...> sub (".*Free_", "");
...> sub (":.*", "");
...> print
...> }'
IBM_LUN59_28D
That command crosses multiple lines for readability but, if you're operating on a file and not too concerned about readability, you can just use the compressed version:
awk '{sub(".*Free_","");sub(":.*","");print}' inputFile
If you're amenable to non-awk solutions, you could also use sed:
sed -e 's/.*Free_//' -e 's/:.*//' inputFile
Note that both those solutions rely on your (somewhat sparse) test data. If your definition of "like" includes preceding textual segments other than Free_ or subsequent characters other than :, some more work may be needed.
For example, if you wanted the string between the first _ and the first :, you could use:
awk '{sub("[^_]*_","");sub(":.*","");print}'

With sed:
sed 's/[^_]*_\(.*\):.*/\1/'
Search for sequence of non _ characters followed by _ (this will match Volume.Free_), then another sequence of characters (this will match IBM_LUN59_28D, we group this for future use), followed by : and any char sequence. Substitute with the saved pattern (\1). That's it.
Sample:
$ echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | sed 's/[^_]*_\(.*\):.*/\1/'
IBM_LUN59_28D

Here is one awk
awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
Eks:
echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
IBM_LUN59_28D
It divides the line by Free_.
If line then have more than one field NF>1 then:
Split second field bye : and print first part a[1]

With awk:
echo "$val" | awk -F: '{print $1}' | awk -F. '{print $2}' | awk '{print substr($0,6)}'
where the given string is in $val.

awk to transpose lines of a text file

A .csv file that has lines like this:
20111205 010016287,1.236220,1.236440
It needs to read like this:
20111205 01:00:16.287,1.236220,1.236440
How do I do this in awk? Experimenting, I got this far. I need to do it in two passes I think. One sub to read the date&time field, and the next to change it.
awk -F, '{print;x=$1;sub(/.*=/,"",$1);}' data.csv

Use that awk command:
echo "20111205 010016287,1.236220,1.236440" | \
awk -F[\ \,] '{printf "%s %s:%s:%s.%s,%s,%s\n", \
$1,substr($2,1,2),substr($2,3,2),substr($2,5,2),substr($2,7,3),$3,$4}'
Explanation:
-F[\ \,]: sets the delimiter to space and ,
printf "%s %s:%s:%s.%s,%s,%s\n": format the output
substr($2,0,3): cuts the second firls ($2) in the desired pieces
Or use that sed command:
echo "20111205 010016287,1.236220,1.236440" | \
sed 's/\([0-9]\{8\}\) \([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1 \2:\3:\4.\5/g'
Explanation:
[0-9]\{8\}: first match a 8-digit pattern and save it as \1
[0-9]\{2\}...: after a space match 3 times a 2-digit pattern and save them to \2, \3 and \4
[0-9]\{3\}: and at last match 3-digit pattern and save it as \5
\1 \2:\3:\4.\5: format the output

sed is better suited to this job since it's a simple substitution on single lines:
$ sed -r 's/( ..)(..)(..)/\1:\2:\3./' file
20111205 01:00:16.287,1.236220,1.236440
but if you prefer here's GNU awk with gensub():
$ awk '{print gensub(/( ..)(..)(..)/,"\\1:\\2:\\3.","")}' file
20111205 01:00:16.287,1.236220,1.236440

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to extract line portion on the basis of start substring and end substring using sed or awk - awk

Given the input you posted all you need is: $ awk -F'cat|cute' '{print $2}' file ;whichisvery ;whichisvery

In sed: sed 's/cute.//;s/.cat//' Input_file In awk: awk '{sub(/cute./,"");sub(/^.cat/,"");print}' Input_file

Related

spacial character replacement using sed or awk

awk sed grep to extract patten with special characters

Grep part of string after symbol and shuffle columns

AWK get specificic pattern

awk to transpose lines of a text file

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to extract line portion on the basis of start substring and end substring using sed or awk - awk

Given the input you posted all you need is: $ awk -F'cat|cute' '{print $2}' file ;whichisvery ;whichisvery

In sed: sed 's/cute.*//;s/.*cat//' Input_file In awk: awk '{sub(/cute.*/,"");sub(/^.*cat/,"");print}' Input_file

Related

spacial character replacement using sed or awk

awk sed grep to extract patten with special characters

Grep part of string after symbol and shuffle columns

AWK get specificic pattern

awk to transpose lines of a text file

Categories

Resources

In sed: sed 's/cute.//;s/.cat//' Input_file In awk: awk '{sub(/cute./,"");sub(/^.cat/,"");print}' Input_file