removing part of a value from variable using sed - awk

I have the following value from a variable .
manager&org.apache.catalina.filters.CSRF_NONCE=314C54E5671790D592A37C2C4A6B9AAF
I need to modify the above variable to remove &amp from it. SO , the variable should like this
manager&;org.apache.catalina.filters.CSRF_NONCE=314C54E5671790D592A37C2C4A6B9AAF
Please suggest

I put your variable in a file and was able to do what you wanted with sed. The trick to make sure you do not remove any other references to just amp is to include the & as part of the substitution.
$ cat /tmp/file
manager&org.apache.catalina.filters.CSRF_NONCE=314C54E5671790D592A37C2C4A6B9AAF
$ cat /tmp/file | sed 's/\&amp/\&/g'
manager&;org.apache.catalina.filters.CSRF_NONCE=314C54E5671790D592A37C2C4A6B9AAF

sed 's/&/\&;/g' <<<"yourString"
The above line should help.
example:
kent$ sed 's/&/\&;/g'<<< "foo&bar&blah"
foo&;bar&;blah

In bash, you can use Parameter Expansion - Pattern substitution to remove a substring:
VAR='manager&org.apache.catalina.filters.CSRF_NONCE=314C54E5671790D592A37C2C4A6B9AAF'
echo ${VAR/&amp}

Related

replace strings between two patterns

I would like to replace (using sed/awk/tr) all the strings between CleanAgrobacterium and _gene by ZZZ in my file A.nwk:
(((CleanAgrobacterium_fabrum_str__C58_DE0068_Scaffold_Proteins_gene-FS783_RS12830:0,CleanAgrobacterium_fabrum_str__C58_DE0067_Scaffold_Proteins_gene-FS653_RS12825:0):0.056789,(CleanAgrobacterium_fabrum_GV2260_Complete_Genome_Proteins_gene-EML4058_RS17445:0,(CleanAgrobacterium_fabrum_1D1416_Chromosome_Proteins_gene-NQG32_RS17500:0,(CleanAgrobacterium_fabrum_PDC82_Contig_Proteins_gene-BLT49_RS14090:0,(CleanAgrobacterium_fabrum_N3394_Scaffold_Proteins_gene-G6L76_RS17395:0,(CleanAgrobacterium_fabrum_12D13_Complete_Genome_Proteins_gene-At12D13_RS18010:0,(CleanAgrobacterium_fabrum_Bi46_Contig_Proteins_gene-LQ162_RS02700:0,(CleanAgrobacterium_fabrum_ARqua1_Scaffold_Proteins_gene-HI842_RS18310:0,(CleanAgrobacterium_fabrum_N4094_Scaffold_Proteins_gene-G6L42_RS17400:0,(CleanAgrobacterium_fabrum_GV3101__pMP90_Complete_Genome_Proteins_gene-EML485_RS17435:0,(CleanAgrobacterium_fabrum_Kin001_Complete_Genome_Proteins_gene-FY134_RS17775:0,(CleanAgrobacterium_fabrum_LBA645_Complete_Genome_Proteins_gene-KXJ62_RS17445:0,(CleanAgrobacterium_fabrum_Di1525a_Scaffold_Proteins_gene-G6L89_RS17735:0,(CleanAgrobacterium_fabrum_NFIX02_Scaffold_Proteins_gene-BLR22_RS16795:0,(CleanAgrobacterium_fabrum_Arqua_Contig_Proteins_gene-EXN51_RS19140:0,(CleanAgrobacterium_fabrum_str__J-07_J-07_Scaffold_Proteins_gene-AGR8A_RS20015:0,CleanAgrobacterium_fabrum_1D132_Complete_Genome_Proteins_gene-At1D132_RS18580:0):0):0):0):0):0):0):0):0):0):0):0):0):0):0):0):0,(CleanAgrobacterium_fabrum_EHA105_Complete_Genome_Proteins_gene-EML540_RS17455:0,(CleanAgrobacterium_fabrum_RIT-As-3_Contig_Proteins_gene-ORG40_RS11815:0,(CleanAgrobacterium_fabrum_2788_Contig_Proteins_gene-G6L39_RS17590:0,(CleanAgrobacterium_fabrum_BG5_Complete_Genome_Proteins_gene-F3P66_RS17495:0,(CleanAgrobacterium_fabrum_Bi05_Contig_Proteins_gene-LQV40_RS07170:0,(CleanAgrobacterium_fabrum_str__C58_C58_Complete_Genome_Proteins_gene-ATU_RS17440:0,CleanAgrobacterium_fabrum_NFIX01_Scaffold_Proteins_gene-BMY00_RS16800:0):0):0):0):0):0):0);
sed "/CleanAgrobacterium/,/gene-/d" A.nwk
Instead of using a range, you could make the pattern more specific for the example data matching 1 or more alphanumeric chars or - or _ in between using [[:alnum:]_-]\+ and replace the match(es) with zzz
sed "s/CleanAgrobacterium[[:alnum:]_-]\+_gene/zzz/g" A.nwk
Output
(((zzz-FS783_RS12830:0,zzz-FS653_RS12825:0):0.056789,(zzz-EML4058_RS17445:0,(zzz-NQG32_RS17500:0,(zzz-BLT49_RS14090:0,(zzz-G6L76_RS17395:0,(zzz-At12D13_RS18010:0,(zzz-LQ162_RS02700:0,(zzz-HI842_RS18310:0,(zzz-G6L42_RS17400:0,(zzz-EML485_RS17435:0,(zzz-FY134_RS17775:0,(zzz-KXJ62_RS17445:0,(zzz-G6L89_RS17735:0,(zzz-BLR22_RS16795:0,(zzz-EXN51_RS19140:0,(zzz-AGR8A_RS20015:0,zzz-At1D132_RS18580:0):0):0):0):0):0):0):0):0):0):0):0):0):0):0):0):0,(zzz-EML540_RS17455:0,(zzz-ORG40_RS11815:0,(zzz-G6L39_RS17590:0,(zzz-F3P66_RS17495:0,(zzz-LQV40_RS07170:0,(zzz-ATU_RS17440:0,zzz-BMY00_RS16800:0):0):0):0):0):0):0);
This replaces all the text between CleanAgrobacterium and _gene by ZZZ:
sed -E 's/(CleanAgrobacterium).*(_gene)/\1ZZZ\2/g' A.nwk
But the result is probably not what you would expect. I assume you want ungreedy matching of the text in-between (.*). For that, use perl:
perl -pe 's/(CleanAgrobacterium).*(_gene)/\1ZZZ\2/g' A.nwk
This might work for you (GNU sed):
sed -E 's/CleanAgrobacterium/&\n/g
s/gene-/\n&/g
s/(CleanAgrobacterium)\n[^\n]*\n(gene-)/\1ZZZ\2/g
s/\n//g' file
Append a newline to CleanAgrobacterium and prepend a newline to gene-.
Replace everything that is not a newline between the desired words.
Remove any introduced newlines.
N.B. This does not cater for matches on separate lines. In this case use something like:
sed -E 'H;1h;$!d;x
s/\n/###NEWLINE%%%/g
s/CleanAgrobacterium/&\n/g
s/gene-/\n&/g
s/(CleanAgrobacterium)\n[^\n]*\n(gene-)/\1ZZZ\2/g
s/\n//g
s/###NEWLINE%%%/\n/g' file
This slurps the whole file into memory, replaces all newlines by a unique string, then applies the first solution and tidies up afterwards.
try this:
sed 's/gene-/gene-\n/g' < A.nwk | sed 's/CleanAgrobacterium.*gene-/CleanAgrobacteriumZZZgene-/g' | sed -n ':a;N;$!ba;s/\n//g;p' > output.txt
works with GNU Sed 4.9 using Linux .
Yet another sed solution. It replaces all THIS with THAT (with your samples in reality but more readable here) between START and END in "fooSTARTTHISENDfooSTARTTHISENDfoo"
and outputs "fooSTARTTHATENDfooSTARTTHATENDfoo".
$ sed -E 's/(CleanAgrobacterium)([^_]|_(_|g(_|e(_|n_)))*([^_g]|g([^_e]|e([^_n]|n[^_e]))))*(_(_|g(_|e(_|n_)))*(g(e?|en))?)?(_gene)/\1ZZZ\2/g' file
The solution is non-greedy and relies on regex capturing groups (CleanAgrobacterium)and (_gene), their backreferences \1, \2 and what is between them
([^_]|_(_|g(_|e(_|n_)))*([^_g]|g([^_e]|e([^_n]|n[^_e]))))*(_(_|g(_|e(_|n_)))*(g(e?|en))?)?
(not _gene) getting replaced by ZZZ. You could use it in, for example; GNU awk's gensub() which supports backreferencing:
$ gawk '{print gensub(/(CleanAgrobacterium)([^_]|_(_|g(_|e(_|n_)))*([^_g]|g([^_e]|e([^_n]|n[^_e]))))*(_(_|g(_|e(_|n_)))*(g(e?|en))?)?(_gene)/,"\\1ZZZ\\2","g",$0)}' file

Git URL - Pull out substring via Shell (awk & sed)?

I have got the following URL:
https://xcg5847#git.rz.bankenit.de/scm/smat/sma-mes-test.git
I need to pull out smat-mes-test and smat:
git config --local remote.origin.url|sed -n 's#.*/\([^.]*\)\.git#\1#p'
sma-mes-test
This works. But I also need the project name, which is smat
I am not really familiar to complex regex and sed, I was able to find the other command in another post here. Does anyone know how I am able to extract the smat value here?
With your shown samples please try following awk code. Simple explanation would be, setting field separator(s) as / and .git for all the lines and in main program printing 3rd last and 3nd last elements from the line.
your_git_command | awk -F'/|\\.git' '{print $(NF-2),$(NF-1)}'
Your sed is pretty close. You can just extend it to capture 2 values and print them:
git config --local remote.origin.url |
sed -E 's~.*/([^/]+)/([^.]+)\.git$~\1 \2~'
smat sma-mes-test
If you want to populate shell variable using these 2 values then use this read command in bash:
read v1 v2 < <(git config --local remote.origin.url |
sed -E 's~.*/([^/]+)/([^.]+)\.git$~\1 \2~')
# check variable values
declare -p v1 v2
declare -- v1="smat"
declare -- v2="sma-mes-test"
Using sed
$ sed -E 's#.*/([^/]*)/#\1 #' input_file
smat sma-mes-test.git
I would harness GNU AWK for this task following way, let file.txt content be
https://xcg5847#git.rz.bankenit.de/scm/smat/sma-mes-test.git
then
awk 'BEGIN{FS="/"}{sub(/\.git$/,"",$NF);print $(NF-1),$NF}' file.txt
gives output
smat sma-mes-test
Explanation: I instruct GNU AWK that field separator is slash character, then I replace .git (observe that . is escaped to mean literal dot) adjacent to end ($) in last field ($NF), then I print 2nd from end field ($(NF-1)) and last field ($NF), which are sheared by space, which is default output field separator, if you wish to use other character for that purpose set OFS (output field separator) in BEGIN. If you want to know more about NF then read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
(tested in gawk 4.2.1)
Why not sed 's!.*/\(.*/.*\)!\1!'?
string=$(config --local remote.origin.url | tail -c -21)
var1=$(echo "${string}" | cut -d'/' -f1)
var2=$(echo "${string}" | cut -d'/' -f2 | sed s'#\.git##')
If you have multiple urls with variable lengths, this will not work, but if you only have the one, it will.
var1=smat
var2=sma-mes-test.git
If I did have something variable, personally I would replace all of the forward slashes with carriage returns, throw them into a file, and then export the last and second last lines with ed, which would give me the two last segments of the url.
Regular expressions literally give me a migraine headache, but as long as I can get everything on its' own line, I can quite easily bypass the need for them entirely.

print dir path after matching its name with wildcards

Have been stuck with this little puzzle. Thank you in advance for helping.
I have a directory path and would like print its path after match.
like
echo /Users/user/Documents/terraform-shared-infra/services/history_book_test | awk -F "terraform-|tfRepo-" '{print $(NF)}'
echo /Users/user/Documents/tfRepo-shared-infra/services/history_book_test | awk -F "terraform-|tfRepo-" '{print $(NF)}'
output:
shared-infra/services/history_book_test
shared-infra/services/history_book_test
When i try to add wildcard in terraform-* it doesn't work.
I would like to print path after match with terraform-* or tfRepo*.
Like:
services/history_book_test
services/history_book_test/../.. so on.
with sed:
echo /Users/user/Documents/terraform-shared-infra/services/history_book_test | sed 's|.*terraform.\([^/]*\)/.*|\1|'
shared-infra
Have tried different ways with awk and grep but no luck. Any leads or idea that I can try. Please.
Thank you.
You're confusing regular expressions with globbing patterns. Both have wildcards and look similar but have quite different meanings and uses. regexps are used by text processing tools like grep, sed, and awk to match text in input strings while globbing patterns are used by shells to match file/directory names. For example, foo* in a regexp means fo followed by zero or more additional os while foo* in a globbing pattern means foo followed by zero or more other characters (which in a regexp would be foo.*). So never just say "wildcard", say "regexp wildcard" or "globbing wildcard" for clarity.
This might be what you're trying to do, using a sed that has a -E arg to enable EREs, e.g. GNU or BSD sed:
$ sed -E 's:.*/(terraform|tfRepo)-[^/]*/::' file
services/history_book_test
services/history_book_test
or using any awk:
$ awk '{sub(".*/(terraform|tfRepo)-[^/]*/","")} 1' file
services/history_book_test
services/history_book_test
Regarding your attempt with sed sed 's|.*terraform.\([^/]*\)/.*|\1|' - if you're going to use a char other than / for the delimiters, don't use a char like | that's a regexp or backreference metachar as at best that obfuscates your code, pick some char that's always literal instead, e.g. :.

Replace character except between pattern using grep -o or sed (or others)

In the following file I want to replace all the ; by , with the exception that, when there is a string (delimited with two "), it should not replace the ; inside it.
Example:
Input
A;B;C;D
5cc0714b9b69581f14f6427f;5cc0714b9b69581f14f6428e;1;"5cc0714b9b69581f14f6427f;16a4fba8d13";xpto;
5cc0723b9b69581f14f64285;5cc0723b9b69581f14f64294;2;"5cc0723b9b69581f14f64285;16a4fbe3855";xpto;
5cc072579b69581f14f6428a;5cc072579b69581f14f64299;3;"5cc072579b69581f14f6428a;16a4fbea632";xpto;
output
A,B,C,D
5cc0714b9b69581f14f6427f,5cc0714b9b69581f14f6428e,1,"5cc0714b9b69581f14f6427f;16a4fba8d13",xpto,
5cc0723b9b69581f14f64285,5cc0723b9b69581f14f64294,2,"5cc0723b9b69581f14f64285;16a4fbe3855",xpto,
5cc072579b69581f14f6428a,5cc072579b69581f14f64299,3,"5cc072579b69581f14f6428a;16a4fbea632",xpto,
For sed I have: sed 's/;/,/g' input.txt > output.txt but this would replace everything.
The regex for the " delimited string: \".*;.*\" .
(A regex for hexadecimal would be better -- something like: [0-9a-fA-F]+)
My problem is combining it all to make a grep -o / sed that replaces everything except for that pattern.
The file size is in the order of two digit Gb (max 99Gb), so performance is important. Relevant.
Any ideas are appreciated.
sed is for doing simple s/old/new on individual strings. grep is for doing g/re/p. You're not trying to do either of those tasks so you shouldn't be considering either of those tools. That leaves the other standard UNIX tool for manipulating text - awk.
You have a ;-separated CSV that you want to make ,-separated. That's simply:
$ awk -v FPAT='[^;]*|"[^"]+"' -v OFS=',' '{$1=$1}1' file
A,B,C,D
5cc0714b9b69581f14f6427f,5cc0714b9b69581f14f6428e,1,"5cc0714b9b69581f14f6427f;16a4fba8d13",xpto,
5cc0723b9b69581f14f64285,5cc0723b9b69581f14f64294,2,"5cc0723b9b69581f14f64285;16a4fbe3855",xpto,
5cc072579b69581f14f6428a,5cc072579b69581f14f64299,3,"5cc072579b69581f14f6428a;16a4fbea632",xpto,
The above uses GNU awk for FPAT. See What's the most robust way to efficiently parse CSV using awk? for more details on parsing CSVs with awk.
If I get correctly your requirements, one option would be to make a three pass thing.
From your comment about hex, I'll consider nothing like # will come in the input so you can do (using GNU sed) :
sed -E 's/("[^"]+);([^"]+")/\1#\2/g' original > transformed
sed -i 's/;/,/g' transformed
sed -i 's/#/;/g' transformed
The idea being to replace the ; when within quotes by something else and write it to a new file, then replace all ; by , and then set back the ; in place within the same file (-i flag of sed).
The three pass can be combined in a single command with
sed -E 's/("[^"]+);([^"]+")/\1#\2/g;s/;/,/g;s/#/;/g' original > transformed
That said, there's probably a bunch of csv parser witch already handle quoted fields that you can probably use in the final use case as I bet this is just an intermediary step for something else later in the chain.
From Ed Morton's comment: if you do it in one pass, you can use \n as replacement separator as there can't be a newline in the text considered line by line.
This might work for you (GNU sed):
sed -E ':a;s/^([^"]*("[^"]*"[^"]*)*"[^";]*);/\1\n/;ta;y/;/,/;y/\n/;/' file
Replace ;'s inside double quotes with newlines, transpose ;'s to ,'s and then transpose newlines to ;'s.

Extract directory path from file path

I have a requirement for getting part of the string which should be read from end of the string. Like below:
a/b/c/d.txt
Now I want to get the output as /a/b/c/ – basically the path of the file. For this, I want the string to be read from the end and where the first / appears, it prints till the first text of the string.
If you have single variable then how about parameter expansion.
Let's say we have following A variable with your provided value.
echo $A
a/b/c/d.txt
Then following could provide you path name for files using parameter expansion.
echo ${A%/*}/
a/b/c/
echo a/b/c/d.txt | awk -F/ '{$NF=""}1' OFS=/
a/b/c/
This should be done with parameter expansion like in #RavinderSingh13's very good answer or dirname as #aragaer suggests, but if you are gung-ho about an awk solution, you could do something like:
echo "a/b/c/d.txt" | awk -F"/" '{ for (f=1;f<NF;f++){printf "%s/", $f}; printf "\n"}'
But that's horrible overkill when you can just echo $(dirname "a/b/c/d.txt")/
Simple sed approach:
echo "a/b/c/d.txt" | sed 's~/[^/]*$~/~'
a/b/c/