Can't replace string to multi-lined string with sed - awk

I'm trying to replace a fixed parse ("replaceMe") in a text with multi-lined text with sed.
My bash script goes as follows:
content=$(awk'{print $5}' < data.txt | sort | uniq)
target=$(cat install.sh)
text=$(sed "s/replaceMe/$content/" <<< "$target")
echo "${text}"
If content contains one line only, replacing works, but if it contains sevrel lines I get:
sed:... untarminated `s' command
I read about "fetching" multi-lined content, but I couldn't find something about placing multi lined string

You'll have more problems than that depending on the contents of data.txt since sed doesn't understand literal strings (see Is it possible to escape regex metacharacters reliably with sed). Just use awk which does:
text="$( awk -v old='replaceMe' '
NR==FNR {
if ( !seen[$5]++ ) {
new = (NR>1 ? new ORS : "") $5
}
next
}
s = index($0,old) { $0 = substr($0,1,s-1) new substr($0,s+length(old)) }
{ print }
' data.txt install.sh )"

Related

How to use variable with or operator | in awk

The following works:
awk '
NR==FNR { sub(/\.(png|txt|jpg|json)$/,""); a[$0]; next }
{ f=$0; sub(/\.(png|txt|jpg|json)$/,"", f) }
!(f in a)
' comp1.txt comp2.txt > result.txt
and now I want it to take the file endings that shall be ignored in the comparison as a variable, but cannot get it to work. My attempt below just compares without ignoring any file endings. I tried with $ and without, with () and without, escaping the |, but so far without success. What is the correct solution?
fileEndingsToIgnore="png|txt|jpg|json"
awk -v fileEndingsToIgnore="${fileEndingsToIgnore}" '
NR==FNR { sub(/\.(fileEndingsToIgnore)$/,""); a[$0]; next }
{ f=$0; sub(/\.(fileEndingsToIgnore)$/,"", f) }
!(f in a)
' comp1.txt comp2.txt > result.txt
GNU AWK doesn't allow you to use variable inside regular expression literal, you might use string variable with ~ and !~ and many String functions, however in such case you need to double escaping as explained in Using Dynamic Regexps. Consider following example, let say you want to output only .txt and .json filenames without extension and you have file.txt with content as follows
file1.txt
file2.bmp
file3.json
then
awk 'BEGIN{s="\\.(txt|json)$"}sub(s,""){print}' file.txt
gives output
file1
file3
Observe \\ rather than \.
(tested in GNU Awk 5.0.1)
One workaround is to dynamically build the regex and store it in a variable, then use the variable in the sub() call.
Sample input file:
$ cat test.file
abc.txt
def.jpg
ghi.exe
jkl.dat
123.json
456.ini
789.pngX
000.png
111.dat
One awk idea:
fileEndingsToIgnore="png|txt|jpg|json"
awk -v fileEndingsToIgnore="${fileEndingsToIgnore}" '
BEGIN { regex="\\.(" fileEndingsToIgnore ")$" } # need to escape the escape char, ie, "\\"
{ out=$1
sub(regex,"",out)
printf "%s => %s\n",$0,out
}
' test.file
This generates:
abc.txt => abc
def.jpg => def
ghi.exe => ghi.exe
jkl.dat => jkl.dat
123.json => 123
456.ini => 456.ini
789.pngX => 789.pngX
000.png => 000
111.dat => 111.dat
Applying this to OP's current code:
fileEndingsToIgnore="png|txt|jpg|json"
awk -v fileEndingsToIgnore="${fileEndingsToIgnore}" '
BEGIN { regex="\\.(" fileEndingsToIgnore ")$" }
NR==FNR { sub(regex,""); a[$0]; next }
{ f=$0; sub(regex,"", f) }
!(f in a)
' comp1.txt comp2.txt > result.txt
I think this should be generic enough :
"-v FS=..." is the list of file extensions to exclude, case sensitive:
mawk -v FS='mp[34]|txt|sh|awk' 'BEGIN { _^= FS = "[.]("FS")$"
split("",__) } FNR==NR ? __[$_] : NF<=($_ in __)' file file

Grabbing value from piped file contents

Let's say I have the following file:
credentials:
[default]
key_id = AKIAGHJQTOP
secret_key = alcsjkf
[default2]
key_id = AKIADGHNKVP
secret_key = njprmls
I want to grab the value of [default] key_id. I'm trying to do it with awk command but I'm open to any other way if it's more efficient and easier. Instead of passing a file name to awk, I want to pass the file contents from environmental variable FILE_CONTENTS
I tried the following:
$export VAR=$(echo "$FILE_CONTENTS" | awk '/credentials.default.key_id/ {print $2}')
But it didn't work. Any help is appreciated.
You can use awk like this:
cat srch.awk
BEGIN { FS = " *= *" }
{ sub(/^[[:blank:]]+/, "") }
/:[[:blank:]]*$/ {
sub(/:[[:blank:]]*$/, "")
k = $1
}
/^[[:blank:]]*\[/ {
s = k "." $1
}
NF == 2 {
map[s "." $1] = $2
}
key in map {
print map[key]
exit
}
# then use it as
echo "$FILE_CONTENTS" |
awk -v key='credentials.[default].key_id' -f srch.awk
AKIAGHJQTOP
# or else
echo "$FILE_CONTENTS" |
awk -v key='credentials.[default].secret_key' -f srch.awk
alcsjkf
With your shown samples, please try following awk code. Written and tested in GNU awk.
awk -v RS='(^|\\n)credentials:\\n[[:space:]]+\\[default\\]\\n[[:space:]]+key_id = \\S+' '
RT && num=split(RT,arr," key_id = "){
print arr[num]
}
' Input_file
Here is the Online demo for used regex(its bit changed from regex used in awk code as escaping is done in program not in site).
Assumptions:
no spaces between labels and :
no spaces between [ the stanza name and ]
all lines with attribute/value pairs have exactly 3 space-delimited fields as shown (ie, attr = value; value has no embedded spaces)
the contents of OP's variable (FILE_CONTENTS) is an exact copy (data and format) of the sample file provided by OP
NOTE: if the input file format can differ from these assumptions then additional code must be added to address said differences; as mentioned in comments ... writing your own parser is doable but you need to insure you address all possible format variations
One awk idea:
awk -v label='credentials' -v stanza='default' -v attr='key_id' '
/:/ { f1=0; if ($0 ~ label ":") f1=1 }
f1 && /[][]/ { f2=0; if ($0 ~ "[" stanza "]") f2=1 }
f1 && f2 && /=/ { if ($1 == attr) { print $3; f1=f2=0 } }
'
This generates:
AKIAGHJQTOP
$ awk 'f{print $3; exit} /\[default]/{f=1}' <<<"$FILE_CONTENTS"
AKIAGHJQTOP
If that's not all you need then edit your question to provide more truly realistic sample input/output including cases where the above doesn't work.
open to any other way if it's more efficient and easier
I suggest taking look at python's configparser, which is part of standard library. Let FILE_CONTENTS environment variable be holding
credentials:
[default]
key_id = AKIAGHJQTOP
secret_key = alcsjkf
[default2]
key_id = AKIADGHNKVP
secret_key = njprmls
then create file getkeyid.py with content as follows
import configparser
import os
config = configparser.ConfigParser()
config.read_string(os.environ["FILE_CONTENTS"].replace("credentials","#credentials",1))
print(config["default"]["key_id"])
and do
python3 getkeyid.py
to get output
AKIAGHJQTOP
Explanation: I retrieve string from environmental variable and replace credentials with #credentials at most 1 time in order to comment that line (otherwise parser will fail), then parse it and retrieve value corresponding to desired key.

Edit only specific lines when I find special character with awk

I have this kind of file :
>AX-89948491-minus
CTAACACATTTAGTAGATT
>AX-89940152-plus
cgtcattcagggcaggtggggcaaaA
>AX-89922107-plus
TTATAACTTGTGTATGCTCTCAGGCT
When the lines start by ">" and include "minus" , I need to reverse (rev) and translate (tr) the next following lines. I should get :
>AX-89948491-minus
AATCTACTAAATGTGTTAG
>AX-89940152-plus
cgtcattcagggcaggtggggcaaaA
>AX-89922107-plus
TTATAACTTGTGTATGCTCTCAGGCT
I would like to go with awk. I tried that but it does not work..
awk '{if(NR%2==1~/"plus"/){print;getline;print} else if (NR%2==1~/"minus"/){system("echo "$0" | rev | tr ATCGatcg TAGCtagc")} else {print;getline;print}}' file
Any help?
This gnu-awk should work for you:
awk '
p {
cmd = "rev <<< \047" $0 "\047 | tr ATCGatcg TAGCtagc"
if ((cmd |& getline var) > 0)
$0 = var
}
{
p = /^>/ && /-minus/
} 1' file
>AX-89948491-minus
AATCTACTAAATGTGTTAG
>AX-89940152-plus
cgtcattcagggcaggtggggcaaaA
>AX-89922107-plus
TTATAACTTGTGTATGCTCTCAGGCT
Awk is a tool to manipulate text, not a tool to sequence calls to other tools. The latter is what a shell is for. There are times when you need to call other tools from awk but not when it's simple text manipulation like reversing and translating characters in a string as you want to do.
Using any awk in any shell on every Unix box without spawning a subshell once per target input line to call other Unix tools (including the non-POSIX-defined rev which won't exist on some Unix boxes):
$ cat tst.awk
BEGIN {
split("ATCGatcg TAGCtagc",tmp)
for (i=1; i<=length(tmp[1]); i++) {
tr[substr(tmp[1],i,1)] = substr(tmp[2],i,1)
}
}
f {
out = ""
for (i=1; i<=length($0); i++) {
char = substr($0,i,1)
out = (char in tr ? tr[char] : char) out
}
$0 = out
f = 0
}
/^>.*minus/ { f=1 }
{ print }
$ awk -f tst.awk file
>AX-89948491-minus
AATCTACTAAATGTGTTAG
>AX-89940152-plus
cgtcattcagggcaggtggggcaaaA
>AX-89922107-plus
TTATAACTTGTGTATGCTCTCAGGCT
I'd use perl, as it has builtin reverse and tr functions:
perl -lpe '
if (/^>/) {$rev = /minus/; next}
if ($rev) {$_ = reverse; tr/ATCGatcg/TAGCtagc/}
' file
>AX-89948491-minus
AATCTACTAAATGTGTTAG
>AX-89940152-plus
cgtcattcagggcaggtggggcaaaA
>AX-89922107-plus
TTATAACTTGTGTATGCTCTCAGGCT

sed or awk with variable and special charachters

I want to use awk or sed to substitute 1 line in my file:
my file content is:
server.modules += ( "mod_redirect" )
$SERVER["socket"] == ":8080" {
$HTTP["host"] =~ "(.*)" {
url.redirect = ( "^/(.*)" => "https://someurl.com/unauthorised" )
}
}
I want to change the line containing url.redirect
The new line is in variable and containing some special charachter will be somthing like this url.redirect = ( "^/(.*)" => "http://newurl.com/newpath" )
so I used the following sed comand:
sed "/url\.redirect =/s/.*/$newline/" 10-redirect.conf
But I got error related to the special characters inside the newline variable.
the newline is an argument of my shell function so I can not change it and add some skip characters inside.
How to use variables with special charachters in sed or awk?
With GNU sed and c command (which replaces the matched lines with the string provided). If there are spaces at the start of string, prefix \ to preserve them
sed '/url\.redirect =/c\'"$newline"
However, c command will still allow escape sequences, for example:
$ s=' rat\tdog\nwolf'
$ seq 3 | sed '2c\'"$s"
1
rat dog
wolf
3
To add contents literally and robustly, use r command
echo "$newline" | sed -e '/url\.redirect =/ {r /dev/stdin' -e 'd}'
Here's r command in action
$ s=' rat\tdog\nwolf'
$ echo "$s" | sed -e '2 {r /dev/stdin' -e 'd}' <(seq 3)
1
rat\tdog\nwolf
3
Could you please try following. This should put same spaces in front of new value what were present before.
newline='url.redirect = ( "^/(.*)" => "https://example.com/authorised" )'
awk -v line="$newline" '
/url.redirect =/{
match($0,/^ +/)
print substr($0,RSTART,RLENGTH) line
next
}
1
' Input_file
You are getting error in sed because it is using a regex and your replacement string may contain meta-characters such as & or / (delimiters) etc.
This awk would be safer to use due to non-regex approach:
newline='url.redirect = ( "^/(.*)" => "https://example.com/authorised" )'
awk -i inplace -v line="$newline" 'index($0, "url.redirect =") {
sub(/[^[:blank:]].*/, "")
$0 = $0 line
} 1' file
server.modules += ( "mod_redirect" )
$SERVER["socket"] == ":8080" {
$HTTP["host"] =~ "(.*)" {
url.redirect = ( "^/(.*)" => "https://example.com/authorised" )
}
}
Note that using ENVIRON would allow all the shell special characters to
awk:
export newline
awk -i inplace 'index($0, "url.redirect =") {
sub(/[^[:blank:]].*/, "")
$0 = $0 ENVIRON["newline"]
} 1' file

awk: non-terminated string

I'm trying to run the command below, and its giving me the error. Thoughts on how to fix? I would rather have this be a one line command than a script.
grep "id\": \"http://room.event.assist.com/event/room/event/" failed_events.txt |
head -n1217 |
awk -F/ ' { print $7 } ' |
awk -F\" ' { print "url \= \"http\:\/\/room\.event\.assist\.com\/event\/room\/event\/'{ print $1 }'\?schema\=1\.3\.0\&form\=json\&pretty\=true\&token\=582EVTY78-03iBkTAf0JAhwOBx\&account\=room_event\"" } '
awk: non-terminated string url = "ht... at source line 1
context is
>>> <<<
awk: giving up
source line number 2
The line below exports out a single column of ID's:
grep "id\": \"http://room.event.assist.com/event/room/event/" failed_events.txt |
head -n1217 |
awk -F/ ' { print $7 } '
156512145
898545774
454658748
898432413
I'm looking to get the ID's above into a string like so:
" url = "string...'ID'string"
take a look what you have in last awk :
awk -F\"
' #single start here
{ print " #double starts for print, no ends
url \= \"http\:\/\/room\.event\.assist\.com\/event\/room\/event\/
' #single ends here???
{ print $1 }'..... #single again??? ...
(rest codes)
and you want to print exact {print } out? i don't think so. why you were nesting print ?
Most of the elements of your pipe can be expressed right inside awk.
I can't tell exactly what you want to do with the last awk script, but here are some points:
Your "grep" is really just looking for a string of text, not a
regexp.
You can save time and simplify things if you use awk's
index() function instead of a RE. Output formats are almost always
best handled using printf().
Since you haven't provided your input data, I can't test this code, so you'll need to adapt it if it doesn't work. But here goes:
awk -F/ '
BEGIN {
string="id\": \"http://room.event.assist.com/event/room/event/";
fmt="url = http://example.com/event/room/event/%s?schema=whatever\n";
}
count == 1217 { nextfile; }
index($0, string) {
split($7, a, "\"");
printf(fmt, a[0]);
count++;
}' failed_events.txt
If you like, you can use awk's -v option to pass in the string variable from a shell script calling this awk script. Or if this is a stand-alone awk script (using #! shebang), you could refer to command line options with ARGV.