Removing delimited block of code between patterns using bash (regex, sed) - awk

I have a file with visibility declarations, spanning over one or more lines, that I want removed only if they are in a test block.
i.e. input.txt:
test(
srcs = [
"test1",
],
visibility = [
"common",
],
deps = [ "deps" ],
)
test(
srcs = [
"test2",
],
visibility = [ "common" ],
)
i.e. output:
test(
srcs = [
"test1",
],
deps = [ "deps" ],
)
test(
srcs = [
"test2",
],
)
The visibility lines could be inside other blocks i.e. etc(...), in which case they should not be removed. i.e.:
etc(
src = [
"etc",
],
# should not be removed because it's not inside a test(...) block
visibility = [
"common",
],
)
This is what I have tried, however, this only matches visibility blocks spanning over a single line:
#!/bin/bash
#remove_lines.sh
remove_visibility_lines () {
start_pattern="test"
end_pattern=")"
pattern_to_remove="visibility = \[.*\],"
sed "/${start_pattern}/,/${end_pattern}/{/${pattern_to_remove}/d}" "$1"
}
remove_visibility_lines $1
$ ./remove_lines.sh input.txt
I've tried several ways to get this to remove visibility spanning over multiple block, i.e. (.*?) and (\_.*), but I can't seem to get it to work.
Please help?
Question is similar to:
Using sed to delete all lines between two matching patterns , however, in my case I have patterns nested inside patterns. I.e.: you only look inside the test(...) block, and only inside those blocks you remove the visibility = [...], blocks.

With awk, assuming no nested () characters:
awk '/^test\(/{f=1} $0==")"{f=0}
f && /visibility = \[/{v=1} !v; /],/{v=0}' ip.txt
/^test\(/{f=1} set flag f if a line starts with test(
$0==")"{f=0} clear flag f if a line contains ) only
f && /visibility = \[/{v=1} set flag v if f is also set and a line contains visibility = [
!v; to print input lines only if flag v is not set
/],/{v=0} clear flag v if line contains ],
To pass start and end strings:
$ cat script.awk
index($0, start) == 1 { f = 1 }
$0 == end { f = 0 }
f && /visibility = \[/ { v = 1 }
! v { print }
/],/ { v = 0 }
$ awk -v start='test(' -v end=')' -f script.awk ip.txt

Related

Bash - Update a value in a file using sed command

I am trying to update the value of key1[count] to 2 in the following file.
#file1
//some additional lines
key1 = { var1 = "1", var2 = "12", "count" = 1 }
key2 = { var1 = "32", var2 = "23", "count" = 1 }
key3 = { var1 = "22", var2 = "32", "count" = 3 }
key4 = { var1 = "32", var2 = "12", "count" = 3 }
//some additional lines
I had tried using awk - awk -i inplace '{ if ( $1 == "key1" ) $12 = 2 ; print $0 }' file1.
However, this only works with awk version 4.1.0 and higher.
With your shown samples, please try following awk code.
awk -v newVal="2" '
/^key1/ && match($0,/"count"[[:space:]]+=[[:space:]]+[0-9]+/){
value=substr($0,RSTART,RLENGTH)
sub(/[0-9]+$/,"",value)
$0=substr($0,1,RSTART-1) value newVal substr($0,RSTART+RLENGTH)
}
1
' Input_file
Explanation: Checking condition if line starts from key1 and using match function to match regex "count"[[:space:]]+=[[:space:]]+[0-9]+ to get count string with its value here. Then with matched sub-string creating value(variable), where substituting digits at last of value to NULL. Then re-assigning value of current line to before match value, matched value, new value variable and rest of the line to make it as per OP's requirement. Finally by 1 printing current line(s).
NOTE: Above will only print the output on terminal, once you are Happy with shown results append > temp && mv temp Input_file to above code.
You can use sed with the -r option for regular expression.
For your example, you can use this command.
sed -i -r 's/(key1.*"count" = )([0-9]*)(.*)/\12\3/g' <file>
More information:
Command template:
sed -i -r 's/(<bef>)(<tar>)(<aft>)/\1<new value>\3/g' <file>
Where
<bef> is a regex that targets /key1... "count" = /
<tar> is a regex that targets /1/
<aft> is a regex that targets / }/
Together, /<bef><tar><aft>/ will expand to become:
/key1 = { var1 = "1", var2 = "12", "count" = 1 }/
Since you want to replace "1" with "2"
<new value> is 2

sed command to delete block of lines (above & below) matching a pattern (sed) - Reolved using Python

Refer below link
Python script to delete json objects from json file
Since I am new to "sed",
I have a file (my_file.json) has contents as below; and I need to delete all the lines starting from "{" up to "},".
[
{
"use":"abcd",
"contact":"xyz",
"name":"my_script.py",
"time":"11:22:33"
},
{
"use":"abcd"
"contact":"xyz",
"name":"some_other_script.py",
"time":"11:22:33"
},
{
"use":"apqwkndf",
"contact":"xyz",
"name":"my_script.py",
"time":"11:22:33"
},
{
"use":"kjdshfjkasd",
"contact":"xyz",
"name":"my_script.py",
"time":"11:22:33"
}
]
I used following command ; it helps me to delete the lower portion of the block i.e. after the pattern till "}," and the line that has pattern and a line above it.
sed -i '/my_script.py"/I,+2 d;$!N;/my_script.py"/!P;D' my_file.json
and output comes out as below
[
{
"use":"abcd",
{
"use":"abcd"
"contact":"xyz",
"name":"some_other_script.py",
"time":"11:22:33"
},
{
"use":"apqwkndf",
{
"use":"kjdshfjkasd",
]
Expected output is; please note since it has only one block remaining so I need to remove "," as well.
[
{
"use":"abcd"
"contact":"xyz",
"name":"some_other_script.py",
"time":"11:22:33"
}
]
How can I solve this?
This might work for you (GNU sed):
sed '/{/{:a;N;/}/!ba;/my_script\.py/d}' file |
sed 'N;/]/s/},/}/;P;D'
This removes unwanted list elements then fixes up the last list delimiter.
An alternative is to store the edited file in memory and then fix up the last delimiter:
sed '/{/{:a;N;/}/!ba;/my_script\.py/d};H;$!d;x;s/.//;s/\(.*}\),\(\s*]\)/\1\2/' file
Could you please try following in a single awk, fair warning Input_files like json should be edited or read by jq like tools, since OP is saying he is NOT allowed to use that, so adding this. Its completely written by shown samples only.
awk '
/{/{
found=1
if(noPrint==""){
actualVal=(actualVal?actualVal ORS:"")val
}
val=noPrint=""
}
found && /"name":"my_script.py"/{
noPrint=1
}
{
val=(val?val ORS:"")$0
}
END{
if(noPrint==""){
actualVal=(actualVal?actualVal ORS:"")val
}
sub(/},$/,"}\n]",actualVal)
print actualVal
}
' Input_file
The usual way such problems are approached:
Package input. Typically convert to one contained information per line.
Filter the input.
Output.
The following script:
cat <<EOF |
[
{
"use":"abcd",
"contact":"xyz",
"name":"my_script.py",
"time":"11:22:33"
},
{
"use":"abcd",
"contact":"xyz",
"name":"some_other_script.py",
"time":"11:22:33"
},
{
"use":"apqwkndf",
"contact":"xyz",
"name":"my_script.py",
"time":"11:22:33"
},
{
"use":"kjdshfjkasd",
"contact":"xyz",
"name":"my_script.py",
"time":"11:22:33"
}
]
EOF
sed -n '
b noterror ; : error {
s/.*/ERROR: &/
q1
} ; : noterror
# remove [ ]
1d;$d;
# first line should be open braces
/{/!{b error}
# read up until closing brackets
# Note escaping is not handled
: again {
N;
$b error
/}/!b again
}
s/}.*/}/;
s/\n/ /g;
# -- one information per line --
p
' | awk '
# filter that myscript.py with a regex
!/"name" *: *"my_script.py"/{
# output with those [ ]
printf "[\n"
print # print the line
printf "]\n"
}'
outputs:
[
{ "use":"abcd", "contact":"xyz", "name":"some_other_script.py", "time":"11:22:33" }
]
You may want to restore newlines by placing a special character in place of a newline and then replacing that character back for a newline, or using a different delimiter for awk
to.

How to sort file contents by paragraph

I have a text file containing below lines.
Number: "472"
displayname: "jack holi"
Number: "392"
displayname: "david"
Number: "376"
displayname: "joly"
Number: "481"
displayname: "john doe"
....
How to sort them in ascending order by number and have output like below
Number: "376"
displayname: "joly"
Number: "392"
displayname: "david"
Number: "472"
displayname: "jack holi"
Number: "481"
displayname: "john doe"
If you are still looking for an (due to the array sorting GNU) awk solution, you can use this script:
script.awk
BEGIN { ORS= RS="\n\n"
FS="[\n:]"
PROCINFO["sorted_in"] = "#ind_num_asc"
}
{ gsub( /"/, "", $2)
so[ $2 + 0 ] = $0 }
END { for( k in so ) print so[k] }
Use it like this awk -f script.awk yourfile .
Explanation
Record separator RS is set to two newlines, so that number and displayname become members of the same record
Fiedld separator FS is set to newlline or : so that we get the number, displayname and their values as fields $1,$3,$2,$4 respectively
the record is put into so under the key in $2, so is sorted by (number#ind_num_asc)
only in the end everything is printed
Perl to the rescue!
perl -e 'BEGIN { $/ = "" }
print for map $_->[1],
sort { $a->[0] <=> $b->[0] }
map [ /Number: "(\d+)"/, $_ ],
<>;' -- input.txt
The BEGIN block turns on paragraph mode, i.e. file is read by the diamond operator in paragraphs, i.e. blocks of texts separated by empty lines.
It uses Schwartzian Transform, i.e. it maps each block to a pair Number, block, then sorts the pairs by the numbers and maps them back to the blocks, now in correct order.
Here's a slightly different take... read two lines at a time from your input file with GNU Parallel and put them together on a single line, sort them, then split the lines up again:
parallel -L2 -ra input.txt echo | sort -n | perl -pe 's/" /"\n/; $_.="\n"'
For earlier versions of gawk that don't have PROCINFO for array scanning order, you can do:
awk 'function cmp(i1, v1, i2, v2)
{ return (i1-i2) }
BEGIN { ORS=RS="\n\n" }
{ s=$2
gsub(/"/, "", s)
arr[s]=$0 }
END {
asorti(arr, so, "cmp")
for (k in so)
print arr[so[k]]}' file

gsub for substituting translations not working

I have a dictionary dict with records separated by ":" and data fields by new lines, for example:
:one
1
:two
2
:three
3
:four
4
Now I want awk to substitute all occurrences of each record in the input
file, eg
onetwotwotwoone
two
threetwoone
four
My first awk script looked like this and works just fine:
BEGIN { RS = ":" ; FS = "\n"}
NR == FNR {
rep[$1] = $2
next
}
{
for (key in rep)
grub(key,rep[key])
print
}
giving me:
12221
2
321
4
Unfortunately another dict file contains some character used by regular expressions, so I have to substitute escape characters in my script. By moving key and rep[key] into a string (which can then be parsed for escape characters), the script will only substitute the second record in the dict. Why? And how to solve?
Here's the current second part of the script:
{
for (key in rep)
orig=key
trans=rep[key]
gsub(/[\]\[^$.*?+{}\\()|]/, "\\\\&", orig)
gsub(orig,trans)
print
}
All scripts are run by awk -f translate.awk dict input
Thanks in advance!
Your fundamental problem is using strings in regexp and backreference contexts when you don't want them and then trying to escape the metacharacters in your strings to disable the characters that you're enabling by using them in those contexts. If you want strings, use them in string contexts, that's all.
You won't want this:
gsub(regexp,backreference-enabled-string)
You want something more like this:
index(...,string) substr(string)
I think this is what you're trying to do:
$ cat tst.awk
BEGIN { FS = ":" }
NR == FNR {
if ( NR%2 ) {
key = $2
}
else {
rep[key] = $0
}
next
}
{
for ( key in rep ) {
head = ""
tail = $0
while ( start = index(tail,key) ) {
head = head substr(tail,1,start-1) rep[key]
tail = substr(tail,start+length(key))
}
$0 = head tail
}
print
}
$ awk -f tst.awk dict file
12221
2
321
4
Never mind for asking....
Just some missing parentheses...?!
{
for (key in rep)
{
orig=key
trans=rep[key]
gsub(/[\]\[^$.*?+{}\\()|]/, "\\\\&", orig)
gsub(orig,trans)
}
print
}
works like a charm.

Problem with context and property

Let's say I want to generate this output:
public String toString() {
return this.getFirstName() + "," + this.getLastName() + "," + this.getAge();
}
from the template below and a custom recursive build-markup function:
template-toString: {this.get<%property%>() <%either not context.build-markup/EOB [{+ "," +}][""]%> }
build-markup/vars template-toString [property] ["FirstName" "LastName" "Age"]
My problem is to avoid the last element to be concatenate with {+ "," +}
My idea was to use a context.build-markup with an EOB property (End Of Block) that would be set to true when last element is processed. Then I could use in template-toString above either not context.build-markup/EOB [{+ "," +}][""] to concatenate or not with {+ "," +} :
context.build-markup: context [
EOB: false
set 'build-markup func [
{Return markup text replacing <%tags%> with their evaluated results.}
content [string! file! url!]
/vars block-fields block-values
/quiet "Do not show errors in the output."
/local out eval value n max i
][
out: make string! 126
either not vars [
content: either string? content [copy content] [read content]
eval: func [val /local tmp] [
either error? set/any 'tmp try [do val] [
if not quiet [
tmp: disarm :tmp
append out reform ["***ERROR" tmp/id "in:" val]
]
] [
if not unset? get/any 'tmp [append out :tmp]
]
]
parse/all content [
any [
end break
| "<%" [copy value to "%>" 2 skip | copy value to end] (eval value)
| copy value [to "<%" | to end] (append out value)
]
]
][
n: length? block-fields
self/EOB: false
actions: copy []
repeat i n [
append actions compose/only [
;set in self 'EOB (i = n)
set in system/words (to-lit-word pick (block-fields) (i)) get pick (block-fields) (i)
]
]
append actions compose/only [
append out build-markup content
]
foreach :block-fields block-values actions
if any [(back tail out) = "^/" (back tail out) = " " (back tail out) = "," (back tail out) = ";" (back tail out) = "/" (back tail out) = "\"] [
remove back tail out
]
]
out
]
]
But my attempt failed (so I commented ;set in self 'EOB (i = n) because it doesn't work). How to correct the code to get what I want ?
I'm quite certain you could be achieving your goal in a cleaner way than this. Regardless, I can tell you why what you're doing isn't working!
Your n is the expression length? block-fields, and your repeat loop goes up to n. But block-fields contains the single parameter [property]! Hence, it loops from 1 to 1.
You presumably wanted to test against something enumerating over block-values (in this example a range from 1 to 3) and then handle it uniquely if the index reached 3. In other words, your set in self 'EOB expression needs to be part of your enumeration over block-values and NOT block-fields.
This would have given you the behavior you wanted:
n: length? block-values
i: 1
foreach :block-fields block-values compose/only [
set in self 'EOB equal? i n
do (actions)
++ i
]
This absolutely won't work:
append actions compose/only [
set in self 'EOB (i = n)
set in system/words (to-lit-word pick (block-fields) (i)) get pick (block-fields) (i)
]
...because you are dealing with a situation where i and n are both 1, for a single iteration of this loop. Which means (i = n) is true. So the meta-code you get for "actions" is this:
[
set in self 'EOB true
set in system/words 'property get pick [property] 1
]
Next you run the code with a superfluous composition (because there are no PAREN!s, you could just omit COMPOSE/ONLY):
append actions compose/only [
append out build-markup content
]
Which adds a line to your actions meta-code, obviously:
[
set in self 'EOB true
set in system/words 'property get pick [property] 1
append out build-markup content
]
As per usual I'll suggest you learn to use PROBE and PRINT to look and check your expectations at each phase. Rebol is good about dumping variables and such...
You seem to making something simple very complicated:
>> a: make object! [
[ b: false
[ set 'c func[i n] [b: i = n]
[ ]
>> a/b
== false
>> c 1 4
== false
>> a/b
== false
>> c 1 1
== true
>> a/b
== true