Awk-How remove pattern with brackets

Awk-How remove pattern with brackets - awk

I made a mistake, adding Ini file (Yes we're in 2022 :D) a section with errors
I added a line [End[edit=true]
How could remove this entire line using awk (I don't have any others choice 😕)
I don't understand how escape the [ in the AWK command line.
Could you please help me?
Thanks

I don't understand how escape the [ in the AWK command line.
If line is always literal [End[edit=true] then you do not need to, just request lines which are not that one following way, let file.ini content be
[someline=true]
[End[edit=true]
[another=true]
then
awk '$0!="[End[edit=true]"' file.ini
gives output
[someline=true]
[another=true]
Explanation: $0 denotes whole line, if it is not [End[edit=true] then it is printed.
(tested in GNU Awk 5.0.1)

A couple ideas where you escape the leading (left) brackets:
awk '/\[End\[edit=true]/ {next} 1' file
# or
awk '!/\[End\[edit=true]/' file
Once you've confirmed the results, and assuming you're using GNU awk, you can add -i inplace to update the file:
awk -i inplace '!/\[End\[edit=true]/' file

Related

awk command works, but not in openwrt's awk

Works here: 'awk.js.org/`
but not in openwrt's awk, which returns the error message:
awk: bad regex '^(server=|address=)[': Missing ']'
Hello everyone!
I'm trying to use an awk command I wrote which is:
'!/^(server=|address=)[/][[:alnum:]][[:alnum:]-.]+([/]|[/]#)$|^#|^\s*$/ {count++}; END {print count+0}'
Which counts invalid lines in a dns blocklist (oisd in this case):
Input would be eg:
server=/0--foodwarez.da.ru/anyaddress.1.1.1
serverspellerror=/0-000.store/
server=/0-24bpautomentes.hu/
server=/0-29.com/
server=/0-day.us/
server=/0.0.0remote.cryptopool.eu/
server=/0.0mail6.xmrminingpro.com/
server=/0.0xun.cryptopool.space/
Output for this should be "2" since there are two lines that don't match the criteria (correctly formed address, comments, or blank lines).
I've tried formatting the command every which way with [], but can't find anything that works. Does anyone have an idea what format/syntax/option needs adjusting?
Thanks!

To portably include - in a bracket expression it has to be the first or last character, otherwise it means a range, and \s is shorthand for [[:space:]] in only some awks. This will work in any POSIX awk:
$ awk '!/^(server=|address=)[/][[:alnum:]][[:alnum:].-]+([/]|[/]#)$|^#|^[[:space:]]*$/ {count++}; END {print count+0}' file
2
Per #tripleee's comment below if your awk is broken such that a / inside a bracket expression isn't treated as literal then you may need this instead:
$ awk '!/^(server=|address=)\/[[:alnum:]][[:alnum:].-]+(\/|\/#)$|^#|^[[:space:]]*$/ {count++}; END {print count+0}' file
2
but get a new awk, e.g. GNU awk, as who knows what other surprises the one you're using may have in store for you!

'!/^(server=|address=)[/][[:alnum:]][[:alnum:]-.]+([/]|[/]#)$|^#|^\s*$/ {count++}; END {print count+0}'
- has special meaning inside [ and ], it is used to denote range e.g. [A-Z] means uppercase ASCII letter, use \ escape sequence to make it literal dash, let file.txt content be
server=/0--foodwarez.da.ru/anyaddress.1.1.1
serverspellerror=/0-000.store/
server=/0-24bpautomentes.hu/
server=/0-29.com/
server=/0-day.us/
server=/0.0.0remote.cryptopool.eu/
server=/0.0mail6.xmrminingpro.com/
server=/0.0xun.cryptopool.space/
then
awk '!/^(server=|address=)[/][[:alnum:]][[:alnum:]\-.]+([/]|[/]#)$|^#|^\s*$/ {count++}; END {print count+0}' file.txt
gives output
2
You might also consider replacing \s using [[:space:]] in order to main consistency.
(tested in GNU Awk 5.0.1)

Remove field-internal newlines in CSV file

I tried different awk methods to achieve this, but since I don't really understand how awk works, I didn't succeed.
So, I have a - large - csv-file that contains multi-line entries such as this:
"99999";"xyz";"text
that has
multiple newlines";"fdx";"xyz"
I need to get rid of those extra newlines in between the quotes.
Since every line ends with a double quote, followed by a newline, I thought I could create a command that replaces all newlines, except the ones that are prepended by a double-quote.
How would I do that?

Chances are all you need is this, using GNU awk for multi-char RS:
awk -v RS='\r\n' '{gsub(/\n/," ")}1' file
since your input is probably a CSV exported from a Windows tool like Excel and so has \r\n "line" endings but individual \ns for newlines within fields.
Alternatively, again using GNU awk for multi-char RS and RT:
$ awk -v RS='"[^"]+"' -v ORS= '{gsub(/\n/," ",RT); print $0 RT}' file
"99999";"xyz";"text that has multiple newlines";"fdx";"xyz"
or if you want all the chains of newlines compressed to single blanks:
$ awk -v RS='"[^"]+"' -v ORS= '{gsub(/\n+/," ",RT); print $0 RT}' file
"99999";"xyz";"text that has multiple newlines";"fdx";"xyz"
If you need anything else, including being able to identify and use the individual fields on each input "line", see What's the most robust way to efficiently parse CSV using awk?.

awk to filter lines in file by removing pattern

Trying to use awk to remove the IonCode_4 digits (always 4 may be different) and leave the file extension. Is the below the best way? Thank you :).
file
1112233 ID_1234_000000-Control_z_zzzz_zz_zz_zz_zz_zz_zzz_zz-zzzz-zzz-zzz_zzzz_zzzz_zzz_zzz_zzz_zzz_zzz.txt
1112231 ID_1234_000000-Control_z_zzzz_zz_zz_zz_zz_zz_zzz_zz-zzzz-zzz-zzz_zzzz_zzzz_zzz_zzz_zzz_zzz_zzz.txt
awk
awk '/_tn_/ {next} gsub ("^.*/|_.*$|IonCode_...._", "", $2)'f
current
1112233 000000-Control
1112231 000000-Control
desired
1112233 000000-Control.txt
1112231 000000-Control.txt

Split records by 1+ spaces or underscore, so the 4th field will be the part you're interested in.
awk -F '[[:space:]]+|_' '!/_tn_/{print $1,$4".txt"}' file

Could you please try following. This is simplest I could think, though we could do it with number of fields mentioning too but that will be more like hard-coding of numbers, so I went with this approach here.
awk '
{
sub(/[^_]*_/,"",$2)
sub(/[^_]*_/,"",$2)
sub(/_.*/,".txt")
}
1
' Input_file

with sed
$ sed -E 's/ID_[0-9]{4}_([^_]+).*(\..*)/\1\2/' file
1112233 000000-Control.txt
1112231 000000-Control.txt

For gawk, how to set FS and RS in the same command as an awk script?

I have an awk command that returns the duplicates in an input stream with
awk '{a[$0]++}END{for (i in a)if (a[i]>1)print i;}'
However, I want to change the field separator characters and record separator characters before I do that. The command I use for that is
FS='\n' RS='\n\n'
Yet I'm having trouble making that happen. Is there a way to effectively combine these two commands into one? Piping one to the other doesn't seem to work either.

the action of BEGIN rule is executed before reading any input.
awk 'BEGIN{FS="\n";RS="\n\n"}{a[$0]++}END{for (i in a)if (a[i]>1)print i;}'
or you can specify them using command line options like:
awk -F '\n' -v RS='\n\n' '{a[$0]++}END{for (i in a)if (a[i]>1)print i;}'

Append prefix to first column of a file with awk

I have a couple of hundreds of files which I want to process with xargs. They all need a fix of their first column.
Therefore I need an awk command to append the prefix "ID_" to the first column of a file (except for the first header line). Can anyone help me with this?
Something along the line:
gawk -f ';' "{$1='ID_' $1; print $0}" file.csv > file_processed.csv
I am not expert for the command, though. And I would rather like to have some inplace processing instead of making a copy of each file. Beforehand, I made it in VIM, but then I only had one file.
:%s/^-/ID_/
I hope someone can help me here.

gawk 'BEGIN{FS=";"; OFS=";"} {if(NR>1) $1="ID_"$1; print}' file.csv > file_processed.csv
FS and OFS set the input and output field separators, respectively.
NR>1 checks whether current line number is larger than 1, so we don't modify the header line.
You can also modify the file in place with -i inplace option:
gawk -i inplace 'BEGIN{FS=";"; OFS=";"} {if(NR>1) $1="ID_"$1; print}' file.csv
Edit
After elaborating the original question, here's the final version:
gawk -i inplace 'BEGIN{FS=OFS=";"} NR>1{sub(/^-/,"ID_",$2)} 1' file.csv
which substitutes - in the beginning of second column with ID_.
NR>1 action applies for all but first (header) line. 1 invokes the default default print action.

If you just want to do something, particularly adding a prefix, on the first field, it is not different from adding the prefix to the whole line.
So you can just awk '$0 = "ID_" $0' file.csv it should do the work. If you want to make it "change in place", you can:
awk '$0="ID_"$0' csv >/tmp/foo && mv /tmp/foo file.csv
You can also make use of sed:
sed -i 's/^/ID_/' file
The -i does "in-place modification"
You mentioned vim, and gave s/^-/ID_/ cmd, it doesn't add the prefix (ID_), it will replace the leading - by the ID_, they are different.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Awk-How remove pattern with brackets - awk

I made a mistake, adding Ini file (Yes we're in 2022 :D) a section with errors I added a line [End[edit=true] How could remove this entire line using awk (I don't have any others choice 😕) I don't understand how escape the [ in the AWK command line. Could you please help me? Thanks

A couple ideas where you escape the leading (left) brackets: awk '/\[End\[edit=true]/ {next} 1' file # or awk '!/\[End\[edit=true]/' file Once you've confirmed the results, and assuming you're using GNU awk, you can add -i inplace to update the file: awk -i inplace '!/\[End\[edit=true]/' file

Related

awk command works, but not in openwrt's awk

Remove field-internal newlines in CSV file

awk to filter lines in file by removing pattern

For gawk, how to set FS and RS in the same command as an awk script?

Append prefix to first column of a file with awk

Categories

Resources