Sed between blocks of some exact text - awk

I am struggling on parsing some log file.
Here how it looks like:
node_name: na2-devdb-cssx
run_id: 3c3424f3-8a62-4f4c-b97a-2096a2afc070
start_time: 2015-06-26T21:00:44Z
status: failure
node_name: eu1-devsx
run_id: f5ed13a3-1f02-490f-b518-97de9649daf5
start_time: 2015-06-26T21:00:34Z
status: success
I need to get blocks which have "failure" in its last line of the block.
Ideally would be to consider on time stamp as well. Like if time stamp is like "2015-06-26T2*"
And here what I have tried so far:
sed -e '/node_name/./failure/p' /file
sed -n '/node_name/./failure/p' /file
awk '/node_name/,/failure/' file
sed -e 's/node_name\(.*\)failure/\1/' file
None of them doesn't work for me.
It just throws me everything except failure...
For example:
[root#localhost chef-repo-zilliant]# sed -n '/node_name/,/failure/p' /tmp/run.txt | head
node_name: eu1-devdb-linc
run_id: e49fe64d-567d-4627-a10d-477e17fb6016
start_time: 2015-06-28T20:59:55Z
status: success
node_name: eu1-devjs1
run_id: c6c7f668-b912-4459-9d56-94d1e0788802
start_time: 2015-06-28T20:59:53Z
status: success
Have no idea why it doesn't work. Seems like for all around these methods work fine...
Thank you in advance.

A way with Gnu sed:
sed -n ':a;/^./{H;n;ba;};x;/2015-06-26T21/{/failure$/p;};' file.txt
details:
:a; # define the label "a"
/^./ { # condition: when a line is not empty
H; # append it to the buffer space
n; # load the next line in the pattern space
ba; # go to label "a"
};
x; # swap buffer space and pattern space
/2015-06-26T21/ { # condition: if the needed date is in the block
/failure$/ p; # condition: if "failure" is in the block then print
};

I noted you tried with awk, although you only tagged the question with sed, so I will add a solution with it.
You can play with built-in variable that control how to split lines and fields, like:
awk '
BEGIN { RS = ""; FS = OFS = "\n"; ORS = "\n\n" }
$NF ~ /failure/ && $(NF-1) ~ /2015-06-26T2/ { print }
' infile
RS = "" separates records in newlines. FS and OFS separates fields in lines, and ORS is to print output like original input, with a line interleaved.
It yields:
node_name: na2-devdb-cssx
run_id: 3c3424f3-8a62-4f4c-b97a-2096a2afc070
start_time: 2015-06-26T21:00:44Z
status: failure

Use grep.
grep -oPz '\bnode_name:(?:(?!\n\n)[\s\S])*?2015-06-26T2(?:(?!\n\n)[\s\S])*?\bfailure\b' file
The main part here is (?:(?!\n\n)[\s\S])*? which matches any charactar but not of a blank line, zero or more times.

Related

Can't replace string to multi-lined string with sed

I'm trying to replace a fixed parse ("replaceMe") in a text with multi-lined text with sed.
My bash script goes as follows:
content=$(awk'{print $5}' < data.txt | sort | uniq)
target=$(cat install.sh)
text=$(sed "s/replaceMe/$content/" <<< "$target")
echo "${text}"
If content contains one line only, replacing works, but if it contains sevrel lines I get:
sed:... untarminated `s' command
I read about "fetching" multi-lined content, but I couldn't find something about placing multi lined string
You'll have more problems than that depending on the contents of data.txt since sed doesn't understand literal strings (see Is it possible to escape regex metacharacters reliably with sed). Just use awk which does:
text="$( awk -v old='replaceMe' '
NR==FNR {
if ( !seen[$5]++ ) {
new = (NR>1 ? new ORS : "") $5
}
next
}
s = index($0,old) { $0 = substr($0,1,s-1) new substr($0,s+length(old)) }
{ print }
' data.txt install.sh )"

Insert a line at the end of an ini section only if it doesn't exist

I have an smb.conf ini file which is overwritten whenever edited with a certain GUI tool, wiping out a custom setting. This means I need a cron job to ensure that one particular section in the file contains a certain option=value pair, and insert it at the end of the section if it doesn't exist.
Example
Ensure that hosts deny=192.168.23. exists within the [myshare] section:
[global]
printcap name = cups
winbind enum groups = yes
security = user
[myshare]
path=/mnt/myshare
browseable=yes
enable recycle bin=no
writeable=yes
hosts deny=192.168.23.
[Another Share]
invalid users=nobody,nobody
valid users=nobody,nobody
path=/mnt/share2
browseable=no
Long-winded solution using awk
After a long time struggling with sed, I concluded that it might not be the right tool for the job. So I moved over to awk and came up with this:
#!/bin/sh
file="smb.conf"
tmp="smb.conf.tmp"
section="myshare"
opt="hosts deny=192.168.23."
awk '
BEGIN {
this_section=0;
opt_found=0;
}
# Match the line where our section begins
/^[ \t]*\['"$section"'\][ \t]*$/ {
this_section=1;
print $0;
next;
}
# Match lines containing our option
this_section == 1 && /^[ \t]*'"$opt"'[ \t]*$/ {
opt_found=1;
}
# Match the following section heading
this_section == 1 && /^[ \t]*\[.*$/ {
this_section=0;
if (opt_found != 1) {
print "\t'"$opt"'";
}
}
# Print every line
{ print $0; }
END {
# In case our section is the very last in the file
if (this_section == 1 && opt_found != 1) {
print "\t'"$opt"'";
}
}
' $file > $tmp
# Overwrite $file only if $tmp is different
diff -q $file $tmp > /dev/null 2>&1
if [ $? -ne 0 ]; then
mv $tmp $file
# reload smb.conf here
else
rm $tmp
fi
I can't help feeling that this is a long script to achieve a simple task. Is there a more efficient/elegant way to insert a property in an ini file using basic shell tools like sed and awk?
Consider using Python 3's configparser:
#!/usr/bin/python3
import sys
from configparser import SafeConfigParser
cfg = SafeConfigParser()
cfg.read(sys.argv[1])
cfg['myshare']['hosts deny'] = '192.168.23.';
with open(sys.argv[1], 'w') as f:
cfg.write(f)
To be called as ./filename.py smb.conf (i.e., the first parameter is the file to change).
Note that comments are not preserved by this. However, since a GUI overwrites the config and doesn't preserve custom options, I suspect that comments are already nuked and that this is not a worry in your case.
Untested, should work though
awk -vT="hosts deny=192.168.23" 'x&&$0~T{x=0}x&&/^ *\[[^]]+\]/{print "\t\t"T;x=0}
/^ *\[myshare\]/{x++}1' file
This solution is a bit awkward. It uses the INI section header as the record separator. This means that there is an empty record before the first header, so when we match the header we're interested in, we have to read the next record to handle that INI section. Also, there are some printf commands because the records still contain leading and trailing newlines.
awk -v RS='[[][^]]+[]]' -v str="hosts deny=192.168.23." '
{printf "%s", $0; printf "%s", RT}
RT == "[myshare]" {
getline
printf "%s", $0
if (index($0, str) == 0) print str
printf "%s", RT
}
' smb.conf
RS is the awk variable that contains the regex to split the text into records.
RT is the awk variable that contains the actual text of the current record separator.
With GNU awk for a couple of extensions:
$ cat tst.awk
index($0,str) { found = 1 }
match($0,/^\s*\[([^]]+).*/,a) {
if ( (name == tgt) && !found ) { print indent str }
name = a[1]
found = 0
}
{ print; indent=gensub(/\S.*/,"","") }
.
$ awk -v tgt="myshare" -v str="hosts deny=192.168.23." -f tst.awk file
[global]
printcap name = cups
winbind enum groups = yes
security = user
[myshare]
path=/mnt/myshare
browseable=yes
enable recycle bin=no
writeable=yes
hosts deny=192.168.23.
[Another Share]
invalid users=nobody,nobody
valid users=nobody,nobody
path=/mnt/share2
browseable=no
.
$ awk -v tgt="myshare" -v str="fluffy bunny" -f tst.awk file
[global]
printcap name = cups
winbind enum groups = yes
security = user
[myshare]
path=/mnt/myshare
browseable=yes
enable recycle bin=no
writeable=yes
hosts deny=192.168.23.
fluffy bunny
[Another Share]
invalid users=nobody,nobody
valid users=nobody,nobody
path=/mnt/share2
browseable=no

How to rewrite a Awk script to process several files instead of one

I am writing a report tool which processes the source files of some application and produce a report table with two columns, one containing the name of the file and the other containing the word TODO if the file contains a call to some deprecated function deprecated_function and DONE otherwise.
I used awk to prepare this report and my shell script looks like
report()
{
find . -type f -name '*.c' \
| xargs -n 1 awk -v deprecated="$1" '
BEGIN { status = "DONE" }
$0 ~ deprecated{ status = "TODO" }
END {
printf("%s|%s\n", FILENAME, status)
}'
}
report "deprecated_function"
The output of this script looks like
./plop-plop.c|DONE
./fizz-boum.c|TODO
This works well but I would like to rewrite the awk script so that it supports several input files instead of just one — so that I can remove the -n 1 argument to xargs. The only solutions I could figure out involve a lot of bookkeeping, because we need to track the changes of FILENAME and the END event to catch each end of file event.
awk -v deprecated="$1" '
BEGIN { status = "DONE" }
oldfilename && (oldfilename != FILENAME) {
printf("%s|%s\n", oldfilename, status);
status = DONE;
oldfilename = FILENAME;
}
$0 ~ deprecated{ status = "TODO" }
END {
printf("%s|%s\n", FILENAME, status)
}'
Maybe there is a cleaner and shorter way to handle this.
I am using FreeBSD's awk and am looking for solutions compatible with this tool.
This will work in any modern awk:
awk -v deprecated="$1" -v OFS='|' '
$0 ~ deprecated{ dep[FILENAME] }
END {
for (i=1;i<ARGC;i++)
print ARGV[i], (ARGV[i] in dep ? "TODO" : "DONE")
}
' file1 file2 ...
Any time you need to produce a report for all files and don't have GNU awk for ENDFILE, you MUST loop through ARGV[] in the END section (or loop through it in BEGIN and populate a different array for END section processing). Anything else will fail if you have empty files.
Your awk script could be something like this:
awk -v deprecated="$1" '
FNR==1 {if(file) print file "|" (f?"TODO":"DONE"); file=FILENAME; f=0}
$0 ~ deprecated {f=1}
END {print file "|" (f?"TODO":"DONE")}' file1.c file2.c # etc.
The logic is fairly similar to your program so hopefully it's all clear. FNR is the record number of the current file, which I'm using to detect the start of a new file. Admittedly there's some repetition in the END block but I don't think it's a big deal. You could always use a function if you wanted to.
Testing it out:
$ cat f1.c
int deprecated_function()
{
// some deprecated stuff
}
$ cat f2.c
int good_function()
{
// some good stuff
}
$ find -name "f?.c" -print0 | xargs -0 awk -v deprecated="deprecated" 'FNR==1 {if(file) print file "|" (f?"TODO":"DONE"); file=FILENAME; f=0} $0 ~ deprecated {f=1} END {print file "|" (f?"TODO":"DONE")}'
./f2.c|DONE
./f1.c|TODO
I have used -print0 and the -0 switch to xargs so that both programs with work file names separated by null bytes "\0" rather than spaces. This means that you won't run into problems with spaces in file names.

awk: non-terminated string

I'm trying to run the command below, and its giving me the error. Thoughts on how to fix? I would rather have this be a one line command than a script.
grep "id\": \"http://room.event.assist.com/event/room/event/" failed_events.txt |
head -n1217 |
awk -F/ ' { print $7 } ' |
awk -F\" ' { print "url \= \"http\:\/\/room\.event\.assist\.com\/event\/room\/event\/'{ print $1 }'\?schema\=1\.3\.0\&form\=json\&pretty\=true\&token\=582EVTY78-03iBkTAf0JAhwOBx\&account\=room_event\"" } '
awk: non-terminated string url = "ht... at source line 1
context is
>>> <<<
awk: giving up
source line number 2
The line below exports out a single column of ID's:
grep "id\": \"http://room.event.assist.com/event/room/event/" failed_events.txt |
head -n1217 |
awk -F/ ' { print $7 } '
156512145
898545774
454658748
898432413
I'm looking to get the ID's above into a string like so:
" url = "string...'ID'string"
take a look what you have in last awk :
awk -F\"
' #single start here
{ print " #double starts for print, no ends
url \= \"http\:\/\/room\.event\.assist\.com\/event\/room\/event\/
' #single ends here???
{ print $1 }'..... #single again??? ...
(rest codes)
and you want to print exact {print } out? i don't think so. why you were nesting print ?
Most of the elements of your pipe can be expressed right inside awk.
I can't tell exactly what you want to do with the last awk script, but here are some points:
Your "grep" is really just looking for a string of text, not a
regexp.
You can save time and simplify things if you use awk's
index() function instead of a RE. Output formats are almost always
best handled using printf().
Since you haven't provided your input data, I can't test this code, so you'll need to adapt it if it doesn't work. But here goes:
awk -F/ '
BEGIN {
string="id\": \"http://room.event.assist.com/event/room/event/";
fmt="url = http://example.com/event/room/event/%s?schema=whatever\n";
}
count == 1217 { nextfile; }
index($0, string) {
split($7, a, "\"");
printf(fmt, a[0]);
count++;
}' failed_events.txt
If you like, you can use awk's -v option to pass in the string variable from a shell script calling this awk script. Or if this is a stand-alone awk script (using #! shebang), you could refer to command line options with ARGV.

Fields contain field separator as string: How to apply awk correctly in this case?

I have a CSV-file similar to this test.csv file:
Header 1; Header 2; Header 3
A;B;US
C;D;US
E;F;US
G;H;FR
I;J;FR
K;L;FR
M;"String with ; semicolon";UK
N;"String without semicolon";UK
O;"String OK";
P;"String OK";
Now, I want to split this file based on header 3. So I want to end up with four separate CSV files, one for "US", "FR", "UK", and "".
With my very limited Linux command line skills (sadly :-( I used until now this line:
awk -F\; 'NR>1{ fname="country_yearly_"$3".csv"; print >>(fname); close(fname);}' test.csv
Of course, the experienced command line users of you will notice my problem: One field in my test.csv contains rows in which the semicolon which I use as a separator is also present in fields that are marked with quotation marks (I can't guarantee that for sure because of millions of rows, but I'm happy with an answer that assumes this). So sadly, I get an additional file named country_yearly_ semicolon".csv, which contains this row in my example.
In my venture to solve this issue, I came across this question on SO. In particular, Thor's answer seems to contain the solution of my problem by replacing all semicolons in strings. I adjusted his code accordingly as follows:
awk -F'"' -v OFS='' '
NF > 1 {
for(i=2; i<=NF; i+=2) {
gsub(";", "|", $i);
$i = FS $i FS; # reinsert the quotes
}
print
}' test.csv > test1.csv
Now, I get the following test1.csv file:
M;"String with | semicolon";UK
N;"String without semicolon";UK
O;"String OK";
P;"String OK";
As you can see, all rows that have quotation marks are shown and my problem line is fixed as well, but a) I actually want all rows, not only those in quotation marks and I can't figure out which part in his code does limit the rows to ones with quotation marks and b) I think it would be more efficient if test.csv is just changed instead of sending the output to a new file, but I don't know how to do that either.
EDIT in response to Birei's answer:
Unfortunately, my minimal example was too simple. Here is an updated version:
Header 1; Header 2; Header 3; Header 4
A;B;US;
C;D;US;
E;F;US;
G;H;FR;
I;J;FR;
K;L;FR;
M;"String with ; semicolon";UK;"Yet another ; string"
N;"String without semicolon";UK; "No problem here"
O;"String OK";;"Fine"
P;"String OK";;"Not ; fine"
Note that my real data has roughly 100 columns and millions of rows and the country column, ignoring semicolons in strings, is column 13. However, as far as I see it I can't use the fact that it's column 13 if I don't get rid of the semicolons in strings first.
To split the file, you might just do:
awk -v FS=";" '{ CSV_FILE = "country_yearly_" $NF ".csv" ; print > CSV_FILE }'
Which always take the last field to construct the file name.
In your example, only lines with quotation marks are printed due to the NF > 1 pattern. The following script will print all lines:
awk -F'"' -v OFS='' '
NF > 1 {
for(i=2; i<=NF; i+=2) {
gsub(";", "|", $i);
$i = FS $i FS; # reinsert the quotes
}
}
{
# print all lines
print
}' test.csv > test1.csv
To do what you want, you could change the line in the script and reprocess it:
awk -F'"' -v OFS='' '
# Save the original line
{ ORIGINAL_LINE = LINE = $0 }
# Replace the semicolon inside quotes by a dummy character
# and put the resulting line in the LINE variable
NF > 1 {
LINE = ""
for(i=2; i<=NF; i+=2) {
gsub(";", "|", $i)
LINE = LINE $(i-1) FS $i FS # reinsert the quotes
}
# Add the end of the line after the last quote
if ( $(i+1) ) { LINE = LINE $(i+1) }
}
{
# Put the semicolon-separated fields in a table
# (the semicolon inside quotes have been removed from LINE)
split( LINE, TABLE, /;/ )
# Build the file name -- TABLE[ 3 ] is the 3rd field
CSV_FILE = "country_yearly_" TABLE[ 3 ] ".csv"
# Save the line
print ORIGINAL_LINE > CSV_FILE
}'
You were near of a solution. I would use the last field to avoid the problem of fields with double quotes. Also, there is no need to close each file. They will automatically be closed by the shell at the end of the awk script.
awk '
BEGIN {
FS = OFS = ";";
}
FNR > 1 {
fname = "country_yearly_" $NF ".csv";
print >>fname;
}
' infile
Check output:
head country_yearly_*
That yields:
==> country_yearly_.csv <==
O;"String OK";
P;"String OK";
==> country_yearly_FR.csv <==
G;H;FR
I;J;FR
K;L;FR
==> country_yearly_UK.csv <==
M;"String with ; semicolon";UK
N;"String without semicolon";UK
==> country_yearly_US.csv <==
A;B;US
C;D;US
E;F;US