tips with awk with changing parameters - awk

i got several pieces of code that look like:
for ff in `seq 3 $nlpN`;
do
npc1[$ff]=`awk 'NR=='$ff' {print $1}' p_walls.raw`;
echo ${npc1[$ff]};
npc2[$ff]=`awk 'NR=='$ff' {print $2}' p_walls.raw`;
npc3[$ff]=`awk 'NR=='$ff' {print $3}' p_walls.raw`;
npRs[$ff]=`awk 'NR=='$ff' {print $4}' p_walls.raw`;
echo $ff
done
as You can see i'm invoking awk several times. Is there a faster way to do this, like invoking awk once and do the assignments with the changin parameters?
thanks a lot in advance!
input looks like:
...
3.76023 0.79528 0.307771 8729.82
3.76024 0.814664 0.307849 8650.2
3.76026 0.845679 0.307978 8802.97
3.76025 0.826293 0.307897 8690.43
3.76017 0.65959 0.30722 8936.07
...
im looking for sth like:
TY

That does look pretty inefficient. As written, awk is processing the input file in its entirety four times with every pass of the loop. I'm also pretty sure that cut is completely unnecessary unless you have the FS environment variable set to something strange. The following will replace the multiple awk runs with a single pass over the data file that stops after it finds the line. Then you can use cut to extract the individual fields.
for ff in `seq 3 $nlpN`
do
data=`awk 'NR=='$ff' { print $1, $2, $3, $4; exit }' p_walls.raw`
npc1[$ff]=`echo "$data" | cut -f1 -d ' '`
echo ${npc1[$ff]}
npc2[$ff]=`echo "$data" | cut -f2 -d ' '`
npc3[$ff]=`echo "$data" | cut -f3 -d ' '`
npRs[$ff]=`echo "$data" | cut -f4 -d ' '`
echo $ff
done
Note that I added an exit statement so that awk will exit after processing the line. This prevents it from reading the entire file on every pass. If all that you need to do is extract a single line from a file, then you might want to use sed instead since (IMHO) the script is easier to read and it seems to be a little faster on large files. The following sed expression is equivalent to the awk line:
data=`sed -n -e "$ff p" -e "$ff q" p_walls.raw`
The -n tells sed to only output from the lines that are selected by the script. In this case, the script, supplied as two -e parameters. Each is an address followed by processing command. Multiple commands are separated newlines in sed scripts but they can also be specified by multiple -e parameters with the same address. Putting this all together, the expression 42 p tells sed to select line 42 and run the p command which prints the selected pattern space (the 42nd line). The 42 q command tells the utility to exit after processing the 42nd line. So, our sed expression reads the first $ffth lines from "p_walls.raw", prints the $ffth one and exits.

Run awk a single time and process the output on each iteration separately.
awk "(NR > 3 && NR <= $nlpN)"' { print NR, $1, $2, $3, $4 }' p_walls.raw |
while read ff c1 c2 c3 Rs
do
npc1[$ff]=$c1
echo ${npc1[$ff]};
npc2[$ff]=$c2
npc3[$ff]=$c3
npRs[$ff]=$Rs
echo $ff
done

Related

How do I decrement all array indexes in a text file?

Background
I have a text file that looks like the following:
$SomeText.element_[1]="MoreText[3]";\r"
$SomeText.element_[2]="MoreText[6]";\r"
$SomeText.element_[3]="MoreText[2]";\r"
$SomeText.element_[4]="MoreText[1]";\r"
$SomeText.element_[5]="MoreText[5]";\r"
This goes on for over a thousand lines. I want to do the following:
$SomeText.element_[0]="MoreText[3]";\r"
$SomeText.element_[1]="MoreText[6]";\r"
$SomeText.element_[2]="MoreText[2]";\r"
$SomeText.element_[3]="MoreText[1]";\r"
$SomeText.element_[4]="MoreText[5]";\r"
Each line of text in the file should have the left most index reduced by one, with the rest of the text unchanged.
Attempted Solutions
So far I have tried the following...but the issue for me is I do not know how to feed it back into the file properly:
Attempt 1
I tried a double cutting technique:
cat file.txt | cut -d '[' -f2 | cut -d ']' -f1 | xargs -I {} expr {} + 1
This properly outputs all of the indicies reduced by one to the command line.
Attempt 2
I tried using awk with a mix of sed, but this caused by machine to hang:
awk -F'[' '{printf("%d\n", $2-1)}' file.txt | xargs -I {} sed -i 's/\[\d+\]/{}/g' file.txt
Question
How to I properly decrement all of the array indexes in the file by one and properly write the decremented indexes into the right location of the text file?
A Perl one-liner makes this easy, overwriting the input file:
perl -pi -e 's/(\d+)/$1-1/e' your-file-name-here
(assuming the first number on each line is the index)
With simple awk you could try following, written and tested with shown samples.
awk '
match($0,/\[[^]]*/){
print substr($0,1,RSTART) count++ substr($0,RSTART+RLENGTH)
}
' Input_file
OR in case your Input_file's count in between [..] is in any order then simply reduce 1 from them as follows.
awk '
match($0,/\[[^]]*/){
print substr($0,1,RSTART) substr($0,RSTART+1,RLENGTH)-1 substr($0,RSTART+RLENGTH)
}
' Input_file
With GNU sed and bash:
sed -E "s/([^[]*\[)([0-9]+)(].*)/printf '%s%d%s\n' '\1' \$((\2 - 1)) '\3'/e" file
Or, if it is possible that the lines contain ' character:
sed -E "
/\[[0-9]+]/{
s/'/'\\\''/g
s/([^[]*\[)([0-9]+)(].*)/printf '%s%d%s\n' '\1' \$((\2 - 1)) '\3'/e
}" file

How can I print only lines that are immediately preceeded by an empty line in a file using sed?

I have a text file with the following structure:
bla1
bla2
bla3
bla4
bla5
So you can see that some lines of text are preceeded by an empty line.
I understand that sed has the concept of two buffers, a pattern space buffer and a hold space buffer, so I'm guessing these need to come in to play here, but I'm unclear how to specify them to accomplish what I need.
In my contrived example above, I'd expect to see the following lines outputted:
bla3
bla5
sed is for doing s/old/new on individual lines, that is all. Any time you start talking about buffers or doing anything related to multi-lines comparisons you're using the wrong tool.
You could do this with awk:
$ awk -v RS= -F'\n' 'NR>1{print $1}' file
bla3
bla5
but it would fail to print the first non-empty line if the first line(s) in the file were empty so this may be what you want if you want lines of all space chars considered to be empty lines:
$ awk 'NF && !p{print} {p=NF}' file
bla3
bla5
and this otherwise:
$ awk '($0!="") && (p==""){print} {p=$0}' file
bla3
bla5
All of the above will work even if there are multiple empty lines preceding any given non-empty line.
To see the difference between the 3 approaches (which you won't see given the sample input in the question):
PS1> printf '\nfoo\n \nbar\n\netc\n' | cat -E
$
foo$
$
bar$
$
etc$
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk -v RS= -F'\n' 'NR>1{print $1}'
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk 'NF && !p{print} {p=NF}'
foo
bar
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk '($0!="") && (p==""){print} {p=$0}'
foo
etc
You can use the hold buffer easily to print the line before the blank like this:
sed -n -e '/^$/{x; p;}' -e h input
But I don't see an easy way to use it for your use case. For your case, instead of using the hold buffer, you could do:
sed -n -e '/^$/ba' -e d -e :a -e n -e p input
But I would do this with awk.
awk 'NR!=1{print $1}' RS= FS=\\n input-file
awk 'p;{p=/^$/}' file
above command does these for each line:
if p is 1, print line;
if line is empty, set p to 1.
if lines consisting of one or more spaces are also considered empty:
awk 'p;{p=!NF}' file
to print non-empty lines each coming right after an empty line, you can use this:
awk 'p*!(p=/^$/)' file
if p is 1 and this line is not empty (1*!(0) = 1*1 = 1), print this line;
otherwise (1*!(1) = 1*0 = 0, 0*anything = 0), don't print anything.
note that this one may not work with all awks, a portable version of this would look like:
awk 'p*(/./);{p=/^$/}' file
if lines consisting of one or more spaces are also considered empty:
awk 'p*NF;{p=!NF}' file
see them online here, and here.
If sed/awk is not mandatory, you can do it with grep:
grep -A 1 '^$' input.txt | grep -v -E '^$|--'
You can use sed to match a range of lines and do sub-matches inside the matches, like so:
# - use the "-n" option to omit printing of lines
# - match lines between a blank line (/^$/) and a non-blank one (/^./),
# then print only the line that contains at least a character,
# i.e, the non-blank line.
sed -ne '
/^$/,/^./ {
/^./{ p; }
}' input.txt
tested by gnu sed, your data in 'a':
$ sed -nE '/^$/{N;s/\n(.+)/\1/p}' a
bla3
bla5
add -i option precedes -n to real editing

Why does awk not filter the first column in the first line of my files?

I've got a file with following records:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
depots/import/HDN1YYAA_20102018.txt;2;CLI001
depots/import/HDN1YYAA_20102018.txt;32;CLI001
depots/import/HDN1YYAA_25102018.txt;1;CAB001
depots/import/HDN1YYAA_50102018.txt;1;CAB001
depots/import/HDN1YYAA_65102018.txt;1;CAB001
depots/import/HDN1YYAA_80102018.txt;2;CLI001
depots/import/HDN1YYAA_93102018.txt;2;CLI001
When I execute following oneliner awk:
cat lignes_en_erreur.txt | awk 'FS=";"{ if(NR==1){print $1}}END {}'
the output is not the expected:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
While I am suppose get only the frist column:
If I run it through all the records:
cat lignes_en_erreur.txt | awk 'FS=";"{ if(NR>0){print $1}}END {}'
then it will start filtering only after the second line and I get the following output:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
depots/import/HDN1YYAA_20102018.txt
depots/import/HDN1YYAA_20102018.txt
depots/import/HDN1YYAA_25102018.txt
depots/import/HDN1YYAA_50102018.txt
depots/import/HDN1YYAA_65102018.txt
depots/import/HDN1YYAA_80102018.txt
depots/import/HDN1YYAA_93102018.txt
Does anybody knows why awk is skiping the first line only.
I tried deleting first record but the behaviour is the same, it will skip the first line.
First, it should be
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}END {}' filename
You can omit the END block if it is empty:
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}' filename
You can use the -F command line argument to set the field delimiter:
awk -F';' '{if(NR==1){print $1}}' filename
Furthermore, awk programs consist of a sequence of CONDITION [{ACTIONS}] elements, you can omit the if:
awk -F';' 'NR==1 {print $1}' filename
You need to specify delimiter in either BEGIN block or as a command-line option:
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}'
awk -F ';' '{ if(NR==1){print $1}}'
cut might be better suited here, for all lines
$ cut -d';' -f1 file
to skip the first line
$ sed 1d file | cut -d';' -f1
to get the first line only
$ sed 1q file | cut -d';' -f1
however at this point it's better to switch to awk
if you have a large file and only interested in the first line, it's better to exit early
$ awk -F';' '{print $1; exit}' file

Combine grep -f and awk

I am using two commands:
awk '{ print $2 }' SomeFile.txt > Pattern.txt
grep -f Pattern.txt File.txt
With the first command I create a list of desirable patterns. With the second command I extract all lines in File.txt that match the lines in the Pattern.txt
My question is, is there a way to combine awk and grep in a pipeline so that I don't have to generate the intermediate Pattern.txt file?
Thanks!
You can do this all in one invocation of awk:
awk 'NR==FNR{a[$2];next}{for(i in a)if($0~i)print}' Somefile.txt File.txt
Populate keys in the array a from the second column of the first file. NR==FNR identifies the first file (total record number is equal to this file's record number). next skips the second block for the first file.
In the second block, loop through all the keys in the array and if the line matches any of them, print it. To avoid printing the line more than once if it matches more than one pattern, you could add a next here too, i.e. {for(i in a)if($0~i){print;next}}.
If the "patterns" are actually fixed strings, it is even simpler:
awk 'NR==FNR{a[$2];next}$0 in a' Somefile.txt File.txt
If your shell supports it, you can use process substitution:
grep -f <(awk '{ print $2 }' SomeFile.txt) File.txt
bash and zsh will support that, others will probably too, didn't tested.
Simpler as the above and supported by all shells would be to use a pipe:
awk '{ print $2 }' SomeFile.txt | grep -f - File.txt
- is used as the argument to -f. - has a special meaning here and stands for stdin. Thanks to Tom Fenech for mentioning that!

How to preserve spaces in input fields with awk

I'm trying to do something pretty simple but its appears more complicated than expected...
I've lines in a text file, separated by the comma and that I want to output to another file, without the first field.
Input:
echo file1,item, 12345678 | awk -F',' '{OFS = ";";$1=""; print $0}'
Output:
;item; 12345678
As you can see the spaces before 12345678 are kind of merged into one space only.
I also tried with the cut command:
echo file1,item, 12345678 | cut -d, -f2-
and I ended up with the same result.
Is there any workaround to handle this?
Actually my entire script is as follows:
cat myfile | while read l_line
do
l_line="'$l_line'"
v_OutputFile=$(echo $l_line | awk -F',' '{print $1}')
echo $(echo $l_line | cut -d, -f2-) >> ${v_OutputFile}
done
But stills in l_line all spaces but one are removed. I also created the quotes inside the file but same result.
it has nothing to do with awk. quote the string in your echo:
#with quotes
kent$ echo 'a,b, c'|awk -F, -v OFS=";" '{$1="";print $0}'
;b; c
#without quotes
kent$ echo a,b, c|awk -F, -v OFS=";" '{$1="";print $0}'
;b; c
The problem is with your invocation of the echo command you're using to feed awk the test data above. The shell is looking at this command:
echo file1,item, 12345678
and treating file1,item, and 12345678 as two separate parameters to echo. echo just prints all its parameters, separated by one space.
If you were to quote the whitespace, as follows:
echo 'file1,item, 12345678'
the shell would interpret this as a single parameter to feed to echo, so you'd get the expected result.
Update after edit to OP - having seen your full script, you could do this entirely in awk:
awk -F, '{ OFS = "," ; f = $1 ; sub("^[^,]*,","") ; print $0 >> f }' myfile