Awk - print the number of row between a selected range from variable in awk and increment his value when variable change after user keypress - awk

This question is more further developed with reference to the following case awk print the number of columns between a selected range from awk
I have the next short script:
#!/bin/bash
#Control del buffer
#awk en stackowerflow
#https://stackoverflow.com/questions/74483916/awk-print-the-number-of-columns-between-a-selected-range-from-awk/74483975?noredirect=1#comment131485506_74483975
if [ $# -eq 1 ]; then
FICH="${1}"
else
FICH="donaciones"
fi
#_INIT valores globales para mostrar debajo en la ultima linea
#alto_INIT necesario para poner texto abajo en la ultima linea
alto_INIT=`tput lines`
alto=`expr $alto_INIT - 2`
largo=`tput cols`
TOTAL_LINEAS_INIT=`cat "$FICH" | wc -l`
TOTAL_LINEAS=$TOTAL_LINEAS_INIT
#Mostramos solo las lineas dependiendo del alto del terminal
#Numero de paginas que hay que mostrar
LINEA=`expr $TOTAL_LINEAS - $alto`
NUM_PAG=`echo "scale=1; $TOTAL_LINEAS / $alto"|bc`
#Si el sresto de dividdir NPAG entre n lineas es +0 sumanos 1 pag. mas
if [ "${NUM_PAG##*.}" -gt 0 ]; then
NUM_PAG=`echo "scale=0; ${NUM_PAG}" + 1 |bc`
NUM_PAG=${NUM_PAG%%.*}
fi
buffer=`awk -v total_lineas="$TOTAL_LINEAS" -v linea="$LINEA" 'NR>=linea&&NR<=total_lineas' "$FICH"`
function Arriba(){
TOTAL_LINEAS=`expr $TOTAL_LINEAS - $alto`
DESDE=`expr $TOTAL_LINEAS - $alto`
HASTA=$TOTAL_LINEAS
if [[ $DESDE -lt 0 ]] ; then
DESDE=0
HASTA=`expr $DESDE + $alto`
fi
buffer=`awk -v desde="$DESDE" -v hasta="$HASTA" 'NR>=desde&&NR<=hasta' "$FICH"`
TOTAL_LINEAS=`expr $HASTA + 1`
}
function Abajo(){
DESDE=$TOTAL_LINEAS
HASTA=`expr $TOTAL_LINEAS + $alto`
if [[ $HASTA -gt $TOTAL_LINEAS_INIT ]] ; then
HASTA=$TOTAL_LINEAS_INIT
fi
buffer=`awk -v desde="$DESDE" -v hasta="$HASTA" 'NR>=desde&&NR<=hasta' "$FICH"`
TOTAL_LINEAS=$HASTA
}
while true; do
clear
printf "$buffer"
tput cup $alto_INIT 0 && printf "Total Lineas: $TOTAL_LINEAS_INIT | Total Pag: $NUM_PAG |Buffer: De $DESDE hasta $HASTA | $TOTAL_LINEAS | (w) Ayuda"
read -rsn1 TECLA
case $TECLA in
h) Arriba ;;
j) Abajo ;;
w) Help ;;
q) printf "\n" && break ;;
esac
done
exit 0
The goal here is show the number of line from a range of lines invoked by awk.
The program calculate the global lines of terminal and made a pagination from a file.Like
less, but i want to show only the portion of the file when the user press a "h" or "j" key. Every time the user press the key "h" the buffer ( portion of file) change and show
the correct part of file in dependence of number of rows. And when the user press the "j" key the buffer return to the previous key.
The program works ok but i want that when awk show the buffer , give me the number of line that correspond to this global line of the file. For this, i have the variable $TOTAL_LINEAS that increment or decrement every instruction buffer change. And this buffer instruction show from this TOTAL_LINEAS until the lines of terminal , and this every time the user press key. In the previous answer i can to add this number of line
but when the user press a key for change the new buffer text allways print the number for this buffer but not for the global buffer that correspond to real line of the file. In other words, allways print the number of line but in this portion of text not for the global text.
I.E: if i have
1 1:20220413:20:Curso Astrología:5:Vicente Ferrer
2 1:10042022:0:Donación instituto Samye:103:Propia
3 14:20220428:0:Candelario Yeshe Nyimpo Inc:9:Dudjom Tersar
4 1:20220512:60:Ayuda por el Hambre y Violencia:6:Vicente Ferrer
Total Lineas: 43 | Total Pag: 2 |Buffer: De 0 hasta 26 | 27
but in the next keypress for go to the next page i need:
5 1:20220413:20:111
6 1:10042022:0:22
7 14:20220428:0:33
8 1:20220512:60:44
Total Lineas: 43 | Total Pag: 2 |Buffer: De 27 hasta 43 | 43
and not:
1 1:20220413:20:111
2 1:10042022:0:22
3 14:20220428:0:33
4 1:20220512:60:44
Total Lineas: 43 | Total Pag: 2 |Buffer: De 27 hasta 43 | 43

Finally was easy:
buffer=`echo "$buffer" |awk -v i=$TOTAL_LINEAS 'NR==i !NF{print;next} NF{print ++i, $0}'`
Thanks all !!

Related

decoding octal escape sequences in input with awk

Updated
Let's suppose that you got octal escape sequences in a stream:
backslash \134 is escaped as \134134
single quote ' and double quote \042
linefeed `\012` and carriage return `\015`
%s &
etc...
note: The escaped characters are limited to 0x01-0x1F 0x22 0x5C 0x7F
How can you revert those escape sequences back to their corresponding character with awk?
While awk is able to understand them out-of-box when used in a literal string or as a parameter argument, I can't find the way to leverage this capability when the escape sequence is part of the data. For now I'm using one gsub per escape sequence but it doesn't feel efficient.
Here's the expected output for the given sample:
backslash \ is escaped as \134
single quote ' and double quote "
linefeed `
` and carriage return `
%s &
etc...
PS: While I have the additional constraint of unescaping each line into an awk variable before printing the result, it doesn't really matter.
Using GNU awk for strtonum() and lots of meaningfully-named variables to show what each step does:
$ cat tst.awk
function octs2chars(str, head,tail,oct,dec,char) {
head = ""
tail = str
while ( match(tail,/\\[0-7]{3}/) ) {
oct = substr(tail,RSTART+1,RLENGTH-1)
dec = strtonum(0 oct)
char = sprintf("%c", dec)
head = head substr(tail,1,RSTART-1) char
tail = substr(tail,RSTART+RLENGTH)
}
return head tail
}
{ print octs2chars($0) }
$ awk -f tst.awk file
backslash \ is escaped as \134
single quote ' and double quote "
linefeed `
` and carriage return `
%s &
etc...
If you don't have GNU awk then write a small function to convert octal to decimal, e.g. oct2dec() below, and then call that instead of strtonum():
$ cat tst2.awk
function oct2dec(oct, dec) {
dec = substr(oct,1,1) * 8 * 8
dec += substr(oct,2,1) * 8
dec += substr(oct,3,1)
return dec
}
function octs2chars(str, head,tail,oct,dec,char) {
head = ""
tail = str
while ( match(tail,/\\[0-7]{3}/) ) {
oct = substr(tail,RSTART+1,RLENGTH-1)
dec = oct2dec(oct) # replaced "strtonum(0 oct)"
char = sprintf("%c", dec)
head = head substr(tail,1,RSTART-1) char
tail = substr(tail,RSTART+RLENGTH)
}
return head tail
}
{ print octs2chars($0) }
$ awk -f tst2.awk file
backslash \ is escaped as \134
single quote ' and double quote "
linefeed `
` and carriage return `
%s &
etc...
The above assumes that, as discussed in comments, the only backslashes in the input will be in the context of the start of octal numbers as shown in the provided sample input.
With GNU awk which supports strtonum() function, would you
please try:
awk '{
while (match($0, /\\[0-7]{1,3}/)) {
printf("%s", substr($0, 1, RSTART - 1)) # print the substring before the match
printf("%c", strtonum("0" substr($0, RSTART + 1, RLENGTH))) # convert the octal string to character
$0 = substr($0, RSTART + RLENGTH) # update $0 with remaining substring
}
print
}' input_file
It processes the matched substring (octal presentation)
in the while loop one by one.
substr($0, RSTART + 1, RLENGTH) skips the leading backslash.
"0" prepended to substr makes an octal string.
strtonum() converts the octal string to the numeric value.
The final print outputs the remaining substring.
UPDATE :: about gawk's strtonum() in unicode mode :
echo '\666' |
LC_ALL='en_US.UTF-8' gawk -e '
$++NF = "<( "(sprintf("%c", strtonum((_=_<_) substr($++_, ++_))))" )>"'
0000000 909522524 539507744 690009798 2622
\ 6 6 6 < ( ƶ ** ) > \n
134 066 066 066 040 074 050 040 306 266 040 051 076 012
\ 6 6 6 sp < ( sp ? ? sp ) > nl
92 54 54 54 32 60 40 32 198 182 32 41 62 10
5c 36 36 36 20 3c 28 20 c6 b6 20 29 3e 0a
0000016
By default, gawk in unicode mode would decode out a multi-byte character instead of byte \266 | 0xB6. If you wanna ensure consistency of always decoding out a single-byte out, even in gawk unicode mode, this should do the trick :
echo '\666' |
LC_ALL='en_US.UTF-8' gawk -e '$++NF = sprintf("<( %c )>",
strtonum((_=_<_) substr($++_, ++_)) + _*++_^_++*_^++_)'
0000000 909522524 539507744 1042882742 10
\ 6 6 6 < ( 266 ) > \n
134 066 066 066 040 074 050 040 266 040 051 076 012
\ 6 6 6 sp < ( sp ? sp ) > nl
92 54 54 54 32 60 40 32 182 32 41 62 10
5c 36 36 36 20 3c 28 20 b6 20 29 3e 0a
0000015
long story short : add 4^5 * 54 to output of strtonum(), which happens to be 0xD800, the starting point of UTF-16 surrogates
=================== =================== ===================
one quick note about #Gene's proposed perl-based solution :
echo 'abc \555 456' | perl -p -e 's/\\([0-7]{3})/chr(oct($1))/ge'
Wide character in print at -e line 1, <> line 1.
abc ŭ 456
octal codes wrap around, meaning \4xx = \0xx ; \6xx = \2xx etc :
printf '\n %s\n' $'\555'
m
so perl is incorrectly decoding these as multi-byte characters, when in fact \555, as confirmed by printf, is merely lowercase "m" (0x6D)
ps : my perl is version 5.34
I got my own POSIX awk solution, so I post it here for reference.
The main idea is to build a hash that translates an octal escape sequence to its corresponding character. You can then use it while splitting the line during the search for escape sequences:
LANG=C awk '
BEGIN {
for ( i = 1; i <= 255; i++ )
tr[ sprintf("\\%03o",i) ] = sprintf("%c",i)
}
{
remainder = $0
while ( match(remainder, /\\[0-7]{3}/) ) {
printf("%s%s", \
substr(remainder, 1, RSTART-1), \
tr[ substr(remainder, RSTART, RLENGTH) ] \
)
remainder = substr(remainder, RSTART + RLENGTH)
}
print remainder
}
' input.txt
backslash `\`
single quote `'` and double quote `"`
linefeed `
` and carriage return `
%s &
etc...
this separate post is made specifically to showcase how to extend the octal lookup reference tables in gawk unicode-mode to all 256 bytes without external dependencies or warning messages:
ASCII bytes reside in table o2bL
8-bit bytes reside in table o2bH
.
# gawk profile, created Fri Sep 16 09:53:26 2022
'BEGIN {
1 makeOctalRefTables(PROCINFO["sorted_in"] = "#val_str_asc" \
(ORS = ""))
128 for (_ in o2bL) {
128 print o2bL[_]
}
128 for (_ in o2bH) {
128 print o2bH[_]
}
}
function makeOctalRefTables(_,__,___,____)
{
1 _=__=___=____=""
for (_ in o2bL) {
break
}
1 if (!(_ in o2bL)) {
1 ____=_+=((_+=_^=_<_)-+-++_)^_--
128 do { o2bL[sprintf("\\%o",_)] = \
sprintf("""%c",_)
} while (_--)
1 o2bL["\\" ((_+=(_+=_^=_<_)+_)*_--+_+_)] = "\\&"
1 ___=--_*_^_--*--_*++_^_*(_^=++_)^(! —_)
128 do { o2bH[sprintf("\\%o", +_)] = \
sprintf("%c",___+_)
} while (____<--_)
}
1 return length(o2bL) ":" length(o2bH)
}'
|
\0 \1 \2 \3 \4 \5 \6 \7 \10\11 \12
\13
\14
\16 \17
\20 \21 \22 \23 \24 \25 \26 \27 \30 \31 \32 \33 34 \35 \36 \37
\40 \41 !\42 "\43 #\44 $\45 %\47 '\50 (\51 )\52 *\53 +\54 ,\55 -\56 .\57 /
\60 0\61 1\62 2\63 3\64 4\65 5\66 6\67 7\70 8\71 9\72 :\73 ;\74 <\75 =\76 >\77 ?
\100 #\101 A\102 B\103 C\104 D\105 E\106 F\107 G\110 H\111 I\112 J\113 K\114 L\115 M\116 N\117 O
\120 P\121 Q\122 R\123 S\124 T\125 U\126 V\127 W\130 X\131 Y\132 Z\133 [\134 \\46 \&\135 ]\136 ^\137 _
\140 `\141 a\142 b\143 c\144 d\145 e\146 f\147 g\150 h\151 i\152 j\153 k\154 l\155 m\156 n\157 o
\160 p\161 q\162 r\163 s\164 t\165 u\166 v\167 w\170 x\171 y\172 z\173 {\174 |\175 }\176 ~\177
\200 ?\201 ?\202 ?\203 ?\204 ?\205 ?\206 ?\207 ?\210 ?\211 ?\212 ?\213 ?\214 ?\215 ?\216 ?\217 ?
\220 ?\221 ?\222 ?\223 ?\224 ?\225 ?\226 ?\227 ?\230 ?\231 ?\232 ?\233 ?\234 ?\235 ?\236 ?\237 ?
\240 ?\241 ?\242 ?\243 ?\244 ?\245 ?\246 ?\247 ?\250 ?\251 ?\252 ?\253 ?\254 ?\255 ?\256 ?\257 ?
\260 ?\261 ?\262 ?\263 ?\264 ?\265 ?\266 ?\267 ?\270 ?\271 ?\272 ?\273 ?\274 ?\275 ?\276 ?\277 ?
\300 ?\301 ?\302 ?\303 ?\304 ?\305 ?\306 ?\307 ?\310 ?\311 ?\312 ?\313 ?\314 ?\315 ?\316 ?\317 ?
\320 ?\321 ?\322 ?\323 ?\324 ?\325 ?\326 ?\327 ?\330 ?\331 ?\332 ?\333 ?\334 ?\335 ?\336 ?\337 ?
\340 ?\341 ?\342 ?\343 ?\344 ?\345 ?\346 ?\347 ?\350 ?\351 ?\352 ?\353 ?\354 ?\355 ?\356 ?\357 ?
\360 ?\361 ?\362 ?\363 ?\364 ?\365 ?\366 ?\367 ?\370 ?\371 ?\372 ?\373 ?\374 ?\375 ?\376 ?\377 ?

Substitution from specify lines of a look up table

I'm trying to get a script to automate some tasks using the GAMESS package, from which I'd hope to extrapolate to more complex cases later. Alas it would seem my Unix programming skills are not up to par.
I have a general GAMESS input file 'ion.inp'of the form:
$CONTRL SCFTYP=<tag4> ICHARG=<tag5> MULT=<tag6> ISPHER=1 NPRINT=-5 $END
$BASIS GBASIS=<tag> $END
$DATA
<tag1> energy
Dnh 2
<tag2> <tag3> .0 .0 .0
$END
And I have (as a MWE) a look up table for the parameters of 'ion.inp' like 'table.dat', where the <tag#> are taken from each line of the table.
<tag1> | <tag2> | <tag3> | <tag4> | <tag5> | <tag6>
Hidrogen | H | 1.0 | ROHF | 0 | 2
Hidrogen cation | H | 1.0 | RHF | 1 | 1
For portability, I'd like to get a solution using POSIX sh, sed or awk, but after some trials (using sh or sed, I'm not familiar with awk at all, even though I know it is a potential solution in this case) I couldn't get it to work.
The file 'ion.inp' can be edited in place because it will be run inside a sh loop. I already got everything else working, except for this supposedly simple substitution.
Any help would be much appreciated!
Here is an example using sed and awk. The script is named script.awk. Everything is output on stdout but you could redirect it to files using > inside the AWK script. See second solution below.
The idea is to drive the process using awk and the table.dat file which contains data to make the substitutions and then for each batch of substitutions (each line of the file), we use sed to perform the actual substitutions once we have each tag and its value.
BEGIN { FS = "\\s*\\|\\s*" } changes the field separator to use "optional spaces followed by | then followed by optional spaces". That means $1, $2, ... will give us the values for the tag numbered 1, 2, ...
NR == 1 { next } is used to skip the first line which is useless since tags are ordered from 1 without any gap. If it was not the case, we would have to adapt the AWK script.
{ ... } for each line we build the sed command and execute it. The output of sed becomes the output of awk for that specific line.
BEGIN { FS = "\\s*\\|\\s*" }
NR == 1 { next }
{
s = ""
for (i=1; i <= NF; i++)
s = s sprintf(";s/%s/%s/g", "<tag" i ">", $i)
system("sed '" s "' ion.inp")
}
$ cat table.dat
<tag1> | <tag2> | <tag3> | <tag4> | <tag5> | <tag6>
Hidrogen | H | 1.0 | ROHF | 0 | 2
Hidrogen cation | H | 1.0 | RHF | 1 | 1
$ cat ion.inp
$CONTRL SCFTYP=<tag4> ICHARG=<tag5> MULT=<tag6> ISPHER=1 NPRINT=-5 $END
$BASIS GBASIS=<tag> $END
$DATA
<tag1> energy
Dnh 2
<tag2> <tag3> .0 .0 .0
$END
$ awk -f script.awk table.dat
$CONTRL SCFTYP=ROHF ICHARG=0 MULT=2 ISPHER=1 NPRINT=-5 $END
$BASIS GBASIS=<tag> $END
$DATA
Hidrogen energy
Dnh 2
H 1.0 .0 .0 .0
$END
$CONTRL SCFTYP=RHF ICHARG=1 MULT=1 ISPHER=1 NPRINT=-5 $END
$BASIS GBASIS=<tag> $END
$DATA
Hidrogen cation energy
Dnh 2
H 1.0 .0 .0 .0
$END
Redirecting to files with command awk -f script2.awk table.dat. The script script2.awk is:
BEGIN { FS = "\\s*\\|\\s*" }
NR == 1 { next }
{
s = ""
for (i=1; i <= NF; i++)
s = s sprintf(";s/%s/%s/g", "<tag" i ">", $i)
system("sed '" s "' ion.inp > " sprintf("output%02d.txt", NR-1))
}
$ cat output01.txt
$CONTRL SCFTYP=ROHF ICHARG=0 MULT=2 ISPHER=1 NPRINT=-5 $END
$BASIS GBASIS=<tag> $END
$DATA
Hidrogen energy
Dnh 2
H 1.0 .0 .0 .0
$END
$ cat output02.txt
$CONTRL SCFTYP=RHF ICHARG=1 MULT=1 ISPHER=1 NPRINT=-5 $END
$BASIS GBASIS=<tag> $END
$DATA
Hidrogen cation energy
Dnh 2
H 1.0 .0 .0 .0
$END

awk - check if file contains certain string - if it does find and replace another one

So i have a file like this:
COD:'Anschlag 15'
LET: DimX(2240)
LET: DimZ(1193)
LET: DimS(1.25)
LET: Schenkel(96)
DIM: X DimX+0.5
Z DimZ+0.5
S DimS
STAINLESS
REF: X1 FOD-107.69
X2 FOD-107.69
Z1 FOD
Z2 FOD
N
ZPF 40
MCM: QSU 10 QSD 10
MNP_SPEED 20
BLHINH 50
ROT: S 4 ROTONBLH SPEED 20
BEN: L 15 AC -1
BEN: L Schenkel AC -1
ROT: S 2 ROTONBLH SPEED 20
BEN: L 15 AC -1
BEN: L Schenkel AC -1
ROT: S 1 ROTONBLH SPEED 20
BEN: L 15 AC -1
BEN: L Schenkel AC -1
ROT: S 3 ROTONBLH SPEED 20
BEN-: L 15 AC -1
BEN: L 107 AC -1
END: SPEED 40
I want to check if the file contains the String "STAINLESS"
if it does
search for all occurences of AC -1 and replace them with AC 3
if it doesn't contain STAINLESS
keep the file as it is
What i've tried is:
find C:/Users/user/test -type f -exec awk -i inplace -f C:/Users/user/test_skript/b.awk {} +
The file b.awk
$1 == "STAINLESS" { f = 1 }
if ( f == 1 )
{ gsub(/AC[[:blank:]]*-1/,"AC 3"); print }
else
{ print }
The gsub function itself works. But the STAINLESS check doesn't.
If STAINLESS always comes before AC -1 then following single pass awk should work:
awk '/STAINLESS/{f=1} f{gsub(/AC -1/, "AC 3")} 1' file
With your shown samples, could you please try following, written and tested in GNU awk. This should work irrespective of if STAINLESS word comes before AC -1 or after it.
awk '
FNR==NR{
if($0~/STAINLESS/){ found=1 }
next
}
found{
gsub(/AC -1/,"AC 3")
}
1
' Input_file Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition which will be TRUE when first time Input_file is being read.
if($0~/STAINLESS/){ found=1 } ##Checking condition if line contains STAINLESS then set found to 1 here.
next ##next will skip all further statements from here.
}
found{ ##Checking condition if found is SET then do following.
gsub(/AC -1/,"AC 3") ##Globally substituting AC -1 with AC 3 here.
}
1 ##Mentioning 1 will print line here.
' Input_file Input_file ##Mentioning Input_file names here.
NOTE: Also change if($0~/STAINLESS/){ found=1 } TO if($0~/STAINLESS/){ found=1; nextfile} in case you have GNU awk, to make it faster in running time wise.
The following is based on the OPs requirements:
I want to check if the file contains the String "STAINLESS"
if it does search for all occurences of AC -1 and replace them with AC 3
if it doesn't contain STAINLESS keep the file as it is
and as such:
Searches the whole file for STAINLESS before replacing AC -1 with AC 3 anywhere it occurs in the file - before, after or on the same line as STAINLESS.
Will keep the file as it is if STAINLESS doesn't exist in it, i.e. does not write to it at all and so won't change the timestamp, ownership, or permissions of it.
Since you're using this in the context of a find with inplace editing, you need something like this (uses GNU awk for -i inplace, nextfile and ENDFILE):
find ... -exec awk -i inplace '
BEGIN {
tgt = "STAINLESS"
ARGV[ARGC++] = ARGV[1]
inplace::enable = 0
gotTgt = 0
}
ARGIND % 2 {
if ( $1 == tgt ) {
gotTgt = 1
nextfile
}
next
}
ENDFILE {
inplace::enable = gotTgt
gotTgt = 0
}
inplace::enable {
gsub(/AC[[:blank:]]*-1/,"AC 3")
print
}
' {} \;
The \; instead of + at the end of the find command is important so awk just gets fed one file at a time to make it easiest to do two passes of each file, first to find STAINLESS and then to do the replacement if it was found on the first pass.
Note that we need to set the enable flag for the upcoming file in the ENDFILE section of the preceding file because by the time BEGNIFILE is executed for the upcoming file it's too late, the inplace editing has already been established for that file so if you do a print "foo" in a BEGINFILE awk knows where to direct it.
if the search string doesn't precede the replacement string, it's easier with grep/sed pair.
$ grep -q STAINLESS file && sed 's/AC -1/AC 3/g' file

Awk if else expression not printing correct results for mathematical operation

So I have an input file that looks like this:
atom Comp
C1 45.7006
H40 30.0407
N41 148.389
S44 502.263
F45 365.162
I also have some variables that I have called in from another file, which I know are defined correctly, as the correct values print when I call them using echo.
These values are
Hslope=-1.1120
Hint=32.4057
Cslope=-1.0822
Cint=196.4234
What I am trying to do is to for all lines with C in the first column, print (column 2 - Cint)/Cslope. The same for all lines with H in the first column with the appropriate variables and have all lines that don't have C or H print "NA".
The first line should be skipped.
Currently, my code reads
awk -v Hslope=$Hslope -v Hint=$Hint -v Cslope=$Cslope -v Cint=$Cint '{for(i=2; i<=NR; i++)
{
if($1 ~ /C/)
{ shift = (($2-Cint)/Cslope); print shift }
else if($1 ~ /H/)
{ shift = (($2-Hint)/Hslope); print shift }
else
{ print "NA" }
} }' avRNMR >> vgRNMR
Where avRNMR is the input file and vgRNMR is the output file, which is already created with the contents "shift" by another line.
I have also tried a version where print is just set to the mathematical expression instead using "shift" as a variable. Another attempt was putting $ in front of every variable. Neither of these have produced any different results.
The output I get is
shift
139.274
2.1268
2.1268
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
Which is not the correct answer, particularly considering that my input file only has the six lines shown above. Note that the number of lines with C, H, and other letters is variable.
What I should get is
shift
139.27
2.13
NA
NA
NA
EDIT
As suggested, exchanging "for(i=2; i<=NR; i++)" for FNR>1 gives the following output
shift
NA
C1 45.7006
139.274
H40 30.0407
2.1268
N41 148.389
NA
S44 502.263
NA
F45 365.162
NA
Which is almost the correct output for the math answers, but not in the desired format. That first NA also means that a line is getting read to print that, which, if it is truly skipping the first line, shouldn't happen.
Remove the for loop on i=2. Add pattern FNR>1 before the action. Anchor the two patterns to the beginning of the field:
awk -v Hslope=$Hslope -v Hint=$Hint -v Cslope=$Cslope -v Cint=$Cint '
FNR > 1 { # skip first record
if($1 ~ /^C/) print (($2-Cint)/Cslope)
else if($1 ~ /^H/) print (($2-Hint)/Hslope)
else print "NA"
}' avRNMR >> vgRNMR
Warning: I didn't test that code.
EDIT: I have now tested the code:
$ cat avRNMR
atom Comp
C1 45.7006
H40 30.0407
N41 148.389
S44 502.263
F45 365.162
$ awk -v Hslope=-1.1120 -v Hint=32.4057 -v Cslope=-1.0822 -v Cint=196.4234 '
> FNR > 1 { # skip first record
> if($1 ~ /^C/) print (($2-Cint)/Cslope)
> else if($1 ~ /^H/) print (($2-Hint)/Hslope)
> else print "NA"
> }' avRNMR
139.274
2.1268
NA
NA
NA
That looks to me like what you want. Please tell me what you are seeing.
Try this:
$ awk 'NR==FNR{v[$1]=$2} NR<=FNR||FNR==1{next} /^[CH]/{c=substr($0, 0, 1); print ($2-v[c"int"])/v[c"slope"];next} {print "NA"}' FS="=" vars FS=" " file
139.274
2.1268
NA
NA
NA
The first pattern/action pair reads variables from file vars into an array v. The second skips further processing and also skips the first line for the second file file. The third will match lines with C and H and do the calculations.
You'll need to change the file names and redirect the output to your outfile.
$ cat tst.awk
{ shift = "NA" }
/^C/ { shift = ($2 - Cint) / Cslope }
/^H/ { shift = ($2 - Hint) / Hslope }
NR>1 { print shift }
$ awk -v Hslope="$Hslope" -v Hint="$Hint" -v Cslope="$Cslope" -v Cint="$Cint" -f tst.awk file
139.274
2.1268
NA
NA
NA
or if this is what you really want:
$ cat tst.awk
{ shift = (NR==1 ? "shift" : "NA") }
/^C/ { shift = ($2 - Cint) / Cslope }
/^H/ { shift = ($2 - Hint) / Hslope }
{ print shift }
$ awk -v Hslope="$Hslope" -v Hint="$Hint" -v Cslope="$Cslope" -v Cint="$Cint" -f tst.awk file
shift
139.274
2.1268
NA
NA
NA

Awk - Substring comparison

Working native bash code :
while read line
do
a=${line:112:7}
b=${line:123:7}
if [[ $a != "0000000" || $b != "0000000" ]]
then
echo "$line" >> FILE_OT_YHAV
else
echo "$line" >> FILE_OT_NHAV
fi
done <$FILE_IN
I have the following file (its a dummy), the substrings being checked are both on the 4th field, so nm the exact numbers.
AAAAAAAAAAAAAA XXXXXX BB CCCCCCC 12312312443430000000
BBBBBBB AXXXXXX CC DDDDDDD 10101010000000000000
CCCCCCCCCC C C QWEQWEE DDD AAAAAAA A12312312312312310000
I m trying to write an awk script that compares two specific substrings, if either one is not 000000 it outputs the line into File A, if both of them are 000000 it outputs the line into File B, this is the code i have so far :
# Before first line.
BEGIN {
print "Awk Started"
FILE_OT_YHAV="FILE_OT_YHAV.test"
FILE_OT_NHAV="FILE_OT_NHAV.test"
FS=""
}
# For each line of input.
{
fline=$0
# print "length = #" length($0) "#"
print "length = #" length(fline) "#"
print "##" substr($0,112,7) "##" substr($0,123,7) "##"
if ( (substr($0,112,7) != "0000000") || (substr($0,123,7) != "0000000") )
print $0 > FILE_OT_YHAV;
else
print $0 > FILE_OT_NHAV;
}
# After last line.
END {
print "Awk Ended"
}
The problem is that when i run it, it :
a) Treats every line as having a different length
b) Therefore the substrings are applied to different parts of it (that is why i added the print length stuff before the if, to check on it.
This is a sample output of the line length awk reads and the different substrings :
Awk Started
length = #130#
## ## ##
length = #136#
##0000000##22016 ##
length = #133#
##0000001##16 ##
length = #129#
##0010220## ##
length = #138#
##0000000##1022016##
length = #136#
##0000000##22016 ##
length = #134#
##0000000##016 ##
length = #137#
##0000000##022016 ##
Is there a reason why awk treats lines of the same length as having a different length? Does it have something to do with the spacing of the input file?
Thanks in advance for any help.
After the comments about cleaning the file up with sed, i got this output (and yes now the lines have a different size) :
1 0M-DM-EM-G M-A.M-E. #DEH M-SM-TM-OM-IM-WM-EM-IM-A M-DM-V/M-DM-T/M-TM-AM-P 01022016 $
2 110000080103M-CM-EM-QM-OM-MM-TM-A M-A. 6M-AM-HM-GM-MM-A 1055801001102 0000120000012001001142 19500000120 0100M-D000000000000000000000001022016 $
3 110000106302M-TM-AM-QM-EM-KM-KM-A 5M-AM-AM-HM-GM-MM-A 1043801001101 0000100000010001001361 19500000100M-IM-SM-O0100M-D000000000000000000000001022016 $
4 110000178902M-JM-AM-QM-AM-CM-IM-AM-MM-MM-G M-KM-EM-KM-AM-S 71M-AM-HM-GM-MM-A 1136101001101 0000130000013001006061 19500000130 0100M-D000000000000000000000001022016 $