Using awk to color the output in bash - awk

I have two files.
First one is csv while other one is plain text file.
I want to print all the lines of file2 which contains column 1 of first file with font color column2 and background color column3.
for example:
f1 contains
Basic Engineering,BLACK,WHITE
Science,RED,BLUE
f2 contains with field width of 20 each:
foo abc Science AA
bar cde Basic Engineering AP
baz efgh Science AB
expected output:
foo abc Science AA (Red font, Blue background)
bar cde Basic Engineering AP (Black font, White background)
baz efgh Science AB (Red font, Blue background)
I have already defined color in a seperate file defineColors.sh as:
BLACK_FONT=`tput setaf 0`
RED_FONT=`tput setaf 1`
WHITE_BACKGROUND=`tput setab 7`
BLUE_BACKGROUND=`tput setab 4`
RESET_ALL=`tput sgr0`
My try :
awk -F, '{sed "/$1/p" f2 | sed -e 's/^\(.*\)$/'"$2_FONT"''"$3_BACKGROUND"'\1/' }' f1

$ cat tst.awk
BEGIN {
split("BLACK RED GREEN YELLOW BLUE MAGENTA CYAN WHITE",tputColors)
for (i in tputColors) {
colorName = tputColors[i]
colorNr = i-1
cmd = "tput setaf " colorNr
fgEscSeq[colorName] = ( (cmd | getline escSeq) > 0 ? escSeq : "<" colorName ">" )
close(cmd)
cmd = "tput setab " colorNr
bgEscSeq[colorName] = ( (cmd | getline escSeq) > 0 ? escSeq : "<" colorName ">" )
close(cmd)
}
cmd = "tput sgr0"
colorOff = ( (cmd | getline escSeq) > 0 ? escSeq : "<sgr0>" )
close(cmd)
FS = ","
}
NR == FNR {
key = $1
fgColor[key] = fgEscSeq[$2]
bgColor[key] = bgEscSeq[$3]
next
}
{
# change this to substr($0,41,20) for your real 20-char fields data
key = substr($0,15,20)
gsub(/^[[:space:]]+|[[:space:]]+$/,"",key)
print bgColor[key] fgColor[key] $0 colorOff
}
Using the pipe to cat -v so you can see color code escape sequences are being output:
$ awk -f tst.awk f1 f2 | cat -v
^[[44m^[[31mfoo abc Science AA^[(B^[[m
^[[47m^[[30mbar cde Basic Engineering AP^[(B^[[m
^[[44m^[[31mbaz efgh Science AB^[(B^[[m
I see you updated your question to say I have already defined color in a seperate file defineColors.sh as: and showed a shell script - just don't use that, it's not needed.

Related

awk - check if file contains certain string - if it does find and replace another one

So i have a file like this:
COD:'Anschlag 15'
LET: DimX(2240)
LET: DimZ(1193)
LET: DimS(1.25)
LET: Schenkel(96)
DIM: X DimX+0.5
Z DimZ+0.5
S DimS
STAINLESS
REF: X1 FOD-107.69
X2 FOD-107.69
Z1 FOD
Z2 FOD
N
ZPF 40
MCM: QSU 10 QSD 10
MNP_SPEED 20
BLHINH 50
ROT: S 4 ROTONBLH SPEED 20
BEN: L 15 AC -1
BEN: L Schenkel AC -1
ROT: S 2 ROTONBLH SPEED 20
BEN: L 15 AC -1
BEN: L Schenkel AC -1
ROT: S 1 ROTONBLH SPEED 20
BEN: L 15 AC -1
BEN: L Schenkel AC -1
ROT: S 3 ROTONBLH SPEED 20
BEN-: L 15 AC -1
BEN: L 107 AC -1
END: SPEED 40
I want to check if the file contains the String "STAINLESS"
if it does
search for all occurences of AC -1 and replace them with AC 3
if it doesn't contain STAINLESS
keep the file as it is
What i've tried is:
find C:/Users/user/test -type f -exec awk -i inplace -f C:/Users/user/test_skript/b.awk {} +
The file b.awk
$1 == "STAINLESS" { f = 1 }
if ( f == 1 )
{ gsub(/AC[[:blank:]]*-1/,"AC 3"); print }
else
{ print }
The gsub function itself works. But the STAINLESS check doesn't.
If STAINLESS always comes before AC -1 then following single pass awk should work:
awk '/STAINLESS/{f=1} f{gsub(/AC -1/, "AC 3")} 1' file
With your shown samples, could you please try following, written and tested in GNU awk. This should work irrespective of if STAINLESS word comes before AC -1 or after it.
awk '
FNR==NR{
if($0~/STAINLESS/){ found=1 }
next
}
found{
gsub(/AC -1/,"AC 3")
}
1
' Input_file Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition which will be TRUE when first time Input_file is being read.
if($0~/STAINLESS/){ found=1 } ##Checking condition if line contains STAINLESS then set found to 1 here.
next ##next will skip all further statements from here.
}
found{ ##Checking condition if found is SET then do following.
gsub(/AC -1/,"AC 3") ##Globally substituting AC -1 with AC 3 here.
}
1 ##Mentioning 1 will print line here.
' Input_file Input_file ##Mentioning Input_file names here.
NOTE: Also change if($0~/STAINLESS/){ found=1 } TO if($0~/STAINLESS/){ found=1; nextfile} in case you have GNU awk, to make it faster in running time wise.
The following is based on the OPs requirements:
I want to check if the file contains the String "STAINLESS"
if it does search for all occurences of AC -1 and replace them with AC 3
if it doesn't contain STAINLESS keep the file as it is
and as such:
Searches the whole file for STAINLESS before replacing AC -1 with AC 3 anywhere it occurs in the file - before, after or on the same line as STAINLESS.
Will keep the file as it is if STAINLESS doesn't exist in it, i.e. does not write to it at all and so won't change the timestamp, ownership, or permissions of it.
Since you're using this in the context of a find with inplace editing, you need something like this (uses GNU awk for -i inplace, nextfile and ENDFILE):
find ... -exec awk -i inplace '
BEGIN {
tgt = "STAINLESS"
ARGV[ARGC++] = ARGV[1]
inplace::enable = 0
gotTgt = 0
}
ARGIND % 2 {
if ( $1 == tgt ) {
gotTgt = 1
nextfile
}
next
}
ENDFILE {
inplace::enable = gotTgt
gotTgt = 0
}
inplace::enable {
gsub(/AC[[:blank:]]*-1/,"AC 3")
print
}
' {} \;
The \; instead of + at the end of the find command is important so awk just gets fed one file at a time to make it easiest to do two passes of each file, first to find STAINLESS and then to do the replacement if it was found on the first pass.
Note that we need to set the enable flag for the upcoming file in the ENDFILE section of the preceding file because by the time BEGNIFILE is executed for the upcoming file it's too late, the inplace editing has already been established for that file so if you do a print "foo" in a BEGINFILE awk knows where to direct it.
if the search string doesn't precede the replacement string, it's easier with grep/sed pair.
$ grep -q STAINLESS file && sed 's/AC -1/AC 3/g' file

Awk if else expression not printing correct results for mathematical operation

So I have an input file that looks like this:
atom Comp
C1 45.7006
H40 30.0407
N41 148.389
S44 502.263
F45 365.162
I also have some variables that I have called in from another file, which I know are defined correctly, as the correct values print when I call them using echo.
These values are
Hslope=-1.1120
Hint=32.4057
Cslope=-1.0822
Cint=196.4234
What I am trying to do is to for all lines with C in the first column, print (column 2 - Cint)/Cslope. The same for all lines with H in the first column with the appropriate variables and have all lines that don't have C or H print "NA".
The first line should be skipped.
Currently, my code reads
awk -v Hslope=$Hslope -v Hint=$Hint -v Cslope=$Cslope -v Cint=$Cint '{for(i=2; i<=NR; i++)
{
if($1 ~ /C/)
{ shift = (($2-Cint)/Cslope); print shift }
else if($1 ~ /H/)
{ shift = (($2-Hint)/Hslope); print shift }
else
{ print "NA" }
} }' avRNMR >> vgRNMR
Where avRNMR is the input file and vgRNMR is the output file, which is already created with the contents "shift" by another line.
I have also tried a version where print is just set to the mathematical expression instead using "shift" as a variable. Another attempt was putting $ in front of every variable. Neither of these have produced any different results.
The output I get is
shift
139.274
2.1268
2.1268
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
Which is not the correct answer, particularly considering that my input file only has the six lines shown above. Note that the number of lines with C, H, and other letters is variable.
What I should get is
shift
139.27
2.13
NA
NA
NA
EDIT
As suggested, exchanging "for(i=2; i<=NR; i++)" for FNR>1 gives the following output
shift
NA
C1 45.7006
139.274
H40 30.0407
2.1268
N41 148.389
NA
S44 502.263
NA
F45 365.162
NA
Which is almost the correct output for the math answers, but not in the desired format. That first NA also means that a line is getting read to print that, which, if it is truly skipping the first line, shouldn't happen.
Remove the for loop on i=2. Add pattern FNR>1 before the action. Anchor the two patterns to the beginning of the field:
awk -v Hslope=$Hslope -v Hint=$Hint -v Cslope=$Cslope -v Cint=$Cint '
FNR > 1 { # skip first record
if($1 ~ /^C/) print (($2-Cint)/Cslope)
else if($1 ~ /^H/) print (($2-Hint)/Hslope)
else print "NA"
}' avRNMR >> vgRNMR
Warning: I didn't test that code.
EDIT: I have now tested the code:
$ cat avRNMR
atom Comp
C1 45.7006
H40 30.0407
N41 148.389
S44 502.263
F45 365.162
$ awk -v Hslope=-1.1120 -v Hint=32.4057 -v Cslope=-1.0822 -v Cint=196.4234 '
> FNR > 1 { # skip first record
> if($1 ~ /^C/) print (($2-Cint)/Cslope)
> else if($1 ~ /^H/) print (($2-Hint)/Hslope)
> else print "NA"
> }' avRNMR
139.274
2.1268
NA
NA
NA
That looks to me like what you want. Please tell me what you are seeing.
Try this:
$ awk 'NR==FNR{v[$1]=$2} NR<=FNR||FNR==1{next} /^[CH]/{c=substr($0, 0, 1); print ($2-v[c"int"])/v[c"slope"];next} {print "NA"}' FS="=" vars FS=" " file
139.274
2.1268
NA
NA
NA
The first pattern/action pair reads variables from file vars into an array v. The second skips further processing and also skips the first line for the second file file. The third will match lines with C and H and do the calculations.
You'll need to change the file names and redirect the output to your outfile.
$ cat tst.awk
{ shift = "NA" }
/^C/ { shift = ($2 - Cint) / Cslope }
/^H/ { shift = ($2 - Hint) / Hslope }
NR>1 { print shift }
$ awk -v Hslope="$Hslope" -v Hint="$Hint" -v Cslope="$Cslope" -v Cint="$Cint" -f tst.awk file
139.274
2.1268
NA
NA
NA
or if this is what you really want:
$ cat tst.awk
{ shift = (NR==1 ? "shift" : "NA") }
/^C/ { shift = ($2 - Cint) / Cslope }
/^H/ { shift = ($2 - Hint) / Hslope }
{ print shift }
$ awk -v Hslope="$Hslope" -v Hint="$Hint" -v Cslope="$Cslope" -v Cint="$Cint" -f tst.awk file
shift
139.274
2.1268
NA
NA
NA

Awk - Substring comparison

Working native bash code :
while read line
do
a=${line:112:7}
b=${line:123:7}
if [[ $a != "0000000" || $b != "0000000" ]]
then
echo "$line" >> FILE_OT_YHAV
else
echo "$line" >> FILE_OT_NHAV
fi
done <$FILE_IN
I have the following file (its a dummy), the substrings being checked are both on the 4th field, so nm the exact numbers.
AAAAAAAAAAAAAA XXXXXX BB CCCCCCC 12312312443430000000
BBBBBBB AXXXXXX CC DDDDDDD 10101010000000000000
CCCCCCCCCC C C QWEQWEE DDD AAAAAAA A12312312312312310000
I m trying to write an awk script that compares two specific substrings, if either one is not 000000 it outputs the line into File A, if both of them are 000000 it outputs the line into File B, this is the code i have so far :
# Before first line.
BEGIN {
print "Awk Started"
FILE_OT_YHAV="FILE_OT_YHAV.test"
FILE_OT_NHAV="FILE_OT_NHAV.test"
FS=""
}
# For each line of input.
{
fline=$0
# print "length = #" length($0) "#"
print "length = #" length(fline) "#"
print "##" substr($0,112,7) "##" substr($0,123,7) "##"
if ( (substr($0,112,7) != "0000000") || (substr($0,123,7) != "0000000") )
print $0 > FILE_OT_YHAV;
else
print $0 > FILE_OT_NHAV;
}
# After last line.
END {
print "Awk Ended"
}
The problem is that when i run it, it :
a) Treats every line as having a different length
b) Therefore the substrings are applied to different parts of it (that is why i added the print length stuff before the if, to check on it.
This is a sample output of the line length awk reads and the different substrings :
Awk Started
length = #130#
## ## ##
length = #136#
##0000000##22016 ##
length = #133#
##0000001##16 ##
length = #129#
##0010220## ##
length = #138#
##0000000##1022016##
length = #136#
##0000000##22016 ##
length = #134#
##0000000##016 ##
length = #137#
##0000000##022016 ##
Is there a reason why awk treats lines of the same length as having a different length? Does it have something to do with the spacing of the input file?
Thanks in advance for any help.
After the comments about cleaning the file up with sed, i got this output (and yes now the lines have a different size) :
1 0M-DM-EM-G M-A.M-E. #DEH M-SM-TM-OM-IM-WM-EM-IM-A M-DM-V/M-DM-T/M-TM-AM-P 01022016 $
2 110000080103M-CM-EM-QM-OM-MM-TM-A M-A. 6M-AM-HM-GM-MM-A 1055801001102 0000120000012001001142 19500000120 0100M-D000000000000000000000001022016 $
3 110000106302M-TM-AM-QM-EM-KM-KM-A 5M-AM-AM-HM-GM-MM-A 1043801001101 0000100000010001001361 19500000100M-IM-SM-O0100M-D000000000000000000000001022016 $
4 110000178902M-JM-AM-QM-AM-CM-IM-AM-MM-MM-G M-KM-EM-KM-AM-S 71M-AM-HM-GM-MM-A 1136101001101 0000130000013001006061 19500000130 0100M-D000000000000000000000001022016 $

awk to count occurrences and output count and output full record of last occurrence counted

I would like to count the occurrences in a single column and output counts of those occurrences along with the full record of the last occurrence counted.
This works well for counting, but would also like to have more description from the output.
awk '{count[$1]++}END{for(j in count) print j,""count[j]""}'
If input was:
A horse pen
A dog apple
B cat house
C mouse grape
C mouse shoe
C elephant pole
Output would be:
A 2 dog apple
B 1 cat house
C 3 elephant pole
If order is critical for you then try the following with GNU awk:
awk '{
ary[$1]++; line[$1] = $2 FS $3
}
END {
n = asorti(ary, sorted_ary)
for(i = 1; i <= n; i++) {
print sorted_ary[i], ary[sorted_ary[i]], line[sorted_ary[i]]
}
}' file
Test:
$ cat file
A horse pen
A dog apple
B cat house
C mouse grape
C mouse shoe
C elephant pole
$ awk '{
ary[$1]++; line[$1] = $2 FS $3
}
END {
n = asorti(ary, sorted_ary)
for(i = 1; i <= n; i++) {
print sorted_ary[i], ary[sorted_ary[i]], line[sorted_ary[i]]
}
}' file
A 2 dog apple
B 1 cat house
C 3 elephant pole
This should work, but beware that order retention is not guaranteed
awk '{k=$1; a[k]++; $1="";
sub(/^ +/, "", $0); b[k]=$0};
END{for (k in a) print k, a[k], b[k]}' file.txt
A 2 dog apple
B 1 cat house
C 3 elephant pole

using awk or sed extract first character of each column and store it in a separate file

I have a file like below
AT AT AG AG
GC GC GG GC
i want to extract first and last character of every col n store them in two different files
File1:
A A A A
G G G G
File2:
T T G G
C C G C
My input file is very large. Is it a way that i can do it in awk or sed
With GNU awk for gensub():
gawk '{
print gensub(/.( |$)/,"","g") > "file1"
print gensub(/(^| )./,"","g") > "file2"
}' file
You can do similar in any awk with gsub() and a couple of variables.
you can try this :
write in test.awk
#!/usr/bin/awk -f
BEGIN {
# FS = "[\s]+"
outfile_head="file1"
outfile_tail="file2"
}
{
num = NF
for(i = 1; i <= NF; i++) {
printf "%s ", substr($i, 0, 1) >> outfile_head
printf "%s ", substr($i, length($i), 1) >> outfile_tail
}
}
then you run this:
./test.awk file
It's easy to do in two passes:
sed 's/\([^ ]\)[^ ]/\1/g' file > file1
sed 's/[^ ]\([^ ]\)/\1/g' file > file2
Doing it in one pass is a challenge...
Edit 1: Modified for your multiple line edit.
You could write a perl script and pass in the file names if you plan to edit it and share it. This loops through the file only once and does not require storing the file in memory.
File "seq.pl":
#!/usr/bin/perl
open(F1,">>$ARGV[1]");
open(F2,">>$ARGV[2]");
open(DATA,"$ARGV[0]");
while($line=<DATA>) {
$line =~ s/(\r|\n)+//g;
#pairs = split(/\s/, $line);
for $pair(#pairs) {
#bases = split(//,$pair);
print F1 $bases[0]." ";
print F2 $bases[length($bases)-1]." ";
}
print F1 "\n";
print F2 "\n";
}
close(F1);
close(F2);
close(DATA);
Execute it like so:
perl seq.pl full.seq f1.seq f2.seq
File "full.seq":
AT AT AG AG
GC GC GG GC
AT AT GC GC
File "f1.seq":
A A A A
G G G G
A A G G
File "f2.seq":
T T G G
C C G C
T T C C