awk for multiple patterns - awk

My file looks like:
L 0 256 * * * * *
H 0 307 100.0 + 0 0
S 30 351 * * * * *
D 8 27 * * * * 99.3
C 11 1 * * * * *
for my script I would like to start by awk print $0 for certain lines using $1
Such as
awk '{if ($1!="C") {print $0} else if ($1!="D") {print $0}}'
But, there has to be a way to combine "C" and "D" into one IF statement... right?
For example if I want to search for == L,H,S ie... NOT C or D how would I right this?

Your present condition is not correct as both $1!="C" and $1!="D" can't be false at the same time. Hence, it will always print the whole file.
This will do as you described:
awk '{if ($1!="C" && $1!="D") {print $0}}' file

Using awk, you can provide rules for specific patterns with the syntax
awk 'pattern {action}' file
see the awk manual page for the definition of a pattern. In your case, you could use a regular expression as a pattern with the syntax
awk'/regular expression/ {action}' file
and a basic regular expression which would suit your needs could be
awk '/^[^CD]/ {print $0}' file
which you can actually shorten into
awk '/^[^CD]/' file
since {print $0} is the default action, as suggested in the comments.

awk '$1 ~ /[CD]/' file
awk '$1 ~ /[LHS]/' file
awk '$1 ~ /[^LHS]/' file
awk '$1 !~ /[LHS]/' file

In somewhat the same order, what would be a simple awk-rule to keep only the lines that don't have sth in one field and simultaneously sth else in another field? Possibly with a pattern.
e.g.:
A=1 B=2 C=3
B=2 E=5 C=3
A=1 D=4 C=3
A line like this
awk '$1$3 != "A=1""C=3" {print $0}' IN > OUT
seems to do the trick,
and the moment I start to use the &&-operator (in a bash script) awk tends to delete all the lines that contain A=1 OR C=3. But the goal is to keep the middle line.
I would like to tell awk to delete line one and three when field one contains the letter, the sub-string 'A' and field three the letter 'C' simultaneously.
But, awk doesn't want to listen to me or I'm seriously doing sth wrongly (as I'm still in the experimental awk-phase)

Related

awk match pattern and convert number to different unit

I have a csv file that contains this kind of values:
vm47,8,32794384Ki,16257320Ki
vm47,8,30223304245,15223080Ki
vm48,8,32794384Ki,16257312Ki
vm48,8,30223304245,15223072Ki
vm49,8,32794384Ki,16257320Ki
vm49,8,30223304245,15223080Ki
The columns 3 and 4 are memoy values expressed either in bytes, or kibibytes. The problem is that the "Ki" string appears randomly through the CSV file, particularly in column3, it's inconsistent.
So to make the file consistent, I need to convert everything in bytes. So basically, any value matching a trailing "Ki" needs to have its numeric value multiplied by 1024, and then replace the corresponding XXXXXKi match.
The reason why I want to do it with awk is because I am already using awk to generate that csv format, but I am happy to do it with sed too.
This is my code so far but obviously it's wrong as it's multiplying any value in columns 3 and 4 by 1024 even though it does not match "Ki". I am not sure at this point how to ask awk "if you see Ki at the end, then multiply by 1024".
kubectl describe node --context=$context| sed -E '/Name:|cpu:|ephemeral-storage:|memory:/!d' | sed 's/\s//g' | awk '
BEGIN {FS = ":"; OFS = ","}
{record[$1] = $2}
$1 == "memory" {print record["Name"], record["cpu"], record["ephemeral-storage"], record["memory"]}
' | awk -F, '{print $1,$2,$3,$3*1024,$4,$4*1024}' >> describe_nodes.csv
Edit: I made a mistake, you need to multiply by 128 to convert KiB in bytes, not 1024.
"if you see Ki at the end, then multiply by 1024
You may use:
awk 'BEGIN{FS=OFS=","} $3 ~ /Ki$/ {$3 *= 1024} $4 ~ /Ki$/ {$4 *= 1024} 1' file
vm47,8,33581449216,16647495680
vm47,8,30223304245,15588433920
vm48,8,33581449216,16647487488
vm48,8,30223304245,15588425728
vm49,8,33581449216,16647495680
vm49,8,30223304245,15588433920
Or a bit shorter:
awk 'BEGIN{FS=OFS=","} {
for (i=3; i<=4; ++i) $i ~ /Ki$/ && $i *= 1024} 1' file
With your shown samples/attempts, please try following awk code. Simple explanation would be, traverse through fields from 3rd field onwards and look for if a value has Ki(ignore cased manner) then multiply it with 128, print all edited/non-edited lines at last.
awk 'BEGIN{FS=OFS=","} {for(i=3;i<=NF;i++){if($i~/[Kk][Ii]$/){$i *= 128}}} 1' Input_file
You could try numfmt:
$ numfmt -d, --field 3,4 --from=auto --to=none <<EOF
vm47,8,32794384Ki,16257320Ki
vm47,8,30223304245,15223080Ki
EOF
vm47,8,33581449216,16647495680
vm47,8,30223304245,15588433920

awk to output two files based on match or no match

In the below awk I am trying to print out the lines that match have the string FP or RFP $2 in the tab-delimited input. If a match is found in $2 then in result only the lines of file that do not have those keywords in them are printed. At the same time another file removed is printed that has those lines that did have those keywords in them. The awk has a syntax error in it when I try to print two files, if I only print one the awk runs. Thank you :).
input
12 aaa
123 FP bbb
11 ccc
10 RFP ddd
result
12 aaa
11 ccc
removed
123 FP bbb
10 RFP ddd
awk
awk -F'\t' 'BEGIN{d["FP"];d["RFP"]}!($2 in d) {print > "removed"}; else {print > "result"}' file
awk: cmd. line:1: BEGIN{d["FP"];d["RFP"]}!($2 in d) {print > "removed"}; else {print > "result"}
awk: cmd. line:1: ^ syntax error
else goes with if. Your script didn't have an if, just an else, hence the syntax error. All you need is:
awk -F'\t' '{print > ($2 ~ /^R?FP$/ ? "removed" : "result")}' file
or if you prefer the array approach you are trying to use:
awk -F'\t' '
BEGIN{ split("FP RFP",t,/ /); for (i in t) d[t[i]] }
{ print > ($2 in d ? "removed" : "result") }
' file
Read the book Effective Awk Programming, 4th Edition, by Arnold Robbins to learn awk syntax and semantics.
Btw when writing if/else code like you show in your question:
if ( !($2 in d) ) removed; else result
THINK about the fact you're using negative (!) logic which makes your code harder to understand right away AND opens you up to potential double negatives. Always try to express every condition in a positive way, in this case that'd be:
if ($2 in d) result; else removed

take out specific columns from mulitple files

I have multiple files that look like the one below. They are tab-separated. For all the files I would like to take out column 1 and the column that start with XF:Z:. This will give me output 1
The files names are htseqoutput*.sam.sam where * varies. I am not sure about the awk function use, and if the for-loop is correct.
for f in htseqoutput*.sam.sam
do
awk ????? "$f" > “out${f#htseqoutput}”
done
input example
AACAGATGATGAACTTATTGACGGGCGGACAGGAACTGTGTGCTGATTGTC_11 16 chr22 39715068 24 51M * 0 0 GACAATCAGCACACAGTTCCTGTCCGCCCGTCAATAAGTTCATCATCTGTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-12 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:18T31G0 YT:Z:UU XF:Z:SNORD43
GTTTCCTTAGTGTAGCGGTTATCACATTCGCCT_0 16 chr19 4724687 40 33M * 0 0 AGGCGAATGTGATAACCGCTACACTAAGGAAAC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:26C6 YT:Z:UU XF:Z:tRNA
TCGACTCCCGGTGTGGGAACC_0 16 chr13 45492060 23 21M * 0 0 GGTTCCCACACCGGGAGTCGA IIIIIIIIIIIIIIIIIIIII AS:i:-6 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0C20 YT:Z:UU XF:Z:tRNA
output 1:
AACAGATGATGAACTTATTGACGGGCGGACAGGAACTGTGTGCTGATTGTC_11 SNORD43
GTTTCCTTAGTGTAGCGGTTATCACATTCGCCT_0 tRNA
TCGACTCCCGGTGTGGGAACC_0 tRNA
Seems like you could just use sed for this:
sed -r 's/^([ACGT0-9_]+).*XF:Z:([[:alnum:]]+).*/\1\t\2/' file
This captures the part at the start of the line and the alphanumeric part following XF:Z: and outputs them, separated by a tab character. One potential advantage of this approach is that it will work independently of the position of the XF:Z: string.
Your loop looks OK (you can use this sed command in place of the awk part) but be careful with your quotes. " should be used, not “/”.
Alternatively, if you prefer awk (and assuming that the bit you're interested in is always part of the last field), you can use a custom field separator:
awk -F'[[:space:]](XF:Z:)?' -v OFS='\t' '{print $1, $NF}' file
This optionally adds the XF:Z: part to the field separator, so that it is removed from the start of the last field.
You can try, if column with "XF:Z:" is always at the end
awk 'BEGIN{OFS="\t"}{n=split($NF,a,":"); print $1, a[n]}' file.sam
you get,
AACAGATGATGAACTTATTGACGGGCGGACAGGAACTGTGTGCTGATTGTC_11 SNORD43
GTTTCCTTAGTGTAGCGGTTATCACATTCGCCT_0 tRNA
TCGACTCCCGGTGTGGGAACC_0 tRNA
or, if this column is a variable position for each file
awk 'BEGIN{OFS="\t"}
FNR==1{
for(i=1;i<=NF;i++){
if($i ~ /^XF:Z:/) break
}
}
{n=split($i,a,":"); print $1, a[n]}' file.sam

awk all lines in the input file whose last field contains at least one uppercase

I'm having some real time with a home assignment here. As a due diligence I was browsing web for the last 2.30 hours and reading awk tutorial to find the solution, hopeless.
Here is the line that I came up with:
awk '/[A-Z]/ {print $NF}' < tweedle.txt (Tweedle.txt is a poem)
current output:
Tweedledee
Tweedledee
battle;
Tweedledee
rattle.
crow,
tar-barrel;
so,
quarrel.
expected output -last fields that start with upper case
Tweedledee
Tweedledee
Tweedledee
just prints the last fields with both upper and lower case.
Need your expertise guys. Thanks in advance
Like this:
awk '$NF ~ /^[A-Z]/{print $NF}' tweedle.txt
Here you go:
awk '$NF ~ /[A-Z]/' < tweedle.txt
This reads: accept all lines whose NFth field matches the regex /[A-Z]/. Awk's default action is to just print the line, and that is what I am assuming you want to do.
And if you want to print just the last field (your question does not make this very clear) of all lines whose last field contains at least one uppercase,
awk '$NF ~ /[A-Z]/ {print $NF}' < tweedle.txt
By the way, here's how I tested this:
faiz#strange-love:/tmp$ cat tweedle.txt
a b aBoo
c D
a x y
G
j h g
faiz#strange-love:/tmp$ awk '$NF ~ /[A-Z]/ {print $NF}' tweedle.txt
aBoo
D
G
faiz#strange-love:/tmp$ awk '$NF ~ /[A-Z]/' tweedle.txt
a b aBoo
c D
G
Providing something like this would really give us a far better understanding of your problem.
without seeing your input file there are many possible solutions:
awk '$NF~/^[A-Z]/ && $0=$NF' file
or (from your "current output", this line works too):
awk '/[a-z]$/' file

awk and log2 divisions

I have a tab delimited file that looks something like this:
foo 0 4
boo 3 2
blah 4 0
flah 1 1
I am trying to calculate log2 for between the two columns for each row. my problem is with the division by zero
What I have tried is this:
cat file.txt | awk -v OFS='\t' '{print $1, log($3/$2)log(2)}'
when there is a zero as the denominator, the awk will crash. What I would want to do is some sort of conditional statement that would print an "inf" as the result when the denominator is equal to 0.
I am really not sure how to go about this?
Any help would be appreciated
Thanks
You can implement that as follows (with a few additional tweaks):
awk 'BEGIN{OFS="\t"} {if ($2==0) {print $1, "inf"} else {print $1, log($3/$2)log(2)}} file.txt
Explanation:
if ($2==0) {print $1, "inf"} else {...} - First check to see if the 2nd field ($2) is zero. If so, print $1 and inf and move on to the next line; otherwise proceed as usual.
BEGIN{OFS="\t"} - Set OFS inside the awk script; mostly a preference thing.
... file.txt - awk can read from files when you specify it as an argument; this saves the use of a cat process. (See UUCA)
awk -F'\t' '{print $1,($2 ? log($3/$2)log(2) : "inf")}' file.txt