awk conditional statement based on a value between colon - awk
I was just introduced to awk and I'm trying to retrieve rows from my file based on the value on column 10.
I need to filter the data based on the value of the third value if ":" was used as a separator in column 10 (last column).
Here is an example data in column 10. 0/1:1,9:10:15:337,0,15.
I was able to extract the third value using this command awk '{print $10}' file.txt | awk -F ":" '/1/ {print $3}'
This returns the value 10 but how can I return other rows (not just the value in column 10) if this third value is less than or greater than a specific number?
I tried this awk '{if($10 -F ":" "/1/ ($3<10))" print $0;}' file.txt but it returns a syntax error.
Thanks!
Your code:
awk '{print $10}' file.txt | awk -F ":" '/1/ {print $3}'
should be just 1 awk script:
awk '$10 ~ /1/ { split($10,f,/:/); print f[3] }' file.txt
but I'm not sure that code is doing what you think it does. If you want to print the 3rd value of all $10s that contain :s, as it sounds like from your text, that'd be:
awk 'split($10,f,/:/) > 1 { print f[3] }' file.txt
and to print the rows where that value is less than 7 would be:
awk '(split($10,f,/:/) > 1) && (f[3] < 7)' file.txt
Related
selecting columns in awk discarding corresponding header
How to properly select columns in awk after some processing. My file here: cat foo A;B;C 9;6;7 8;5;4 1;2;3 I want to add a first column with line numbers and then extract some columns of the result. For the example let's get the new first (line numbers) and third columns. This way: awk -F';' 'FNR==1{print "linenumber;"$0;next} {print FNR-1,$1,$3}' foo gives me this unexpected output: linenumber;A;B;C 1 9 7 2 8 4 3 1 3 but expected is (note B is now the third column as we added linenumber as first): linenumber;B 1;6 2;5 3;2 [fixed and revised]
To get your expected output, use: $ awk 'BEGIN { FS=OFS=";" } { print (FNR==1?"linenumber":FNR-1),$(FNR==1?3:1) }' file Output: linenumber;C 1;9 2;8 3;1 To add a column with line number and extract first and last columns, use: $ awk 'BEGIN { FS=OFS=";" } { print (FNR==1?"linenumber":FNR-1),$1,$NF }' file Output this time: linenumber;A;C 1;9;7 2;8;4 3;1;3
Why do you print $0 (the complete record) in your header? And, if you want only two columns in your output, why to you print 3 (FNR-1, $1 and $3)? Finally, the reason why your output field separators are spaces instead of the expected ; is simply that... you did not specify the output field separator (OFS). You can do this with a command line variable assignment (OFS=\;), as shown in the second and third versions below, but also using the -v option (-v OFS=\;) or in a BEGIN block (BEGIN {OFS=";"}) as you wish (there are differences between these 3 methods but they don't matter here). [EDIT]: see a generic solution at the end. If the field you want to keep is the second of the input file (the B column), try: $ awk -F\; 'FNR==1 {print "linenumber;" $2; next} {print FNR-1 ";" $2}' foo linenumber;B 1;6 2;5 3;2 or $ awk -F\; 'FNR==1 {print "linenumber",$2; next} {print FNR-1,$2}' OFS=\; foo linenumber;B 1;6 2;5 3;2 Note that, as long as you don't want to keep the first field of the input file ($1), you could as well overwrite it with the line number: $ awk -F\; '{$1=FNR==1?"linenumber":FNR-1; print $1,$2}' OFS=\; foo linenumber;B 1;6 2;5 3;2 Finally, here is a more generic solution to which you can pass the list of indexes of the columns of the input file you want to print (1 and 3 in this example): $ awk -F\; -v cols='1;3' ' BEGIN { OFS = ";"; n = split(cols, c); } { printf("%s", FNR == 1 ? "linenumber" : FNR - 1); for(i = 1; i <= n; i++) printf("%s", OFS $(c[i])); printf("\n"); }' foo linenumber;A;C 1;9;7 2;8;4 3;1;3
linux csv file concatenate columns into one column
I've been looking to do this with sed, awk, or cut. I am willing to use any other command-line program that I can pipe data through. I have a large set of data that is comma delimited. The rows have between 14 and 20 columns. I need to recursively concatenate column 10 with column 11 per row such that every row has exactly 14 columns. In other words, this: a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p will become: a,b,c,d,e,f,g,h,i,jkl,m,n,o,p I can get the first 10 columns. I can get the last N columns. I can concatenate columns. I cannot think of how to do it in one line so I can pass a stream of endless data through it and end up with exactly 14 columns per row. Examples (by request): How many columns are in the row? sed 's/[^,]//g' | wc -c Get the first 10 columns: cut -d, -f1-10 Get the last 4 columns: rev | cut -d, -f1-4 | rev Concatenate columns 10 and 11, showing columns 1-10 after that: awk -F',' ' NF { print $1","$2","$3","$4","$5","$6","$7","$8","$9","$10$11}'
Awk solution: awk 'BEGIN{ FS=OFS="," } { diff = NF - 14; for (i=1; i <= NF; i++) printf "%s%s", $i, (diff > 1 && i >= 10 && i < (10+diff)? "": (i == NF? ORS : ",")) }' file The output: a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
With GNU awk for the 3rd arg to match() and gensub(): $ cat tst.awk BEGIN{ FS="," } match($0,"(([^,]+,){9})(([^,]+,){"NF-14"})(.*)",a) { $0 = a[1] gensub(/,/,"","g",a[3]) a[5] } { print } $ awk -f tst.awk file a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
If perl is okay - can be used just like awk for stream processing $ cat ip.txt a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p 1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4 1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4 $ awk -F, '{print NF}' ip.txt 16 18 22 $ perl -F, -lane '$n = $#F - 4; print join ",", (#F[0..8], join("", #F[9..$n]), #F[$n+1..$#F]) ' ip.txt a,b,c,d,e,f,g,h,i,jkl,m,n,o,p 1,2,3,4,5,6,3,4,2,43432,5,2,3,4 1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4 -F, -lane split on , results saved in #F array $n = $#F - 4 magic number, to ensure output ends with 14 columns. $#F gives the index of last element of array (won't work if input line has less than 14 columns) join helps to stitch array elements together with specified string #F[0..8] array slice with first 9 elements #F[9..$n] and #F[$n+1..$#F] the other slices as needed Borrowing from Ed Morton's regex based solution $ perl -F, -lape '$n=$#F-13; s/^([^,]*,){9}\K([^,]*,){$n}/$&=~tr|,||dr/e' ip.txt a,b,c,d,e,f,g,h,i,jkl,m,n,o,p 1,2,3,4,5,6,3,4,2,43432,5,2,3,4 1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4 $n=$#F-13 magic number ^([^,]*,){9}\K first 9 fields ([^,]*,){$n} fields to change $&=~tr|,||dr use tr to delete the commas e this modifier allows use of Perl code in replacement section this solution also has the added advantage of working even if input field is less than 14
You can try this gnu sed sed -E ' s/,/\n/9g :A s/([^\n]*\n)(.*)(\n)(([^\n]*\n){4})/\1\2\4/ tA s/\n/,/g ' infile
First variant - with awk awk -F, ' { for(i = 1; i <= NF; i++) { OFS = (i > 9 && i < NF - 4) ? "" : "," if(i == NF) OFS = "\n" printf "%s%s", $i, OFS } }' input.txt Second variant - with sed sed -r 's/,/#/10g; :l; s/#(.*)((#[^#]){4})/\1\2/; tl; s/#/,/g' input.txt or, more straightforwardly (without loop) and probably faster. sed -r 's/,(.),(.),(.),(.)$/#\1#\2#\3#\4/; s/,//10g; s/#/,/g' input.txt Testing Input a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u Output a,b,c,d,e,f,g,h,i,jkl,m,n,o,p a,b,c,d,e,f,g,h,i,jklmn,o,p,q,r a,b,c,d,e,f,g,h,i,jklmnopq,r,s,t,u
Solved a similar problem using csvtool. Source file, copied from one of the other answers: $ cat input.txt a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p 1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4 1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4 Concatenating columns: $ cat input.txt | csvtool format '%1,%2,%3,%4,%5,%6,%7,%8,%9,%10%11%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22\n' - a,b,c,d,e,f,g,h,i,jkl,m,n,o,p,,,,,, 1,2,3,4,5,6,3,4,2,434,3,2,5,2,3,4,,,, 1,2,3,4,5,6,3,4,2,4as,f,e,3,4,3,2,5,2,3,4 anatoly#anatoly-workstation:cbs$ cat input.txt
awk: print each column of a file into separate files
I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done But I am getting errors awk: illegal field $(), name "i" input record number 1, file input.txt source line number 1 How to correctly use $i inside the {print }?
Following single awk may help you too here: awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file
An all awk solution. First test data: $ cat foo 11 12 13 21 22 23 Then the awk: $ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo and results: $ ls data* data2 data3 $ cat data2 11 12 21 22 The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call: $ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo
If you want to do what you try to accomplish : for i in {2..99}; do awk -v x=$i '{print $1" " $x }' input.txt > data${i} done Note the -v switch of awk to pass variables $x is the nth column defined in your variable x Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time
awk ternay operator, count fs with ,
How to make this command line: awk -F "," '{NF>0?$NF:$0}' to print the last field of a line if NF>0, otherwise print the whole line? Working data bogota dept math, bogota
awk -F, '{ print ( NF ? $NF : $0 ) }' file
Actually, you don't need ternary operator for this, but use : awk -F, '{print $NF}' file This will print the last field, i.e, if there are more than 1 field, it will print the last field, if line has only one field, it will print the same.
awk and log2 divisions
I have a tab delimited file that looks something like this: foo 0 4 boo 3 2 blah 4 0 flah 1 1 I am trying to calculate log2 for between the two columns for each row. my problem is with the division by zero What I have tried is this: cat file.txt | awk -v OFS='\t' '{print $1, log($3/$2)log(2)}' when there is a zero as the denominator, the awk will crash. What I would want to do is some sort of conditional statement that would print an "inf" as the result when the denominator is equal to 0. I am really not sure how to go about this? Any help would be appreciated Thanks
You can implement that as follows (with a few additional tweaks): awk 'BEGIN{OFS="\t"} {if ($2==0) {print $1, "inf"} else {print $1, log($3/$2)log(2)}} file.txt Explanation: if ($2==0) {print $1, "inf"} else {...} - First check to see if the 2nd field ($2) is zero. If so, print $1 and inf and move on to the next line; otherwise proceed as usual. BEGIN{OFS="\t"} - Set OFS inside the awk script; mostly a preference thing. ... file.txt - awk can read from files when you specify it as an argument; this saves the use of a cat process. (See UUCA)
awk -F'\t' '{print $1,($2 ? log($3/$2)log(2) : "inf")}' file.txt