awk '{print $<number>}' but without knowing <number> before hand - sql
Printing a specific column per line using pipe to awk is fine.
But how do I do it if I do not know which column it is, except I have to get the column who's first row matches something.
Example.
Title1 Title2 TargetTitle Title3
x y z a
b c d e
The above table, I want to filter out only:
z
d
BUT, two problems
1) I don't know exactly the column number
2) I don't want first row (not a big deal, I can just sed lines 2 to $).
Thanks.
You can build your output using awk like this:
awk -v OFS='\t' 'NR>1{for (i=1; i<=NF; i++) {
if ($i=="b"||$i=="d") $i=""; printf "%s%s", $i, (i==NF)?ORS:OFS}}' file
x y z a
c e
To filter out one column, you could use something like this:
awk -v title="TargetTitle" 'NR==1 { for (i=1;i<=NF;++i) if ($i==title) col=i }
NR>1 { for (i=1;i<=NF;++i) if (i!=col) printf "%s%s", $i, (i<NF?OFS:ORS)}' file
Output:
x y a
b c e
If you want to add more space between each column in the output, you can change the value of the OFS variable or change the first format specifier from %s to %4s, for example.
If you want to only print one column, you can do something like this:
awk -v title="TargetTitle" 'NR==1 { for (i=1;i<=NF;++i) if ($i==title) col=i }
NR>1 { print $col }' file
Output:
z
d
Related
selecting columns in awk discarding corresponding header
How to properly select columns in awk after some processing. My file here: cat foo A;B;C 9;6;7 8;5;4 1;2;3 I want to add a first column with line numbers and then extract some columns of the result. For the example let's get the new first (line numbers) and third columns. This way: awk -F';' 'FNR==1{print "linenumber;"$0;next} {print FNR-1,$1,$3}' foo gives me this unexpected output: linenumber;A;B;C 1 9 7 2 8 4 3 1 3 but expected is (note B is now the third column as we added linenumber as first): linenumber;B 1;6 2;5 3;2 [fixed and revised]
To get your expected output, use: $ awk 'BEGIN { FS=OFS=";" } { print (FNR==1?"linenumber":FNR-1),$(FNR==1?3:1) }' file Output: linenumber;C 1;9 2;8 3;1 To add a column with line number and extract first and last columns, use: $ awk 'BEGIN { FS=OFS=";" } { print (FNR==1?"linenumber":FNR-1),$1,$NF }' file Output this time: linenumber;A;C 1;9;7 2;8;4 3;1;3
Why do you print $0 (the complete record) in your header? And, if you want only two columns in your output, why to you print 3 (FNR-1, $1 and $3)? Finally, the reason why your output field separators are spaces instead of the expected ; is simply that... you did not specify the output field separator (OFS). You can do this with a command line variable assignment (OFS=\;), as shown in the second and third versions below, but also using the -v option (-v OFS=\;) or in a BEGIN block (BEGIN {OFS=";"}) as you wish (there are differences between these 3 methods but they don't matter here). [EDIT]: see a generic solution at the end. If the field you want to keep is the second of the input file (the B column), try: $ awk -F\; 'FNR==1 {print "linenumber;" $2; next} {print FNR-1 ";" $2}' foo linenumber;B 1;6 2;5 3;2 or $ awk -F\; 'FNR==1 {print "linenumber",$2; next} {print FNR-1,$2}' OFS=\; foo linenumber;B 1;6 2;5 3;2 Note that, as long as you don't want to keep the first field of the input file ($1), you could as well overwrite it with the line number: $ awk -F\; '{$1=FNR==1?"linenumber":FNR-1; print $1,$2}' OFS=\; foo linenumber;B 1;6 2;5 3;2 Finally, here is a more generic solution to which you can pass the list of indexes of the columns of the input file you want to print (1 and 3 in this example): $ awk -F\; -v cols='1;3' ' BEGIN { OFS = ";"; n = split(cols, c); } { printf("%s", FNR == 1 ? "linenumber" : FNR - 1); for(i = 1; i <= n; i++) printf("%s", OFS $(c[i])); printf("\n"); }' foo linenumber;A;C 1;9;7 2;8;4 3;1;3
linux csv file concatenate columns into one column
I've been looking to do this with sed, awk, or cut. I am willing to use any other command-line program that I can pipe data through. I have a large set of data that is comma delimited. The rows have between 14 and 20 columns. I need to recursively concatenate column 10 with column 11 per row such that every row has exactly 14 columns. In other words, this: a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p will become: a,b,c,d,e,f,g,h,i,jkl,m,n,o,p I can get the first 10 columns. I can get the last N columns. I can concatenate columns. I cannot think of how to do it in one line so I can pass a stream of endless data through it and end up with exactly 14 columns per row. Examples (by request): How many columns are in the row? sed 's/[^,]//g' | wc -c Get the first 10 columns: cut -d, -f1-10 Get the last 4 columns: rev | cut -d, -f1-4 | rev Concatenate columns 10 and 11, showing columns 1-10 after that: awk -F',' ' NF { print $1","$2","$3","$4","$5","$6","$7","$8","$9","$10$11}'
Awk solution: awk 'BEGIN{ FS=OFS="," } { diff = NF - 14; for (i=1; i <= NF; i++) printf "%s%s", $i, (diff > 1 && i >= 10 && i < (10+diff)? "": (i == NF? ORS : ",")) }' file The output: a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
With GNU awk for the 3rd arg to match() and gensub(): $ cat tst.awk BEGIN{ FS="," } match($0,"(([^,]+,){9})(([^,]+,){"NF-14"})(.*)",a) { $0 = a[1] gensub(/,/,"","g",a[3]) a[5] } { print } $ awk -f tst.awk file a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
If perl is okay - can be used just like awk for stream processing $ cat ip.txt a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p 1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4 1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4 $ awk -F, '{print NF}' ip.txt 16 18 22 $ perl -F, -lane '$n = $#F - 4; print join ",", (#F[0..8], join("", #F[9..$n]), #F[$n+1..$#F]) ' ip.txt a,b,c,d,e,f,g,h,i,jkl,m,n,o,p 1,2,3,4,5,6,3,4,2,43432,5,2,3,4 1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4 -F, -lane split on , results saved in #F array $n = $#F - 4 magic number, to ensure output ends with 14 columns. $#F gives the index of last element of array (won't work if input line has less than 14 columns) join helps to stitch array elements together with specified string #F[0..8] array slice with first 9 elements #F[9..$n] and #F[$n+1..$#F] the other slices as needed Borrowing from Ed Morton's regex based solution $ perl -F, -lape '$n=$#F-13; s/^([^,]*,){9}\K([^,]*,){$n}/$&=~tr|,||dr/e' ip.txt a,b,c,d,e,f,g,h,i,jkl,m,n,o,p 1,2,3,4,5,6,3,4,2,43432,5,2,3,4 1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4 $n=$#F-13 magic number ^([^,]*,){9}\K first 9 fields ([^,]*,){$n} fields to change $&=~tr|,||dr use tr to delete the commas e this modifier allows use of Perl code in replacement section this solution also has the added advantage of working even if input field is less than 14
You can try this gnu sed sed -E ' s/,/\n/9g :A s/([^\n]*\n)(.*)(\n)(([^\n]*\n){4})/\1\2\4/ tA s/\n/,/g ' infile
First variant - with awk awk -F, ' { for(i = 1; i <= NF; i++) { OFS = (i > 9 && i < NF - 4) ? "" : "," if(i == NF) OFS = "\n" printf "%s%s", $i, OFS } }' input.txt Second variant - with sed sed -r 's/,/#/10g; :l; s/#(.*)((#[^#]){4})/\1\2/; tl; s/#/,/g' input.txt or, more straightforwardly (without loop) and probably faster. sed -r 's/,(.),(.),(.),(.)$/#\1#\2#\3#\4/; s/,//10g; s/#/,/g' input.txt Testing Input a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u Output a,b,c,d,e,f,g,h,i,jkl,m,n,o,p a,b,c,d,e,f,g,h,i,jklmn,o,p,q,r a,b,c,d,e,f,g,h,i,jklmnopq,r,s,t,u
Solved a similar problem using csvtool. Source file, copied from one of the other answers: $ cat input.txt a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p 1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4 1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4 Concatenating columns: $ cat input.txt | csvtool format '%1,%2,%3,%4,%5,%6,%7,%8,%9,%10%11%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22\n' - a,b,c,d,e,f,g,h,i,jkl,m,n,o,p,,,,,, 1,2,3,4,5,6,3,4,2,434,3,2,5,2,3,4,,,, 1,2,3,4,5,6,3,4,2,4as,f,e,3,4,3,2,5,2,3,4 anatoly#anatoly-workstation:cbs$ cat input.txt
Print the 1st and every nth column of a text file using awk
I have a txt file contains a total of 10177 columns and a total of approximately 450,000 rows. The information is separated by tabs. I am trying to trim the file down using awk so that it only prints the 1-3, 5th, and every 14th column after the fifth one. My file has a format that looks like: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... 10177 A B C D E F G H I J K L M N O P Q R S T ... X Y X Y X Y X Y X Y X Y X Y X Y X Y X Y ... I am hoping to generate an output txt file (also separated with tab) that contains: 1 2 3 5 18 ... A B C E R ... X Y X X Y ... The current awk code I have looks like (I am using cygwin to use the code): $ awk -F"\t" '{OFS="\t"} { for(i=5;i<10177;i+=14) printf ($i) }' test2.txt > test3.txt But the result I am getting shows something like: 123518...ABCER...XYXXY... When opened with excel program, the results are all mashed into 1 single cell. In addition, when I try to include code for (i=0;i<=3;i++) printf "%s ",$i in the awk to get the first 3 columns, it just prints out the original input document together with the mashed result. I am not familiar with awk, so I am not sure what causes this issue.
Awk field numbers, strings, and array indices all start at 1, not 0, so when you do: for (i=0;i<=3;i++) printf "%s ",$i the first iteration prints $0 which is the whole record. You're on the right track with: $ awk -F"\t" '{OFS="\t"} { for(i=5;i<10177;i+=14) printf ($i) }' test2.txt > test3.txt but never do printf with input data as the only argument to printf since then printf will treat it as a format string without data (rather than what you want which is a plain string format with your data) and then that will fail cryptically if/when your input data contains formatting characters like %s or %d. So, always use printf "%s", $i, never printf $i. The problem you're having with excel, I would guess, is you're trying to double click on the file and hoping excel knows what to do with it (it won't, unlike if this was a CSV). You can import tab-separated files into excel after it's opened though - google that. You want something like: awk ' BEGIN { FS=OFS="\t" } { for (i=1; i<=3; i++) { printf "%s%s", (i>1?OFS:""), $i } for (i=5; i<=NF; i+=14) { printf "%s%s", OFS, $i } print "" } ' file I highly recommend the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
In awk using conditional operator in for: $ awk 'BEGIN { FS=OFS="\t" } { for(i=1; i<=NF; i+=( i<3 ? 1 : ( i==3 ? 2 : 14 ))) printf "%s%s", $i, ( (i+14)>NF ? ORS : OFS) }' file 1 2 3 5 19 A B C E S X Y X X X In the for if i<3 increment by one, if i==3 increment by two to get to 5 and after that by 14.
I would be tempted to solve the problem along the following lines. I think you'll find you save time by not iterating in awk. $ cols="$( { echo 1 2 3; seq 5 14 10177; } | sed 's/^/$/; 2,$ s/^/, /' )" $ awk -F\\t "{print $cols}" test.txt
Given 2 or more tsv files and only a terminal, how can you calculate the average difference of dates that only appear in both?
Let's say we have some files: File 1 (date, value): 20130510\t50000 20130520\t3400 20130601\t4500 File 2 (date, something, value): 20130511\tx\t123 20130520\ty\t456 20130601\tz\t789 We want the average of the difference in values associated with the dates that appear in both files. 20130520 and 20130601 appear in both (need some kind of filter) the difference in values is abs(3400-456) and abs(4500-789) the average is (abs(3400-456)+abs(4500-789))/2.0 I can easily do this in Python, but how about with awk in the terminal?
You could try: awk -f a.awk file1 file2 where a.awk is: BEGIN {FS="\t"} NR==FNR{ x[$1]=$2; next } $1 in x { y[$1]=$3 } END{ for (i in y) { s=s+abs(x[i]-y[i]) j++ } print s/j } function abs(x){return ((x < 0.0) ? -x : x)} Output: 3327.5
Using awk awk 'NR==FNR{a[$1]=$2;next} {if ($1 in a) { s+=sqrt((a[$1]-$3)*(a[$1]-$3));i++}} END{print s/i}' file1 file2 Explanation No need define FS to "\t", because white space has included tab sqrt((x-y)*(x-y)) can be easily used for ABS function.
sum same column across multiple files using awk ?
I want to add the 3rd column of 5 files such that the new file will have the same 2nd col and the sum of the 3rd col of the 5 files. I tried something like this: $ cat freqdat044.dat | awk '{n=$3; getline <"freqdat046.dat";print $2" " n+$3}' > freqtrial1.dat freqdat048.dat`enter code here`$ cat freqdat044.dat | awk '{n=$3; getline <"freqdat046.dat";print $2" " n+$3}' > freqtrial1.dat The files names: freqdat044.dat freqdat045.dat freqdat046.dat freqdat047.dat freqdat049.dat freqdat050.dat And saved in output file the contain only $2 and the new col form the summation of the 3rd
awk '{x[$2] += $3} END {for(y in x) print y,x[y]}' freqdat044.dat freqdat045.dat freqdat046.dat freqdat047.dat freqdat049.dat freqdat050.dat This does not necessarily print lines as they appear in the first file. If you want to preserve that sorting, then you have to save that ordering somewhere: awk 'FNR==NR {keys[FNR]=$2; cnt=FNR} {x[$2] += $3} END {for(i=1; i<=cnt; ++i) print keys[i],x[keys[i]]}' freqdat044.dat freqdat045.dat freqdat046.dat freqdat047.dat freqdat049.dat freqdat050.dat