Add new lines up to specific row - awk
Hoping somebody can help me out.
I have large number of files with different number of lines.
I would like to add new lines in to the files up to specific rows, say 6.
Infile.txt
text1
text2
text3
The out file I would like to have is
Outfile.txt
text1
text2
text3
\n
\n
\n
Short awk solution:
awk -v r=6 'END{ while((r--)-NR>0) print "" }1' file
-v r=6 - variable r indicating total/maximal number of rows
In awk's END block, the built-in variable NR will contain the row number of the last line of the file. From there it's easy to print the needed number of additional empty rows.
$ awk -v lines=6 '1; END {for (i=NR; i<lines; ++i) print ""}' file
text1
text2
text3
$ awk -v lines=6 '1; END {for (i=NR; i<lines; ++i) print ""}' file | wc -l
6
IMHO the clearest and most obvious way to handle this is to simply loop from the last line number plus 1 to the target number of lines:
$ seq 3 | awk -v n=6 '{print} END{for (i=NR+1; i<=n; i++) print ""}'
1
2
3
$
You can also count down if you want to save a variable:
$ seq 3 | awk -v n=6 '{print} END{while (n-- > NR) print ""}'
1
2
3
$
but IMHO that's sacrificing clarity in favor of brevity and not worthwhile.
Related
Sort a file preserving the header as first position with bash
When sorting a file, I am not preserving the header in its position: file_1.tsv Gene Number a 3 u 7 b 9 sort -k1,1 file_1.tsv Result: a 3 b 9 Gene Number u 7 So I am tryig this code: sed '1d' file_1.tsv | sort -k1,1 > file_1_sorted.tsv first='head -1 file_1.tsv' sed '1 "$first"' file_1_sorted.tsv What I did is to remove the header and sort the rest of the file, and then trying to add again the header. But I am not able to perform this last part, so I would like to know how can I copy the header of the original file and insert it as the first row of the new file without substituting its actuall first row.
You can do this as well : { head -1; sort; } < file_1.tsv ** Update ** For macos : { IFS= read -r header; printf '%s\n' "$header" ; sort; } < file_1.tsv
a simpler awk $ awk 'NR==1{print; next} {print | "sort"}' file
$ head -1 file; tail -n +2 file | sort Output: Gene Number a 3 b 9 u 7
Could you please try following. awk ' FNR==1{ first=$0 next } { val=(val?val ORS:"")$0 } END{ print first print val | "sort" } ' Input_file Logical explanation: Check condition FNR==1 to see if its first line; then save its values to variable and move on to next line by next. Then keep appending all lines values to another variable with new line till last line. Now come to END block of this code which executes when Input_file is done being read, there print first line value and put sort command on rest of the lines value there.
This will work using any awk, sort, and cut in any shell on every UNIX box and will work whether the input is coming from a pipe (when you can't read it twice) or from a file (when you can) and doesn't involve awk spawning a subshell: awk -v OFS='\t' '{print (NR>1), $0}' file | sort -k1,1n -k2,2 | cut -f2- The above uses awk to stick a 0 at the front of the header line and a 1 in front of the rest so you can sort by that number then whatever other field(s) you want to sort on and then remove the added field again with a cut. Here it is in stages: $ awk -v OFS='\t' '{print (NR>1), $0}' file 0 Gene Number 1 a 3 1 u 7 1 b 9 $ awk -v OFS='\t' '{print (NR>1), $0}' file | sort -k1,1n -k2,2 0 Gene Number 1 a 3 1 b 9 1 u 7 $ awk -v OFS='\t' '{print (NR>1), $0}' file | sort -k1,1n -k2,2 | cut -f2- Gene Number a 3 b 9 u 7
linux csv file concatenate columns into one column
I've been looking to do this with sed, awk, or cut. I am willing to use any other command-line program that I can pipe data through. I have a large set of data that is comma delimited. The rows have between 14 and 20 columns. I need to recursively concatenate column 10 with column 11 per row such that every row has exactly 14 columns. In other words, this: a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p will become: a,b,c,d,e,f,g,h,i,jkl,m,n,o,p I can get the first 10 columns. I can get the last N columns. I can concatenate columns. I cannot think of how to do it in one line so I can pass a stream of endless data through it and end up with exactly 14 columns per row. Examples (by request): How many columns are in the row? sed 's/[^,]//g' | wc -c Get the first 10 columns: cut -d, -f1-10 Get the last 4 columns: rev | cut -d, -f1-4 | rev Concatenate columns 10 and 11, showing columns 1-10 after that: awk -F',' ' NF { print $1","$2","$3","$4","$5","$6","$7","$8","$9","$10$11}'
Awk solution: awk 'BEGIN{ FS=OFS="," } { diff = NF - 14; for (i=1; i <= NF; i++) printf "%s%s", $i, (diff > 1 && i >= 10 && i < (10+diff)? "": (i == NF? ORS : ",")) }' file The output: a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
With GNU awk for the 3rd arg to match() and gensub(): $ cat tst.awk BEGIN{ FS="," } match($0,"(([^,]+,){9})(([^,]+,){"NF-14"})(.*)",a) { $0 = a[1] gensub(/,/,"","g",a[3]) a[5] } { print } $ awk -f tst.awk file a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
If perl is okay - can be used just like awk for stream processing $ cat ip.txt a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p 1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4 1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4 $ awk -F, '{print NF}' ip.txt 16 18 22 $ perl -F, -lane '$n = $#F - 4; print join ",", (#F[0..8], join("", #F[9..$n]), #F[$n+1..$#F]) ' ip.txt a,b,c,d,e,f,g,h,i,jkl,m,n,o,p 1,2,3,4,5,6,3,4,2,43432,5,2,3,4 1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4 -F, -lane split on , results saved in #F array $n = $#F - 4 magic number, to ensure output ends with 14 columns. $#F gives the index of last element of array (won't work if input line has less than 14 columns) join helps to stitch array elements together with specified string #F[0..8] array slice with first 9 elements #F[9..$n] and #F[$n+1..$#F] the other slices as needed Borrowing from Ed Morton's regex based solution $ perl -F, -lape '$n=$#F-13; s/^([^,]*,){9}\K([^,]*,){$n}/$&=~tr|,||dr/e' ip.txt a,b,c,d,e,f,g,h,i,jkl,m,n,o,p 1,2,3,4,5,6,3,4,2,43432,5,2,3,4 1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4 $n=$#F-13 magic number ^([^,]*,){9}\K first 9 fields ([^,]*,){$n} fields to change $&=~tr|,||dr use tr to delete the commas e this modifier allows use of Perl code in replacement section this solution also has the added advantage of working even if input field is less than 14
You can try this gnu sed sed -E ' s/,/\n/9g :A s/([^\n]*\n)(.*)(\n)(([^\n]*\n){4})/\1\2\4/ tA s/\n/,/g ' infile
First variant - with awk awk -F, ' { for(i = 1; i <= NF; i++) { OFS = (i > 9 && i < NF - 4) ? "" : "," if(i == NF) OFS = "\n" printf "%s%s", $i, OFS } }' input.txt Second variant - with sed sed -r 's/,/#/10g; :l; s/#(.*)((#[^#]){4})/\1\2/; tl; s/#/,/g' input.txt or, more straightforwardly (without loop) and probably faster. sed -r 's/,(.),(.),(.),(.)$/#\1#\2#\3#\4/; s/,//10g; s/#/,/g' input.txt Testing Input a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u Output a,b,c,d,e,f,g,h,i,jkl,m,n,o,p a,b,c,d,e,f,g,h,i,jklmn,o,p,q,r a,b,c,d,e,f,g,h,i,jklmnopq,r,s,t,u
Solved a similar problem using csvtool. Source file, copied from one of the other answers: $ cat input.txt a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p 1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4 1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4 Concatenating columns: $ cat input.txt | csvtool format '%1,%2,%3,%4,%5,%6,%7,%8,%9,%10%11%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22\n' - a,b,c,d,e,f,g,h,i,jkl,m,n,o,p,,,,,, 1,2,3,4,5,6,3,4,2,434,3,2,5,2,3,4,,,, 1,2,3,4,5,6,3,4,2,4as,f,e,3,4,3,2,5,2,3,4 anatoly#anatoly-workstation:cbs$ cat input.txt
awk: print each column of a file into separate files
I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done But I am getting errors awk: illegal field $(), name "i" input record number 1, file input.txt source line number 1 How to correctly use $i inside the {print }?
Following single awk may help you too here: awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file
An all awk solution. First test data: $ cat foo 11 12 13 21 22 23 Then the awk: $ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo and results: $ ls data* data2 data3 $ cat data2 11 12 21 22 The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call: $ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo
If you want to do what you try to accomplish : for i in {2..99}; do awk -v x=$i '{print $1" " $x }' input.txt > data${i} done Note the -v switch of awk to pass variables $x is the nth column defined in your variable x Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time
awk to copy and move of file last line to previous line above
In the awk below I am trying to move the last line only, to the one above it. The problem with the below is that since my input file varies (not always 4 lines like in the below), I can not use i=3 everytime and can not seem to fix it. Thank you :). file this is line 1 this is line 2 this is line 3 this is line 4 desired output this is line 1 this is line 2 this is line 4 this is line 3 awk (seems like the last line is being moved, but to i=2) awk ' {lines[NR]=$0} END{ print lines[1], lines[NR]; for (i=3; i<NR; i++) {print lines[i]} } ' OFS=$'\n' file this is line 1 this is line 2 this is line 4 this is line 3
$ seq 4 | awk 'NR>2{print p2} {p2=p1; p1=$0} END{print p1 ORS p2}' 1 2 4 3 $ seq 7 | awk 'NR>2{print p2} {p2=p1; p1=$0} END{print p1 ORS p2}' 1 2 3 4 5 7 6
try following awk once: awk '{a[FNR]=$0} END{for(i=1;i<=FNR-2;i++){print a[i]};print a[FNR] ORS a[FNR-1]}' Input_file Explanation: Creating an array named a with index FNR(current line's number) and keeping it's value to current line's value. Now in END section of awk, starting a for loop from i=1 to i<=FNR-2 why till FNR-2 because you need to swap only last 2 lines here. Once it prints all the lines then simply printing a[FNR](which is last line) and then printing a[FNR-1] with ORS(to print new line). Solution 2nd: By counting the number of lines in a Input_file and putting them into a awk variable. awk -v lines=$(wc -l < Input_file) 'FNR==(lines-1){val=$0;next} FNR==lines{print $0 ORS val;next} 1' Input_file
You nearly had it. You just have to change the order. awk ' {lines[NR]=$0} END{ for (i=1; i<NR-1; i++) {print lines[i]} print lines[NR]; print lines[NR-1]; } ' OFS=$'\n' file
I'd reverse the file, swap the first two lines, then re-reverse the file tac file | awk 'NR==1 {getline line2; print line2} 1' | tac
Combining columns within a single file using awk
I am trying to reformat a large file. The first 6 columns of each line are OK but the rest of the columns in the line need to be combined in increments of 2 with a "/" character in between. Example file (showing only a few columns but have many more in actual file): 1 1 0 0 1 2 A T A C Into: 1 1 0 0 1 2 A/T A/C So far I have been trying awk and this is where I am at... awk '{print $1,$2,$3,$4,$5; for(i=7; i < NF; i=i+2) print $i+"/"+$i+1}' myfile.txt > mynewfile.txt
awk '{for(i=j=7; i < NF; i+=2) {$j = $i"/"$(i+1); j++} NF=j-1}1' input
Please try this: awk '{print $1" "$2" "$3" "$4" "$5" "$6" "$7"/"$8" "$9"/"$10}' myfile.txt > mynewfile.txt
"+" is the arithmetic "and" operator, string concatenation is done by simply listing the strings adjacent to each other, i.e. to get the string "foobar" you'd write: "foo" "bar" not: "foo" + "bar" Anyway, try this: awk -v ORS= '{print $1,$2,$3,$4,$5,$6; for(i=7;i<=NF;i++) print (i%2?OFS:"/") $i; print "\n"}' myfile.txt > mynewfile.txt