Splitting and appending with ksh and awk/nawk - awk

I'm just having the darnedest time with this. Here is my nawk statement:
nawk -F"\t" '{print substr($1,0,4)","substr($1,5,4)","substr($1,9,4)","$2","$3","$4","$5","$6}' filename
In a nutshell, this is a tab delimited file. I want to split the first column (12 chars) into 3 columns and I do that with the substring function. Then, I'd like to print the rest of the data without the first column. It's the appending part that I'm having an issue with.
In its current iteration, the lines that don't have 6 columns will have hanging commas and the ones that have greater than 6 columns don't get printed.
Any thoughts?

Untested, but try this:
nawk -F"\t" -v OFS=, '
{$1 = substr($1,0,4) OFS substr($1,5,4) OFS substr($1,9,4)}
{print}
' filename
Update for comment -- I assume you want every field in quotes:
nawk -F"\t" -v OFS=, -v q="'" '
{
$1 = q substr($1,0,4) q OFS q substr($1,5,4) q OFS q substr($1,9,4) q
for (i=2; i<=NF; i++)
$i = q $i q
print
}
' filename
I pass a single quote into nawk as a variable because you cannot embed a single quote into a single quoted string.

Related

awk print sum of group of lines

I have a file with a column named (effect) which has rows separated by blank lines,
(effect)
1
1
1
(effect)
1
1
1
1
(effect)
1
1
I know how to print the sum of column like
awk '{sum+=$1;} END{print sum;}' file.txt
Using awk how can I print the sum of each (effect) in for loop? such that I have three lines or multiple lines in other cases like below
sum=3
sum=4
sum=2
You can check if there is an (effect) part, and print the sum when encountering either the (effect) part or when in the END block.
awk '
$1 == "(effect)" { if(seen) print "sum="sum; seen = 1; sum = 0 }
/[0-9]/ { sum += $1 }
END { if (seen) print "sum="sum }
' file
Output
sum=3
sum=4
sum=2
With your shown samples, please try following awk code. Written and tested in GNU awk.
awk -v RS='(^|\n)?\\(effect\\)[^(]*' '
RT{
gsub(/\(effect\)\n|\n+[[:space:]]*$/,"",RT)
num=split(RT,arr,ORS)
print "sum="num
}
' Input_file
Explanation: Simple explanation would be, using GNU awk. In awk program set RS as (^|\n)?\\(effect\\)[^(]* regex for whole Input_file. In main program checking condition if RT is NOT NULL then using gsub(Global substitution) function to substitute (effect)\n and \n+[[:space:]]*$(new lines followed by spaces at end of value) with NULL in RT. Then splitting value of RT into array named arr with delimiter of ORS and saving its(total contents value OR array length value) into variable named num, then printing sum= along with value of num here to get required results.
With shown samples, output will be as follows:
sum=3
sum=4
sum=2
This should work in any version of awk:
awk '{sum += $1} $0=="(effect)" && NR>1 {print "sum=" sum; sum=0}
END{print "sum=" sum}' file
sum=3
sum=4
sum=2
Similar to #Ravinder's answer, but does not depend on the name of the header:
awk -v RS='' -v FS='\n' '{
sum = 0
for (i=2; i<=NF; i++) sum += $i
printf "sum=%d\n", sum
}' file
RS='' means that sequences of 2 or more newlines separate records.
The Field Separator is newline.
The for loop omits field #1, the header.
However that means that empty lines truly need to be empty: no spaces or tabs allowed. If your data might have blank lines that contain whitespace, you can set
-v RS='\n[[:space:]]*\n'
$ awk -v RS='(effect)' 'NR>1{sum=0; for(i=1;i<=NF;i++) sum+=$i; print "sum="sum}' file
sum=3
sum=4
sum=2

Sed/awk for String to integer conversion of a csv column in shell

I need 7th column of a csv file to be converted from float to decimal. It's a huge file and I don't want to use while read for conversion. Any shortcuts with awk?
Input:
"xx","x","xxxxxx","xxx","xx","xx"," 00000001.0000"
"xx","x","xxxxxx","xxx","xx","xx"," 00000002.0000"
"xx","x","xxxxxx","xxx","xx","xx"," 00000005.0000"
"xx","x","xxxxxx","xxx","xx","xx"," 00000011.0000"
Output:
"xx","x","xxxxxx","xxx","xx","xx","1"
"xx","x","xxxxxx","xxx","xx","xx","2"
"xx","x","xxxxxx","xxx","xx","xx","5"
"xx","x","xxxxxx","xxx","xx","xx","11"
Tried these, worked. But anything simpler ?
awk 'BEGIN {FS=OFS="\",\""} {$7 = sprintf("%.0f", $7)} 1' $test > $test1
awk '{printf("%s\"\n", $0)}' $test1
With your shown samples, please try following awk program.
awk -v s1="\"" -v OFS="," '{$NF = s1 ($NF + 0) s1} 1' Input_file
Explanation: Simple explanation would be, setting OFS as , then in main program; in each line's last field keeping only digits and covering last field with ", re-shuffle the fields and printing edited/non-edited all lines.
Another simple awk solution:
awk 'BEGIN {FS=OFS="\",\""} {$NF = $NF+0 "\""} 1' file
"xx","x","xxxxxx","xxx","xx","xx","1"
"xx","x","xxxxxx","xxx","xx","xx","2"
"xx","x","xxxxxx","xxx","xx","xx","5"
"xx","x","xxxxxx","xxx","xx","xx","11"
awk 'BEGIN{FS=OFS=","} {gsub(/"/, "", $7); $7="\"" $7+0 "\""; print}' file
Output:
"xx","x","xxxxxx","xxx","xx","xx","1"
"xx","x","xxxxxx","xxx","xx","xx","2"
"xx","x","xxxxxx","xxx","xx","xx","5"
"xx","x","xxxxxx","xxx","xx","xx","11"
gsub(/"/, "", $7): removes all " from $7
$7+0: Reduces the number in $7 to minimal representation

linux csv file concatenate columns into one column

I've been looking to do this with sed, awk, or cut. I am willing to use any other command-line program that I can pipe data through.
I have a large set of data that is comma delimited. The rows have between 14 and 20 columns. I need to recursively concatenate column 10 with column 11 per row such that every row has exactly 14 columns. In other words, this:
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
will become:
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
I can get the first 10 columns. I can get the last N columns. I can concatenate columns. I cannot think of how to do it in one line so I can pass a stream of endless data through it and end up with exactly 14 columns per row.
Examples (by request):
How many columns are in the row?
sed 's/[^,]//g' | wc -c
Get the first 10 columns:
cut -d, -f1-10
Get the last 4 columns:
rev | cut -d, -f1-4 | rev
Concatenate columns 10 and 11, showing columns 1-10 after that:
awk -F',' ' NF { print $1","$2","$3","$4","$5","$6","$7","$8","$9","$10$11}'
Awk solution:
awk 'BEGIN{ FS=OFS="," }
{
diff = NF - 14;
for (i=1; i <= NF; i++)
printf "%s%s", $i, (diff > 1 && i >= 10 && i < (10+diff)?
"": (i == NF? ORS : ","))
}' file
The output:
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
With GNU awk for the 3rd arg to match() and gensub():
$ cat tst.awk
BEGIN{ FS="," }
match($0,"(([^,]+,){9})(([^,]+,){"NF-14"})(.*)",a) {
$0 = a[1] gensub(/,/,"","g",a[3]) a[5]
}
{ print }
$ awk -f tst.awk file
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
If perl is okay - can be used just like awk for stream processing
$ cat ip.txt
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4
1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4
$ awk -F, '{print NF}' ip.txt
16
18
22
$ perl -F, -lane '$n = $#F - 4;
print join ",", (#F[0..8], join("", #F[9..$n]), #F[$n+1..$#F])
' ip.txt
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
1,2,3,4,5,6,3,4,2,43432,5,2,3,4
1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4
-F, -lane split on , results saved in #F array
$n = $#F - 4 magic number, to ensure output ends with 14 columns. $#F gives the index of last element of array (won't work if input line has less than 14 columns)
join helps to stitch array elements together with specified string
#F[0..8] array slice with first 9 elements
#F[9..$n] and #F[$n+1..$#F] the other slices as needed
Borrowing from Ed Morton's regex based solution
$ perl -F, -lape '$n=$#F-13; s/^([^,]*,){9}\K([^,]*,){$n}/$&=~tr|,||dr/e' ip.txt
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
1,2,3,4,5,6,3,4,2,43432,5,2,3,4
1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4
$n=$#F-13 magic number
^([^,]*,){9}\K first 9 fields
([^,]*,){$n} fields to change
$&=~tr|,||dr use tr to delete the commas
e this modifier allows use of Perl code in replacement section
this solution also has the added advantage of working even if input field is less than 14
You can try this gnu sed
sed -E '
s/,/\n/9g
:A
s/([^\n]*\n)(.*)(\n)(([^\n]*\n){4})/\1\2\4/
tA
s/\n/,/g
' infile
First variant - with awk
awk -F, '
{
for(i = 1; i <= NF; i++) {
OFS = (i > 9 && i < NF - 4) ? "" : ","
if(i == NF) OFS = "\n"
printf "%s%s", $i, OFS
}
}' input.txt
Second variant - with sed
sed -r 's/,/#/10g; :l; s/#(.*)((#[^#]){4})/\1\2/; tl; s/#/,/g' input.txt
or, more straightforwardly (without loop) and probably faster.
sed -r 's/,(.),(.),(.),(.)$/#\1#\2#\3#\4/; s/,//10g; s/#/,/g' input.txt
Testing
Input
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u
Output
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
a,b,c,d,e,f,g,h,i,jklmn,o,p,q,r
a,b,c,d,e,f,g,h,i,jklmnopq,r,s,t,u
Solved a similar problem using csvtool. Source file, copied from one of the other answers:
$ cat input.txt
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4
1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4
Concatenating columns:
$ cat input.txt | csvtool format '%1,%2,%3,%4,%5,%6,%7,%8,%9,%10%11%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22\n' -
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p,,,,,,
1,2,3,4,5,6,3,4,2,434,3,2,5,2,3,4,,,,
1,2,3,4,5,6,3,4,2,4as,f,e,3,4,3,2,5,2,3,4
anatoly#anatoly-workstation:cbs$ cat input.txt

Convert single column into three comma separated columns using awk

I have a single long column and want to reformat it into three comma separated columns, as indicated below, using awk or any Unix tool.
Input:
Xaa
Ybb
Mdd
Tmmn
UUnx
THM
THSS
THEY
DDe
Output:
Xaa,Ybb,Mdd
Tmmn,UUnx,THM
THSS,THEY,DDe
$ awk '{printf "%s%s",$0,NR%3?",":"\n";}' file
Xaa,Ybb,Mdd
Tmmn,UUnx,THM
THSS,THEY,DDe
How it works
For every line of input, this prints the line followed by, depending on the line number, either a comma or a newline.
The key part is this ternary statement:
NR%3?",":"\n"
This takes the line number modulo 3. If that is non-zero, then it returns a comma. If it is zero, it returns a newline character.
Handling files that end before the final line is complete
The assumes that the number of lines in the file is an integer multiple of three. If it isn't, then we probably want to assure that the last line has a newline. This can be done, as Jonathan Leffler suggests, using:
awk '{printf "%s%s",$0,NR%3?",":"\n";} END { if (NR%3 != 0) print ""}' file
If the final line is short of three columns, the above code will leave a trailing comma on the line. This may or may not be a problem. If we do not want the final comma, then use:
awk 'NR==1{printf "%s",$0; next} {printf "%s%s",(NR-1)%3?",":"\n",$0;} END {print ""}' file
Jonathan Leffler offers this slightly simpler alternative to achieve the same goal:
awk '{ printf("%s%s", pad, $1); pad = (NR%3 == 0) ? "\n" : "," } END { print "" }'
Improved portability
To support platforms which don't use \n as the line terminator, Ed Morton suggests:
awk -v OFS=, '{ printf("%s%s", pad, $1); pad = (NR%3?OFS:ORS)} END { print "" }' file
There is a tool for this. Use pr
pr -3ats,
3 columns width, across, suppress header, comma as separator.
xargs -n3 < file | awk -v OFS="," '{$1=$1} 1'
xargs uses echo as default action, $1=$1 forces rebuild of $0.
Using only awk I would go with this (which is similar to what proposed by #jonathan-leffler and #John1024)
{
sep = NR == 1 ? "" : \
(NR-1)%3 ? "," : \
"\n"
printf sep $0
}
END {
printf "\n"
}

How to remove field separators in awk when printing $0?

eg, each row of the file is like :
1, 2, 3, 4,..., 1000
How can print out
1 2 3 4 ... 1000
?
If you just want to delete the commas, you can use tr:
$ tr -d ',' <file
1 2 3 4 1000
If it is something more general, you can set FS and OFS (read about FS and OFS) in your begin block:
awk 'BEGIN{FS=","; OFS=""} ...' file
You need to set OFS (the output field separator). Unfortunately, this has no effect unless you also modify the string, leading the rather cryptic:
awk '{$1=$1}1' FS=, OFS=
Although, if you are happy with some additional space being added, you can leave OFS at its default value (a single space), and do:
awk -F, '{$1=$1}1'
and if you don't mind omitting blank lines in the output, you can simplify further to:
awk -F, '$1=$1'
You could also remove the field separators:
awk -F, '{gsub(FS,"")} 1'
Set FS to the input field separators. Assigning to $1 will then reformat the field using the output field separator, which defaults to space:
awk -F',\s*' '{$1 = $1; print}'
See the GNU Awk Manual for an explanation of $1 = $1