awk adding a semicolon if empty - awk

I do have a file with different length of each line. E.g.:
a; 1; 2; 3; 4;
b; 11; 22;
c; 122; 233; 344; 45; 56;
d; 13;
e; 144; 25; 36; 47; 58; 69;
I try to generate a file, separated by semicolon where each line has the same amount of values. E.g.:
a; 1; 2; 3; 4; ; ;
b; 11; 22; ; ; ; ;
c; 122; 233; 344; 45; 56; ;
d; 13; ; ; ; ; ;
e; 144; 25; 36; 47; 58; 69;
I tried different ways with awk but I am to newbie to get it done correctly in bulk.
awk '{if( $4 == ""){print ";"}else{print $4}}' testtest.txt
I hope the swarm intelligence can help me with it.

With your shown samples please try following awk code. This is more a like Generic code, where I am getting highest number of fields in whole Input_file in its first read and then once its found passing it to 2nd Input_file and assigning NF value for each line to NF value which gives total number of fields as per need and puts ; for fields newly added.
awk -v FS='; ' -v OFS='; ' '
FNR==NR{
nf=(nf>NF?nf:NF)
next
}
{
$nf=$nf
}
1
' Input_file Input_file

Making your records contain at least 8 fields:
awk -F '; *' -v OFS='; ' '{$8 = $8} 1'
limitations:
The wanted number of fields is specified statically, so you need to already know how many there are in the input file (see #RavinderSingh13 answer for a generic way to determine the number of fields).
If, for example, there's a record with 9 fields, the code will not strip it down to 8.

#RavinderSingh13's answer works but requires that the input file name be repeated in the argument list, which can be avoided by modifying ARGC and ARGV:
awk '
BEGIN{
FS=OFS="; "
}
NR==1{
ARGV[ARGC++] = FILENAME
}
FNR==NR{
nf=(nf>NF?nf:NF)
next
}
{
NF=nf
}
1
' testtest.txt

gawk 'BEGIN { FS = (OFS = "; ") "*" } NF = 8'
-or-
mawk NF=8 FS='; *' OFS='; '
a; 1; 2; 3; 4; ; ;
b; 11; 22; ; ; ; ;
c; 122; 233; 344; 45; 56; ;
d; 13; ; ; ; ; ;
e; 144; 25; 36; 47; 58; 69;

Related

Remove whitespace after comma using FPAT Var

Here's my code
BEGIN {
FPAT="([^,]+)|(\"[^\"]+\")"
}
{
print "NF = ", NF
for (i = 1; i <= NF; i++) {
printf("$%d = <%s>\n", i, $i)}
}
And the output are :
NF = 3
$1 = <Johny Bravo>
$2 = < Chief of Security>
$3 = < 417-555-66>
There's whitespace before the string. How to remove that whitespace? The whitespace in input are space after ",". The Input from .txt file that contain record like :
Johny Bravo, Chief of Security, 417-555-66
Expected output
NF = 3
$1 = <Johny Bravo>
$2 = <Chief of Security>
$3 = <417-555-66>
Converting my comment to answer so that solution is easy to find for future visitors.
You may call gsub inside the for loop to remove leading and trailing spaces from each field.
s='Johny Bravo, Chief of Security, 417-555-66'
awk -v FPAT='("[^"]+")"|[^,]+' '{
for (i = 1; i <= NF; i++) {
gsub(/^ +| +$/, "", $i)
printf("$%d = <%s>\n", i, $i)
}
}' <<< "$s"
$1 = <Johny Bravo>
$2 = <Chief of Security>
$3 = <417-555-66>

AWK: How to supress default print

AWK: How to supress default print
Following awk if statement always prints $0. How to stop it from doing so
( nodeComplete && count )
{
#print $0
#print count;
for (i = 0; i < count; i++) {print array1[i];};
nodeComplete=0;
count=0;
}
Welcome to SO, try changing your braces { position and let me know if this helps.
( nodeComplete && count ){
#print $0
#print count;
for (i = 0; i < count; i++) {print array1[i];};
nodeComplete=0;
count=0;
}
Explanation of above change:
logic behind this is simple { next to condition means coming
statements should be executed as per condition. If you put them in
next line then it will all together a different set of block and
condition will be a different block. So if condition is TRUE then it
will print complete line since { is altogether a separate block.

Sort rows in csv file without header & first column

I've a CSV file containing records like below.
id,h1,h2,h3,h4,h5,h6,h7
101,zebra,1,papa,4,dog,3,apple
102,2,yahoo,5,kangaroo,7,ape
I want to sort rows into this file without header and first column. My output should like this.
id,h1,h2,h3,h4,h5,h6,h7
101,1,3,4,apple,dog,papa,zebra
102,2,5,7,ape,kangaroo,yahoo
I tried below AWK but don't know how to exclude header and first column.
awk -F"," ' {
s=""
for(i=1; i<=NF; i++) { a[i]=$i; }
for(i=1; i<=NF; i++)
{
for(j = i+1; j<=NF; j++)
{
if (a[i] >= a[j])
{
temp = a[j];
a[j] = a[i];
a[i] = temp;
}
}
}
for(i=1; i<=NF; i++){ s = s","a[i]; }
print s
}
' file
Thanks
If perl is okay:
$ perl -F, -lane 'print join ",", $.==1 ? #F : ($F[0], sort #F[1..$#F])' ip.txt
id,h1,h2,h3,h4,h5,h6,h7
101,1,3,4,apple,dog,papa,zebra
102,2,5,7,ape,kangaroo,yahoo
-F, to indicate , as input field separator, results saved in #F array
See https://perldoc.perl.org/perlrun#Command-Switches for details on other options
join "," to use , as output field separator
$.==1 ? #F for first line, print as is
($F[0], sort #F[1..$#F]) for other lines, get first field and sorted output of other fields
.. is range operator, $#F will give index of last field
you can also use (shift #F, sort #F) instead of ($F[0], sort #F[1..$#F])
For given header, sorting first line would work too, so this can simplify logic required
$ # can also use: perl -F, -lane 'print join ",", shift #F, sort #F'
$ perl -F, -lane 'print join ",", $F[0], sort #F[1..$#F]' ip.txt
id,h1,h2,h3,h4,h5,h6,h7
101,1,3,4,apple,dog,papa,zebra
102,2,5,7,ape,kangaroo,yahoo
$ # can also use: ruby -F, -lane 'print [$F.shift, $F.sort] * ","'
$ ruby -F, -lane 'print [$F[0], $F.drop(1).sort] * ","' ip.txt
id,h1,h2,h3,h4,h5,h6,h7
101,1,3,4,apple,dog,papa,zebra
102,2,5,7,ape,kangaroo,yahoo
if you have gawk use asort:
awk -v OFS="," 'NR>1{split($0, a, ",");
$1=a[1];
delete a[1];
n = asort(a, b);
for (i = 1; i <= n; i++){ $(i+1)=b[i]}};
1' file.csv
This splits the columns to array a with seperator as , for all raws except the first one.
Then assign the first value in the column in a raw with the first value in a and delete this value from a.
Now the a is sorted to b and assign value starting from 2 column. then print it.
You can just use the asort() function in awk for your requirement and start sorting them from second line on-wards. The solution is GNU awk specific because of length(array) function
awk 'NR==1{ print; next }
NR>1 { finalStr=""
arrayLength=""
delete b
split( $0, a, "," )
for( i = 2; i <= length(a); i++ )
b[arrayLength++] = a[i]
asort( b )
for( i = 1; i <= arrayLength ; i++ )
finalStr = (finalStr)?(finalStr","b[i]):(b[i])
printf( "%s", a[1]","finalStr )
printf( "\n" );
}' file
The idea is first we split the entire line with a , delimiter into the array a from which we get the elements from the 2nd field onwards to a new array b. We sort those elements in this new array and append the first column element when we print it finally.

Awk average of n data in each column

"Using awk to bin values in a list of numbers" provide a solution to average each set of 3 points in a column using awk.
How is it possible to extend it to an indefinite number of columns mantaining the format? For example:
2457135.564106 13.249116 13.140903 0.003615 0.003440
2457135.564604 13.250833 13.139971 0.003619 0.003438
2457135.565067 13.247932 13.135975 0.003614 0.003432
2457135.565576 13.256441 13.146996 0.003628 0.003449
2457135.566039 13.266003 13.159108 0.003644 0.003469
2457135.566514 13.271724 13.163555 0.003654 0.003476
2457135.567011 13.276248 13.166179 0.003661 0.003480
2457135.567474 13.274198 13.165396 0.003658 0.003479
2457135.567983 13.267855 13.156620 0.003647 0.003465
2457135.568446 13.263761 13.152515 0.003640 0.003458
averaging values every 5 lines, should output something like
2457135.564916 13.253240 13.143976 0.003622 0.003444
2457135.567324 13.270918 13.161303 0.003652 0.003472
where the first result is the average of the first 1-5 lines, and the second result is the average of the 6-10 lines.
The accepted answer to Using awk to bin values in a list of numbers is:
awk '{sum+=$1} NR%3==0 {print sum/3; sum=0}' inFile
The obvious extension to average all the columns is:
awk 'BEGIN { N = 3 }
{ for (i = 1; i <= NF; i++) sum[i] += $i }
NR % N == 0 { for (i = 1; i <= NF; i++)
{
printf("%.6f%s", sum[i]/N, (i == NF) ? "\n" : " ")
sum[i] = 0
}
}' inFile
The extra flexibility here is that if you want to group blocks of 5 rows, you simply change one occurrence of 3 into 5. This ignores blocks of up to N-1 rows at the end of the file. If you want to, you can add an END block that prints a suitable average if NR % N != 0.
For the sample input data, the output I got from the script above was:
2457135.564592 13.249294 13.138950 0.003616 0.003437
2457135.566043 13.264723 13.156553 0.003642 0.003465
2457135.567489 13.272767 13.162732 0.003655 0.003475
You can make the code much more complex if you want to analyze what the output formats should be. I've simply used %.6f to ensure 6 decimal places.
If you want N to be a command-line parameter, you can use the -v option to relay the variable setting to awk:
awk -v N="${variable:-3}" \
'{ for (i = 1; i <= NF; i++) sum[i] += $i }
NR % N == 0 { for (i = 1; i <= NF; i++)
{
printf("%.6f%s", sum[i]/N, (i == NF) ? "\n" : " ")
sum[i] = 0
}
}' inFile
When invoked with $variable set to 5, the output generated from the sample data is:
2457135.565078 13.254065 13.144591 0.003624 0.003446
2457135.567486 13.270757 13.160853 0.003652 0.003472

Maya MEL scripting sprintf equivalent

How do you print an integer into a string in Maya MEL scripting?
It turns out you can just use + to concatenate string and integer objects in Maya Embedded Language.
For example:
int $i ;
string $s ;
for( $i = 0; $i < 5; $i++ ) {
$s = "ooh" + $i ;
print $s ;
}
There is also the format command