I do have a file with different length of each line. E.g.:
a; 1; 2; 3; 4;
b; 11; 22;
c; 122; 233; 344; 45; 56;
d; 13;
e; 144; 25; 36; 47; 58; 69;
I try to generate a file, separated by semicolon where each line has the same amount of values. E.g.:
a; 1; 2; 3; 4; ; ;
b; 11; 22; ; ; ; ;
c; 122; 233; 344; 45; 56; ;
d; 13; ; ; ; ; ;
e; 144; 25; 36; 47; 58; 69;
I tried different ways with awk but I am to newbie to get it done correctly in bulk.
awk '{if( $4 == ""){print ";"}else{print $4}}' testtest.txt
I hope the swarm intelligence can help me with it.
With your shown samples please try following awk code. This is more a like Generic code, where I am getting highest number of fields in whole Input_file in its first read and then once its found passing it to 2nd Input_file and assigning NF value for each line to NF value which gives total number of fields as per need and puts ; for fields newly added.
awk -v FS='; ' -v OFS='; ' '
FNR==NR{
nf=(nf>NF?nf:NF)
next
}
{
$nf=$nf
}
1
' Input_file Input_file
Making your records contain at least 8 fields:
awk -F '; *' -v OFS='; ' '{$8 = $8} 1'
limitations:
The wanted number of fields is specified statically, so you need to already know how many there are in the input file (see #RavinderSingh13 answer for a generic way to determine the number of fields).
If, for example, there's a record with 9 fields, the code will not strip it down to 8.
#RavinderSingh13's answer works but requires that the input file name be repeated in the argument list, which can be avoided by modifying ARGC and ARGV:
awk '
BEGIN{
FS=OFS="; "
}
NR==1{
ARGV[ARGC++] = FILENAME
}
FNR==NR{
nf=(nf>NF?nf:NF)
next
}
{
NF=nf
}
1
' testtest.txt
gawk 'BEGIN { FS = (OFS = "; ") "*" } NF = 8'
-or-
mawk NF=8 FS='; *' OFS='; '
a; 1; 2; 3; 4; ; ;
b; 11; 22; ; ; ; ;
c; 122; 233; 344; 45; 56; ;
d; 13; ; ; ; ; ;
e; 144; 25; 36; 47; 58; 69;
Input (sample)
=== account ===
title,altTitle,platform,url,
title,altTitle,platform,url,
title,altTitle,platform,url,
title,altTitle,platform,url,
title,altTitle,platform,url,
__collate-by-account.awk
#! /usr/bin/awk -f
#
# Group together lines (records) by account name
BEGIN { FS = ":" }
### generate headers ###
{s = $1}
{if (s != p)
print "\n\n=== ", s " ==="
}
{p = s}
### process records ###
# print field $2 to last field
{for (i = 2; i <= NF; ++i)
# {if (i!=NF) printf $i":"; else printf $i}
{ i != NF ? printf $i":" : printf $i }
}
{printf "\n"}
This part works as intended:
{if (i!=NF) printf $i":"; else printf $i}
Why doesn't this work:
{ i != NF ? printf $i":" : printf $i }
Getting the following errors:
awk: scripts/utils/metadata/__collate-by-account.awk:18: { i != NF ? printf $i":" : printf $i }
awk: scripts/utils/metadata/__collate-by-account.awk:18: ^ syntax error
awk: scripts/utils/metadata/__collate-by-account.awk:18: { i != NF ? printf $i":" : printf $i }
awk: scripts/utils/metadata/__collate-by-account.awk:18: ^ syntax error
Solution, thanks to #James Brown:
### process records ###
# print field $2 to last field
{for (i = 2; i <= NF; ++i)
{ printf "%s%s",$i,(i!=NF?":":"") }
}
{printf "\n"}
Explaination:
First off, note that printf can't be inside the ternary operator, neither the conditional expression to be evaluated (for obvious reasons) nor the resulting if-else expressions that will be executed after evaluation.
printf formats and prints the results
%s%s format specifiers, outputs or substitutes the next 2 arguments as strings:
https://www.gnu.org/software/gawk/manual/html_node/Format-Modifiers.html
https://en.wikipedia.org/wiki/Printf_format_string
$i simply output the field that's being looped over, see the above for-loop
(i!=NF?":":"")
output ":" if i is not equal to NF,
otherwise output empty string ""
I've this error:
Use of uninitialized value $index in concatenation (.) or string at getdesc.pl line 43, <OctetsIn> line 2.
part of my code as follows:
my $select_sth = $dbh->prepare("SELECT Hid,Hostname,IP FROM Devices")
or die "$dbh->errstr";
$select_sth->execute() or die "$dbh->errstr";
while ( my $row_ref = $select_sth->fetchrow_hashref ) {
my $hostname = $row_ref->{'Hostname'};
if ( $hostname ne 'null' ) {
my $hid = $row_ref->{'Hid'};
my $ip = $row_ref->{'IP'};
my $desc = "null";
my $index = 0;
open( OctetsIn, "snmpwalk -v2c -c public $ip 1.3.6.1.2.1.18 |" )
or die "can't exec: $!";
while (<OctetsIn>) {
chomp;
print <OctetsIn> . "\n";
/IF-MIB::ifAlias.(\S+) = STRING: (\S+)/;
$index = $1;
$desc = $2;
$dbh->do(
"INSERT INTO Description (Hid,index,desc) Values ($hid,$index,'$desc')"
) or die "$dbh->errstr";
}
}
}
close(OctetsIn);
What error is there in my code? anyone knows how to fix the error ?
The error is on the line:
$dbh->do("INSERT INTO Description (Hid,index,desc) Values ($hid,$index,'$desc')") or die "$dbh->errstr";
You should test if regex was successful prior to assigning $1 to $index, ie.
# skip to next line if current did not match, as $1 and $2 are undefined
/IF-MIB::ifAlias.(\S+) = STRING: (\S+)/ or next;
There are three issues regarding your innermost while loop:
You're reading from the filehandle twice when trying to just print the current line:
while (<OctetsIn>) {
chomp;
print <OctetsIn> . "\n"; # Should be: print "$_\n";
Always verify that your regular expression matched before using capture variables.
/IF-MIB::ifAlias.(\S+) = STRING: (\S+)/;
$index = $1; # Will be undefined if regex doesn't match
$desc = $2;
Use placeholders and bind values instead of manually including values in a SQL statement:
Should aim to never interpolate values directly into a SQL statement like below:
"INSERT INTO Description (Hid,index,desc) Values ($hid,$index,'$desc')"
To clean up these three issues, I'd transform your inner while loop to something like the following.
while (<OctetsIn>) {
chomp;
print "$_\n";
if (my ($index, $desc) = /IF-MIB::ifAlias.(\S+) = STRING: (\S+)/) {
$dbh->do(
"INSERT INTO Description (Hid,index,desc) Values (?,?,?)",
undef, $hid, $index, $desc
) or die $dbh->errstr;
}
}
$index = $1;
your regexp doesn't match, so $1 is undef
The following works great on my data in column 12 but I have over 70 columns that are not all the same and I need to output all of the columns, the converted ones replacing the scientific values.
awk -F',' '{printf "%.41f\n", $12}' $file
Thanks
This is one line..
2012-07-01T21:59:50,2012-07-01T21:59:00,1817,22901,264,283,549,1,2012-06-24T13:20:00,2.600000000000000e+001,4.152327506554059e+001,-7.893523806678388e+001,5.447572631835938e+002,2.093000000000000e+003,5.295000000000000e+003,1,194733,1.647400093078613e+001,31047680,1152540,29895140,4738,1.586914062500000e+000,-1.150000000000000e+002,0.000000000000000e+000,0.000000000000000e+000,0.000000000000000e+000,0.000000000000000e+000,0.000000000000000e+000,0.000000000000000e+000,0.000000000000000e+000,3.606000000000000e+003,0.000000000000000e+000,0.000000000000000e+000,0.000000000000000e+000,4.557073364257813e+002,0.000000000000000e+000,0.000000000000000e+000,0.000000000000000e+000,11,0.000000000000000e+000,2.000000000000000e+000,0,0,0,0,4.466836981009692e-004,0.000000000000000e+000,0.000000000000000e+000,0.000000000000000e+000,8,0,840,1,600,1,6,1,1,1,5,2,2,2,1,1,1,1,4854347,0,-
UPDATE
This is working for the non converted output. I am having a bit of trouble inserting an else if statement for some reason. Everything seems to give me a syntax error in a file or on cli.
awk -F',' '{for (i=1;i<=NF;i++) {if (i <= 9||i == 16||i == 17||i == 19||i == 20||i == 21||i == 22|| i == 40|| i == 43||i == 44||i == 45||i == 46||i >= 51) printf ($i",")};}' $file
I would like to insert the following statement into the code above??
else if (i == 10) printf ("%.41f", $i)
SOLVED
Got it worked out. Thanks for all the great ideas. I can't seem to make it work in a file with awk -f but on the command line this is working great. I put this one liner in my program.
awk -F',' '{for (i=1;i<=NF;i++) {if (i <= 9||i == 16||i == 17||i >= 19&&i <= 22|| i == 40|| i >= 43&&i <= 46||i >= 51&&i <= 70) printf($i","); else if (i == 10||i == 18) printf("%.2f,", $i); else if (i == 11||i == 12) printf("%.41f,", $i); else if (i == 13) printf("%.1f,", $i); else if (i == 14||i == 15||i >= 24&&i <= 46) printf ("%d,", $i); else if (i == 23) printf("%.4f,", $i); else if (i >= 47&&i <= 50) printf("%.6f,", $i); if (i == 71) printf ($i"\n")};}'
RESULT
2012-07-01T21:59:50,2012-07-01T21:59:00,1817,22901,264,283,549,1,2012-06-24T13:20:00,26.00,41.52327506554058800247730687260627746582031,-78.93523806678388154978165403008460998535156,544.8,2093,5295,1,194733,16.47,31047680,1152540,29895140,4738,1.5869,-115,0,0,0,0,0,0,0,3606,0,0,0,455,0,0,0,11,0,2,0,0,0,0,0.000447,0.000000,0.000000,0.000000,8,0,840,1,600,1,6,1,1,1,5,2,2,2,1,1,1,1,4854347,0,-
You can do regex matching in a loop to choose the format for each field since numbers are also strings in AWK:
#!/usr/bin/awk -f
BEGIN {
d = "[[:digit:]]"
OFS = FS = ","
}
{
delim = ""
for (i = 1; i <= NF; i++) {
if ($i ~ d "e+" d d d "$") {
printf "%s%.41f", delim, $i
}
else {
printf "%s%s", delim, $i
}
delim = OFS
}
printf "\n"
}
Edit:
I've changed the version above so you can see how it would be used in a file as an AWK script. Save it (I'll call it "scinote") and set it as executable chmod u+x scinote, then you can run it like this: ./scinote inputfile
I've also modified the latest version you added to your question to make it a little simpler and so it's ready to go into a script file as above.
#!/usr/bin/awk -f
BEGIN {
plainlist = "16 17 19 20 21 22 40 43 44 45 46"
split(plainlist, arr)
for (i in arr) {
plainfmt[arr[i]] = "%s"
}
OFS = FS = ","
}
{
delim = ""
for (i = 1; i <= NF; i++) {
printf "%s", delim
if (i <= 9 || i in plainfmt || i >= 51) {
printf plainfmt[i], $i
}
else if (i == 10) {
printf "%.41f", $i
}
else if (i == 12) {
printf "%.12f", $i
}
delim = OFS
}
printf "\n"
}
If you had more fields with other formats (rather than just one per), you could do something similar to the plainfmt array.
You could always loop through all of your data fields and use them in your printf. For a simple file just to test the mechanics you could try this:
awk '{for (i=1; i<=NF; i++) printf("%d = %s\n", i, $i);}' data.txt
Note that -F is not set here, so fields will be split by whitepace.
NF is the predefined variable for number of fields on a line, fields start with 1 (e.g., $1, $2, etc until $NF). $0 is the whole line.
So for your example this may work:
awk -F',' '{for (i=1; i<=NF; i++) printf "%.41f\n", $i}' $file
Update based on comment below (not on a system test the syntax):
If you have certain fields that need to be treated differently, you may have to resort to a switch statement or an if-statement to treat different fields differently. This would be easier if you stored your script in a file, let's call it so.awk and invoked it like this:
awk -f so.awk $file
Your script might contain something along these lines:
BEGIN{ FS=',' }
{ for (i=1; i<=NF; i++)
{
if (i == 20 || i == 22|| i == 30)
printf( " .. ", $i)
else if ( i == 13 || i == 24)
printf( " ....", $i)
etc.
}
}
You can of course also use if (i > 2) ... or other ranges to avoid having to list out every single field if possible.
As an alternative to this series of if-statements see the switch statement mentioned above.