Awk for a fortran output - awk

sorry for the probably too easy question, but I am a beginner in this.
I have a FORTRAN output that looks somewhat like this:
xxxx 3.54D+05 yyyy
xxxx 3.89D+08 yyyy
xxxx 2.45D-04 yyyy
...
...
I would like to print the logarithm of the second column, but awk does not recognize this form of scientific notation. Any suggestions?
Thank you!

I don't have my Mac to hand to test this, but in the interests of getting you started, I'll have a try off the top of my head...
awk '{x=$2; sub(/D/,"e",x); print log(x)}' file
Hopefully that will pick up column 2 into variable x and then replace the D by e, get the log and print it... then again it may say "bailing at line 1 because Mark is clueless"....

Related

Difference between print, put and say?

In Perl 6, what is the difference between print, put and say?
I can see how print 5 is different, but put 5 and say 5 look the same.
put $a is like print $a.Str ~ “\n”
say $a is like print $a.gist ~ “\n”
put is more computer readable.
say is more human readable.
put 1 .. 8 # 1 2 3 4 5 6 7 8
say 1 .. 8 # 1..8
Learn more about .gist here.
———
More accurately, put and say append the value of the nl-out attribute of the output filehandle, which by default is \n. You can override it, though. Thanks Brad Gilbert for pointing that out.
Handy Perl 6 FAQ: How and why do say, put and print differ?
The most obvious difference is that say and put append a newline at the end of the output, and print does not.
But there's another difference: print and put converts its arguments to a string by calling the Str method on each item passed to, say uses the gist method instead. The gist method, which you can also create for your own classes, is intended to create a Str for human interpretation. So it is free to leave out information about the object deemed unimportant to understand the essence of the object.
...
So, say is optimized for casual human interpretation, dd is optimized for casual debugging output and print and put are more generally suitable for producing output.
...

Grep for Multiple instances of string between a substring and a character?

Can you please tell me how to Grep for every instance of a substring that occurs multiple times on multiple lines within a file?
I've looked at
https://unix.stackexchange.com/questions/131399/extract-value-between-two-search-patterns-on-same-line
and How to use sed/grep to extract text between two words?
But my problem is slightly different - each substring will be immediately preceded by the string: name"> and will be terminated be a < character immediately after the last character of the substring I want.
So one line might be
<"name">Bob<125><adje></name><"name">Dave<123><adfe></name><"name">Fred<125><adfe></name>
And I would like the output to be:
Bob
Dave
Fred
Although awk is not the best tool for xml processing, it will help if your xml structure and data simple enough.
$ awk -F"[<>]" '{for(i=1;i<NF;i++) if($i=="\"name\"") print $(++i)}' file
Bob
Dave
Fred
I doubt that the tag is <"name"> though. If it's <name>, without the quotes change the condition in the script to $i=="name"
gawk
awk -vRS='<"name">|<' '/^[A-Z]/' file
Bob
Dave
Fred

Awk Sum skipping Special Character Row

I am trying to take the sum of a particular column in a file i.e. column 18.
Using awk command along with Printf to display it in proper decimal format.
SUM=`cat ${INF_TARGET_FILE_PATH}/${EXTRACT_NAME}_${CURRENT_DT}.txt|awk -F"" '{s+=$18}END{printf("%24.2f\n", s)}'
Above command is skipping those rows in file which has the special character in one of the column 5 - RÉPARATIONS. Hence Awk skips these rows and doesnt consider sum for that row. Please help how to resolve this issue to take sum of all rows.
There is missing a back tics in your example, should be:
SUM=`cat ${INF_TARGET_FILE_PATH}/${EXTRACT_NAME}_${CURRENT_DT}.txt|awk -F"" '{s+=$18}END{printf("%24.2f\n", s)}'`
But you should not use back tics, you should use parentheses $(code)
Using cat to enter data to awk is also wrong way to do it, add pat after awk
SUM=$(awk -F"" '{s+=$18} END {printf "%24.2f\n",s}' ${INF_TARGET_FILE_PATH}/${EXTRACT_NAME}_${CURRENT_DT}.txt)
This may resolve your problem, but gives a more correct code.
If you give us your input file, it would help us to understand the problem.

How do I create a sub arrary in awk?

Given a list like:
Dog bone
Cat catnip
Human ipad
Dog collar
Dog collar
Cat collar
Human car
Human laptop
Cat catnip
Human ipad
How can I get results like this, using awk:
Dog bone 1
Dog collar 2
Cat catnip 2
Cat collar 1
Human car 1
Human laptop 1
Human ipad 2
Do I need a sub array? It seems to me like a need an array of "owners" which is populated by arrays of "things."
I'd like to use awk to do this, as this is a subscript of another program in awk, and for now, I'd rather not create a separate program.
By the way, I can already do it using sort and grep -c, and a few other pipes, but I really won't be able to do that on gigantic data files, as it would be too slow. Awk is generally much faster for this kind of thing, I'm told.
Thanks,
Kevin
EDIT: Be aware, that the columns are actually not next to eachother like this, in the real file, they are more like column $8 and $11. I say this because I suppose if they were next to eachother I could incorporate an awk regex ~/Dog\ Collar/ or something. But I won't have that option. -thanks!
awk does not have multi-dimensional arrays, but you can manage by constructing 2D-ish array keys:
awk '{count[$1 " " $2]++} END {for (key in count) print key, count[key]}' | sort
which, from your input, outputs
Cat catnip 2
Cat collar 1
Dog bone 1
Dog collar 2
Human car 1
Human ipad 2
Human laptop 1
Here, I use a space to separate the key values. If your data contains spaces, you can use some other character that does not appear in your input. I typically use array[$a FS $b] when I have a specific field separator, since that's guaranteed not to appear in the field values.
GNU Awk has some support for multi-dimensional arrays, but it's really just cleverly concatenating keys to form a sort of compound key.
I'd recommend learning Perl, which will be fairly familiar to you if you like awk, but Perl supports true Lists of Lists. In general, Perl will take you much further than awk.
Re your comment:
I'm not trying to be superior. I understand you asked how to accomplish a task with a specific tool, awk. I did give a link to the documentation for simulating multi-dimensional arrays in awk. But awk doesn't do that task well, and it was effectively replaced by Perl nearly 20 years ago.
If you ask how to cross a lake on a bicycle, and I tell you it'll be easier in a boat, I don't think that's unreasonable. If I tell you it'll be easier to first build a bridge, or first invent a Star Trek transporter, then that would be unreasonable.

Summing values in one-line comma delimited file

EDIT: Thanks all of you. Python solution worked lightning-fast :)
I have a file that looks like this:
132,658,165,3216,8,798,651
but it's MUCH larger (~ 600 kB). There are no newlines, except one at the end of file.
And now, I have to sum all values that are there. I expect the final result to be quite big, but if I'd sum it in C++, I possess a bignum library, so it shouldn't be a problem.
How should I do that, and in what language / program? C++, Python, Bash?
Penguin Sed, "Awk"
sed -e 's/,/\n/g' tmp.txt | awk 'BEGIN {total=0} {total += $1} END {print total}'
Assumptions
Your file is tmp.txt (you can edit this obviously)
Awk can handle numbers that large
Python
sum(map(int,open('file.dat').readline().split(',')))
The language doesn't matter, so long as you have a bignum library. A rough pseudo-code solution would be:
str = ""
sum = 0
while input
get character from input
if character is not ','
append character to back of str
else
convert str to number
add number to sum
str = ""
output sum
If all of the numbers are smaller than (2**64)/600000 (which still has 14 digits), an 8 byte datatype like "long long" in C will be enough. The program is pretty straight-forward, use the language of your choice.
Since it's expensive to treat that large input as a whole I suggest you take a look at this post. It explains how to write a generator for string splitting. It's in C# but it well suited for crunching through that kind of input.
If you are worried about the total sum to not fit in a integer (say 32-bit) you can just as easily implement a bignum your self, especially if you just use integer and addition. Just carry the bit-31 to next dword and keep adding.
If precision isn't important, just accumulate the result in a double. That should give you plenty of range.
http://www.koders.com/csharp/fid881E3E70CC37E480545A0C37C98BC8C208B06723.aspx?s=datatable#L12
A fast C# CSV parser. I've seen it crunch though a few thousand 1MB files rather quickly, I have it running as part of a service that consumes about 6000 files a month.
No need to reinvent a fast wheel.
python can handle the big integers.
tr "," "\n" < file | any old script for summing
Ruby is convenient, since it automatically handles big numbers. I can't remember of Awk does arbitrary precision arithmentic, but if so, you could use
awk 'BEGIN {RS="," ; sum = 0 }
{sum += $1 }
END { print sum }' < file