Use xls, csv or other type of file to make a float array in Objective-C - objective-c

Hi all, I'm *very* new at the whole programming thing, but I really like it. Sorry if I do not have enough details
So pretty much, I have excel files that have columns with numbers ( I can't post a picture because I don't have 10 reputations yet.
I've been searching for a couple days and I haven't really found an answer to this. I was wondering if there is a way that I could either create multiple arrays from the file - I know we can't make matrices like matlab - that have the numbers in each column, i.e.
float numbers[] = {1.3, 1.2, 4.2};
Or create the excel file with numbers (the iWork version of excel), and import the numbers file into the xcode project and from there create the arrays
The issue I have is that there's around a thousand numbers so copying it one by one is extremely time consuming
sorry if this is confusing, please let me know if there's anything else I should add as information

There's two easy ways to do this, even if you are a beginner.
Open MS Excel, input your values, save excel document as .CSV file.
Then grab an objective-c CSV converter like this one here in github and you are done.
The second way, you could declare an array of numbers (in example, 'float matrix[5][5];') and use it however you see fit.
I've done both of my suggestions in seperate projects and both work very well. I used the first method for a 15 page excel document that I needed to use in my app, the second method I used in another app that I needed to constantly change the contents of the 2d array.
Once you declare the 'float matrix[5][5]' this is a 5x5 matrix (a.k.a. table) you can use it however you want. You could have the first array be the column and the second array be the row.

You mean generate objective-C source from a .csv file? One way to do this would be to use a scripting language like Perl:
#!/usr/bin/perl
use warnings;
use strict;
my #numbers = ();
while (<>) {
chomp;
push #numbers, $_;
}
my $numbers = join(', ', #numbers);
print qq(float numbers[] = {$numbers};\n);
This assumes that your .csv file (say foo.csv) has numbers in the first column and nothing else. If the file contains:
1.5
2.5
18842984
-4
And you pipe it to the script (foo.pl in this example):
cat foo.csv | perl foo.pl
It will output this:
float numbers[] = {1.5, 2.5, 18842984, -4};
Is that what you're trying to do?

Related

Referencing nested arrays in awk

I'm creating a bunch of mappings that can be indexed into using 3 keys such as below:
mappings["foo"]["bar"]["blah"][1]=0
split( "10,13,19,49", mappings["foo"]["bar"]["blah"] )
I can then index into the nested array using for example
mappings[product][format][version][i]
But this is a bit long-winded when I need to refer to the same nested array several times, so in other languages I'd create a reference to the inner array:
map=mappings[product][format][version]
map[i]
However, I can't seem to get this to work in awk (gawk 4.1.3).
I can only find one link over google, that suggests this is impossible in previous versions of awk, and a loop setting the keys and values one-by-one is the only solution. Is this still the case or does anyone have a suggestions for a better solution?
https://developer.apple.com/library/archive/documentation/OpenSource/Conceptual/ShellScripting/Howawk-ward/Howawk-ward.html
EDIT
In response to comments a bit more background on what I'm trying to do. If there is a better approach, I'm all for using it!
I have set of CSV files that I'm feeding into AWK. The idea is to calculate a checksum based on specific columns after applying filtering to the rows.
The columns to checksum on, and the filtering to apply, are derivived from runtime parameters sent into the script.
The runtime parameters are a triple of (product,format,version), hence my use of a 3-nested assoicative array.
Another approach would be to use triple as a single key, rather than nesting, but gawk doesn't seem to natively support this, so I'd end-up concatenating the values as string. This felt a bit less structured to me, but if I'm wrong, happy to change my mind on this apporach.
Anyway, it is these parameters that are used to index into the array to structure to retrieve the column numbers, etc.
You can then build-up a tree-like structure, for example, the below shows 2 formats for product foo on version blah, and so on...:
mappings["product-foo"]["format-bar"]["version-blah"][1]=0
split( "10,13,19,49", mappings["product-foo"]["format-bar"]["version-blah"] )
mappings["product-foo"]["format-moo"]["version-blah"][1]=0
split( "55,23,14,6", mappings["product-foo"]["format-moo"]["version-blah"] )
The magic happens like this, you can see how long-winded the mappings indexing becomes without referencing:
(FNR>1 && (format!="some-format" ||
(version=="some-version" && $1=="some-filter") ||
(version=="some-other-version" && $8=="some-other-filter"))) {
# Loop over each supplied field summing an absolute tally for each
for (i=1; i <= length(mappings[product][format][version]); i++) {
sumarr[i] += ( $mappings[product][format][version][i] < 0 ? -$mappings[product][format][version][i]:$mappings[product][format][version][i] )
}
}
The comment from #ed-morton simplifies this as originally requested, but interested if their is a simpler approach.
The right answer is from #ed-morton above (thanks!).
Ed - if you write it out as an answer I'll accept it, otherwise I'll accept this quote in a few days for good housekeeping.
Right, there is no array copy functionality in awk and there are no pointers/references so you can't create a pointer to an array. You can of course create function map(i) { return mappings[product][format][version][i]}

Reading Fortran binary file in Python

I'm having trouble reading an unformatted F77 binary file in Python.
I've tried the SciPy.io.FortraFile method and the NumPy.fromfile method, both to no avail. I have also read the file in IDL, which works, so I have a benchmark for what the data should look like. I'm hoping that someone can point out a silly mistake on my part -- there's nothing better than having an idiot moment and then washing your hands of it...
The data, bcube1, have dimensions 101x101x101x3, and is r*8 type. There are 3090903 entries in total. They are written using the following statement (not my code, copied from source).
open (unit=21, file=bendnm, status='new'
. ,form='unformatted')
write (21) bcube1
close (unit=21)
I can successfully read it in IDL using the following (also not my code, copied from colleague):
bcube=dblarr(101,101,101,3)
openr,lun,'bcube.0000000',/get_lun,/f77_unformatted,/swap_if_little_endian
readu,lun,bcube
free_lun,lun
The returned data (bcube) is double precision, with dimensions 101x101x101x3, so the header information for the file is aware of its dimensions (not flattend).
Now I try to get the same effect using Python, but no luck. I've tried the following methods.
In [30]: f = scipy.io.FortranFile('bcube.0000000', header_dtype='uint32')
In [31]: b = f.read_record(dtype='float64')
which returns the error Size obtained (3092150529) is not a multiple of the dtypes given (8). Changing the dtype changes the size obtained but it remains indivisible by 8.
Alternately, using fromfile results in no errors but returns one more value that is in the array (a footer perhaps?) and the individual array values are wildly wrong (should all be of order unity).
In [38]: f = np.fromfile('bcube.0000000')
In [39]: f.shape
Out[39]: (3090904,)
In [42]: f
Out[42]: array([ -3.09179121e-030, 4.97284231e-020, -1.06514594e+299, ...,
8.97359707e-029, 6.79921640e-316, -1.79102266e-037])
I've tried using byteswap to see if this makes the floating point values more reasonable but it does not.
It seems to me that the np.fromfile method is very close to working but there must be something wrong with the way it's reading the header information. Can anyone suggest how I can figure out what should be in the header file that allows IDL to know about the array dimensions and datatype? Is there a way to pass header information to fromfile so that it knows how to treat the leading entry?
I played a bit around with it, and I think I have an idea.
How Fortran stores unformatted data is not standardized, so you have to play a bit around with it, but you need three pieces of information:
The Format of the data. You suggest that is 64-bit reals, or 'f8' in python.
The type of the header. That is an unsigned integer, but you need the length in bytes. If unsure, try 4.
The header usually stores the length of the record in bytes, and is repeated at the end.
Then again, it is not standardized, so no guarantees.
The endianness, little or big.
Technically for both header and values, but I assume they're the same.
Python defaults to little endian, so if that were the the correct setting for your data, I think you would have already solved it.
When you open the file with scipy.io.FortranFile, you need to give the data type of the header. So if the data is stored big_endian, and you have a 4-byte unsigned integer header, you need this:
from scipy.io import FortranFile
ff = FortranFile('data.dat', 'r', '>u4')
When you read the data, you need the data type of the values. Again, assuming big_endian, you want type >f8:
vals = ff.read_reals('>f8')
Look here for a description of the syntax of the data type.
If you have control over the program that writes the data, I strongly suggest you write them into data streams, which can be more easily read by Python.
Fortran has record demarcations which are poorly documented, even in binary files.
So every write to an unformatted file:
integer*4 Test1
real*4 Matrix(3,3)
open(78,format='unformatted')
write(78) Test1
write(78) Matrix
close(78)
Should ultimately be padded by an np.int32 values. (I've seen references that this tells you the record length, but haven't verified persconally.)
The above could be read in Python via numpy as:
input_file = open(file_location,'rb')
datum = np.dtype([('P1',np.int32),('Test1',np.int32),('P2',np.int32),('P3',mp.int32),('MatrixT',(np.float32,(3,3))),('P4',np.int32)])
data = np.fromfile(input_file,datum)
Which should fully populate the data array with the individual data sets of the format above. Do note that numpy expects data to be packed in C format (row major) while Fortran format data is column major. For square matrix shapes like that above, this means getting the data out of the matrix requires a transpose as well, before using. For non square matrices, you will need to reshape and transpose:
Matrix = np.transpose(data[0]['MatrixT']
Transposing your 4-D data structure is going to need to be done carefully. You might look into SciPy for automated ways to do so; the SciPy package seems to have Fortran related utilities which I have not fully explored.

Is binary identical output possible with XlsxWriter?

With the same input is it possible to make the output binary identical using XlsxWriter?
I tried changing the created property to the same date and that helped a little. Still get a lot of differences in sharedStrings.xml.
Thanks
Yes for identical input, if you set the created date in the worksheet properties:
import xlsxwriter
import datetime
for filename in ('hello1.xlsx', 'hello2.xlsx'):
workbook = xlsxwriter.Workbook(filename)
workbook.set_properties({'created': datetime.date(2016, 4, 25)})
worksheet = workbook.add_worksheet()
worksheet.write('A1', 'Hello world')
workbook.close()
Then:
$ cmp hello1.xlsx hello2.xlsx
# No output. Files are the same.
The order in which strings are added to the file will change the layout of the sharedStrings table and thus lead to non-identical files. That is generally the case with Excel as well.
Note: This requires XlsxWriter version 1.0.4 or later to work.
Even though the author of the previous answer appears to have repudiated it, it appears to be correct, but not the whole story. I did my own tests on Python 3.7 and XlsxWriter 1.1.2. You won't notice the creation time issue if your files are small because they'll be written so fast their default creation times of "now()" will be the same.
What's missing from the first answer is that you need to make the same number of calls to the write_* methods. For example, if you call write followed by merge_range on the same cell for one of the workbooks, you need to have the same sequence of calls for the other. You can't skip the write call and just do merge_range, for instance. If you do this, the sharedStrings.xml files will have different values of count even if the value of uniqueCount is the same.
If you can arrange for these things to be true, then your two workbooks should come out as equal at the binary level.

Extracting Data from an Area file

I am trying to extract information at a specific location (lat,lon) from different satellite images. These images are were given to me in the AREA format and I cooked up a simple jython script to extract temperature values like so.
While the script works, here is small snippet from it that prints out the data value at a point.
from edu.wisc.ssec.mcidas import AreaFile as af
url="adde://localhost/imagedata?&PORT=8113&COMPRESS=gzip&USER=idv&PROJ=0& VERSION=1&DEBUG=false&TRACE=0&GROUP=FL&DESCRIPTOR=8712C574&BAND=2&LATLON=29.7276 -85.0274 E&PLACE=ULEFT&SIZE=1 1&UNIT=TEMP&MAG=1 1&SPAC=4&NAV=X&AUX=YES&DOC=X&DAY=2012002 2012002&TIME=&POS=0&TRACK=0"
a=af(url);
value=a.getData();
print value
array([[I, [array([I, [array('i', [2826, 2833, 2841, 2853])])])
So what does this mean?
Please excuse me if the question seems trivial, while I am comfortable with python I am really new to dealing with scientific data.
Note
Here is a link to the entire script.
After asking around, I found out that the Area objects returns data in multiples of four. So the very first value is what I am looking for.
Grabbing the value is as simple as :
ar[0][0][0]

How to strip a text file into a single line, and then split that into a relevant list in python?

I'm a noob right now with pygame and I was wondering how to load a textfile, then strip that into a a single line. I believe that i would need to use the .rstrip('/n') function on my variable with the openned text file. But now, how do I turn this into a list? If I intentionally used two colons (::) to separate between my relevant pieces of information in the text file, how do I make it into a list with each list index being the contents in between two sets of ::? The purpose is to create save files in a menu GUI when closed, so is there a simpler way to save and open the contents of variables from one instance of the program to the next?
>>> "foo::bar::baz".split("::")
['foo', 'bar', 'baz']
If you just want to save structured data, however, you might want to look at either the pickle or json libraries. Both of them give ways to dump Python objects to files and then load them back out again.