None of the samples of accord.net handles international parsing of decimals - accord.net

Since all the input sample files for the programming samples of accord.net are using en-US decimal points, all regions that are not using that decimal point will not be able to load the sample files, because the parsing of the decimalnumbers fails.
Here is the fix:
before the load loop add this line:
System.Globalization.CultureInfo us = new System.Globalization.CultureInfo("en-US");
Then parse the decimals like this:
double.Parse(strs[0], us.NumberFormat);

Technically this is not a question, but a bug-report

Related

Reading Fortran binary file in Python

I'm having trouble reading an unformatted F77 binary file in Python.
I've tried the SciPy.io.FortraFile method and the NumPy.fromfile method, both to no avail. I have also read the file in IDL, which works, so I have a benchmark for what the data should look like. I'm hoping that someone can point out a silly mistake on my part -- there's nothing better than having an idiot moment and then washing your hands of it...
The data, bcube1, have dimensions 101x101x101x3, and is r*8 type. There are 3090903 entries in total. They are written using the following statement (not my code, copied from source).
open (unit=21, file=bendnm, status='new'
. ,form='unformatted')
write (21) bcube1
close (unit=21)
I can successfully read it in IDL using the following (also not my code, copied from colleague):
bcube=dblarr(101,101,101,3)
openr,lun,'bcube.0000000',/get_lun,/f77_unformatted,/swap_if_little_endian
readu,lun,bcube
free_lun,lun
The returned data (bcube) is double precision, with dimensions 101x101x101x3, so the header information for the file is aware of its dimensions (not flattend).
Now I try to get the same effect using Python, but no luck. I've tried the following methods.
In [30]: f = scipy.io.FortranFile('bcube.0000000', header_dtype='uint32')
In [31]: b = f.read_record(dtype='float64')
which returns the error Size obtained (3092150529) is not a multiple of the dtypes given (8). Changing the dtype changes the size obtained but it remains indivisible by 8.
Alternately, using fromfile results in no errors but returns one more value that is in the array (a footer perhaps?) and the individual array values are wildly wrong (should all be of order unity).
In [38]: f = np.fromfile('bcube.0000000')
In [39]: f.shape
Out[39]: (3090904,)
In [42]: f
Out[42]: array([ -3.09179121e-030, 4.97284231e-020, -1.06514594e+299, ...,
8.97359707e-029, 6.79921640e-316, -1.79102266e-037])
I've tried using byteswap to see if this makes the floating point values more reasonable but it does not.
It seems to me that the np.fromfile method is very close to working but there must be something wrong with the way it's reading the header information. Can anyone suggest how I can figure out what should be in the header file that allows IDL to know about the array dimensions and datatype? Is there a way to pass header information to fromfile so that it knows how to treat the leading entry?
I played a bit around with it, and I think I have an idea.
How Fortran stores unformatted data is not standardized, so you have to play a bit around with it, but you need three pieces of information:
The Format of the data. You suggest that is 64-bit reals, or 'f8' in python.
The type of the header. That is an unsigned integer, but you need the length in bytes. If unsure, try 4.
The header usually stores the length of the record in bytes, and is repeated at the end.
Then again, it is not standardized, so no guarantees.
The endianness, little or big.
Technically for both header and values, but I assume they're the same.
Python defaults to little endian, so if that were the the correct setting for your data, I think you would have already solved it.
When you open the file with scipy.io.FortranFile, you need to give the data type of the header. So if the data is stored big_endian, and you have a 4-byte unsigned integer header, you need this:
from scipy.io import FortranFile
ff = FortranFile('data.dat', 'r', '>u4')
When you read the data, you need the data type of the values. Again, assuming big_endian, you want type >f8:
vals = ff.read_reals('>f8')
Look here for a description of the syntax of the data type.
If you have control over the program that writes the data, I strongly suggest you write them into data streams, which can be more easily read by Python.
Fortran has record demarcations which are poorly documented, even in binary files.
So every write to an unformatted file:
integer*4 Test1
real*4 Matrix(3,3)
open(78,format='unformatted')
write(78) Test1
write(78) Matrix
close(78)
Should ultimately be padded by an np.int32 values. (I've seen references that this tells you the record length, but haven't verified persconally.)
The above could be read in Python via numpy as:
input_file = open(file_location,'rb')
datum = np.dtype([('P1',np.int32),('Test1',np.int32),('P2',np.int32),('P3',mp.int32),('MatrixT',(np.float32,(3,3))),('P4',np.int32)])
data = np.fromfile(input_file,datum)
Which should fully populate the data array with the individual data sets of the format above. Do note that numpy expects data to be packed in C format (row major) while Fortran format data is column major. For square matrix shapes like that above, this means getting the data out of the matrix requires a transpose as well, before using. For non square matrices, you will need to reshape and transpose:
Matrix = np.transpose(data[0]['MatrixT']
Transposing your 4-D data structure is going to need to be done carefully. You might look into SciPy for automated ways to do so; the SciPy package seems to have Fortran related utilities which I have not fully explored.

GeoTIFF software generation

I have some data about the height of a set of points (for example, from the Google Elevation API). There is a task to save this data in GeoTIFF format, then to use in osgEarth (GDAL). How can this be done? It does not matter in what language.
A quick search on the Internet only gave me the answer to the reverse question (How do I open geotiff images with gdal in python?)
I would be very grateful for any help.
So i would do this with GDAL from python (You could also use rasterio which is a nice wrapper around gdal for file raster file handling)
You should put your data in a numpy array,let us call it some_nparray.
Then create the tif dataset gtiffDriver.Create(). Here you can provide the name of your file, the dimensions in number of columns and rows of your image, the number of bands (here 1), and the datatype. Here i said float32, however byte, int16 etc could also work, depending on your data (you can check it with heigh_data_array.dtype)
Next you should set the geotransform, which is the information about the corner coordinates and pixel resolution, and you should set the projection you are using. This is done with dataset.SetGeoTransform and dataset.SetProjection. How these are created is not in the scope of this question I believe. If you do not need it, i guess you can even skip that part.
Finally write your array to the file with WriteArray and close the file.
You code should look something like this. Here I use the convention that variables prefixed with some_ should be provided by you.
from osgeo import gdal
height_data_array = some_nparray
gtiffDriver = gdal.GetDriverByName('GTiff')
dataset = gtiffDriver.Create('result.tif',
height_data_array.shape[1],
height_data_array.shape[0],
1,
gdal.GDT_Float32)
dataset.SetGeoTransform(some_geotrans)
dataset.SetProjection(some_projection)
dataset.GetRasterBand(1).WriteArray(height_data_array)
dataset = None

Alt-Code Characters in F#

Edit/Update:
Thank you all for responding. I understand I was being too vague, but wasn't sure if posting naked lines of code would be useful in this case.
In my .vb file I have a pulldown control with its validation values as:
TempUnit.DataSource = {"°C", "°F", "°R", "K"}
...which is stored in a variable:
Dim unit As String = TempUnit.SelectedItem.ToString
...which gets passed into a function along with other variables:
Function xxx(..., ByVal unitT As String) As Double
... which finally calls the .fs file and gets evaluated using:
let tempConv t u =
match u with
|"°C" -> t * 9.0 / 5.0 + 32.0
|"°R" -> t - 459.67
|"K" -> t * 9.0 / 5.0 - 459.67
|_ -> t
If any temperature unit other than Kelvin is selected, the match fails and defaults to the else case (which is Fahrenheit in this context). I ended up bypassing the degree symbol entirely by evaluating the substring instead:
Dim unit As String = TempUnit.SelectedItem.ToString.Substring(1)
The program is working again, but I have no idea what I changed, if anything, to make the string match stop working. The first thing I tried was to copy/paste from one file to other to ensure they were identical strings, in addition to trying other symbols, but to no avail. The degree symbol is what caught my attention, but then I checked the pressure units and found the exact same issue with the micro prefix.
Thank you, Hans Passant, I had unicode in mind as a possible solution, but it didn't seem like an easy fix in the heat of the moment. I appreciate your link.
Original Post:
I have a VB program referencing a function stored in an F# library file whose arguments include unit of measure strings containing special characters (e.g. "°C" "µBar").
The strings are identical in the .vb and .fs files; and there was no issue until the F# library file stopped recognizing the Alt-Code characters for reasons unbeknownst to me.
The program works as intended if I remove the offending Alt-Code character from the string definitions in the F# and VB files.
What would cause a match to fail between two identical strings that happen to contain an Alt-Code character?
What is the proper way to handle Alt-Code characters in F# (and VB for that matter)?
The µ glyph is a bit infamous. Unicode has two codepoints that look like that: U+03BC = "Greek small letter Mu" and U+00B5 = "Micro sign". One is a letter in the Greek alphabet, the other is a symbol that often appears in math and units.
Compare μ and µ. Looks almost identical in most fonts (you can see the difference with Segoe UI) and very easily fools the human eye. Typographers insist they are not the same, particularly if they are Greek I'd imagine. Nor does a computer, the problem you are surely dealing with.
Copy/paste or re-type to fix. The Charmap.exe applet in Windows is very handy to get this right.

How Do I Convert a Byte Stream to a Text String?

I'm working on a licensing system for my application. I'd like to put all licensing information (licensee name, expiration date, and enabled features) into an object, encrypt that object with a private key, then represent the encrypted data as a single text string which I can send via email to my customers.
I've managed to get the encrypted data into a byte stream, but I don't know how to convert that byte stream into a text value -- something that contains no control characters or whitespace. Can anyone offer advice on how to do that? I've been researching the Encoding class, but I can't find a text-only encoding.
I'm using Net 2.0 -- mostly VB, but I can do C# also.
Use a Base64Encoder to convert it to a text string that can be decoded with a Base64Decoder. It is great for representing arbitary binary data in a text friendly manner, only upper and lower case A-Z and 0-9 digits.
BinHex is an example of one way to do that. It may not be exactly what you want -- for example, you might want to encode your data such that it's impossible to inadvertently spell words in your string, and you may or may not care about maximizing the density of information. But it's an example that may help you come up with your own encoding.
I've found Base32 useful for license keys before. There are some C# implementations linked from this answer. My own license code is based on this implementation, which avoids ambiguous characters to make it easier to retype the keys.

Char.ConvertFromUtf32 not available in Silverlight

I'm converting a WinForms app to Silverlight (VB.NET). What should I use instead of Char.ConvertFromUtf32 as it's not available to use in Silverlight?
UTF-32 is currently not part of Silverlight, so you have to find a way around the limitation. I think you should stop a moment and think exactly why you need to read UTF32-encoded text.
If you are reading such text from a database or a file on the server, I would perform the conversion server-side (if possible I would convert everything to UTF-8 and get rid of the UTF-32 data in one shot).
If you are parsing a user-provided file on the client side, I would detect the UTF-32 encoding and gently tell the user that the file encoding is not supported. UTF32 is pretty rare nowadays, so I guess it should not be a very common case (but I could be wrong not knowing your exact situation).
In order to detect the file encoding you have to look at the first few bytes (byte order mark) -more information here, if they are not present the task becomes much harder and involves some kind of heuristics based on character frequency.
From: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/types/how-to-convert-between-hexadecimal-strings-and-numeric-types
You can use a direct cast, like:
// Get the character corresponding to the integral value.
string stringValue = Char.ConvertFromUtf32(value);
char charValue = (char)value;
Small warning, it will only work up to 0xffff. It will not work for high range Unicode from 0x10000 to 0x10ffff.
Also, if you need to parse \uXXXX, try this other question: How do I convert Unicode escape sequences to Unicode characters in a .NET string?