VTK: The data array in the element may be too short - data-visualization

I'm trying to visualize some data in vtr format. For this purposes I've created a couple npy files by this library, then I've converted this files by PyEVTK into the vtr format (like in the lowlevel.py example). But when I'm trying to visualize this data by ParaView, an error appears:
ERROR: In /var/tmp/portage/sci-visualization/paraview-4.0.1-r1/work/ParaView-v4.0.1-source/VTK/IO/XML/vtkXMLDataReader.cxx, line 510
vtkXMLRectilinearGridReader (0x36bb080): Cannot read point data array "Pressure" from PointData in piece 0. The data array in the element may be too short.
Can anybody explain, what exactly means this error message, and what's wrong with the my visualization data?
Solved:
I made a stupid mistake - data size in header was different from the actual data size, and this was the cause of error.

This error may be coming from the XML header declaration, that may not contain all data needed. You may miss the header_type that contains the size of each info written between each set of data.
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="BigEndian" header_type="UInt64">

Related

CreateML data analysis stopped

When I attempt to train a CreateML model, I get the following screen after inputting my training data:
Create ML error message
I am then unable to add my test data or train the model. Any ideas on what is going on here?
[EDIT] As mentioned in my comment below, this issue went away when I removed some of my training data. Any newcomers who are running into this issue are encouraged to try some of the solutions below and comment on whether it worked for them. I'm happy to accept an answer if it seems like it's working for people.
This happens when the first picture in the dataset has no label. If you place a labeled photo as the first in the dataset and in the coreML json, you shouldn't get that issue.
Correct:
[{"annotations":[{"label":"Enemy","coordinates":{"y":156,"x":302,"width":26,"height":55}}],"imagefilename":"Enemy1.png"},{"annotations":[{"label":"Enemy","coordinates":{"y":213,"x":300,"width":69,"height":171}}],"imagefilename":"Enemy7.png"},{"annotations":
Incorrect:
[{"annotations":[],"imagefilename":"Enemy_v40.png"},{"annotations":[],"imagefilename":"Enemy_v41.png"},{"annotations":[],"imagefilename":"Enemy_v42.png"},{"annotations":
At the minimum you should check for these 2 situations, which triggered the same generic error for me (data analysis stopped), in the context of an Object Detection Model:
One or more of the image names referenced in annotations.json is incorrect (e.g. typo in image name)
The first entry in annotations.json has an empty annotations array (i.e. an image that does not contain any of the objects to be detected)
If you are using any random Split or something similar, make sure, its parsing the data correctly. you can test this easily by debugging.
I suggest you check to see if your training data is consistent and all entries have all needed values. The error is likely in the section of data you removed.
That would cause the error Nate commented he is seeing when he gets that pop up.
Getting the log would be the next step in any other evaluation.

Reading Fortran binary file in Python

I'm having trouble reading an unformatted F77 binary file in Python.
I've tried the SciPy.io.FortraFile method and the NumPy.fromfile method, both to no avail. I have also read the file in IDL, which works, so I have a benchmark for what the data should look like. I'm hoping that someone can point out a silly mistake on my part -- there's nothing better than having an idiot moment and then washing your hands of it...
The data, bcube1, have dimensions 101x101x101x3, and is r*8 type. There are 3090903 entries in total. They are written using the following statement (not my code, copied from source).
open (unit=21, file=bendnm, status='new'
. ,form='unformatted')
write (21) bcube1
close (unit=21)
I can successfully read it in IDL using the following (also not my code, copied from colleague):
bcube=dblarr(101,101,101,3)
openr,lun,'bcube.0000000',/get_lun,/f77_unformatted,/swap_if_little_endian
readu,lun,bcube
free_lun,lun
The returned data (bcube) is double precision, with dimensions 101x101x101x3, so the header information for the file is aware of its dimensions (not flattend).
Now I try to get the same effect using Python, but no luck. I've tried the following methods.
In [30]: f = scipy.io.FortranFile('bcube.0000000', header_dtype='uint32')
In [31]: b = f.read_record(dtype='float64')
which returns the error Size obtained (3092150529) is not a multiple of the dtypes given (8). Changing the dtype changes the size obtained but it remains indivisible by 8.
Alternately, using fromfile results in no errors but returns one more value that is in the array (a footer perhaps?) and the individual array values are wildly wrong (should all be of order unity).
In [38]: f = np.fromfile('bcube.0000000')
In [39]: f.shape
Out[39]: (3090904,)
In [42]: f
Out[42]: array([ -3.09179121e-030, 4.97284231e-020, -1.06514594e+299, ...,
8.97359707e-029, 6.79921640e-316, -1.79102266e-037])
I've tried using byteswap to see if this makes the floating point values more reasonable but it does not.
It seems to me that the np.fromfile method is very close to working but there must be something wrong with the way it's reading the header information. Can anyone suggest how I can figure out what should be in the header file that allows IDL to know about the array dimensions and datatype? Is there a way to pass header information to fromfile so that it knows how to treat the leading entry?
I played a bit around with it, and I think I have an idea.
How Fortran stores unformatted data is not standardized, so you have to play a bit around with it, but you need three pieces of information:
The Format of the data. You suggest that is 64-bit reals, or 'f8' in python.
The type of the header. That is an unsigned integer, but you need the length in bytes. If unsure, try 4.
The header usually stores the length of the record in bytes, and is repeated at the end.
Then again, it is not standardized, so no guarantees.
The endianness, little or big.
Technically for both header and values, but I assume they're the same.
Python defaults to little endian, so if that were the the correct setting for your data, I think you would have already solved it.
When you open the file with scipy.io.FortranFile, you need to give the data type of the header. So if the data is stored big_endian, and you have a 4-byte unsigned integer header, you need this:
from scipy.io import FortranFile
ff = FortranFile('data.dat', 'r', '>u4')
When you read the data, you need the data type of the values. Again, assuming big_endian, you want type >f8:
vals = ff.read_reals('>f8')
Look here for a description of the syntax of the data type.
If you have control over the program that writes the data, I strongly suggest you write them into data streams, which can be more easily read by Python.
Fortran has record demarcations which are poorly documented, even in binary files.
So every write to an unformatted file:
integer*4 Test1
real*4 Matrix(3,3)
open(78,format='unformatted')
write(78) Test1
write(78) Matrix
close(78)
Should ultimately be padded by an np.int32 values. (I've seen references that this tells you the record length, but haven't verified persconally.)
The above could be read in Python via numpy as:
input_file = open(file_location,'rb')
datum = np.dtype([('P1',np.int32),('Test1',np.int32),('P2',np.int32),('P3',mp.int32),('MatrixT',(np.float32,(3,3))),('P4',np.int32)])
data = np.fromfile(input_file,datum)
Which should fully populate the data array with the individual data sets of the format above. Do note that numpy expects data to be packed in C format (row major) while Fortran format data is column major. For square matrix shapes like that above, this means getting the data out of the matrix requires a transpose as well, before using. For non square matrices, you will need to reshape and transpose:
Matrix = np.transpose(data[0]['MatrixT']
Transposing your 4-D data structure is going to need to be done carefully. You might look into SciPy for automated ways to do so; the SciPy package seems to have Fortran related utilities which I have not fully explored.

How to load CanvasSvgDocument from file?

In a C++/winrt project I have a large number of small svg resources to be loaded from file. Since it would be slow to reload them all from disk at each CreateResources event from the CanvasVirtualControl I have loaded them in advance and stored the data for each in an array. When CreateResources happens my intent is to load a CanvasSvgDocument for each of these by using the CanvasSvgDocument method LoadFromXml(System.string). However, If I create an svgDocument using the resourcecreator, I get an invalid argument crash when calling LoadFromXml(). The resourceCreator argument looks right (VS preview 6 now allows me to see local variables!) and the xml data string argument looks like the valid svg data, so my best guess about the crash is that the data string is the wrong format. The file data is UTF-8. If I convert that to a std::wstring as I must for the LoadFromXml argument can it still be understood as byte data?
For example, I create the std::wstring this way, given a pointer to unsigned char file data and its length in bytes:
m_data_string = std::wstring(data, data + dataLength);
When CreateResources is triggered that datastring is referenced this way:
m_svg = CanvasSvgDocument(resourceCreator);
m_svg.LoadFromXml(resourceCreator, m_data_string);
But LoadFromXml crashes with that invalid parameter error. I see that the length of the data string is correct, but of course that is the number of characters, not the actual size of the data. Could there be a conflict between the UTF-8 attribute in the svg and the fact that it is now recorded as 16-bit characters? If so, how would one load an xml document from such data?
[Update] with the suggestion that I use winrt::to_hstring. I read the unsigned char data into a std::string,
std::string cstring = std::string("");
cstring.assign(data, data + dataLength);
Then I convert that:
m_data_string = winrt::to_hstring(cstring);
And finally try to load an svg as before:
m_svg.LoadFromXml(resourceCreator, m_data_string);
And it crashes as before. I notice that in the debugger that converted string in neither case appeared to be gibberish - in both cases it read in the debugger as the expected svg data. But if this hstring is wide chars wouldn't that be a conflict with the attribute in the svg that identifies it as UTF-8?
[Update] I'm starting to wonder if anyone has ever used CanvasSvgDocument.Draw() to draw an svg loaded from a file. The files are now loading without crashing without any change to their internal encoding reference. But - they won't draw. These files - 239 of them - are UTF-8, svg 1.1, and they display nicely if opened in Edge or any browser. But if I load the file data to an hstring, create a CanvasSvgDocument and then use CanvasSvgDocument.LoadFromXml to load them, they do not draw when called by CanvasSvgDocument's draw method. Other drawing of shapes, etc. works fine during the drawing session. Here is what could be a hint: If I call GetXML() on one of these svgs after it is loaded, what is returned is just this:
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"></svg>
That is, the drawing information is not there. Or is this the full extent of what GetXml() is meant to return? That wouldn't seem useful. So perhaps CanvasSvgDocument.LoadFromXml(ResourceCreator, String) doesn't actually work yet?
So I'm back to asking again: is there a way to load a functional CanvasSvgDocument from file data?
My first answer here was wrong: the fault in my code above is that LoadFromXml() is a static method, and someone pointed out to me elsewhere, I was discarding the returned result. It should be theSvg = CanvasSvgDocument::LoadFromXml(string).
Having corrected that, I'm back to the problem of loading UTF-8 data in a method whose argument is a wide-character string. Changing the internal reference to UTF-16 doesn't help after all. Loading the svg with CanvasSvgDocument::LoadAsync(filestream) works, but if I want to load these without re-accessing the disk I will need to find a way to make RandomAccessFileStream from a buffer of bytes and then use LoadAsync. I think. Unless there is some other way to make LoadFromXmL() work - at present it fails with an invalid argument error.

Setting maxJsonLength property for USQL Newtonsoft JSON extractor

I'm using USQL for Azure data lake project and my input is in JSON format and I'm extracting each line and converting it back to JSON tuple. But the issue is some string lengths are bigger 102400 and Newtonsoft JSON extractor defaults to 102400 maximum length and this is causing failure on these records. Is it possible to change maxJsonLength property to bigger value to handle these large inputs? I found a property MaximumLength in Newtonsoft.Json.XML file inside assemblies, but it is also not working.
Any suggestion is highly appreciated.
Please note that U-SQL currently has a string data type size limit of 128kB (in UTF-8 encoded data). So even if you increase your Newtonsoft properties, you may run into that limit. Can you provide more information on what you are exactly doing and what the exact error message is to provide you more concrete answers?

what structure should I follow to log exception in a text file in json format? Objective-C

I am trying to log excretions in a text file in JSON format. Whole file is like a JSON object (an array of customeModle Class).
It works fine for first time but for next time when I go to log into the file I have to read it then add the new object into the array then delete previous and save it again and obviously it is not a good way to log errors.
Problems
Suppose there are many errors are getting logged at a single point of time and all are reading and appending the array then writing it back to the log file then many error won't be logged for sure.
It is consuming and wasting to much cpu and ram energy.
Please suggest a way to append new objects in the existing file without overwriting it.
Many thanks for your help you may offer.
Per Apple Documentation, you can open a file (output stream) in append mode.
Given you hold a reference to file output stream outStream, you can use below method to append data:
[NSJsonSerialization writeJSONObject:myNewObject toStream:outStream options:1 error:&error]
However, I would personally use the option you are already doing - read the data in mutable object, modify and then use NSJSONSerialization to convert it back to data again. Finally, save that data to disk - replacing the original. As this keeps JSON structure intact.