Turi Create rescales and moves my object annotations coordinates - object-detection

I created and merged an images SFrame with an Annotations SFrame.
I have verified that the coordinates of the annotation boxes matches the location of the features measured in Photoshop.
However the models I create are non-functional, so I explored the merged data set with
data['image_with_ground_truth'] =
tc.object_detector.util.draw_bounding_boxes(data['image'], data['annotations'])
and I find that all the annotations are squashed in the top-left corner in Turi Create despite them actually being widely distributed on the source image as in the second image. The annotations list column shows the coordinates get read correctly into TC, but are mapped badly into what the model sees as bounding boxes.
Where should I look to find the scaling problem in Turi Create??

the version of ml-annotate I was using output coordinates with different scale factors for each image in set, some close, some off by as much as 3.3x

Related

How do I render to multiple 3d targets in Vulkan?

I have some legacy DX11 code that renders to multiple 3d render targets. Destination target is passed via SV_TARGETxx and the slice is set via SV_RenderTargetArrayIndex in GS. Is there any way to do the same in Vulkan?
My plan is to create individual view for each slice of each 3d target and pass them all together as attachments to a single frame buffer, then in GS I can have something like gl_Layer = sliceNo + targetOffsets[xx]. Is there any better solution?
In Vulkan, the GS SV_RenderTargetArrayIndex is called Layer in SPIR-V or gl_Layer in GLSL. It behaves the same as in D3D. You create one view per 3D target, and attach that to the framebuffer. The Layer output from the GS will say which layer (of all the targets) the output primitive is drawn to.
In Vulkan there's no "true" 3D framebuffer attachments, in the sense that after projection to screen space coordinates everything exists in a 2D plane. So attachment image views can have 2D_ARRAY dimensionality, but not 3D. The Image and image view parameter compatibility requirements table says that given a 3D image, you can create a 2D_ARRAY image view with layerCount >= 1. Note that you have to create the image with the VK_IMAGE_CREATE_2D_ARRAY_COMPATIBLE_BIT flag.
So if you want to have N 3D render target images:
Create your N 3D images, with the VK_IMAGE_CREATE_2D_ARRAY_COMPATIBLE_BIT flag.
Create one image view for each image, with VK_IMAGE_VIEW_TYPE_2D_ARRAY and layerCount equal to the number of slices you want to be able to render to.
Create a VkRenderPass with one VkAttachmentDescription per 3D render target, plus whatever others you need for depth/stencil, resolve target, etc.
Create a VkFrameBuffer based on that VkRenderPass, and pass your image views in the VkFrameBufferCreateInfo::pAttachments array. Set VkFramebufferCreateInfo::layerCount to the number of layers/slices you want to be able to render to.
[Edit: Below paragraph can be ignored based on first comment. Leaving it for transparency.]
I'm confused what you're trying to do with SV_Target[n]. In both D3D and Vulkan, if you've got multiple render targets / color attachments, the fragment shader will write to all of them -- if your fragment shader doesn't provide a value for a bound target, the value written is undefined. So SV_Target[n] is used to tell which shader output variables go to which target, but they don't let you write to some without writing to others. Vulkan works similarly, using output variables gl_FragData[n] in GLSL.
If you're talking about having 1 draw call rendered from multiple points of view (but otherwise using the same pipeline) then you want VK_KHR_multiview. This is an extension in Vulkan 1.0, but core in 1.1.
There's an example of it's usage here and the corresponding shader functionality is here. It functions similar to what you seem to describe. You attach multiple images from a texture array to a single framebuffer ("rendertarget" in D3D) and then in the vertex shader you can determine which layer you're rendering to via the gl_ViewIndex variable. There's no need for a geometry shader with this approach.

How to plot 2D vector field in single picture using Hue and Brightness method with Digital Micrograph script?

I would like to to plot 2D vector field in a single picture using the Hue & brightness method, i.e., Hue to direction (or say, phase), brightness to magnitude.
Such method is often used to visualize e.g., magnetic domains, vortex etc which are reconstructed from Lorenz microscopy.
As input, I have two images of size 1024*1024, pixels contain the magnitude of X and Y component of the vector field.
Since DM does not support native HSL color scheme, I think one should first uses a group of self defined functions to convert HSL to RGB...
You can only use RGB images in DigitalMicrograph, so you will have to do the conversion from HSB to RGB in your script code, and then create the according RGB image.
Luckily, there is a demonstration script on the Gatan script resources webpage which does exactly that! You can basically use the script as it is shown there.
Gatan Script Resources
Link to script-file:
Display as HSB
Note, the script uses complex images as input - just as a convenient container to combine two images into a single one. The test function demonstrates this though.

Simple algorithm for tracking a rectangular blob

I have created an experimental fast rectangular object tracking system; it will be used for headtracking and controllling objects in 3D engine (Ogre3D).
For now I am able to show to the webcam any kind of bright colored rectangle (text markers are good objects) and system registers basic properties of this object (hue/value/lightness and initial width and height in 0 degrees rotation).
After I have registered the trackable object, I do some simple frame processing to create grayscale probabilty map.
So now I have 2 known things:
1) 4 corners for the last object position (it's always a rectangle but it may be rotated)
2) a pretty rectangular (but still far from perfect) blob which is the brightest in the frame. I can get coordinates of any point of the blob without problems, point detection is stable enough.
I can find a bounding rectangle of the object without problems, but I have a problem with detecting the object corners themselves.
I need the simplest possible (quick&dirty would be great) algorithm to scan the image starting with some known coordinates (a point inside the blob) and detect new 4 x,y coordinates of a "blobish" rectangle corners (not corners of a bounding box but corners of the rectangular blob itself).
Ready-to-use C++ function would be awesome, but somehow google doesn't like me today :(
I think that it would be overkill to use some complicated function form OpenCV library just to extract 4 points of a single rectanglular blob. But if you know a quick and efficient way how to do it using OpenCV (it must be real-time and light on CPU because I'll run the 3D engine at the same time) then I would be really grateful.
You can apply Hough transform on segmented image to detect lines. Using detected lines you can calculate their intersection to find the corner coordinates of the blob.

resolution from a PDFPage?

I have a PDF document that is created by creating NSImages with size in 72dpi pts, each has a single representation which is measured in pixels. I then put these images into PDFPages with initWithImage, and then save the document.
When I open the document, I need the resolution of the original image. However, all of the rectangles that PDFPage gives me are measured in points, not pixels.
I know that the information is in there, and I suppose I can try to parse the PDF data myself, by going through the voyeur.app example... but that's a WHOLE lot of effort to do something that should be pretty normal...
Is there an easier way to do this?
Added:
I've tried two techniques:
get the PDFRepresentation data from
the page, and use it to make a new
NSImage via initWithData. This
works, however, the image has both
size and pixel size in 72dpi.
Draw the PDFPage into a new
off-screen context, and then get a
CGImage from that. The problem is
that when I'm making the context, it
appears that I need to know the size
in pixels already, which defeats
part of the purpose...
There are a few things you need to understand about PDF:
The PDF Coordinate system is in
points (1/72 inch) by default.
The PDF Coordinate system is devoid of resolution. (this is a white lie - the resolution is effectively the limits of 32 bit floating point numbers).
Images in PDF do not inherently have any resolution attached to them (this is a white lie - images compressed with JPEG2000 still have resolution in their embedded metadata).
An Image in PDF is represented by an object that contains a series of samples that are stored using some compression filter.
Image objects can be rendered on a page multiple times at any size.
Since resolution is defined as the number of pixels (or samples) per unit distance, resolution only means something for a particular rendering of an image on a page. So if you are rendering a particular image to fill the page, then the resolution in dpi is
xdpi = image_width / (pageWidthInPoints / 72.0);
ydpi = image_height / (pageHeightInPoints / 72.0);
If the image is not being rendered to the full size of the page, a complete solution is very tricky. Adobe prescribes that images should be treated as being 1x1 and that you change the page transformation matrix to determine how to render them. The means that you would need the matrix at the point of rendering the image and you would need to push the points (0,0), (0, 1), (1,0) through the matrix. The Euclidean distance between (0, 0)' and (1, 0)' will give you the width in points and the Euclidean distance between (0, 0)' and (0, 1)' will give you the height in points.
So how do you get that matrix? Well, you need the content stream for the page and you need to write a PDF interpreter that can rip the content stream and keep track of changes to the CTM. When you reach your image, you extract the CTM for it.
To do that last step should be about an hour with a decent PDF toolkit, provided you are familiar with the toolkit. Writing that toolkit is several person years of work.

Is it possible to animate markers in ArcMap?

I'm completely new to ArcGIS and ArcMap, but someone suggested this program to me for a project I'm working on.
I would like to animate individual entities on a map, and was wondering if it is possible to do so in ArcMap. I asked this earlier here and a member directed me to a tutorial on animating in ArcGIS. The animation in the guide was over a map spread (ie. each pixel on the map displays, say, a different color to indicate population data in the area). However I realized that if I zoom in a lot, eventually the image will degenerate into pixels, which is why I need an actual object to mark a certain point. I checked some online tutorials and it seems like we can place markers on the map. Can someone tell me if it is possible to animate these markers (for example via a for-loop)? And if so, could you point me in a direction where to start?
Thanks in advance!
You can animate layers in ArcMap is the short answer. Its not as simple as using the timeline feature in Google Earth for example though. But then ArcMap is much more than just a visualization tool.
This help page on the ESRI web help looks like a good place to start.
I'm not 100% sure what you mean by the image degenerates into pixels. Are you saying that the markers were single points in the layer. Unlike Google Earth you are not confined to simply plotting points on the map. You can draw completely arbitrary shapes in ArcMap, which can be defined to cover actual areas of the map, so when you zoom-in the shape gets larger.
The way you need to load data into ArcMap to produce an animation isn't too simple. There might be other ways to do this, but the way I know of is to generate a NetCDF file. This file contains a 3D matrix of layer data, where each layer is separated through time. Because you generate a matrix, you are effectively placing a raster image over the map. Thus if you want to cover a large area, each matrix becomes large, and you multiply that by the number of time slices you wish to animate over.
Once you have a NetCDF file with your data in however, getting ArcMap to animate it and produce say a .avi file is pretty simple.
You could try just loading some of the example NetCDF datasets into ArcMap to see how/if they will work to get you started.
Hope that helps.
The upcoming v10 will have better time-aware capabilities, which will allow for animation.