TensorFlow Dataset `.map` - Is it possible to ignore errors? - tensorflow

Short version:
When using Dataset map operations, is it possible to specify that any 'rows' where the map invocation results in an error are quietly filtered out rather than having the error bubble up and kill the whole session?
Specifics:
I have an input pipeline set up that (more or less) does the following:
reads a set of file paths of images stored locally (images of varying dimensions)
reads a suggested set of 'bounding boxes' from a csv
Produces the set of all image path to bounding box combinations
Reads and decodes the image then produces the set of 'cropped' images for each of these combinations using tf.image.crop_to_bounding_box
My issue is that there are (very rare) instances where my suggested bounding boxes are outside the bounds of a given image so (understandably) tf.image.crop_to_bounding_box throws an error something like this:
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [width must be >= target + offset.]
which kills the session.
I'd prefer it if these errors were simply ignored and that the pipeline moved onto the next combination.
(I understand that the correct fix for this specific issue would be commit the time to checking each bounding box and image dimension size are possible the step before and filter them out using a filter operation before it got to the map with the cropping operation. I was wondering if there was an easy way to just ignore an error and move on to the next case both for easy of implementation in this specific case and also in more general cases)

For Tensorflow 2
dataset = dataset.apply(tf.data.experimental.ignore_errors())

There is tf.contrib.data.ignore_errors. I've never tried this myself, but according to the docs the usage is simply
dataset = dataset.map(some_map_function)
dataset = dataset.apply(tf.contrib.data.ignore_errors())
It should simply pass through the inputs (i.e. returns the same dataset) but ignore any that throw an error.

Related

What is the right way to use Logic operations (VkLogicOp) in vulkan

I have a case where I am writing to integer framebuffers, and I want to use logic operations when writing to pixels in the fragment shader. These are the steps I followed:
When creating the logical device, I set the VkPhysicalDeviceFeatures.logicOp to VK_TRUE (so this feature is enabled)
when creating the pipeline, I set VkPipelineColorBlendStateCreateInfo.logicOpEnable to VK_TRUE, and VkPipelineColorBlendStateCreateInfo.logicOp to VK_LOGIC_OP_COPY.
My framebuffer format is VK_FORMAT_R32G32B32A32_SINT
Once I render the frame, I see that nothing is getting updated in the frame buffer. Is there any step I am missing? (btw, I don't get any validation errors).
Thanks!

How to access net displacements in pyiron

Using pyiron, I want to calculate the mean square displacement of the ions in my system. How do I see the total displacement (i.e. not folded back by periodic boundary conditions) without dumping very frequently and checking when an atom passes over the boundary and gets wrapped?
Try to compare job['output/generic/unwrapped_positions'][-1] and job.structure.positions+job.output.total_displacements[-1]. If they deliver the same values, it's definitely fine both ways. If not, you can post the relevant lines in your notebook here.
I'd like to add a few comments to Jan's answer:
While job['output/generic/unwrapped_positions'] returns the unwrapped positions parsed from the output files, job.output.total_displacements returns the displacement of atoms calculated from each pair of consecutive snapshots. So if an atom moves more than half the box length in any direction, job.output.total_displacements will give wrong coordinates. Therefore, job['output/generic/unwrapped_positions'] is generally more trustworthy, but it is not available in all the codes (since some codes simply do not provide an output for unwrapped positions).
Moreover, if an interactive job is used, it is possible that job.structure.positions does not return the initial positions, i.e. job.structure.positions+job.output.total_displacements won't be initial positions + displacements.
So, in short, my answer to your question would be rather "Use job['output/generic/unwrapped_positions'] and if it's not available, use job.structure.positions+job.output.total_displacements but be aware of potential problems you might be running into."

vkQueueSubmit() call includes a stageMask with VK_PIPELINE_STAGE_GEOMETRY_SHADER_BIT bit set when device does not have geometryShader feature enabled

First of all, I'm a total newbie with Vulkan (I'm using the binding provided by LWJGL). I know I should copy/paste more code, but I don't even know what would be relevant for now (so don't hesitate to ask me some specific piece of code).
I try to make something like that :
Use a ComputeShader to compute a buffer of pixel.
Use vkCmdCopyBufferToImage to directly copy this array into a framebuffer image.
So, no vertex/fragment shaders for now.
I allocated a Compute Pipeline, and a FrameBuffer. I have one {Queue/CommandPool/CommandBuffer} for Computation, and one other for Rendering.
When I try to submit the graphic queue with:
vkQueueSubmit(graphicQueue, renderPipeline.getFrameSubmission().getSubmitInfo(imageIndex));
I obtain the following error message (from validation) :
ERROR OCCURED: Object: VK_NULL_HANDLE (Type = 0) | vkQueueSubmit() call includes a stageMask with VK_PIPELINE_STAGE_GEOMETRY_SHADER_BIT bit set when device does not have geometryShader feature enabled. The spec valid usage text states 'If the geometry shaders feature is not enabled, each element of pWaitDstStageMask must not contain VK_PIPELINE_STAGE_GEOMETRY_SHADER_BIT' (https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#VUID-VkSubmitInfo-pWaitDstStageMask-00076)
ERROR OCCURED: Object: VK_NULL_HANDLE (Type = 0) | vkQueueSubmit() call includes a stageMask with VK_PIPELINE_STAGE_TESSELLATION_CONTROL_SHADER_BIT and/or VK_PIPELINE_STAGE_TESSELLATION_EVALUATION_SHADER_BIT bit(s) set when device does not have tessellationShader feature enabled. The spec valid usage text states 'If the tessellation shaders feature is not enabled, each element of pWaitDstStageMask must not contain VK_PIPELINE_STAGE_TESSELLATION_CONTROL_SHADER_BIT or VK_PIPELINE_STAGE_TESSELLATION_EVALUATION_SHADER_BIT' (https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#VUID-VkSubmitInfo-pWaitDstStageMask-00077)
I tried to change the VkSubmitInfo.pWaitDstStageMask to different values (like VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT...) but nothing change.
So, what would be the best pWaitDstStageMask for my use case ?
Ok, I found my problem:
The pWaitDstStageMask must be an array with the same size than pWaitSemaphores.
I only putted 1 stage mask, for 2 semaphores.

How to cache data during the first epoch correctly (Tensorflow, dataset)?

I'm trying to used the cache transformation for a dataset. Here is my current code (simplified):
dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=1)
dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=5000, count=1))
dataset = dataset.map(_parser_a, num_parallel_calls=12)
dataset = dataset.padded_batch(
20,
padded_shapes=padded_shapes,
padding_values=padding_values
)
dataset = dataset.prefetch(buffer_size=1)
dataset = dataset.cache()
After the first epoch, I received the following error message:
The calling iterator did not fully read the dataset we were attempting
to cache. In order to avoid unexpected truncation of the sequence, the
current [partially cached] sequence will be dropped. This can occur if
you have a sequence similar to dataset.cache().take(k).repeat().
Instead, swap the order (i.e. dataset.take(k).cache().repeat())
Then, the code proceeded and still read data from the hard drive instead of the cache. So, where should I place dataset.cache() to avoid the error?
Thanks.
The implementation of the Dataset.cache() transformation is fairly simple: it builds up a list of the elements that pass through it as you iterate over completely it the first time, and it returns elements from that list on subsequent attempts to iterate over it. If the first pass only performs a partial pass over the data then the list is incomplete, and TensorFlow doesn't try to use the cached data, because it doesn't know whether the remaining elements will be needed, and in general it might need to reprocess all the preceding elements to compute the remaining elements.
By modifying your program to consume the entire dataset, and iterate over it until tf.errors.OutOfRangeError is raised, the cache will have a complete list of the elements in the dataset, and it will be used on all subsequent iterations.

GPUImage capturePhotoAsImageProcessedUpToFilter only working for the last filter

In my application I am using a stack of 3 filters and adding that to a stillCamera. I am trying to take the image from filter1, its an empty filter so it returns the actual image.
[stillCamera addTarget:filter1];
[filter1 addTarget:filter2];
[filter2 addTarget:filter3];
[filter3 addTarget:cameraView];
When I call capturePhotoAsImageProcessedUpToFilter, it only ever returns an image when I pass it filter3 like below.
[stillCamera capturePhotoAsImageProcessedUpToFilter:filter3 with...
The two examples below never return images
[stillCamera capturePhotoAsImageProcessedUpToFilter:filter1 with...
[stillCamera capturePhotoAsImageProcessedUpToFilter:filter2 with...
Am I doing something wrong? As a fix I am using:
[filter1 imageFromCurrentlyProcessedOutput]
Is there any difference between calling capturePhotoAsImageProcessedUpToFilter and imageFromCurrentlyProcessedOutput?
I think this is a side effect of a memory conservation optimization I tried to put in place last year. For very large images, like photos, what I try to do is destroy the framebuffer that backs each filter as the filtered image progresses through the filter chain. The idea is to try to minimize memory spikes by only having one or two copies of the large image in memory at any point in time.
Unfortunately, that doesn't seem to work as intended much of the time, and because the framebuffers are deleted as the image progresses, only the last filter in the chain ends up having a valid framebuffer to read from. I'm probably going to yank this optimization out at some point in the near future in favor of an internal framebuffer and texture cache, but I'm not sure what can be done in the meantime to read from these intermediary filters in a chain.