Is tf.identity() creating a full deep copy? - tensorflow

I'm often using the following pattern for managing control flow:
with tf.get_default_graph().control_dependencies([c_op]):
h_state = tf.identity(h_state)
However, I'm concerned that tf.identity() might copy the data passed to it which is not what I want. Can somebody confirm that it does or does not create a copy?

The implementation of the tf.identity() operation will forward its input to its output without making a deep copy. However, if the tf.identity() operation is pinned to a different device from the operation that produces its input, a deep copy will occur.

Related

Zero derivative calculation during optimization (no impact to objective)

I am writing an optimization code for a finite-difference radiation solver model. I started to use "src_indices" for connecting parameters rather promoting all the variables. But when I changed the connection, optimization does not calculate derivatives, gives "no impact to objective" error, and successfully terminates optimization after first iteration. Could not find any clue for finding the error in the logs (Bug may be in a completely different reason).
Is there any suggestion where I can start?
I uploaded the code to GitHub https://github.com/TufanAkba/opt_question
The first thing that comes to mind when you mention "design variables have no impact on objective" is that there may be a missing connection. Since this behavior only started after you changed the connection style, I think this is even more likely.
There are a couple of tools you can use to diagnose this. The first is the n2 viewer, which you can launch by typing the following at your command prompt:
openmdao n2 receiver_opt.py
This will launch a browser window that contains a graphical model viewer which is described in detail here. You can use this to explore the structure of your model. To find unconnected inputs in your model, look for any input blocks that are colored orange. These are technically connected to a hidden IndepVarComp called _auto_ivc, and will include design variables, which are set by the optimizer. You will want to look for any that should be connected to other component outputs.
OpenMDAO also has a connection viewer that just shows connections.
openmdao view_connections receiver_opt.py
You can use this tool to just focus on the connections. It is described here. If you choose to use this, just filter to see any connection to _auto_ivc in the source output string to see the unconnected inputs.
If you reach this point, and are satisfied that all the connections are correct, then there are a couple of other possibilities:
Are all of your src_indices correct? Maybe some of them are an empty set, or maybe some create a "degenerate" case. For example, if you have a set of cascading components that each multiply an incoming vector by a diagonal matrix, and if your indices are [0] in one connection, and [4] in another connection, then you've effectively severed the entire model. None of our visualization tools can pick that up, and you will need to inspect the indices manually.
It could also be a derivative problem, though what you describe sounds like connections. In that case, I recommend using check_partials to look for any missing or incorrect derivatives.
Are you computing any derivatives using complex step? It is possible that you are losing the complex part of the calculation through a complex-unsafe operation. Checking your derivatives against 'fd' can help to find these.

Implement data generator in federated training

(I have posted the question on https://github.com/tensorflow/federated/issues/793 and maybe also here!)
I have customized my own data and model to federated interfaces and the training converged. But I am confused about an issue that in an images classification task, the whole dataset is extreme large and it can't be stored in a single federated_train_data nor be imported to memory for one time. So I need to load the dataset from the hard disk in batches to memory real-timely and use Keras model.fit_generator instead of model.fit during training, the approach people use to deal with large data.
I suppose in iterative_process shown in image classification tutorial, the model is fitted on a fixed set of data. Is there any way to adjust the code to let it fit to a data generator?I have looked into the source codes but still quite confused. Would be incredibly grateful for any hints.
Generally, TFF considers the feeding of data to be part of the "Python driver loop", which is a helpful distinction to make when writing TFF code.
In fact, when writing TFF, there are generally three levels at which one may be writing:
TensorFlow defining local processing (IE, processing that will happen on the clients, or on the server, or in the aggregators, or at any other placement one may want, but only a single placement.
Native TFF defining the way data is communicated across placements. For example, writing tff.federated_sum inside of a tff.federated_computation decorator; writing this line declares "this data is moved from clients to server, and aggregated via the sum operator".
Python "driving" the TFF loop, e.g. running a single round. It is the job of this final level to do what a "real" federated learning runtime would do; one example here would be selecting the clients for a given round.
If this breakdown is kept in mind, using a generator or some other lazy-evaluation-style construct to feed data in to a federated computation becomes relatively simple; it is just done at the Python level.
One way this could be done is via the create_tf_dataset_for_client method on the ClientData object; as you loop over rounds, your Python code can select from the list of client_ids, then you can instantiate a new list of tf.data.Datasetsand pass them in as your new set of client data. An example of this relatively simple usage would be here, and a more advanced usage (involving defining a custom client_datasets_fn which takes client_id as a parameter, and passing it to a separately-defined training loop would be here, in the code associated to this paper.
One final note: instantiating a tf.data.Dataset does not actually load the dataset into memory; the dataset is only loaded in when it is iterated over. One helpful tip I have received from the lead author of tf.data.Dataset is to think of tf.data.Dataset more as a "dataset recipe" than a literal instantiation of the dataset itself. It has been suggested that perhaps a better name would have been DataSource for this construct; hopefully that may help the mental model on what is actually happening. Similarly, using the tff.simulation.ClientData object generally shouldn't really load anything into memory until it is iterated over in training on the clients; this should make some nuances around managing dataset memory simpler.

HOST_WRITE useful or validation layers wrong?

In the specs about HOST_WRITE_BIT it is write :
For host writes to be seen by subsequent command buffer operations, a pipeline barrier from a source of VK_ACCESS_HOST_WRITE_BIT and VK_PIPELINE_STAGE_HOST_BIT to a destination of the relevant device pipeline stages and access types must be performed. Note that such a barrier is performed implicitly upon each command buffer submission, so an explicit barrier is only rarely needed
However, when you transition (through a vkCmdPipelineBarrier (so in a commandBuffer)) one image from PREINITIALIZED layout with a srcAccessMask to 0 instead of HOST_WRITE_BIT, you get an error :
validation layer: Source AccessMask 0 [None] must have required access bit 16384 [VK_ACCESS_HOST_WRITE_BIT] when layout is VK_IMAGE_LAYOUT_PREINITIALIZED, unless the app has previously added a barrier for this transition.
Is there an error from the specs? From the validation layers? Is the barrier we are talking about is a pure execution barrier and not a memory one? Am I missing something?
My private opinion is: validation layers bug.
It simply checks layout vs access flags and does not seem to be aware of this corner case:
https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/blob/master/layers/core_validation.cpp#L9005
There's also a case when you reconsider and do not host write to the VK_IMAGE_LAYOUT_PREINITIALIZED Image at all (and thus need no barrier), isn't it?
I believe the layer message is a WARNING, not ERROR. It may mean it is "only" a heuristic and some false positives are expected (until they improve the layers, which seem posible, but not so trivial for this case).
They even only lately corrected possibility for 0 access flags for the presentation, so it is not far fetched they would (with similar mind) forget something like that in the layers.
I would report an Issue there. They do not bite and worse that can happen is that some Khronos insider more knowledgable than me will explain why you are wrong.
That being said, perhaps the VK_PIPELINE_STAGE_HOST_BIT is unnecessary too (and TOP should suffice)?

Sharing tensorflow model between processes

I have several processes running on a server, one of which is training a model in tensorflow. Periodically, I want the trainer to send the current model to the other processes. The way I do this now is with the usual Saver class that can save to and restore from disk.
However, I think this form of IPC is rather inefficient, and it is potentially causing filesystem lockups on the server. If there were a way to serialize the variables into some blob I could send that over a zmq broadcast pipe, I but I haven't found this in the docs.
Alternatively, distributed tensorflow is probably up to the task, but I don't think I need something so complicated.
You could preshare the architecture then use tf.get_collection(tf.GraphKeys.VARIABLES) at every number of steps of your liking, and run it to get the values, then you can use variable.assign at the other end to load up the values in the appropriate variables.

Is it possible to read from a VBO?

I'm trying to make an OpenGL renderer that mashes various shapes into one large mesh and stores these in two VBOs, one GL_ARRAY_BUFFER and one GL_ELEMENT_ARRAY_BUFFER. I'm aiming for it to work on both OpenGL ES 2 and OpenGL 3.2 core. I am currently trying to find the best way to handle deleting shapes from within this mesh and my current approach is to periodically rebuild the entire thing, possibly on a background thread.
The problem is that in order to rebuild the new and clean mesh, I need access to the vertices / indices that have been written to the buffers using glMapBuffer. According to the documentation for GL_OES_mapbuffer, WRITE_ONLY_OES is the only acceptable parameter for 'access'.
So, I don't think the data pointed at there is reliable to read from in order to create my new buffers. I know there are other functions in GL Core that allow you to copy the buffer data, but these also seem to be missing.
Can anyone verify that this is not possible on ES 2.0 or give some approach for achieving buffer reading? My current solution is to keep a shadow copy of all the data, which is obviously not ideal.
I think that keeping a shadow copy of GPU data in main memory is much better than reading these data from GPU memory. It is recommended to discard previous data before using glMapBuffer anyway. Read this for more information (It will not give you direct answer to your question, but it might be usefull).