Accessing intermediate results from a tensorflow graph - tensorflow

If I have a complex calculation of the form
tmp1 = tf.fun1(placeholder1,placeholder2)
tmp2 = tf.fun2(tmp1,placeholder3)
tmp3 = tf.fun3(tmp2)
ret = tf.fun4(tmp3)
and I calculate
ret_vals =,feed_dict={placeholder1: vals1, placeholder2: vals2, placeholder3: vals3})
fun1, fun2 etc are possibly costly operations on a lot of data.
If I run to get ret_vals as above, is it possible to later or at the same time access the intermediate values as well without re-running everything up to that value? For example, to get tmp2, I could re-run everything using
tmp2_vals =,feed_dict={placeholder1: vals1, placeholder2: vals2, placeholder3: vals3})
But this looks like a complete waste of computation? Is there a way to access several of the intermediate results in a graph after performing one run?
The reason why I want to do this is for debugging or testing or logging of progress when ret_vals gets calculated e.g. in an optimization loop. Every step where I run the ret_vals calculations is costly but I want to see some of the intermediate results that were calculated.
If I do something like
tmp2_vals, ret_vals =[tmp2, ret], ...)
does this guarantee that the graph will only get run once (instead of one time for tmp2 and one time for ret) like I want it?

Have you looked at tf.Print? This is an identity op with printing funciton. You can insert it in your graph right after tmp2 to get the value of it. Note that the default setting only allows you to print the first n values of the tensor, you can modify the value n by giving attribute first_n to the op.


X and Y inputs in LabVIEW

I am new to LabVIEW and I am trying to read a code written in LabVIEW. The block diagram is this:
This is the program to input x and y functions into the voltage input. It is meant to give an input voltage in different forms (sine, heartshape , etc.) into the fast-steering mirror or galvano mirror x and y axises.
x and y function controls are for inputting a formula for a function, and then we use "evaluation single value" function to input into a daq assistant.
I understand that { 2*(|-Mpi|)/N }*i + -Mpi*pi goes into the x value. However, I dont understand why we use this kind of formula. Why we need to assign a negative value and then do the absolute value of -M*pi. Also, I don`t understand why we need to divide to N and then multiply by i. And finally, why need to add -Mpi again? If you provide any hints about this I would really appreciate it.
This is just a complicated way to write the code/formula. Given what the code looks like (unnecessary wire bends, duplicate loop-input-tunnels, hidden wires, unnecessary coercion dots, failure to use appropriate built-in 'negate' function) not much care has been given in writing it. So while it probably yields the correct results you should not expect it to do so in the most readable way.
To answer you specific questions:
Why we need to assign a negative value and then do the absolute value
We don't. We can just move the negation immediately before the last addition or change that to a subtraction:
{ 2*(|Mpi|)/N }*i - Mpi*pi
And as #yair pointed out: We are not assigning a value here, we are basically flipping the sign of whatever value the user entered.
Why we need to divide to N and then multiply by i
This gives you a fraction between 0 and 1, no matter how many steps you do in your for-loop. Think of N as a sampling rate. I.e. your mirrors will always do the same movement, but a larger N just produces more steps in between.
Why need to add -Mpi again
I would strongly assume this is some kind of quick-and-dirty workaround for a bug that has not been fixed properly. Looking at the code it seems this +Mpi*pi has been added later on in the development process. And while I don't know what the expected values are I would believe that multiplying only one of the summands by Pi is probably wrong.

Best way to parallelize multi-table function in Python (using Pandas)

I have this function below that iterates through every row of a data frame (using pandas apply) and determines what values are valid from a prediction-probability matrix (L2) by referencing another data frame (GST) to obtain the valid values for a given row. The function just returns the row back with the maximum valid probability assigned to the previously blank value for that row (Predicted Level 2) in the data frame passed to the function (test_x2)
Not a terribly complex function and it works fine on smaller datasets but when I scale to like 3-5 million records, it starts to take way too long. I tried using the multiprocessing module as well as dask/numba but nothing was able to improve the runtime (not sure if this is just due to the fact the function is not vectorizable).
My question is two fold:
1) Is there a better way to write this? (I'm guessing there is)
2) If not, what parallel computing strategies could work with this type of function? I've already tried a number of different python options but I'm just leaning more towards running the larger datasets on totally separate machines at this point. Feel free to provide any suggested code to parallelize something like this. Thanks in advance for any guidance provided.
l2 = MNB.predict_proba(test_x)
l2_classes = MNB.classes_
L2 = pd.DataFrame(l2, columns = MNB.classes_)
test_x2["Predicted Level 2"] = ""
def predict_2(row):
s = row["Predicted Level 1"]
s = GST.loc[s,:]
s.reset_index(inplace = True)
Valid_Level2s = s["GST Level 2"].tolist()
p2 = L2.ix[, Valid_Level2s]
max2 = p2.idxmax(axis = 1)
output = row["Predicted Level 2"] = max2
return row
test_x2 = test_x2.apply(predict_2, axis = 1)

How to cache data during the first epoch correctly (Tensorflow, dataset)?

I'm trying to used the cache transformation for a dataset. Here is my current code (simplified):
dataset =, num_parallel_reads=1)
dataset = dataset.apply(, count=1))
dataset =, num_parallel_calls=12)
dataset = dataset.padded_batch(
dataset = dataset.prefetch(buffer_size=1)
dataset = dataset.cache()
After the first epoch, I received the following error message:
The calling iterator did not fully read the dataset we were attempting
to cache. In order to avoid unexpected truncation of the sequence, the
current [partially cached] sequence will be dropped. This can occur if
you have a sequence similar to dataset.cache().take(k).repeat().
Instead, swap the order (i.e. dataset.take(k).cache().repeat())
Then, the code proceeded and still read data from the hard drive instead of the cache. So, where should I place dataset.cache() to avoid the error?
The implementation of the Dataset.cache() transformation is fairly simple: it builds up a list of the elements that pass through it as you iterate over completely it the first time, and it returns elements from that list on subsequent attempts to iterate over it. If the first pass only performs a partial pass over the data then the list is incomplete, and TensorFlow doesn't try to use the cached data, because it doesn't know whether the remaining elements will be needed, and in general it might need to reprocess all the preceding elements to compute the remaining elements.
By modifying your program to consume the entire dataset, and iterate over it until tf.errors.OutOfRangeError is raised, the cache will have a complete list of the elements in the dataset, and it will be used on all subsequent iterations.

Can I run a GA to optimize wavelet transform?

I am running a wavelet transform (cmor) to estimate damping and frequencies that exists in a signal.cmor has 2 parameters that I can change them to get more accurate results. center frequency(Fc) and bandwidth frequency(Fb). If I construct a signal with few freqs and damping then I can measure the error of my estimation(fig 2). but in actual case I have a signal and I don't know its freqs and dampings so I can't measure the a friend in here suggested me to reconstruct the signal and find error by measuring the difference between the original and reconstructed signal e(t)=|x(t)−x^(t)|.
so my question is:
Does anyone know a better function to find the error between reconstructed and original signal,rather than e(t)=|x(t)−x^(t)|.
can I use GA to search for Fb and Fc? or do you know a better search method?
Hope this picture shows what I mean, the actual case is last one. others are for explanations
Thanks in advance
You say you don't know the error until after running the wavelet transform, but that's fine. You just run a wavelet transform for every individual the GA produces. Those individuals with lower errors are considered fitter and survive with greater probability. This may be very slow, but conceptually at least, that's the idea.
Let's define a Chromosome datatype containing an encoded pair of values, one for the frequency and another for the damping parameter. Don't worry too much about how their encoded for now, just assume it's an array of two doubles if you like. All that's important is that you have a way to get the values out of the chromosome. For now, I'll just refer to them by name, but you could represent them in binary, as an array of doubles, etc. The other member of the Chromosome type is a double storing its fitness.
We can obviously generate random frequency and damping values, so let's create say 100 random Chromosomes. We don't know how to set their fitness yet, but that's fine. Just set it to zero at first. To set the real fitness value, we're going to have to run the wavelet transform once for each of our 100 parameter settings.
for Chromosome chr in population = run_wavelet_transform(chr.frequency, chr.damping)
Now we have 100 possible wavelet transforms, each with a computed error, stored in our set called population. What's left is to select fitter members of the population, breed them, and allow the fitter members of the population and offspring to survive into the next generation.
while not done
offspring = new_population()
while count(offspring) < N
parent1, parent2 = select_parents(population)
child1, child2 = do_crossover(parent1, parent2)
mutate(child2) = run_wavelet_transform(child1.frequency, child1.damping) = run_wavelet_transform(child2.frequency, child2.damping)
end while
population = merge(population, offspring)
end while
There are a bunch of different ways to do the individual steps like select_parents, do_crossover, mutate, and merge here, but the basic structure of the GA stays pretty much the same. You just have to run a brand new wavelet decomposition for every new offspring.

Clearing numerical values in Mathematica

I am working on fairly large Mathematica projects and the problem arises that I have to intermittently check numerical results but want to easily revert to having all my constructs in analytical form.
The code is fairly fluid I don't want to use scoping constructs everywhere as they add work overhead. Is there an easy way for identifying and clearing all assignments that are numerical?
EDIT: I really do know that scoping is the way to do this correctly ;-). However, for my workflow I am really just looking for a dirty trick to nix all numerical assignments after the fact instead of having the foresight to put down a Block.
If your assignments are on the top level, you can use something like this:
a = 1;
b = c;
d = 3;
e = d + b;
HoldPattern[lhs_ = rhs_?NumericQ] |
HoldPattern[(lhs_ = rhs_?NumericQ;)] :> Unset[lhs],
This will work if you have a sufficient history length $HistoryLength (defaults to infinity). Note however that, in the above example, e was assigned 3+c, and 3 here was not undone. So, the problem is really ambiguous in formulation, because some numbers could make it into definitions. One way to avoid this is to use SetDelayed for assignments, rather than Set.
Another alternative would be to analyze the names in say Global' context (if that is the context where your symbols live), and then say OwnValues and DownValues of the symbols, in a fashion similar to the above, and remove definitions with purely numerical r.h.s.
But IMO neither of these approaches are robust. I'd still use scoping constructs and try to isolate numerics. One possibility is to wrap you final code in Block, and assign numerical values inside this Block. This seems a much cleaner approach. The work overhead is minimal - you just have to remember which symbols you want to assign the values to. Block will automatically ensure that outside it, the symbols will have no definitions.
Yet another possibility is to use local rules. For example, one could define rule[a] = a->1; rule[d]=d->3 instead of the assignments above. You could then apply these rules, extracting them as say
DownValues[rule][[All, 2]], whenever you want to test with some numerical arguments.
Building on Andrew Moylan's solution, one can construct a Block like function that would takes rules:
SetAttributes[BlockRules, HoldRest]
BlockRules[rules_, expr_] :=
Block ## Append[Apply[Set, Hold#rules, {2}], Unevaluated[expr]]
You can then save your numeric rules in a variable, and use BlockRules[ savedrules, code ], or even define a function that would apply a fixed set of rules, kind of like so:
In[76]:= NumericCheck =
Function[body, BlockRules[{a -> 3, b -> 2`}, body], HoldAll];
In[78]:= a + b // NumericCheck
Out[78]= 5.
EDIT In response to Timo's comment, it might be possible to use NotebookEvaluate (new in 8) to achieve the requested effect.
SetAttributes[BlockRules, HoldRest]
BlockRules[rules_, expr_] :=
Block ## Append[Apply[Set, Hold#rules, {2}], Unevaluated[expr]]
nb = CreateDocument[{ExpressionCell[
Defer[Plot[Sin[a x], {x, 0, 2 Pi}]], "Input"],
ExpressionCell[Defer[Integrate[Sin[a x^2], {x, 0, 2 Pi}]],
BlockRules[{a -> 4}, NotebookEvaluate[nb, InsertResults -> "True"];]
As the result of this evaluation you get a notebook with your commands evaluated when a was locally set to 4. In order to take it further, you would have to take the notebook
with your code, open a new notebook, evaluate Notebooks[] to identify the notebook of interest and then do :
InsertResults -> "True"]]
I hope you can make this idea work.