How to get name of streams in MinibatchSource? - cntk

How can I get the names of each of the streams in a MinibatchSource?
Can I get the names associated with with the stream information returned by stream_infos?
minibatch_source.stream_infos()
I also have a follow-up-question:
The result from:
print(reader_train.streams.keys())
is
dict_keys(['labels', 'features'
How does these names relate to the construction of the MiniBatchSource, which is done like this?
return MinibatchSource(ImageDeserializer(map_file, StreamDefs(
features = StreamDef(field='image', transforms=transforms), # first column in map file is referred to as 'image'
labels = StreamDef(field='label', shape=num_classes) # and second as 'label'
)))
I would have thought that my streams would be named ‘image’ and ‘label’, but they were named ‘labels’ and ‘features’.
I guess those names are somehow default names?

For your original question:
minibatch_source.streams.keys()
See for example this tutorial under the section "A Brief Look at Data and Data Reading".
For your followup question: The names returned by keys() are the arguments of StreamDefs(). This is all you need in your program. If you define your MinibatchSource like this
return MinibatchSource(ImageDeserializer(map_file, StreamDefs(
image = StreamDef(field='image', transforms=transforms), # first column in map file is referred to as 'image'
label = StreamDef(field='label', shape=num_classes) # and second as 'label')))
then the names will match. You can choose any names you want but the value of the field inside StreamDef() should match the source (which depends on your input data and the Deserializer you are using).

Related

How to specify which key/value pairs to exclude in spaCy's Doc.to_disk(path, exclude=['user_data'])?

My nlp pipeline has some doc extensions that store 3 items (a string for file name and two dicts which map non-serializable objects). I'd like only to exclude the non-serializable key/value pairs in the user data, but keep the filename.
doc.to_disk(path, exclude=['user_data'])
works as expected, excluding all user data. There are apparently options to instead exclude either 'user_data_keys' or 'user_data_values' but I find no explanation of their usage, and furthermore I can't think of any good reason to store either all the keys without the values or all the values without the keys!
I would like to exclude both keys and values of only certain fields in the doc.user_data. If this is possible, how is it done?
You will need to specify which keys or values you want to exclude.
https://spacy.io/api/doc#serialization-fields
data = doc.to_bytes(exclude=["text", "tensor"])
doc.from_disk("./doc.bin", exclude=["user_data"])
Per this thread here, you can try the following work around:
def remove_unserializable_results(doc):
doc.user_data = {}
for x in dir(doc._):
if x in ['get', 'set', 'has']: continue
setattr(doc._, x, None)
for token in doc:
for x in dir(token._):
if x in ['get', 'set', 'has']: continue
setattr(token._, x, None)
return doc
nlp.add_pipe(remove_unserializable_results, last=True)

Where does SimObject name get set?

I want to know where does SimObject names like mem_ctrls, membus, replacement_policy are set in gem5. After looking at the code, I understood that, these name are used in stats.txt.
I have looked into SimObject code files(py,cc,hh files). I printed all Simobject names by stepping through root descendants in Simulation.py and then searched some of the names like mem_ctrls using vscode, but could not find a place where these names are set.
for obj in root.descendants():
print("object name:%s\n"% obj.get_name())
These names are the Python variable names from the configuration/run script.
For instance, from the Learning gem5 simple.py script...
from m5.objects import *
# create the system we are going to simulate
system = System()
# Set the clock fequency of the system (and all of its children)
system.clk_domain = SrcClockDomain()
system.clk_domain.clock = '1GHz'
system.clk_domain.voltage_domain = VoltageDomain()
# Set up the system
system.mem_mode = 'timing' # Use timing accesses
system.mem_ranges = [AddrRange('512MB')] # Create an address range
The names will be system, clk_domain, mem_ranges.
Note that only the SimObjects will have a name. The other parameters (e.g., integers, etc.) will not have a name.
You can see where this is set here: https://gem5.googlesource.com/public/gem5/+/master/src/python/m5/SimObject.py#1352

Tensorboard: why is family name prefixed twice?

I'm using tf.summary.histogram(var_name, var, family='my_family') to log a histogram. In the tensorboard interface it appears as
my_family/my_family/var_name
Does anybody know what the logic is behind duplicating the family name?
It does seem intentional, as I find the following in tensorflow/tensorflow/python/ops/summary_op_util.py :
# Use family name in the scope to ensure uniqueness of scope/tag.
scope_base_name = name if family is None else '{}/{}'.format(family, name)
with ops.name_scope(scope_base_name, default_name, values) as scope:
if family is None:
tag = scope.rstrip('/')
else:
# Prefix our scope with family again so it displays in the right tab.
tag = '{}/{}'.format(family, scope.rstrip('/'))
The first time family is inserted in scope_base_name = name if family is None else '{}/{}'.format(family, name), and the second time in tag = '{}/{}'.format(family, scope.rstrip('/')), which according to the comments in the code was deliberate.
I too was frustrated by this, but in the context of using tf.summary.scalar. I've resorted to using:
tf.summary.scalar('myfamily/myname', var)
Now the variables show up in Tensorboard without the duplication of the family name.
P.S. I would have made this a "comment" instead of an answer, but my reputation is too low.

How to show Feature Names in Graphviz?

I'm building a tree in Graphviz and I can't seem to be able to get the feature names to show up, I have defined a list with the feature names like so:
names = list(df.columns.values)
Which prints:
['Gender',
'SuperStrength',
'Mask',
'Cape',
'Tie',
'Bald',
'Pointy Ears',
'Smokes']
So the list is being created, later I build the tree like so:
export_graphviz(tree, out_file=ddata, filled=True, rounded=True, special_characters=False, impurity=False, feature_names=names)
But the final image still has the feature names listed like X[]:
How can I get the actual feature names to show up? (Cape instead of X[3], etc.)
I can only imagine this has to do with passing the names as an array of the values. It works fine if you pass the columns directly:
export_graphviz(tree, out_file=ddata, filled=True, rounded=True, special_characters=False, impurity=False, feature_names=df.columns)
If needed, you can also slice the columns:
export_graphviz(tree, out_file=ddata, filled=True, rounded=True, special_characters=False, impurity=False, feature_names=df.columns[5:])

Cypher: Scope of Match Statements in which Variables are Valid

I think I have a general problem with understanding the structure of matches and the scope in which variables of the match live.
The specific piece of code where I have the problem with is this:
// S sentiment toward A goodFor/badFor T
// => S sentiment toward the idea of A goodFor/badFor T
MATCH (S:A)-[:SOURCE]->(sent1:PS {type:"sentiment"})-[:TARGET]->(gfbf:E {type:"gfbf"}) , (A)-[:SOURCE]->(gfbf)-[:TARGET]->(T) , (Writer:A {type:"writer"})
// if there is some negative belief in any of the writers private state spaces that involve gfbf then inference is blocked
WHERE NOT (Writer)-[*1..]->({type:"believesTrue" , spec:FALSE})-[*1..]->(gfbf)
// if sent1 is in some private state spaces of the writer return all of these
OPTIONAL MATCH p=(Writer)-[*]->(sent1)
WITH NODES(p)[1..-1] AS ps_nodes
WHERE ALL(x IN ps_nodes[1..] WHERE LABELS(x) = "PS")
MERGE (S)-[:SOURCE]->(sent2:PS {type:"sentiment" , spec:(sent1.spec)})-[:TARGET]->(ideaOf:I {name:"ideaOf" , type:"ideaOf"})-[:TARGET]->(gfbf)
ON CREATE SET sent2.name =
CASE sent2.spec
WHEN FALSE THEN "-S"
ELSE "+S"
END
RETURN p
I think it's not relevant to understand what this is for. It suffices to see the structure I assume, but basically what it does is: It looks for a subgraph where there is path S-->sent1-->gfbf and also a path A-->gfbf-->T. If it finds that is makes a new path A-->sent2-->ideaOf-->gfbf, all he while setting the properties of the new nodes depending on the properties of the nodes from the match. Furthermore it looks whether it also has a path writer-->...-->sent where all nodes in the ... part have label PS. If it finds that path then it returns this for further operations in a different part of the program.
The error I am getting is this:
py2neo.cypher.error.statement.InvalidSyntax: sent1 not defined (line 6, column 58 (offset: 421))
"MERGE (S)-[:SOURCE]->(sent2:PS {type:"sentiment" , spec:(sent1.spec)})-[:TARGET]->(ideaOf:I {name:"ideaOf" , type:"ideaOf"})-[:TARGET]->(g"bf)
Why is sent1 no longer defined where I use it and how would I need to restructure the code to make it valid?
sent1 in isn't in the prior WITH - change it so:
WITH NODES(p)[1..-1] AS ps_nodes, sent1