Get field names from a TFRecord - tensorflow

Given a .tfrecord file, we can define a record iterator
record_iterator = tf.python_io.tf_record_iterator(path=record)
Then parse it using
example = tf.train.SequenceExample()
for element in record_iterator:
example.ParseFromString(element)
The question is: how do we infer all the field names in the context ?
If we know the structure in advance, we can say example.context.feature["width"]. In addition, str(example.context) returns a string with the entire record structure. However, I was wondering if there is any built-in function to get the field names and avoid parsing this string (e.g. by searching for "key")

Related

How to define a Combitimetable through a script in Dymola?

I am trying to perform several simulations in a sequence using a for loop in a script. From simulation to simulation, the only variable to change is the file path of a Combitimetable.
I propagated the variable fileName in order to assign a new path in each iteration. However, when the model reads the extension, changes the timeScale and the resolution is lower than needed. I tried to propagate timeScale too, but without changes. Is there a function to define the Combitimetable variables? Is my only alternative to merge all tables and split the results manually?
Example of the script on a single run (without the for loop):
filePath="RL_30_200g";
dymolaPath = "modelica://customTILComponents/Combitables/Combitimetable_"+filePath+".txt";
fileName= ModelicaServices.ExternalReferences.loadResource(dymolaPath);
result ="Full_Year_Simulation_"+filePath;
timeScale = 1/3600;
translateModel ("customTILComponents.MA_Santoro.FullModels.OptiHorst_FullModel_New_Year_Simulation_Batch");
simulateModel(startTime=0,stopTime=8860,numberOfIntervals=300,method="Dassl",tolerance=0.000001,resultFile=result);
I am not sure where your problem is and how you change fileName. In your question timeScale is also not used anywhere. Anyway, here is how I would do it: add a parameter to your model for fileName. Since it is a string, the only way to change it is via a modifier which can be included in the model name of the simulateModel command.
Here is an example: In your model with the time table, propagate the parameter fileName:
model MyModel
parameter String fileName="NoName" "File where matrix is stored";
Modelica.Blocks.Tables.CombiTable1Ds combiTable1Ds(
tableOnFile=true,
tableName="x",
fileName=fileName) annotation (Placement(transformation(extent={{-10,-10},{10,10}})));
Modelica.Blocks.Sources.Ramp ramp(duration=1) annotation (Placement(transformation(extent={{-60,-10},{-40,10}})));
equation
connect(ramp.y, combiTable1Ds.u) annotation (Line(points={{-39,0},{-12,0}}, color={0,0,127}));
annotation (uses(Modelica(version="4.0.0")));
end MyModel;
Then change the value of fileName in every loop.
Here we assume that there are three .mat files available in the workspace, named First.mat, Second.mat and Third.mat.
function batchSim
input String fileNames[:] = {"First", "Second", "Third"};
algorithm
for f in fileNames loop
simulateModel("MyModel(fileName=\""+f+".mat\")", stopTime=1, resultFile="Full_Year_Simulation_"+f);
end for;
annotation(__Dymola_interactive=true);
end batchSim;
This works quite well, but the downside is that the model will be recompiled in every iteration of the for loop, as the modifier has changed. If this is a big problem, define all file paths in a string vector in the model and add an integer parameter for the index. Then use the command simulateExtendedModel and change only the index via the parameters initialNames and initialValues.
Building on the answer by marco (so same model and same external files) an alternative is to make a script such as:
fileNames := {"First", "Second", "Third"};
Advanced.AllowStringParameters:=true;
translateModel("MyModel");
for f in fileNames loop
fileName:=f;
simulateModel("MyModel", stopTime=1, resultFile="Full_Year_Simulation_"+f);
end for;
Unfortunately it seems you cannot turn that into a function.

Confused about Tensorflow Algorithm function

Colab notebook
Under the section on Feature Columns, there is this specific line of code
feature_columns = [ ]
for feature_name in CATEGORICAL_COLUMNS:
vocabulary = dftrain[feature_name].unique()
I'm struggling to understand what this is doing. I don't really know what to search up too as I'm still quite new to programming. Why is there a need for this line? I understand that it outputs all unique values of the specified feature_name, but don't get how it's linked to the next line.
When you don't understand a function just google the module name (TensorFlow) and the function name. I found the documentation for tf.feature_column.categorical_column_with_vocabulary_list described here. To quote the documentation:
Use this when your inputs are in string or integer format, and you have an in-memory vocabulary mapping each value to an integer ID. By default, out-of-vocabulary values are ignored.
What this section of code is doing is going through each column and mapping each unique string value to a unique integer (its location in the vocabulary list). Transforming your column using this type of mapping is common for categorical data. The reason that unique is needed is because tf.feature_column.categorical_column_with_vocabulary_list needs a unique list as an argument before it can work its magic.
In the future please put all necessary code in the question. It should not be required to visit another link to answer your question.

How to serialize data in example-in-example format for tensorflow-ranking?

I'm building a ranking model with tensorflow-ranking. I'm trying to serialize a data set in the TFRecord format and read it back at training time.
The tutorial doesn't show how to do this. There is some documentation here on an example-in-example data format but it's hard for me to understand: I'm not sure what the serialized_context or serialized_examples fields are or how they fit into examples and I'm not sure what the Serialize() function in the code block is.
Concretely, how can I write and read data in example-in-example format?
The context is a map from feature name to tf.train.Feature. The examples list is a list of maps from feature name to tf.train.Feature. Once you have these, the following code will create an "example-in-example":
context = {...}
examples = [{...}, {...}, ...]
serialized_context = tf.train.Example(features=tf.train.Features(feature=context)).SerializeToString()
serialized_examples = tf.train.BytesList()
for example in examples:
tf_example = tf.train.Example(features=tf.train.Features(feature=example))
serialized_examples.value.append(tf_example.SerializeToString())
example_in_example = tf.train.Example(features=tf.train.Features(feature={
'serialized_context': tf.train.Feature(bytes_list=tf.train.BytesList(value=[serialized_context])),
'serialized_examples': tf.train.Feature(bytes_list=serialized_examples)
}))
To read the examples back, you may call
tfr.data.parse_from_example_in_example(example_pb,
context_feature_spec = context_feature_spec,
example_feature_spec = example_feature_spec)
where context_feature_spec and example_feature_spec are maps from feature name to tf.io.FixedLenFeature or tf.io.VarLenFeature.
First of all, I recommend reading this article to ensure that you know how to create a tf.Example as well as tf.SequenceExample (which by the way, is the other data format supported by TF-Ranking):
Tensorflow Records? What they are and how to use them
In the second part of this article, you will see that a tf.SequenceExample has two components: 1) Context and 2)Sequence (or examples). This is the same idea that Example-in-Example is trying to implement. Basically, context is the set of features that are independent of the items that you want to rank (a search query in the case of search, or user features in the case of a recommendation system) and the sequence part is a list of items (aka examples). This could be a list of documents (in search) or movies (in recommendation).
Once you are comfortable with tf.Example, Example-in-Example will be easier to understand. Take a look at this piece of code for how to create an EIE instance:
https://www.gitmemory.com/issue/tensorflow/ranking/95/518480361
1) bundle context features together in a tf.Example object and serialize it
2) bundle sequence(example) features (each of which could contain a list of values) in another tf.Example object and serialize this one too.
3) wrap these inside a parent tf.Example
4) (if you're writing to tfrecords) serialize the parent tf.Example object and write to your tfrecord file.

Get Text Symbol Programmatically With ID

Is there any way of programmatically getting the value of a Text Symbol at runtime?
The scenario is that I have a simple report that calls a function module. I receive an exported parameter in variable LV_MSG of type CHAR1. This indicates a certain status message created in the program, for instance F (Fail), X (Match) or E (Error). I currently use a CASE statement to switch on LV_MSG and fill another variable with a short description of the message. These descriptions are maintained as text symbols that I retrieve at compile time with text-MS# where # is the same as the possible returns of LV_MSG, for instance text-MSX has the value "Exact Match Found".
Now it seems to me that the entire CASE statement is unnecessary as I could just assign to my description variable the value of the text symbol with ID 'MS' + LV_MSG (pseudocode, would use CONCATENATE). Now my issue is how I can find a text symbol based on the String representation of its ID at runtime. Is this even possible?
If it is, my code would look cleaner and I wouldn't have to update my actual code when new messages are added in the function module, as I would simply have to add a new text symbol. But would this approach be any faster or would it in fact degrade the report's performance?
Personally, I would probably define a domain and use the fixed values of the domain to represent the values. This way, you would even get around the string concatenation. You can use the function module DD_DOMVALUE_TEXT_GET to easily access the language-dependent text of a domain value.
To access the text elements of a program, use a function module like READ_TEXT_ELEMENTS.
Be aware that generic programming like this will definitely slow down your program. Whether it would make your code look cleaner is in the eye of the beholder - if the values change rarely, I don't see why a simple CASE statement should be inferior to some generic text access.
Hope I understand you correctly but here goes. This is possible with a little trickery, all the text symbols in a report are defined as variables in the program (with the name text-abc where abc is the text ID). So you can use the following:
data: lt_all_text type standard table of textpool with default key,
lsr_text type ref to textpool.
"Load texts - you will only want to do this once
read textpool sy-repid into lt_all_text language sy-langu.
sort lt_all_Text by entry.
"Find a text, the field KEY is the text ID without TEXT-
read table lt_all_text with key entry = i_wanted_text
reference into lsr_text binary search.
If you want the address you can add:
field-symbols: <l_text> type any.
data l_name type string.
data lr_address type ref to data.
concatenate 'TEXT-' lsr_text->key into l_name.
assign (l_name) to <l_text>.
if sy-subrc = 0.
get reference of <l_text> into lr_address.
endif.
As vwegert pointed out this is probably not the best solution, for error handling rather use message classes or exception objects. This is useful in other cases though so now you know how.

How to read data from data bag within a PIG script

I have a databag which is the following format
{([ChannelName#{ (bigXML,[])} ])}
DataBag consists of only one item which is a Tuple.
Tuple consists of only item that is Map.
Map is of type which is a map between channel names and values.
Here is value is of type DataBag, which consists of only one tuple.
The tuple consists of two items one is a charrarray (very big string) and other is a map
I have a UDF that emits the above bag.
Now i need to invoke another UDF by passing the only tuple within the DataBag against a given Channel from the Map.
Assuming there was not data bag and a tuple as
([ChannelName#{ (bigXML,[])} ])
I can access the data using $0.$0#'StdOutChannel'
Now with the tuple inside a bag
{([ChannelName#{ (bigXML,[])} ])}
If i do $0.$0.$0#'StdOutChannel' (Prepend $0), i get the following error
ERROR 1052: Cannot cast bag with schema bag({bytearray}) to map
How can I access data within a data bag?
Try to break this problem down a little.
Let's say you get your inner bag:
MYBAG = $0.$0#'StdOutChannel';
First, can you ILLUSTRATE or DUMP this?
What can you do with this bag? Usually FOREACH over the tuples inside.
A = FOREACH MYBAG {
GENERATE $0 AS MyCharArray, $1 AS MyMap
};
ILLUSTRATE A; -- or if this doesn't work
DUMP A;
Can you try this interactively and maybe edit your question a little more with some details as a result of you trying these things.
Some editing hints for StackOverflow:
put backticks around your code (`ILLUSTRATE`)
indent code blocks by 4 spaces on each line