What is the purpose of the _classifierInput encoder field in the model parameters? - nupic

If you take a look at default model parameters as created by Cerebro, you see the following encoders:
{
'encoders': {
'_classifierInput': {
'classifierOnly': True,
'clipInput': True,
'fieldname': u'f',
'n': 100,
'name': '_classifierInput',
'type': 'AdaptiveScalarEncoder',
'w': 21
},
u'f': {
'clipInput': True,
'fieldname': u'f',
'n': 100,
'name': u'f',
'type': 'AdaptiveScalarEncoder',
'w': 21
}
}
}
What is the purpose of the _classifierInput encoder field? It looks like it just mirrors the encoder field that comes after it.

This is in clamodel.py:
def _getClassifierOnlyEncoder(self):
"""
Returns: sensor region's encoder that is sent only to the classifier,
not to the bottom of the network
"""
return self._getSensorRegion().getSelf().disabledEncoder
If you want the CLA to learn to predict (or "compute") a value, but not to use the value as input data, I think this is how you do it. For instance, you might have training data which includes the "answer" but this will be missing later (this is how a lot of the ML competitions work).

Related

How to evaluate the multi-class classifier prediction with TensorFlow Model Analysis library?

I'm trying to use TensorFlow Model Analysis library to analyze the prediction data from a multi-class classifier model, using the analyze_raw_data API. Currently the label contains 3 different classes [0, 1, 2], trained with SparseCategoricalCrossentropy loss. The tfma config was set as following:
eval_config = text_format.Parse("""
## Model information
model_specs {
label_key: "label",
prediction_key: "predictions"
}
## Post training metric information. These will be merged with any built-in
## metrics from training.
metrics_specs {
metrics { class_name: "ExampleCount" }
metrics { class_name: "SparseCategoricalCrossentropy" }
metrics { class_name: "SparseCategoricalAccuracy" }
metrics { class_name: "Precision" config: '"top_k": 1' }
metrics { class_name: "Precision" config: '"top_k": 3' }
metrics { class_name: "Recall" config: '"top_k": 1' }
metrics { class_name: "Recall" config: '"top_k": 3' }
metrics { class_name: "MultiClassConfusionMatrixPlot" }
}
## Slicing information
slicing_specs {} # overall slice
""", tfma.EvalConfig())
I've added ground truth label as a numerical value from [0, 1, 2] to the dataframe column "label", and predicted probabilities as a list to another column "predictions" (e.g. [0.2, 0.3, 0.5]), but I observed the error like ArrowTypeError: ("Expected bytes, got a 'numpy.ndarray' object", 'Conversion failed for column predictions with type object') when loading the data:
~/.pyenv/versions/3.7.8/lib/python3.7/site-packages/tensorflow_model_analysis/api/model_eval_lib.py in analyze_raw_data(data, eval_config, output_path, add_metric_callbacks)
1511
1512 arrow_data = table_util.CanonicalizeRecordBatch(
-> 1513 table_util.DataFrameToRecordBatch(data))
1514 beam_data = beam.Create([arrow_data])
1515
Does anyone know how to write the data for labels and predictions of multi-class classification so that we can do the model analysis with tfma?

Check which are the next layers in a tensorflow keras model

I have a keras model which has shortcuts between layers. For each layer, I would like to get the name (or index) of the next connected layers, because simply iterating through all the model.layers will not tell me whether the layer was connected to the previous one or not.
An example model could be:
model = tf.keras.applications.resnet50.ResNet50(
include_top=True, weights='imagenet', input_tensor=None,
input_shape=None, pooling=None, classes=1000)
You can extract the information in dict format in this way...
Firstly, define a utility function and get the relevant nodes as made in the model.summary() method from every Functional model (code reference)
relevant_nodes = []
for v in model._nodes_by_depth.values():
relevant_nodes += v
def get_layer_summary_with_connections(layer):
info = {}
connections = []
for node in layer._inbound_nodes:
if relevant_nodes and node not in relevant_nodes:
# node is not part of the current network
continue
for inbound_layer, node_index, tensor_index, _ in node.iterate_inbound():
connections.append(inbound_layer.name)
name = layer.name
info['type'] = layer.__class__.__name__
info['parents'] = connections
return info
Secondly, extract the information iterating through layers:
results = {}
layers = model.layers
for layer in layers:
info = get_layer_summary_with_connections(layer)
results[layer.name] = info
results is a nested dict with this format:
{
'layer_name': {'type':'the layer type', 'parents':'list of the parent layers'},
...
'layer_name': {'type':'the layer type', 'parents':'list of the parent layers'}
}
For ResNet50 it results in:
{
'input_4': {'type': 'InputLayer', 'parents': []},
'conv1_pad': {'type': 'ZeroPadding2D', 'parents': ['input_4']},
'conv1_conv': {'type': 'Conv2D', 'parents': ['conv1_pad']},
'conv1_bn': {'type': 'BatchNormalization', 'parents': ['conv1_conv']},
...
'conv5_block3_out': {'type': 'Activation', 'parents': ['conv5_block3_add']},
'avg_pool': {'type': 'GlobalAveragePooling2D', 'parents' ['conv5_block3_out']},
'predictions': {'type': 'Dense', 'parents': ['avg_pool']}
}
Also, you can modify get_layer_summary_with_connections to return all the information you are interested in
You can view the whole model and its connections with the keras's Model plotting utilities
tf.keras.utils.plot_model(model, to_file='path/to/image', show_shapes=True)

TensorFlow - how to import data with multiple labels

I'm trying to create a model in TensorFlow which predicts ideal item for a user by predicting a vector of numbers.
I have created a dataset in Spark and saved it as a TFRecord using Spark TensorFlow connector.
In the dataset, I have several hundreds of features and 20 labels in each row. For easier manipulation, I have given every column a prefix 'feature_' or 'label_'.
Now I'm trying to write input function for TensorFlow, but I can't figure out how to parse the data.
So far I have written this:
def dataset_input_fn():
path = ['data.tfrecord']
dataset = tf.data.TFRecordDataset(path)
def parser(record):
example = tf.train.Example()
example.ParseFromString(record)
# TODO: no idea what to do here
# features = parsed["features"]
# label = parsed["label"]
# return features, label
dataset = dataset.map(parser)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32)
dataset = dataset.repeat(100)
iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
return features, labels
How can I split the Example into a feature set and a label set? I have tried to split the Example into two parts, but there is no way to even access it. The only way I have managed to access it is by printing the example out, which gives me something like this.
features {
...
feature {
key: "feature_wishlist_hour"
value {
int64_list {
value: 0
}
}
}
feature {
key: "label_emb_1"
value {
float_list {
value: 0.4
}
}
}
feature {
key: "label_emb_2"
value {
float_list {
value: 0.8
}
}
}
...
}
Your parser function should be similar to how you constructed the example proto. In your case its should be something similar to:
# example proto decode
def parser(example_proto):
keys_to_features = {'feature_wishlist_hour':tf.FixedLenFeature((), tf.int64),
'label_emb_1': tf.FixedLenFeature((), tf.float32),
'label_emb_2': tf.FixedLenFeature((), tf.float32)}
parsed_features = tf.parse_single_example(example_proto, keys_to_features)
return parsed_features['feature_wishlist_hour'], (parsed_features['label_emb_1'], parsed_features['label_emb_2'])
EDIT: From the comments it seems you are encoding each of the features as key, value pair, which is not right. Check this answer: Numpy to TFrecords: Is there a more simple way to handle batch inputs from tfrecords? on how to write it in a proper way.

Is it compulsory for SVM to have two labels?

I have a question regarding Support Vector Machine. Does SVM have to have two labels? Is it possible that there will be one label and prediction will be based on that label? For example, the following testData does not fit trainingData so it won't be labeled 1 but any other integer. The dilemma is that I do not know values for worst case scenario because all values are gotten from user input.
int labels[3] = {1, 1, 1};
cv::Mat labelsMat(3, 1, CV_32S, labels);
float trainingData[3][3] = { { 25, 191, 19 }, { 24, 186, 17}, { 25, 200, 19} };
float testData[3] = {70, 500, 100};
SVM is a classification method. By using SVM you can separate data into several parts and the labels requires for this. You should train SVM by more than one label then test it by new inputs.

Trouble setting up the SimpleVector encoder

Using the commits from breznak for the encoders (I wasn't able to figure out "git checkout ..." with GitHub, so I just carefully copied over the three files - base.py, multi.py, and multi_test.py).
I ran multi_test.py without any problems.
Then I adjusted my model parameters (MODEL_PARAMS), so that the encoders portion of 'sensorParams' looks like this:
'encoders': {
'frequency': {
'fieldname': u'frequency',
'type': 'SimpleVector',
'length': 5,
'minVal': 0,
'maxVal': 210
}
},
I also adjusted the modelInput portion of my code, so it looked like this:
model = ModelFactory.create(model_params.MODEL_PARAMS)
model.enableInference({'predictedField': 'frequency'})
y = [1,2,3,4,5]
modelInput = {"frequency": y}
result = model.run(modelInput)
But I get the final error, regardless if I instantiate 'y' as a list or a numpy.ndarray
File "nta/eng/lib/python2.7/site-packages/nupic/encoders/base.py", line 183, in _getInputValue
return getattr(obj, fieldname)
AttributeError: 'list' object has no attribute 'idx0'
I also tried initializing a SimpleVector encoder inline with my modelInput, directly encoding my array, then passing it through modelInput. That violated the input parameters of my SimpleVector, because I was now double encoding. So I removed the encoders portion of my model parameters dictionary. That caused a spit up, because some part of my model was looking for that portion of the dictionary.
Any suggestions on what I should do next?
Edit: Here're the files I'm using with the OPF.
sendAnArray.py
import numpy
from nupic.frameworks.opf.modelfactory import ModelFactory
import model_params
class sendAnArray():
def __init__(self):
self.model = ModelFactory.create(model_params.MODEL_PARAMS)
self.model.enableInference({'predictedField': 'frequency'})
for i in range(100):
self.run()
def run(self):
y = [1,2,3,4,5]
modelInput = {"frequency": y}
result = self.model.run(modelInput)
anomalyScore = result.inferences['anomalyScore']
print y, anomalyScore
sAA = sendAnArray()
model_params.py
MODEL_PARAMS = {
'model': "CLA",
'version': 1,
'predictAheadTime': None,
'modelParams': {
'inferenceType': 'TemporalAnomaly',
'sensorParams': {
'verbosity' : 0,
'encoders': {
'frequency': {
'fieldname': u'frequency',
'type': 'SimpleVector',
'length': 5,
'minVal': 0,
'maxVal': 210
}
},
'sensorAutoReset' : None,
},
'spEnable': True,
'spParams': {
'spVerbosity' : 0,
'globalInhibition': 1,
'columnCount': 2048,
'inputWidth': 5,
'numActivePerInhArea': 60,
'seed': 1956,
'coincInputPoolPct': 0.5,
'synPermConnected': 0.1,
'synPermActiveInc': 0.1,
'synPermInactiveDec': 0.01,
},
'tpEnable' : True,
'tpParams': {
'verbosity': 0,
'columnCount': 2048,
'cellsPerColumn': 32,
'inputWidth': 2048,
'seed': 1960,
'temporalImp': 'cpp',
'newSynapseCount': 20,
'maxSynapsesPerSegment': 32,
'maxSegmentsPerCell': 128,
'initialPerm': 0.21,
'permanenceInc': 0.1,
'permanenceDec' : 0.1,
'globalDecay': 0.0,
'maxAge': 0,
'minThreshold': 12,
'activationThreshold': 16,
'outputType': 'normal',
'pamLength': 1,
},
'clParams': {
'regionName' : 'CLAClassifierRegion',
'clVerbosity' : 0,
'alpha': 0.0001,
'steps': '5',
},
'anomalyParams': {
u'anomalyCacheRecords': None,
u'autoDetectThreshold': None,
u'autoDetectWaitRecords': 2184
},
'trainSPNetOnlyIfRequested': False,
},
}
The problem seems to be that the SimpleVector class is accepting an array instead of a dict as its input, and then reconstructs that internally as {'list': {'idx0': 1, 'idx1': 2, ...}} (ie as if this dict had been the input). This is fine if it is done consistently, but your error shows that it's broken down somewhere. Have a word with #breznak about this.
Working through the OPF was difficult. I wanted to input an array of indices into the temporal pooler, so I opted to interface directly with the algorithms (I relied heavy on hello_tp.py). I ignored SimpleVector all together, and instead worked through the BitmapArray encoder.
Subutai has a useful email on the nupic-discuss listserve, where he breaks down the three main areas of the NuPIC API: algorithms, networks/regions, & the OPF. That helped me understand my options better.