ImageNet index to Wordnet 3.0 synsets - wordnet

Working with ImageNet Resnet-50 in Caffe the prediction gives a 1000-dimensional vector. Is there an easy way to translate the indices of this vector to Wordnet 3.0 synset identifiers? For instance, that the 415: 'bakery, bakeshop, bakehouse' is "n02776631"?
I note that a similar question, Get ImageNet label for a specific index in the 1000-dimensional output tensor in torch, has been asked about a human-readable label associated with the index and an answer pointed to an index-to-label mapping available in this URL: https://gist.github.com/maraoz/388eddec39d60c6d52d4
From the human readable label I suppose it is possible to find the Wordnet synset identifier via the label-to-synset mapping on this page: http://image-net.org/challenges/LSVRC/2015/browse-synsets but I am wondering whether this is already done?

The mapping seems to be straightforward with the data from https://gist.github.com/maraoz/388eddec39d60c6d52d4 and http://image-net.org/challenges/LSVRC/2015/browse-synsets:
{0: {'id': '01440764-n',
'label': 'tench, Tinca tinca',
'uri': 'http://wordnet-rdf.princeton.edu/wn30/01440764-n'},
1: {'id': '01443537-n',
'label': 'goldfish, Carassius auratus',
'uri': 'http://wordnet-rdf.princeton.edu/wn30/01443537-n'},
2: {'id': '01484850-n',
'label': 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
'uri': 'http://wordnet-rdf.princeton.edu/wn30/01484850-n'},
...
See https://gist.github.com/fnielsen/4a5c94eaa6dcdf29b7a62d886f540372 for full file.
I have not checked thoroughly whether this mapping is actually correct.
This mapping was constructed with:
import ast
from lxml import html
import requests
from pprint import pprint
url_index = ('https://gist.githubusercontent.com/maraoz/'
'388eddec39d60c6d52d4/raw/'
'791d5b370e4e31a4e9058d49005be4888ca98472/gistfile1.txt')
url_synsets = "http://image-net.org/challenges/LSVRC/2014/browse-synsets"
index_to_label = ast.literal_eval(requests.get(url_index).content)
elements = html.fromstring(requests.get(url_synsets).content).xpath('//a')
label_to_synset = {}
for element in elements:
href = element.attrib['href']
if href.startswith('http://imagenet.stanford.edu/synset?wnid='):
label_to_synset[element.text] = href[42:]
index_to_synset = {
k: {
'id': label_to_synset[v] + '-n',
'label': v,
'uri': "http://wordnet-rdf.princeton.edu/wn30/{}-n".format(
label_to_synset[v])
}
for k, v in index_to_label.items()}
pprint(index_to_synset)

Related

How can I make a DataFrame from a ColumnTransformer composed by LOO Encoder, OHE and Ordinal Encoder?

Since the feature get_feature_names() was deprecated from "native" categorical encoders in Sklearn (actually it was replaced by get_feature_names_out()), how could I make a DataFrame where the transformed variables have their proper names since inside the ColumnTransformer has encoders whose respond for get_feature_names_out() and others for get_feature_names()? Here is the situation:
features_pipe = make_column_transformer(
(OneHotEncoder(handle_unknown = 'ignore', sparse=False), ['Gender', 'Race']),
(OrdinalEncoder(), ['Age', 'Overall Work Exp.', 'Fieldwork Exp.', 'Level of Education']),
(ce.LeaveOneOutEncoder(), ['State (US)'])
).fit(X_train, y_train)
X_train_encoded = features_pipe.transform(X_train)
X_test_encoded = features_pipe.transform(X_test)
X_train_encoded_df = pd.DataFrame(X_train_encoded, columns= features_pipe.get_features_names_out())
X_train_encoded_df.head()
I got this error: AttributeError: 'ColumnTransformer' object has no attribute 'get_features_names_out'
That's because LeaveOneOutEncoder does not support get_feature_names_out(). It supports get_feature_names().
How could I overcome this issue and print my DataFrame correctly?
I have this same issue once in the past.
If you don't mind using a subclass of ColumnTransformer you can create it and just modify to call get_feature_names() when get_feature_names_out() was not available.
In this case, you should declare the class
from sklearn.compose import ColumnTransformer
from sklearn.compose._column_transformer import _is_empty_column_selection
class MyColumnTransformer(ColumnTransformer):
def __init__(self, transformers, **kwargs):
super().__init__(transformers=transformers, **kwargs)
def _get_feature_name_out_for_transformer(
self, name, trans, column, feature_names_in
):
column_indices = self._transformer_to_input_indices[name]
names = feature_names_in[column_indices]
if trans == "drop" or _is_empty_column_selection(column):
return
elif trans == "passthrough":
return names
if not hasattr(trans, "get_feature_names_out"):
return trans.get_feature_names()
return trans.get_feature_names_out(names)
Although the use of ColumnTransformer is not as simple as using make_column_transformer, it's much more customizable.
So, in this case, you also have to pass a name to each transformer using the following schema:
(name, transformer, columns)
features_pipe = MyColumnTransformer(transformers=
[
('OHE', OneHotEncoder(handle_unknown = 'ignore', sparse=False), ['Gender', 'Race']),
('OE', OrdinalEncoder(), ['Age', 'Overall Work Exp.', 'Fieldwork Exp.', 'Level of Education']),
('LOOE', ce.LeaveOneOutEncoder(), ['State (US)'])
])
features_pipe.fit(X_train, y_train)
and finally continue the code the way you suggested.
If you don't want to append the transformer names to the features names, just include verbose_feature_names_out=False when initializing MyColumnTransformer.

folium choropleth returns blank

I have a GeoDataFrame that plots nicely from Geopandas, but returns blank as Choropleth graph in Folium.
Folium 0.7.0
Geopandas 0.5.0
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [12, 12]
radiosambab.plot('situacionpromedio', antialiased=False)
As a geojson
radiosambab.__geo_interface__
returns
{'type': 'FeatureCollection',
'features': [{'id': '020130302',
'type': 'Feature',
'properties': {'situacionpromedio': 1.1173449839705998},
'geometry': {'type': 'Polygon',
'coordinates': (((-58.46738862003677, -34.53484761336359),
(-58.466080612615286, -34.53427219003239),
(-58.46379657486779, -34.53326322986549),
(-58.46165233386257, -34.530802575280035),
(-58.46133757821172, -34.530441540420355),
(-58.4588949370924, -34.527620828300144),
(-58.45884013885469, -34.52762641175383),
(-58.45875915687486, -34.527621382400326),
(-58.458732162886044, -34.52761970593736),
(-58.45867655438868, -34.52763422563783),
(-58.45856182767256, -34.52767203362345),
(-58.45850001004012, -34.52769515425145),
(-58.458440891778, -34.52771844249678),
(-58.45839257108904, -34.52774240132773),
(-58.45834357673059, -34.5277438516926),
...
Calling
radiosambab['situacionpromedio']
returns a Geoseries as expected:
COD_2010
020130302 1.117345
020131101 1.117371
020130104 1.161630
020130102 1.087263
020130101 1.268362
020120405 1.132843
020130107 1.085900
020130106 1.028195
020130109 1.056225
020130111 1.061627
020120407 1.138702
020120404 1.084368
020120402 1.078862
...
But, when invoking folium.Choropleth, it does not work:
m_2 = folium.Map(location=[-34.603722, -58.381592], tiles='openstreetmap', zoom_start=14)
folium.Choropleth(geo_data=radiosambab.__geo_interface__, data=radiosambab['situacionpromedio'], key_on='feature.id', fill_color='YlOrBr').add_to(m_2)
folium.LayerControl().add_to(m_2)
m_2
Returns
Thanks!
Problem seems to be related to lack of memory. Is actually plotting when restricting the number of polygons. But fails to do so above 2000 polygons aprox.
I had a similar problem and solved it with:
var_geodataframe = var_geodataframe.to_crs(epsg=4326)
May be must use the actual version of folium and review the Geopandas.

Run prediction from saved model in tensorflow 2.0

I have a saved model (a directory with model.pd and variables) and wanted to run predictions on a pandas data frame.
I've unsuccessfully tried a few ways to do this:
Attempt 1: Restore the estimator from the saved model
estimator = tf.estimator.LinearClassifier(
feature_columns=create_feature_cols(),
model_dir=path,
warm_start_from=path)
Where path is the directory that has a model.pd and variables folder. I got an error
ValueError: Tensor linear/linear_model/dummy_feature1/weights is not found in
gs://bucket/Trainer/output/2013/20191008T170504.583379-63adee0eaee0/serving_model_dir/export/1570554483/variables/variables
checkpoint {'linear/linear_model/dummy_feature1/weights': [1, 1], 'linear/linear_model/dummy_feature2/weights': [1, 1]
}
Attempt 2: Run prediction directly from the saved model by running
imported = tf.saved_model.load(path) # path is the directory that has a `model.pd` and variables folder
imported.signatures["predict"](example)
But has not successfully passed the argument - looks like the function is looking for a tf.example and I am not sure how to convert a data frame to tf.example.
My attempt to convert is below but got an error that df[f] is not a tensor:
for f in features:
example.features.feature[f].float_list.value.extend(df[f])
I've seen solutions on StackOverflow but they are all tensorflow 1.14. Greatly appreciate it if someone can help with tensorflow 2.0.
Considering you have your saved model present like this:
my_model
assets saved_model.pb variables
You can load your saved model using:
new_model = tf.keras.models.load_model('saved_model/my_model')
# Check its architecture
new_model.summary()
To perform prediction on a DataFrame you need to:
Wrap scalars into a list so as to have a batch dimension (models only process batches of data, not single samples)
Call convert_to_tensor on each feature
Example 1:
If you have values for the first test row as
sample = {
'Type': 'Cat',
'Age': 3,
'Breed1': 'Tabby',
'Gender': 'Male',
'Color1': 'Black',
'Color2': 'White',
'MaturitySize': 'Small',
'FurLength': 'Short',
'Vaccinated': 'No',
'Sterilized': 'No',
'Health': 'Healthy',
'Fee': 100,
'PhotoAmt': 2,
}
input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}
predictions = new_model.predict(input_dict)
prob = tf.nn.sigmoid(predictions[0])
print(
"This particular pet had a %.1f percent probability "
"of getting adopted." % (100 * prob)
)
Example 2:
Or if you have multiple rows present in the same order as the train data
predict_dataset = tf.convert_to_tensor([
[5.1, 3.3, 1.7, 0.5,],
[5.9, 3.0, 4.2, 1.5,],
[6.9, 3.1, 5.4, 2.1]
])
# training=False is needed only if there are layers with different
# behavior during training versus inference (e.g. Dropout).
predictions = new_model(predict_dataset, training=False)
for i, logits in enumerate(predictions):
class_idx = tf.argmax(logits).numpy()
p = tf.nn.softmax(logits)[class_idx]
name = class_names[class_idx]
print("Example {} prediction: {} ({:4.1f}%)".format(i, name, 100*p))

How do I need to modify exporting a keras model to accept b64 string to RESTful API/Google cloud ML

The complete code for exporting the model: (I've already trained it and now loading from weights file)
def cnn_layers(inputs):
conv_base= keras.applications.mobilenetv2.MobileNetV2(input_shape=(224,224,3), input_tensor=inputs, include_top=False, weights='imagenet')
for layer in conv_base.layers[:-200]:
layer.trainable = False
last_layer = conv_base.output
x = GlobalAveragePooling2D()(last_layer)
x= keras.layers.GaussianNoise(0.3)(x)
x = Dense(1024,name='fc-1')(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.advanced_activations.LeakyReLU(0.3)(x)
x = Dropout(0.4)(x)
x = Dense(512,name='fc-2')(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.advanced_activations.LeakyReLU(0.3)(x)
x = Dropout(0.3)(x)
out = Dense(10, activation='softmax',name='output_layer')(x)
return out
model_input = layers.Input(shape=(224,224,3))
model_output = cnn_layers(model_input)
test_model = keras.models.Model(inputs=model_input, outputs=model_output)
weight_path = os.path.join(tempfile.gettempdir(), 'saved_wt.h5')
test_model.load_weights(weight_path)
export_path='export'
from tensorflow.python.saved_model import builder as saved_model_builder
from tensorflow.python.saved_model import utils
from tensorflow.python.saved_model import tag_constants, signature_constants
from tensorflow.python.saved_model.signature_def_utils_impl import build_signature_def, predict_signature_def
from tensorflow.contrib.session_bundle import exporter
builder = saved_model_builder.SavedModelBuilder(export_path)
signature = predict_signature_def(inputs={'image': test_model.input},
outputs={'prediction': test_model.output})
with K.get_session() as sess:
builder.add_meta_graph_and_variables(sess=sess,
tags=[tag_constants.SERVING],
signature_def_map={'predict': signature})
builder.save()
And the output of  (dir 1 has saved_model.pb and models dir) :
python /tensorflow/python/tools/saved_model_cli.py show --dir /1 --all   is
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['predict']:
The given SavedModel SignatureDef contains the following input(s):
inputs['image'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 224, 224, 3)
name: input_1:0
The given SavedModel SignatureDef contains the following output(s):
outputs['prediction'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 107)
name: output_layer/Softmax:0
Method name is: tensorflow/serving/predict
To accept b64 string:
The code was written for (224, 224, 3) numpy array. So, the modifications I made for the above code are:
_bytes should be added to input when passing as b64. So,
predict_signature_def(inputs={'image':......
          changed to
predict_signature_def(inputs={'image_bytes':.....
Earlier, type(test_model.input) is : (224, 224, 3) and dtype: DT_FLOAT. So,
signature = predict_signature_def(inputs={'image': test_model.input},.....
          changed to (reference)
temp = tf.placeholder(shape=[None], dtype=tf.string)
signature = predict_signature_def(inputs={'image_bytes': temp},.....
Edit:
Code to send using requests is : (As mentioned in the comments)
encoded_image = None
with open('/1.jpg', "rb") as image_file:
encoded_image = base64.b64encode(image_file.read())
object_for_api = {"signature_name": "predict",
"instances": [
{
"image_bytes":{"b64":encoded_image}
#"b64":encoded_image (or this way since "image" is not needed)
}]
}
p=requests.post(url='http://localhost:8501/v1/models/mnist:predict', json=json.dumps(object_for_api),headers=headers)
print(p)
I'm getting <Response [400]> error. I think there's no error in the way I'm sending. Something needs to be changed in the code for exporting the model and specifically in
temp = tf.placeholder(shape=[None], dtype=tf.string).
Looking at the docs you've provided what you're looking to do is to take the image and send it in to the API. Images are easily transferable in a text format if you encode them, base64 being pretty much the standard. So what we want to do is create a json object with the image as base64 in the right place and then send this json object into the REST api. python has the requests library which makes sending in a python dictionary as JSON very easy.
So take the image, encode it, put it in a dictionary and send it off using requests:
import requests
import base64
encoded_image = None
with open("image.png", "rb") as image_file:
encoded_image = base64.b64encode(image_file.read())
object_for_api = {"signature_name": "predict",
"instances": [
{
"image": {"b64": encoded_image}
}]
}
requests.post(url='http://localhost:8501/v1/models/mnist:predict', json=object_for_api)
You can also encode your numpy array into JSON but it doesn't seem that the API docs are looking for that.
Two side notes:
I encourage you to use tf.saved_model.simple_save
You may find model_to_estimator convenient.
While your model seems like it will work for requests (the output of saved_model_cli shows the outer dimension is None for both inputs and outputs), it's fairly inefficient to send JSON arrays of floats
To the last point, it's often easier to modify the code to do the image decoding server side so you're sending a base64 encoded JPG or PNG over the wire instead of an array of floats. Here's one example for Keras (I plan to update that answer with simpler code).

tensorflow record with float numpy array

I want to create tensorflow records to feed my model;
so far I use the following code to store uint8 numpy array to TFRecord format;
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
def convert_to_record(name, image, label, map):
filename = os.path.join(params.TRAINING_RECORDS_DATA_DIR, name + '.' + params.DATA_EXT)
writer = tf.python_io.TFRecordWriter(filename)
image_raw = image.tostring()
map_raw = map.tostring()
label_raw = label.tostring()
example = tf.train.Example(features=tf.train.Features(feature={
'image_raw': _bytes_feature(image_raw),
'map_raw': _bytes_feature(map_raw),
'label_raw': _bytes_feature(label_raw)
}))
writer.write(example.SerializeToString())
writer.close()
which I read with this example code
features = tf.parse_single_example(example, features={
'image_raw': tf.FixedLenFeature([], tf.string),
'map_raw': tf.FixedLenFeature([], tf.string),
'label_raw': tf.FixedLenFeature([], tf.string),
})
image = tf.decode_raw(features['image_raw'], tf.uint8)
image.set_shape(params.IMAGE_HEIGHT*params.IMAGE_WIDTH*3)
image = tf.reshape(image_, (params.IMAGE_HEIGHT,params.IMAGE_WIDTH,3))
map = tf.decode_raw(features['map_raw'], tf.uint8)
map.set_shape(params.MAP_HEIGHT*params.MAP_WIDTH*params.MAP_DEPTH)
map = tf.reshape(map, (params.MAP_HEIGHT,params.MAP_WIDTH,params.MAP_DEPTH))
label = tf.decode_raw(features['label_raw'], tf.uint8)
label.set_shape(params.NUM_CLASSES)
and that's working fine. Now I want to do the same with my array "map" being a float numpy array, instead of uint8, and I could not find examples on how to do it;
I tried the function _floats_feature, which works if I pass a scalar to it, but not with arrays;
with uint8 the serialization can be done by the method tostring();
How can I serialize a float numpy array and how can I read that back?
FloatList and BytesList expect an iterable. So you need to pass it a list of floats. Remove the extra brackets in your _float_feature, ie
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.ones((3,)).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes": _floats_feature(numpy_arr)}))
print(example)
features {
feature {
key: "bytes"
value {
float_list {
value: 1.0
value: 1.0
value: 1.0
}
}
}
}
I will expand on the Yaroslav's answer.
Int64List, BytesList and FloatList expect an iterator of the underlying elements (repeated field). In your case you can use a list as an iterator.
You mentioned: it works if I pass a scalar to it, but not with arrays. And this is expected, because when you pass a scalar, your _floats_feature creates an array of one float element in it (exactly as expected). But when you pass an array you create a list of arrays and pass it to a function which expects a list of floats.
So just remove construction of the array from your function: float_list=tf.train.FloatList(value=value)
I've stumbled across this while working on a similar problem. Since part of the original question was how to read back the float32 feature from tfrecords, I'll leave this here in case it helps anyone:
If map.ravel() was used to input map of dimensions [x, y, z] into _floats_feature:
features = {
...
'map': tf.FixedLenFeature([x, y, z], dtype=tf.float32)
...
}
parsed_example = tf.parse_single_example(serialized=serialized, features=features)
map = parsed_example['map']
Yaroslav's example failed when a nd array was the input:
numpy_arr = np.ones((3,3)).astype(np.float)
I found that it worked when I used numpy_arr.ravel() as the input. But is there a better way to do it?
First of all, many thanks to Yaroslav and Salvador for their enlightening answers.
According to my experience, their methods only works when the input is a 1D NumPy array as the size of (n, ). When the input is a Numpy array with the dimension of more than 2, the following error info appears:
def _float_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes":
_float_feature(numpy_arr)}))
print(example)
TypeError: array([[0., 1., 2.],
[3., 4., 5.]]) has type numpy.ndarray, but expected one of: int, long, float
So, I'd like to expand on Tsuan's answer, that is, flattening the input before it was fed into the TF example. The modified code is as follows:
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float).flatten()
example = tf.train.Example(features=tf.train.Features(feature={"bytes":
_float_feature(numpy_arr)}))
print(example)
In addition, np.flatten() is more applicable than np.ravel().
Use tfrmaker, a TFRecord utility package. You can install the package with pip:
pip install tfrmaker
Then you could create tfrecords like this:
from tfrmaker import images
# mapping label names with integer encoding.
LABELS = {"bishop": 0, "knight": 1, "pawn": 2, "queen": 3, "rook": 4}
# specifiying data and output directories.
DATA_DIR = "datasets/chess/"
OUTPUT_DIR = "tfrecords/chess/"
# create tfrecords from the images present in the given data directory.
info = images.create(DATA_DIR, LABELS, OUTPUT_DIR)
# info contains a list of information (path: releative path, size: no of images in the tfrecord) about created tfrecords
print(info)
The package also has some cool features like:
dynamic resizing
splitting tfrecords into optimal shards
spliting training, validation, testing of tfrecords
count no of images in tfrecords
asynchronous tfrecord creation
NOTE: This package currently supports image datasets that are organised as directories with class names as sub directory names.