using k-means with a placeholder as an input - tensorflow

I'm running KMeans to cluster the data like
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=N, random_state=0).fit(X)
Centers = kmeans.cluster_centers_
Now, I need to use placeholder and use the code like the following
X = tf.placeholder(tf.float32, shape=(None, num_features))
kmeans = KMeans(n_clusters=N, random_state=0).fit(X)
Centers = kmeans.cluster_centers_
However, it dose not work. Is there any equivalent way in tensorflow (tensorflow.compat.v1) that I can use both KMeans and placeholder?

The error comes from confusion about what a placeholder is.
Roughly speaking, these are tf objects thought to be there as a promise to get a variable.
Your X has a shape, but it is an empty object. In order to fit a model you have to give it something containing data, and this is not the case. If you really want to use a tensorflow object (and I advise against it) you have to put data into it, for example using tf.variable.

Related

How to specify variables in tensorflow simple save

I am trying unsuccessfully to save my tensorflow model using the simple save method.
I have built a model using keras and trained it successfully, with an accuracy of 88%. I am now trying to save this model so we can serve it, but the function I need, simple save, isn't clear about how to specify the variables that get passed in.
The the session and the export directory is clear enough, but the inputs and outputs are mysterious. I believe that because I've used Keras, these variables are hidden by the abstraction of keras and the documentation from Tensorflow on simple save offers no explanation.
As a hailmary, I set Z equal to y just to put something in there, but obviously that is wrong. Do I need to set up an output variable Z, and if so, what type is it?
Not sure if this is enough code to get to the bottom of this. Even getting pointed at the right docs would be a big boost.
import tensorflow as tf
session = tf.keras.backend.get_session()
export_dir = "/Users/somedir/"
z = np.array([])
tf.saved_model.simple_save(session,
export_dir,
inputs={"x": X, "y": y},
outputs={"z": z})
X is my dataset -- an array of all independent variables. Y is the outcome (dependent variable). I don't have another candidate for z, so I set it to an empty array.
I get AttributeError: 'numpy.ndarray' object has no attribute 'get_shape'
Turns out that you can query the model itself for its inputs and outputs.
Don't forget to import the right libs:
import time
import tensorflow as tf
import tensorflow.python.saved_model
Then set an export path variable, for convenience this is timestamped, so you can run this again and again:
export_path = "/somedirectory/{}".format(time.strftime("%Y%m%d_%H%M%S"))
Then inside of get_session() block, the following will do the trick:
with tf.keras.backend.get_session() as sess:
tf.saved_model.simple_save(
sess,
export_path,
inputs={t.name:t for t in model.inputs},
outputs={t.name:t for t in model.outputs})

TensorFlow: Initial value without shape

I tried to implement the following code.
import tensorflow as tf
a = tf.placeholder(tf.int32)
b = tf.placeholder(tf.int32)
def initw(a,b):
tf.Variable(tf.sign(tf.random_uniform(shape=[a,b],minval=-1.0,maxval=1.0)))
bla = initw(a,b)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run([bla], feed_dict={a:2, b:2}))
But I keep getting an error which states:
ValueError: initial_value must have a shape specified: Tensor("Sign:0",shape=(?, ?), dtype=float32)
Can someone tell me what I am doing wrong here? I really don't see what causes the error.
EDIT:
I want to use initw(a,b) to initialize the weights of a network. I want to be able to do something like:
weights = {
"h1": tf.get_variable("h1", initializer=initw(a,b).initialized_value())
}
Where a and b are the height and width of a matrix.
In my eyes the error message is actually quite precise. But I understand your confusion. You probably do not really understand how Tensorflow works under the hood. You might want to start reading here.
The shape of the computational graph must be known before runtime. There can only be one axis in every variable or placeholder which is unspecified at compile time, it is than later at runtime considered to be the batch dimension.
In your case you are trying to use placeholders to specify the dimensions of a variable, which is impossible because the graph can not be compiled this way.
I don't know what you are trying to do with this but I would guess there is a way to achieve what you need. You can actually use the length of the batch dimension dynamically to draw a uniform vector of that size.
Edit: After you updated the question I feel like I was right about my suspicion. There is no need for a and b to be placeholders, just make them Python variables, like this:
import tensorflow as tf
# Matrix shape must be known in advance, but can of course still be specified
# in some settings file or at the beginning of the python skript
A = 2
B = 2
W = tf.Variable(tf.sign(tf.random_uniform(shape=(A, B), minval=-1.0,
maxval=1.0)))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(W))

How to deal with large(>2GB) embedding lookup table in tensorflow?

When I use pre-trained word vectors to do classification with LSTM, I wondered how to deal with embedding lookup table larger than 2gb in tensorflow.
To do this, I tried to make embedding lookup table like the code below,
data = tf.nn.embedding_lookup(vector_array, input_data)
got this value error.
ValueError: Cannot create a tensor proto whose content is larger than 2GB
variable vector_array on the code is numpy array, and it contains about 14 million unique tokens and 100 dimension word vectors for each word.
thank you for your helping with
You need to copy it to a tf variable. There's a great answer to this question in StackOverflow:
Using a pre-trained word embedding (word2vec or Glove) in TensorFlow
This is how I did it:
embedding_weights = tf.Variable(tf.constant(0.0, shape=[embedding_vocab_size, EMBEDDING_DIM]),trainable=False, name="embedding_weights")
embedding_placeholder = tf.placeholder(tf.float32, [embedding_vocab_size, EMBEDDING_DIM])
embedding_init = embedding_weights.assign(embedding_placeholder)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
sess.run(embedding_init, feed_dict={embedding_placeholder: embedding_matrix})
You can then use the embedding_weights variable for performing the lookup (remember to store word-index mapping)
Update: Use of the variable is not required but it allows you to save it for future use so that you don't have to re-do the whole thing again (it takes a while on my laptop when loading very large embeddings). If that's not important, you can simply use placeholders like Niklas Schnelle suggested
For me the accepted answer doesn't seem to work. While there is no error the results were terrible (when compared to a smaller embedding via direct initialization) and I suspect the embeddings were just the constant 0 the tf.Variable() is initialized with.
Using just a placeholder without an extra variable
self.Wembed = tf.placeholder(
tf.float32, self.embeddings.shape,
name='Wembed')
and then feeding the embedding on every session.run() of the graph seems to work however.
Using feed_dict with large embeddings was too slow for me with TF 1.8, probably due to the issue mentioned by Niklas Schnelle.
I ended up with the following code:
embeddings_ph = tf.placeholder(tf.float32, wordVectors.shape, name='wordEmbeddings_ph')
embeddings_var = tf.Variable(embeddings_ph, trainable=False, name='wordEmbeddings')
embeddings = tf.nn.embedding_lookup(embeddings_var,input_data)
.....
sess.run(tf.global_variables_initializer(), feed_dict={embeddings_ph:wordVectors})

How can I feed a numpy array to a prefetch and buffer pipeline of TensorFlow

I tried to follow the Cifar10 example. However, I want to replace the file reading with the Numpy array. There are a few benefits for doing that:
Simpler code (I want to remove the binary file parsing)
Simpler graph and visualization --> easier to explain to other audience
Small perf improvement (due to I/O and parsing)?
What would be a simple way to do it?
You need to get the tensor reshape_image by either:
giving it a name
finding its default name, with Tensorboard for instance
reshaped_image = tf.cast(read_input.uint8image, tf.float32, name="float_image")
Then you can feed your numpy array using a feed_dict like:
reshaped_image = tf.get_default_graph().get_tensor_by_name("float_image")
sess.run(loss, feed_dict={reshaped_image: your_numpy})
The same goes for labels.

writing a custom cost function in tensorflow

I'm trying to write my own cost function in tensor flow, however apparently I cannot 'slice' the tensor object?
import tensorflow as tf
import numpy as np
# Establish variables
x = tf.placeholder("float", [None, 3])
W = tf.Variable(tf.zeros([3,6]))
b = tf.Variable(tf.zeros([6]))
# Establish model
y = tf.nn.softmax(tf.matmul(x,W) + b)
# Truth
y_ = tf.placeholder("float", [None,6])
def angle(v1, v2):
return np.arccos(np.sum(v1*v2,axis=1))
def normVec(y):
return np.cross(y[:,[0,2,4]],y[:,[1,3,5]])
angle_distance = -tf.reduce_sum(angle(normVec(y_),normVec(y)))
# This is the example code they give for cross entropy
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
I get the following error:
TypeError: Bad slice index [0, 2, 4] of type <type 'list'>
At present, tensorflow can't gather on axes other than the first - it's requested.
But for what you want to do in this specific situation, you can transpose, then gather 0,2,4, and then transpose back. It won't be crazy fast, but it works:
tf.transpose(tf.gather(tf.transpose(y), [0,2,4]))
This is a useful workaround for some of the limitations in the current implementation of gather.
(But it is also correct that you can't use a numpy slice on a tensorflow node - you can run it and slice the output, and also that you need to initialize those variables before you run. :). You're mixing tf and np in a way that doesn't work.
x = tf.Something(...)
is a tensorflow graph object. Numpy has no idea how to cope with such objects.
foo = tf.run(x)
is back to an object python can handle.
You typically want to keep your loss calculation in pure tensorflow, so do the cross and other functions in tf. You'll probably have to do the arccos the long way, as tf doesn't have a function for it.
just realized that the following failed:
cross_entropy = -tf.reduce_sum(y_*np.log(y))
you cant use numpy functions on tf objects, and the indexing my be different too.
I think you can use "Wraps Python function" method in tensorflow. Here's the link to the documentation.
And as for the people who answered "Why don't you just use tensorflow's built in function to construct it?" - sometimes the cost function people are looking for cannot be expressed in tf's functions or extremely difficult.
This is because you have not initialized your variable and because of this it does not have your Tensor there right now (can read more in my answer here)
Just do something like this:
def normVec(y):
print y
return np.cross(y[:,[0,2,4]],y[:,[1,3,5]])
t1 = normVec(y_)
# and comment everything after it.
To see that you do not have a Tensor now and only Tensor("Placeholder_1:0", shape=TensorShape([Dimension(None), Dimension(6)]), dtype=float32).
Try initializing your variables
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
and evaluate your variable sess.run(y). P.S. you have not fed your placeholders up till now.