Programmatically `git status` with dulwich - dulwich

I'm wondering how I can perform the equivalent of git status with dulwich?
I tried this:
After adding/changing/renaming some files and staging them for commit, this is what I've tried doing:
from dulwich.repo import Repo
from dulwich.index import changes_from_tree
r = Repo('my-git-repo')
index = r.open_index()
changes = index.changes_from_tree(r.object_store, r['HEAD'].tree)
Outputs the following:
>>> list(changes)
(('Makefile', None), (33188, None), ('9b20...', None))
(('test/README.txt', 'test/README.txt'), (33188, 33188), ('484b...', '4f89...'))
((None, 'Makefile.mk'), (None, 33188), (None, '9b20...'))
((None, 'TEST.txt'), (None, 33188), (None, '2a02...'))
But this output requires that I further process it to detect:
I modified README.txt.
I renamed Makefile to Makefile.mk.
I added TEST.txt to the repository.
The functions in dulwich.diff_tree provide a much nicer interface to tree changes... is this not possible before actually committing?

You should be able to use dulwich.diff_tree.tree_changes to detect the changes between two trees.
One of the requirements for this is that you add the relevant tree objects to the object store - you can use dulwich.index.commit_index for this.

For completeness, a working sample:
from dulwich.repo import Repo
from dulwich.diff_tree import tree_changes
repo = Repo("./")
index = repo.open_index()
try:
head_tree = repo.head().tree
except KeyError: # in case of empty tree
head_tree = dulwich.objects.Tree()
changes = list(tree_changes(repo, head_tree, index.commit(repo.object_store)))
for change in changes:
print "%s: %s"%(change.type,change.new.path)

Related

Tf-agent Actor/Learner: TFUniform ReplayBuffer dimensionality issue - invalid shape of Replay Buffer vs. Actor update

I try to adapt the this tf-agents actor<->learner DQN Atari Pong example to my windows machine using a TFUniformReplayBuffer instead of the ReverbReplayBuffer which only works on linux machine but I face a dimensional issue.
[...]
---> 67 init_buffer_actor.run()
[...]
InvalidArgumentError: {{function_node __wrapped__ResourceScatterUpdate_device_/job:localhost/replica:0/task:0/device:CPU:0}} Must have updates.shape = indices.shape + params.shape[1:] or updates.shape = [], got updates.shape [84,84,4], indices.shape [1], params.shape [1000,84,84,4] [Op:ResourceScatterUpdate]
The problem is as follows: The tf actor tries to access the replay buffer and initialize the it with a certain number random samples of shape (84,84,4) according to this deepmind paper but the replay buffer requires samples of shape (1,84,84,4).
My code is as follows:
def train_pong(
env_name='ALE/Pong-v5',
initial_collect_steps=50000,
max_episode_frames_collect=50000,
batch_size=32,
learning_rate=0.00025,
replay_capacity=1000):
# load atari environment
collect_env = suite_atari.load(
env_name,
max_episode_steps=max_episode_frames_collect,
gym_env_wrappers=suite_atari.DEFAULT_ATARI_GYM_WRAPPERS_WITH_STACKING)
# create tensor specs
observation_tensor_spec, action_tensor_spec, time_step_tensor_spec = (
spec_utils.get_tensor_specs(collect_env))
# create training util
train_step = train_utils.create_train_step()
# calculate no. of actions
num_actions = action_tensor_spec.maximum - action_tensor_spec.minimum + 1
# create agent
agent = dqn_agent.DqnAgent(
time_step_tensor_spec,
action_tensor_spec,
q_network=create_DL_q_network(num_actions),
optimizer=tf.compat.v1.train.RMSPropOptimizer(learning_rate=learning_rate))
# create uniform replay buffer
replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
data_spec=agent.collect_data_spec,
batch_size=1,
max_length=replay_capacity)
# observer of replay buffer
rb_observer = replay_buffer.add_batch
# create batch dataset
dataset = replay_buffer.as_dataset(
sample_batch_size=batch_size,
num_steps = 2,
single_deterministic_pass=False).prefetch(3)
# create callable function for actor
experience_dataset_fn = lambda: dataset
# create random policy for buffer init
random_policy = random_py_policy.RandomPyPolicy(collect_env.time_step_spec(),
collect_env.action_spec())
# create initalizer
init_buffer_actor = actor.Actor(
collect_env,
random_policy,
train_step,
steps_per_run=initial_collect_steps,
observers=[replay_buffer.add_batch])
# initialize buffer with random samples
init_buffer_actor.run()
(The approach is using the OpenAI Gym Env as well as the corresponding wrapper functions)
I worked with keras-rl2 and tf-agents without actor<->learner for other atari games to create the DQN and both worked quite well afer a some adaptions. I guess my current code will also work after a few adaptions in the tf-agent libary functions, but that would obviate the purpose of the libary.
My current assumption: The actor<->learner methods are not able to work with the TFUniformReplayBuffer (as I expect them to), due to the missing support of the TFPyEnvironment - or I still have some knowledge shortcomings regarding this tf-agents approach
Previous (successful) attempt:
from tf_agents.environments.tf_py_environment import TFPyEnvironment
tf_collect_env = TFPyEnvironment(collect_env)
init_driver = DynamicStepDriver(
tf_collect_env,
random_policy,
observers=[replay_buffer.add_batch],
num_steps=200)
init_driver.run()
I would be very grateful if someone could explain me what I'm overseeing here.
I fixed it...partly, but the next error is (in my opinion) an architectural problem.
The problem is that the Actor/Learner setup is build on a PyEnvironment whereas the
TFUniformReplayBuffer is using the TFPyEnvironment which ends up in the failure above...
Using the PyUniformReplayBuffer with a converted py-spec solved this problem.
from tf_agents.specs import tensor_spec
# convert agent spec to py-data-spec
py_collect_data_spec = tensor_spec.to_array_spec(agent.collect_data_spec)
# create replay buffer based on the py-data-spec
replay_buffer = py_uniform_replay_buffer.PyUniformReplayBuffer(
data_spec= py_collect_data_spec,
capacity=replay_capacity*batch_size
)
This snippet solved the issue of having an incompatible buffer in the background but ends up in another issue
--> The add_batch function does not work
I found this approach which advises to use either a batched environment or to make the following adaptions for the replay observer (add_batch method).
from tf_agents.utils.nest_utils import batch_nested_array
#********* Adpations add_batch method - START *********#
rb_observer = lambda x: replay_buffer.add_batch(batch_nested_array(x))
#********* Adpations add_batch method - END *********#
# create batch dataset
dataset = replay_buffer.as_dataset(
sample_batch_size=32,
single_deterministic_pass=False)
experience_dataset_fn = lambda: dataset
This helped me to solve the issue regarding this post but now I run into another problem where I need to ask someone of the tf-agents-team...
--> It seems that the Learner/Actor structure is no able to work with another buffer than the ReverbBuffer, because the data-spec which is processed by the PyUniformReplayBuffer sets up a wrong buffer structure...
For anyone who has the same problem: I just created this Github-Issue report to get further answers and/or fix my lack of knowledge.
the full fix is shown below...
--> The dimensionality issue was valid and should indicate the the (uploaded) batched samples are not in the correct shape
--> This issue happens due to the fact that the "add_batch" method loads values with the wrong shape
rb_observer = replay_buffer.add_batch
Long story short, this line should be replaced by
rb_observer = lambda x: replay_buffer.add_batch(batch_nested_array(x))
--> Afterwards the (replay buffer) inputs are of correct shape and the Learner Actor Setup starts training.
The full replay buffer is shown below:
# create buffer for storing experience
replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
agent.collect_data_spec,
1,
max_length=1000000)
# create batch dataset
dataset = replay_buffer.as_dataset(
sample_batch_size=32,
num_steps = 2,
single_deterministic_pass=False).prefetch(4)
# create batched nested array input for rb_observer
rb_observer = lambda x: replay_buffer.add_batch(batch_nested_array(x))
# create batched readout of dataset
experience_dataset_fn = lambda: dataset

Add file name and line number to tensorflow op name during debug mode

I am interested in a feature or hacky solution that allows every tensorflow (specifically tf1.x) op name to include the file name and line number where the op is defined, in an automated fashion across the entire code base. This will greatly facilitate tracking down the place where an op raises an error, such as the situation below:
File "tensorflow/contrib/distribute/python/mirrored_strategy.py", line 633, in _update
assert isinstance(var, values.DistributedVariable), var
AssertionError: Tensor("floordiv_2:0", shape=(), dtype=int64, device=/job:chief/replica:0/task:0/device:GPU:0)
Right now the best I can do is to take a wild guess where a floordiv might occur, but honestly I have no clue at the moment.
The easiest way might be to show the graph on tensorboard and then search for failing op by name. Among the names of parent scopes or preceding operations you would probably be able to tell which layer is failing.
If not that, wrap your layer calls, model constructions in scopes. Hopefully, if this is your tensor failing and not a part of optimizer or else, you would see the direction where to look at.
If you are dedicated to wrap every op in a file-name/line-number, you can try to monkey-patch tf.Operation constructor with a scope. Which should be something along the next lines:
from inspect import getframeinfo, stack
import tensorflow as tf
def scopify_ops(func):
def wrapper(*args, **kwargs):
caller = getframeinfo(stack()[1][0])
path = "%s:%d - %s" % (caller.filename, caller.lineno, message)
print("Caller info:", path)
with tf.name_scope("path")
return func(*args, **kwargs)
return wrapper
tf.Operation.__init__ = scopify_ops(tf.Operation.__init__)

Tensorflow/Keras, How to convert tf.feature_column into input tensors?

I have the following code to average embeddings for list of item-ids.
(Embedding is trained on review_meta_id_input, and used as look up for pirors_input and for getting average embedding)
review_meta_id_input = tf.keras.layers.Input(shape=(1,), dtype='int32', name='review_meta_id')
priors_input = tf.keras.layers.Input(shape=(None,), dtype='int32', name='priors') # array of ids
item_embedding_layer = tf.keras.layers.Embedding(
input_dim=100, # max number
output_dim=self.item_embedding_size,
name='item')
review_meta_id_embedding = item_embedding_layer(review_meta_id_input)
selected = tf.nn.embedding_lookup(review_meta_id_embedding, priors_input)
non_zero_count = tf.cast(tf.math.count_nonzero(priors_input, axis=1), tf.float32)
embedding_sum = tf.reduce_sum(selected, axis=1)
item_average = tf.math.divide(embedding_sum, non_zero_count)
I also have some feature columns such as..
(I just thought feature_column looked cool, but not many documents to look for..)
kid_youngest_month = feature_column.numeric_column("kid_youngest_month")
kid_age_youngest_buckets = feature_column.bucketized_column(kid_youngest_month, boundaries=[12, 24, 36, 72, 96])
I'd like to define [review_meta_id_iput, priors_input, (tensors from feature_columns)] as an input to keras Model.
something like:
inputs = [review_meta_id_input, priors_input] + feature_layer
model = tf.keras.models.Model(inputs=inputs, outputs=o)
In order to get tensors from feature columns, the closest lead I have now is
fc_to_tensor = {fc: input_layer(features, [fc]) for fc in feature_columns}
from https://github.com/tensorflow/tensorflow/issues/17170
However I'm not sure what the features are in the code.
There's no clear example on https://www.tensorflow.org/api_docs/python/tf/feature_column/input_layer either.
How should I construct the features variable for fc_to_tensor ?
Or is there a way to use keras.layers.Input and feature_column at the same time?
Or is there an alternative than tf.feature_column to do the bucketing as above? then I'll just drop the feature_column for now;
The behavior you desire could be achieved through following steps.
This works in TF 2.0.0-beta1, but may being changed or even simplified in further reseases.
Please check out issue in TensorFlow github repository Unable to use FeatureColumn with Keras Functional API #27416. There you will find the more general example and useful comments about tf.feature_column and Keras Functional API.
Meanwhile, based on the code in your question the input tensor for feature_column could be get like this:
# This you have defined feauture column
kid_youngest_month = feature_column.numeric_column("kid_youngest_month")
kid_age_youngest_buckets = feature_column.bucketized_column(kid_youngest_month, boundaries=[12, 24, 36, 72, 96])
# Then define layer
feature_layer = tf.keras.layers.DenseFeatures(kid_age_youngest_buckets)
# The inputs for DenseFeature layer should be define for each original feature column as dictionary, where
# keys - names of feature columns
# values - tf.keras.Input with shape =(1,), name='name_of_feature_column', dtype - actual type of original column
feature_layer_inputs = {}
feature_layer_inputs['kid_youngest_month'] = tf.keras.Input(shape=(1,), name='kid_youngest_month', dtype=tf.int8)
# Then you can collect inputs of other layers and feature_layer_inputs into one list
inputs=[review_meta_id_input, priors_input, [v for v in feature_layer_inputs.values()]]
# Then define outputs of this DenseFeature layer
feature_layer_outputs = feature_layer(feature_layer_inputs)
# And pass them into other layer like any other
x = tf.keras.layers.Dense(256, activation='relu')(feature_layer_outputs)
# Or maybe concatenate them with outputs from your others layers
combined = tf.keras.layers.concatenate([x, feature_layer_outputs])
#And probably you will finish with last output layer, maybe like this for calssification
o=tf.keras.layers.Dense(classes_number, activation='softmax', name='sequential_output')(combined)
#So you pass to the model:
model_combined = tf.keras.models.Model(inputs=[s_inputs, [v for v in feature_layer_inputs.values()]], outputs=o)
Also note. In model fit() method you should pass info which data sould be used for each input.
One way, if you use tf.data.Dataset, take care that you have used the same names for features in Dataset and for keys in feature_layer_inputs dictionary
Other way use explicite notation like:
model.fit({'review_meta_id_input': review_meta_id_data, 'priors_input': priors_data, 'kid_youngest_month': kid_youngest_month_data},
{'outputs': o},
...
)

Tensorflow serving variable file

I'm about to do tensorflow serving.
pb file and variable folder are created.
but No file was created under the variable folder.
like this
└── variables
├── variables.data-00000-of-00001
└── variables.index
After further experimentation, I found that the file only occurs when output is output to tf.Variable.
for example
1) z = tf.Variable(3,dtype=tf.float32)
2) z = tf.constant(3,dtype=tf.float32)
1) is created the file but 2) is not created file
z is output variable
signature_def_map= {
"serving_default": tf.saved_model.signature_def_utils.predict_signature_def(
inputs= {"egg": x, "bacon":y},
outputs= {"spam": z})
})
Is it right that I found out?
The above explanation is a test result as a simple example.
This is what I really want to do
sIdSorted = tf.gather(sId, indices[::-1])[0:5]
sess=tf.Session()
print sess.run(sIdSorted,feed_dict={userLat:37.12,userLon:127.2})
As a result of printing, it was output as follows.
['s7' 's1' 's2' 's3' 's4']
However, in this way, nothing is displayed in the variable folder.....
So I tried to output to tf.variable.
sIdSorted = tf.Variable(tf.gather(sId, indices[::-1])[0:5])
but This will output an error to the following.
initial_value must have a shape specified: Tensor("strided_slice_1:0", dtype=string)
so I tried it as follows.
sIdSorted = tf.Variable(tf.constant(tf.gather(sId, indices[::-1])[0:5],shape=[5]))
but This will output an error to the following.
List of Tensors when single Tensor expected
I need your help. Thank you for reading.
**tensorflow version :1.3.0 python 2.x
That is correct: only tf.Variables result in variable files being exported. Those files contain the actual values of the variables. The graph structure itself is stored in the saved_model.pb. That's where your gather (and any other ops) are. You should be able to serve the model.

How to add all variables under a scope into a certain collection

In tensorflow python APIs, tf.get_variable has a parameter collections to add the created var to the specified collections. But tf.variable_scope does not.
What's the suggested way to add all variables under a variable scope into a certain collection?
I don't believe there is a way to do this directly. You could file a feature request on Tensorflow's github issues tracker.
I can suggest two workarounds you might try though:
iterate over the result of tf.all_variables(), and extract variables whose names look like ".../scope_name/...". The scope names are encoded in the variable name, separated by / characters.
write wrappers around tf.VariableScope and tf.get_variable() that store the variables created inside the scope in a data structure.
I hope that helps!
I have managed to do this:
import tensorflow as tf
def var_1():
with tf.variable_scope("foo") as foo_scope:
assert foo_scope.name == "ll/foo"
a = tf.get_variable("a", [2, 2])
return foo_scope
def var_2(foo_scope):
with tf.variable_scope("bar"):
b = tf.get_variable("b", [2, 2])
with tf.variable_scope("baz") as other_scope:
c = tf.get_variable("c", [2, 2])
assert other_scope.name == "ll/bar/baz"
with tf.variable_scope(foo_scope) as foo_scope2:
d = tf.get_variable("d", [2, 2])
assert foo_scope2.name == "ll/foo" # Not changed.
def main():
with tf.variable_scope("ll"):
scp = var_1()
var_2(scp)
all_default_global_variables = tf.get_collection_ref(tf.GraphKeys.GLOBAL_VARIABLES)
my_collection = tf.get_collection('my_collection') # create my collection
ll_foo_variables = []
for variable in all_default_global_variables:
if "ll/foo" in variable.name:
ll_foo_variables.append(variable)
tf.add_to_collection('my_collection', ll_foo_variables)
variables_in_my_collection = tf.get_collection_ref("my_collection")
print(variables_in_my_collection)
main()
You can see that in my code in a, b, c and d only a and d have the same scope name ll/foo.
The process:
First I add all variables which are created by default in the tf.GraphKeys.GLOBAL_VARIABLES collection, then I create a collection named my_collection and then I add only those variables with 'll/foo' in the scope name to my_collection.
And what I get I what I expected:
[[<tf.Variable 'll/foo/a:0' shape=(2, 2) dtype=float32_ref>, <tf.Variable 'll/foo/d:0' shape=(2, 2) dtype=float32_ref>]]
import tensorflow as tf
for var in tf.global_variables(scope='model'):
tf.add_to_collection(tf.GraphKeys.MODEL_VARIABLES, var)
Instead of using global_variables, you could also iterate over trainable_variables if that is what you're interested in. In both cases, you do not only capture the variables you created manually using get_variable() but also the ones created by e.g. any tf.layers call.
You could just get all variables within the scope instead of getting a collection:
tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='my_scope')
https://stackoverflow.com/a/36536063/9095840