I have been struggling with this error for a few hours now, but seem lost even after reading through the documentation.
I'm using H2O's Extended Isolation Forest (EIF), an unsupervised model, to detect anomalies in an unlabelled dataset. Which is working as intended, however for the project i'm working on the model explainability is extremely important. I discovered the explain function, which supposedly returns several explainablity methods for a model. I'm particularly interested in the SHAP values from this function.
The documentation states
The main functions, h2o.explain() (global explanation) and h2o.explain_row() (local explanation) work for individual H2O models, as well a list of models or an H2O AutoML object. The h2o.explain() function generates a list of explanations.
Since the H2O models link brings me to a page which covers both supervised and unsupervised models I assume the explain function would work for both types of models.
When trying to run my code the following code works just fine.
import h2o
from h2o.estimators import H2OExtendedIsolationForestEstimator
h2o.init()
df_EIF = h2o.H2OFrame(df_EIF)
predictors = df_EIF.columns[0:37]
eif = H2OExtendedIsolationForestEstimator(ntrees = 75, sample_size =500, extension_level = (len(predictors) -1) )
eif.train(x=predictors, training_frame = df_EIF)
eif_result = eif.predict(df_EIF)
df_EIF['anomaly_score_EIF') = eif_result['anomaly_score']
However when trying to call explain over the model (eif)
eif.explain(df_EIF)
Gives me the following KeyError
KeyError Traceback (most recent call last)
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx.py in <module>
----> 1 eif.explain(df_EIF)
2
3
4
5
C:\ProgramData\Anaconda3\lib\site-packages\h2o\explanation\_explain.py in explain(models, frame, columns, top_n_features, include_explanations, exclude_explanations, plot_overrides, figsize, render, qualitative_colormap, sequential_colormap)
2895 plt = get_matplotlib_pyplot(False, raise_if_not_available=True)
2896 (is_aml, models_to_show, classification, multinomial_classification, multiple_models, targets,
-> 2897 tree_models_to_show, models_with_varimp) = _process_models_input(models, frame)
2898
2899 if top_n_features < 0:
C:\ProgramData\Anaconda3\lib\site-packages\h2o\explanation\_explain.py in _process_models_input(models, frame)
2802 models_with_varimp = [model for model in models if _has_varimp(model)]
2803 tree_models_to_show = _get_tree_models(models, 1 if is_aml else float("inf"))
-> 2804 y = _get_xy(models_to_show[0])[1]
2805 classification = frame[y].isfactor()[0]
2806 multinomial_classification = classification and frame[y].nlevels()[0] > 2
C:\ProgramData\Anaconda3\lib\site-packages\h2o\explanation\_explain.py in _get_xy(model)
1790 """
1791 names = model._model_json["output"]["original_names"] or model._model_json["output"]["names"]
-> 1792 y = model.actual_params["response_column"]
1793 not_x = [
1794 y,
KeyError: 'response_column
From my understanding this response column refers to a column that you are trying to predict. However, since i'm dealing with an unlabelled dataset this response column doesn't exist. Is there a way for me to bypass this error? Is it even possible to utilize the explain() function on unsupervised models? If, so how do I do this? If it is not possible, is there another way to extract the Shap values of each variable from the model? Since the shap.TreeExplainer also doesn't seem to work on a H2O model.
TL;DR: Is it possible to use the .explain() function from h2o on an (Extended) Isolation forest? If so how?
Unfortunately, the explain method in H2O-3 is supported only for the supervised algorithms.
What you could do is to use a surrogate model and look at explanations on it.
Basically, you'd fit a GBM (or DRF as those 2 models support the TreeSHAP) on the data + the prediction of the Extended Isolation Forest which would be the response.
Here is another approach how to explain prediction of (E)IF: https://github.com/h2oai/h2o-tutorials/blob/master/tutorials/isolation-forest/interpreting_isolation-forest.ipynb
Related
I try to adapt the this tf-agents actor<->learner DQN Atari Pong example to my windows machine using a TFUniformReplayBuffer instead of the ReverbReplayBuffer which only works on linux machine but I face a dimensional issue.
[...]
---> 67 init_buffer_actor.run()
[...]
InvalidArgumentError: {{function_node __wrapped__ResourceScatterUpdate_device_/job:localhost/replica:0/task:0/device:CPU:0}} Must have updates.shape = indices.shape + params.shape[1:] or updates.shape = [], got updates.shape [84,84,4], indices.shape [1], params.shape [1000,84,84,4] [Op:ResourceScatterUpdate]
The problem is as follows: The tf actor tries to access the replay buffer and initialize the it with a certain number random samples of shape (84,84,4) according to this deepmind paper but the replay buffer requires samples of shape (1,84,84,4).
My code is as follows:
def train_pong(
env_name='ALE/Pong-v5',
initial_collect_steps=50000,
max_episode_frames_collect=50000,
batch_size=32,
learning_rate=0.00025,
replay_capacity=1000):
# load atari environment
collect_env = suite_atari.load(
env_name,
max_episode_steps=max_episode_frames_collect,
gym_env_wrappers=suite_atari.DEFAULT_ATARI_GYM_WRAPPERS_WITH_STACKING)
# create tensor specs
observation_tensor_spec, action_tensor_spec, time_step_tensor_spec = (
spec_utils.get_tensor_specs(collect_env))
# create training util
train_step = train_utils.create_train_step()
# calculate no. of actions
num_actions = action_tensor_spec.maximum - action_tensor_spec.minimum + 1
# create agent
agent = dqn_agent.DqnAgent(
time_step_tensor_spec,
action_tensor_spec,
q_network=create_DL_q_network(num_actions),
optimizer=tf.compat.v1.train.RMSPropOptimizer(learning_rate=learning_rate))
# create uniform replay buffer
replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
data_spec=agent.collect_data_spec,
batch_size=1,
max_length=replay_capacity)
# observer of replay buffer
rb_observer = replay_buffer.add_batch
# create batch dataset
dataset = replay_buffer.as_dataset(
sample_batch_size=batch_size,
num_steps = 2,
single_deterministic_pass=False).prefetch(3)
# create callable function for actor
experience_dataset_fn = lambda: dataset
# create random policy for buffer init
random_policy = random_py_policy.RandomPyPolicy(collect_env.time_step_spec(),
collect_env.action_spec())
# create initalizer
init_buffer_actor = actor.Actor(
collect_env,
random_policy,
train_step,
steps_per_run=initial_collect_steps,
observers=[replay_buffer.add_batch])
# initialize buffer with random samples
init_buffer_actor.run()
(The approach is using the OpenAI Gym Env as well as the corresponding wrapper functions)
I worked with keras-rl2 and tf-agents without actor<->learner for other atari games to create the DQN and both worked quite well afer a some adaptions. I guess my current code will also work after a few adaptions in the tf-agent libary functions, but that would obviate the purpose of the libary.
My current assumption: The actor<->learner methods are not able to work with the TFUniformReplayBuffer (as I expect them to), due to the missing support of the TFPyEnvironment - or I still have some knowledge shortcomings regarding this tf-agents approach
Previous (successful) attempt:
from tf_agents.environments.tf_py_environment import TFPyEnvironment
tf_collect_env = TFPyEnvironment(collect_env)
init_driver = DynamicStepDriver(
tf_collect_env,
random_policy,
observers=[replay_buffer.add_batch],
num_steps=200)
init_driver.run()
I would be very grateful if someone could explain me what I'm overseeing here.
I fixed it...partly, but the next error is (in my opinion) an architectural problem.
The problem is that the Actor/Learner setup is build on a PyEnvironment whereas the
TFUniformReplayBuffer is using the TFPyEnvironment which ends up in the failure above...
Using the PyUniformReplayBuffer with a converted py-spec solved this problem.
from tf_agents.specs import tensor_spec
# convert agent spec to py-data-spec
py_collect_data_spec = tensor_spec.to_array_spec(agent.collect_data_spec)
# create replay buffer based on the py-data-spec
replay_buffer = py_uniform_replay_buffer.PyUniformReplayBuffer(
data_spec= py_collect_data_spec,
capacity=replay_capacity*batch_size
)
This snippet solved the issue of having an incompatible buffer in the background but ends up in another issue
--> The add_batch function does not work
I found this approach which advises to use either a batched environment or to make the following adaptions for the replay observer (add_batch method).
from tf_agents.utils.nest_utils import batch_nested_array
#********* Adpations add_batch method - START *********#
rb_observer = lambda x: replay_buffer.add_batch(batch_nested_array(x))
#********* Adpations add_batch method - END *********#
# create batch dataset
dataset = replay_buffer.as_dataset(
sample_batch_size=32,
single_deterministic_pass=False)
experience_dataset_fn = lambda: dataset
This helped me to solve the issue regarding this post but now I run into another problem where I need to ask someone of the tf-agents-team...
--> It seems that the Learner/Actor structure is no able to work with another buffer than the ReverbBuffer, because the data-spec which is processed by the PyUniformReplayBuffer sets up a wrong buffer structure...
For anyone who has the same problem: I just created this Github-Issue report to get further answers and/or fix my lack of knowledge.
the full fix is shown below...
--> The dimensionality issue was valid and should indicate the the (uploaded) batched samples are not in the correct shape
--> This issue happens due to the fact that the "add_batch" method loads values with the wrong shape
rb_observer = replay_buffer.add_batch
Long story short, this line should be replaced by
rb_observer = lambda x: replay_buffer.add_batch(batch_nested_array(x))
--> Afterwards the (replay buffer) inputs are of correct shape and the Learner Actor Setup starts training.
The full replay buffer is shown below:
# create buffer for storing experience
replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
agent.collect_data_spec,
1,
max_length=1000000)
# create batch dataset
dataset = replay_buffer.as_dataset(
sample_batch_size=32,
num_steps = 2,
single_deterministic_pass=False).prefetch(4)
# create batched nested array input for rb_observer
rb_observer = lambda x: replay_buffer.add_batch(batch_nested_array(x))
# create batched readout of dataset
experience_dataset_fn = lambda: dataset
I would like to convert the ActorDistributionModel from a trained PPOClipAgent into a Tensorflow Lite model for deployment. How should I accomplish this?
I have tried following this tutorial (see section at bottom converting policy to TFLite), but the network outputs a single action (the policy) rather than the density function over actions that I desire.
I think perhaps something like this could work:
tf.compat.v2.saved_model.save(actor_net, saved_model_path, signature=?)
... if I knew how to set the signature parameter. That line of code executes without error when I omit the signature parameter, but I get the following error on load (I assume because the signature is not set up correctly):
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
File "/home/ais/salesmentor.ai/MDPSolver/src/solver/ppo_budget.py", line 336, in train_eval
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
File "/home/ais/.local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py", line 1275, in from_saved_model
raise ValueError("Only support a single signature key.")
ValueError: Only support a single signature key.
This appears to work. I won't accept the answer until I have completed an end-to-end test, though.
def export_model(actor_net, observation_spec, saved_model_path):
predict_signature = {
'action_pred':
tf.function(func=lambda x: actor_net(x, None, None)[0].logits,
input_signature=(tf.TensorSpec(shape=observation_spec.shape),)
)
}
tf.saved_model.save(actor_net, saved_model_path, signatures=predict_signature)
# Convert to TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path,
signature_keys=["action_pred"])
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
tflite_policy = converter.convert()
with open(os.path.join(saved_model_path, 'policy.tflite'), 'wb') as f:
f.write(tflite_policy)
The solution wraps the actor_net in a lambda because I was unable to figure out how to specify the signature with all three expected arguments. Through the lambda, I convert the function into using a single argument (a tensor). I expect to pass None to the other two arguments in my use case, so there is nothing lost in this approach.
I see you using CartPole as the model simulation, Agent DQN, and Model learning and Evaluation from links provided TF-Agent Checkpointer. For simple understanding, you need to understand about the distributions and your model limits ( less than 6 actions determining at a time ).
Discretes Distribution, answer the question to the points but the links is how they implement AgentDQN on TF- Agent.
temp = tf.random.normal([10], 1, 0.2, tf.float32), mean is one and the standard deviation is 0.2. Overall of result summation product is nearby one and its variance is 0.2, when they have 10 actions to determine the possibility of the result is the same action is 1 from 5 or 0.5. random normal
Coefficient is ladder steps or you understand as IF and ELSE conditions or SWITCH conditions such as at the gap of 0 to 5, 5 to 10, 10 to 15, and continue.
The matrixes product from the Matrix coefficients and randoms is selected 4 - 5 actions sorted by priority, significant and select the most effects in rows.
The ArgMax is 0 to 9 which is actions 0 - 9 that respond to the environment input co-variances.
Sample: To the points, random distributions and selective agents ( we call selective agent maybe the questioner has confused with NN DQN )
temp = tf.random.normal([10], 1, 0.2, tf.float32)
temp = np.asarray(temp) * np.asarray([ coefficient_0, coefficient_1, coefficient_2, coefficient_3, coefficient_4, coefficient_5, coefficient_6, coefficient_7, coefficient_8, coefficient_9 ])
temp = tf.nn.softmax(temp)
action = int(np.argmax(temp))
I'm trying to use Tensorflow to Machine Learning to analyze an image and return the probability if is positive or negative based on a model created (extension .h5). I couldn't found a documentation exactly for that, or repository, so even a link to read will be awesome.
Link for the application: https://share.streamlit.io/felipelx/hackathon/IDC_Detector.py
Libraries that I'm trying to use.
import numpy as np
import streamlit as st
import tensorflow as tf
from keras.models import load_model
The function to load the model.
#st.cache(allow_output_mutation=True)
def loadIDCModel():
model_idc = load_model('models/IDC_model.h5', compile=False)
model_idc.summary()
return model_idc
The function to work the image, and what I'm trying to see: model.predict - I can see but is not updating the %, independent of the image the value is always the same.
if uploaded_file is not None:
# transform image to numpy array
file_bytes = tf.keras.preprocessing.image.load_img(uploaded_file, target_size=(96,96), grayscale = False, interpolation = 'nearest', color_mode = 'rgb', keep_aspect_ratio = False)
c.image(file_bytes, channels="RGB")
Genrate_pred = st.button("Generate Prediction")
if Genrate_pred:
model = loadMetModel()
input_arr = tf.keras.preprocessing.image.img_to_array(file_bytes)
input_arr = np.array([input_arr])
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])
prediction = probability_model.predict(input_arr)
dict_pred = {0: 'Benigno/Normal', 1: 'Maligno'}
result = dict_pred[np.argmax(prediction)]
value = 0
if result == 'Benigno/Normal':
value = str(((prediction[0][0])*100).round(2)) + '%'
else:
value = str(((prediction[0][1])*100).round(2)) + '%'
c.metric('Predição', result, delta=value, delta_color='normal')
Thank you in advance to any help.
The first thing I'm noticing is that your function for loading the model is named loadIDCModel, but then the function you call for loading the model is loadMetModel. When I check your source code, though, it looks like you've already addressed this issue. I'd recommend updating your question to reflect this.
Playing around with your application, I think the issue is your model itself. I tried various images — images containing carcinomas, and even a picture of a cat — and each gave me a probability around 73%. The lowest score I got was 72.74%, and the highest was 73.11% (this one was the cat). It seems that the output percentage is varying slightly, hinting that rather than something being wrong in the code, your model itself is likely at fault. You might need to retrain your model, as it seems to have learned to always return a value of approximately 0.73.
Is there any way in federated-tensorflow to make clients train the model for multiple epochs on their dataset? I found on the tutorials that a solution could be modifying the dataset by running dataset.repeat(NUMBER_OF_EPOCHS), but why should I modify the dataset?
The tf.data.Dataset is the TF2 way of setting this up. It maybe useful to think about the code as modifying the "data pipeline" rather than the "dataset" itself.
https://www.tensorflow.org/guide/data and particularly the section https://www.tensorflow.org/guide/data#processing_multiple_epochs can be useful pointers.
At a high-level, the tf.data API sets up a stream of examples. Repeats (multiple epochs) of that stream can be configured as well.
dataset = tf.data.Dataset.range(5)
for x in dataset:
print(x) # prints 0, 1, 2, 3, 4 on separate lines.
repeated_dataset = dataset.repeat(2)
for x in repeated_dataset:
print(x) # same as above, but twice
shuffled_repeat_dataset = dataset.shuffle(
buffer_size=5, reshuffle_each_iteration=True).repeat(2)
for x in repeated_dataset:
print(x) # same as above, but twice, with different orderings.
I am working on a personal machine learning project where I am attempting to classify data into binary classes when the classes are extremely imbalanced. I am initially trying to implement the approach proposed in Hierarchical Sampling for Active Learning by S Dasgupta which exploits the cluster structure of the dataset to aide the active learner.
However, I am struggling to implement the algorithm proposed in the article. I have written this so far, however am unsure how to continue:
from scipy.spatial.distance import pdist, squareform
from scipy.cluster.hierarchy import linkage, dendrogram
data_dist = pdist(X) # computing the distance
data_link = linkage(data_dist) # computing the linkage
The data is stored in X and the correct clasification in y. Sample dataset:
X = np.array([[0.3,0.7],[0.5,0.5] ,[0.2,0.8], [0.1,0.9]])
y = np.array([[0], [1], [0], [1]])
(Note the actual dataset is approximately 500 times larger)
Hierarchical Sampling for Active Learning as proposed by S Dasgupta is now implemented in libact, a Python active learning libary. For the source code, see this link.
Example (from doccumentation):
from libact.query_strategies import UncertaintySampling
from libact.query_strategies.multiclass import HierarchicalSampling
sub_qs = UncertaintySampling(
dataset, method='sm', model=SVM(decision_function_shape='ovr'))
qs = HierarchicalSampling(
dataset, # Dataset object
dataset.get_num_of_labels(),
active_selecting=True,
subsample_qs=sub_qs
)