I follow the official tutotial from microsoft: https://learn.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-score-model-predict-spark-pool
When I execute:
#Bind model within Spark session
model = pcontext.bind_model(
return_types=RETURN_TYPES,
runtime=RUNTIME,
model_alias="Sales", #This alias will be used in PREDICT call to refer this model
model_uri=AML_MODEL_URI, #In case of AML, it will be AML_MODEL_URI
aml_workspace=ws #This is only for AML. In case of ADLS, this parameter can be removed
).register()
I got : No module named 'azureml.automl'
My Notebook
As per the repro from my end, the above code which you have shared works as excepted and I don't see any error message which you are experiencing.
I had even tested the same code on the newly created Apache spark 3.1 runtime and it works as expected.
I would request you to create a new cluster and see if you are able to run the above code.
I solved it. In my case it works best like this:
Imports
#Import libraries
from pyspark.sql.functions import col, pandas_udf,udf,lit
from notebookutils.mssparkutils import azureML
from azureml.core import Workspace, Model
from azureml.core.authentication import ServicePrincipalAuthentication
from azureml.core.model import Model
import joblib
import pandas as pd
ws = azureML.getWorkspace("AzureMLService")
spark.conf.set("spark.synapse.ml.predict.enabled","true")
Predict function
def forecastModel():
model_path = Model.get_model_path(model_name="modelName", _workspace=ws)
modeljob = joblib.load(model_path + "/model.pkl")
validation_data = spark.read.format("csv") \
.option("header", True) \
.option("inferSchema",True) \
.option("sep", ";") \
.load("abfss://....csv")
validation_data_pd = validation_data.toPandas()
predict = modeljob.forecast(validation_data_pd)
return predict
Related
I'm running code to train a PPO policy on chess using PettingZoo:
import gym.vector.utils
import supersuit as ss
import stable_baselines3.ppo
import pettingzoo.classic
if __name__ == '__main__':
env = original_env = pettingzoo.classic.chess_v5.env()
env = pettingzoo.utils.turn_based_aec_to_parallel(env)
env = ss.pettingzoo_env_to_vec_env_v1(env)
env = ss.concat_vec_envs_v1(env, 8, num_cpus=4, base_class='stable_baselines3')
model = stable_baselines3.PPO(stable_baselines3.ppo.MultiInputPolicy, env,
tensorboard_log='my_logs')
model.learn(total_timesteps=100)
In the next to last line, you can see I'm outputting logs to TensorBoard, where I hope to see a nice graph. However, all I see is this:
I've used TensorBoard before and it worked. Why isn't it showing any progress now? Or even lack of progress?
Turns out I just needed to use a lower value for n_steps.
I am learning how to use Pyomo in Google Colab and I created an Abstract model, but I dont know the coding to read the data file and solve the model. The documentation gives instructions about the command prompt but it is not the case as I am working with Google Colab.
I will highly appreaciate your help.
!pip install pyomo
from pyomo.environ import *
import matplotlib.pyplot as plt
!wget -N -q "https://ampl.com/dl/open/ipopt/ipopt-linux64.zip"
!unzip -o -q ipopt-linux64
model = AbstractModel()
model.x = Var(bounds=(0,1.2), within=Reals)
model.obj1 = Objective(expr=model.x**2, sense=maximize)
#opt = SolverFactory('ipopt')
opt=SolverFactory('ipopt', executable='/content/ipopt')
instance = model.create_instance()
results = opt.solve(instance) # solves and updates instance
print('OF= ',value(instance.obj1))
In TensorFlow examples, I can see URLs to download the csv format of the dataset.
For example,
Iris- https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv
Titanic- https://storage.googleapis.com/tf-datasets/titanic/train.csv
However, I can't find the URL for every dataset in TensorFlow that are listed over her. (https://www.tensorflow.org/datasets/catalog/overview).
you don't need the URLs. Tensorflow datasets are already ready to use. check out the tutorial here tfds guide
For titanic, it is available here titanic structured dataset
Hope this would help :)
TensorFlow Datasets is having a collection of ready-to-use datasets.
loaded from tfds - "Dataset downloaded and prepared to /root/tensorflow_datasets/iris/2.0.0. Subsequent calls will reuse this data. "- really covinient... but if you'd better take dataset from url (see here - pipelines are convinient):
# https://www.tensorflow.org/guide/data#consuming_csv_data
import tensorflow as tf
import pandas as pd
# test_file = tf.keras.utils.get_file("temperature.csv", "https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv")
titanic_file = tf.keras.utils.get_file("train.csv", "https://storage.googleapis.com/tf-datasets/titanic/train.csv")
df = pd.read_csv(titanic_file)
df.head()
# make dataset from pandas:
myDataset = tf.data.Dataset.from_tensor_slices(dict(df))
for feature_batch in myDataset.take(1):
for key, value in feature_batch.items():
print(" {!r:20s}: {}".format(key, value))
titanic_lines = tf.data.TextLineDataset(titanic_file)
for line in titanic_lines.take(10):
print(line.numpy())
here are different Datasets & Flows also
I'm trying to put together a demo of Neptune using Neptune workbench, but something's not working right. I've got this block set up:
from __future__ import print_function # Python 2/3 compatibility
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
graph = Graph()
cluster_url = #my cluster
remoteConn = DriverRemoteConnection( f'wss://{cluster_url}:8182/gremlin','g')
g = graph.traversal().withRemote(remoteConn)
import uuid
tmp = uuid.uuid4()
tmp_id=str(id)
def get_id(name):
uid = uuid.uuid5(uuid.NAMESPACE_DNS, f"{name}.licensing.company.com")
return str(uid)
def add_sku(name):
tmp_id = get_id(name)
g.addV('SKU').property('id', tmp_id, 'name', name)
return name
def get_values():
return g.V().properties().toList()
The problem is that calling add_sku doesn't result in a vertex being added to the graph. Doing the same operation in a cell with gremlin magic works, and I can retrieve values through python, but I can't add vertices. Does anyone see what I'm missing here?
The Python code is not working because it is missing a terminal step (next() or iterate()) on the end of it which forces it to evaluate. If you add the terminal step it should work:
g.addV('SKU').property('id', tmp_id, 'name', name).next()
I am trying to use SciBERT pre-trained model, namely: scibert-scivocab-uncased the following way:
!pip install pytorch-pretrained-bert
import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
import logging
import matplotlib.pyplot as plt
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
segments_ids = [1] * len(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
model = BertModel.from_pretrained('/Users/.../Downloads/scibert_scivocab_uncased-3.tar.gz')
And I get the following error:
EOFError: Compressed file ended before the end-of-stream marker was reached
I downloaded the file from the website (https://github.com/allenai/scibert)
I converted it from "tar" to gzip
Nothing worked.
Any hint on how to approach this?
Thank you!
In the new version of pytorch-pretrained-BERT i.e. in transformers, you can do the following to load a pretrained model after you un-tar:
import AutoModelForTokenClassification, AutoTokenizer
model = AutoModelForTokenClassification.from_pretrained("/your/local/path/to/scibert_scivocab_uncased")
Need to unzip the package and rename the json file to config.json
Then just address the folder pathname where you have unzipped the package. It should work