How do I set region for BigQuery ML.FORECAST(? - google-bigquery

Using BigQuery ML in a local jupyter notebook (%%bigquery), I create a model in an EU dataset:
CREATE OR REPLACE MODEL mydataset.mymodel...
It evaluates fine:
ML.EVALUATE(MODEL mydataset.mymodel)...
but when I try to predict:
ML.FORECAST(MODEL mydataset.mymodel,...
I get:
Dataset myproject:mydataset was not found in location US
Why is FORECAST so xenophobic and how can I make it right?

I got rid of %%bigquery and instead used
client = bigquery.Client(location=REGION)

Try the below code. I guess your project path is missing.
CREATE OR REPLACE MODEL myproject.mydataset.mymodel...
ML.EVALUATE(MODEL myproject.mydataset.mymodel)...
ML.FORECAST(MODEL myproject.mydataset.mymodel,...

Related

Pyomo dynamic optimisation ERROR: variable that is not attached to an active block on the submodel being written

I am trying to set up a model for a dynamic optimisation problem with pyomo.DAE.
I defined my state variable as well as it's corresponding Derivative (both indexed by m.Time). I then set up a simple constraint that expresses the relationship between state and derivative variable in the most simple terms. Solving the problem with a dummy objective (so just testing the constraint), I get the following error:
ERROR: Model contains an expression (calc_my_state[0])
that contains a variable (derivative_var[0]) that is not
attached to an active block on the submodel being written
Here's an excerpt of what I wrote:
(.....)
m.state_var = Var(m.Time, initialize=0)
m.derivative_var = DerivativeVar(m.state_var, wrt=m.Time)
def calc_my_state(m,i):
return m.derivative_var[i] == m.state_var[i]*2
m.calc_my_state = Constraint(m.Time, rule=calc_my_state)
m.obj = Objective(expr=1)
opt = SolverFactory("glpk")
results = opt.solve(m)
I tried to reproduce the simple setup of an DAE in pyomo, more or less copied and pasted lines from the pyomoDAE docu.
I printed derivative_var.get_state_var() and it gives me the right state variable without error.
I also tried solving simple DAE examples that I found on the internet and solving them with my solver settings worked fine as well.
What am I missing? I am grateful for any input!!! Thanks!
I found the missing link: I did not specify the "Discretization Transformation". Once something like the following was added, the script ran without error!
discretizer = TransformationFactory('dae.finite_difference')
discretizer.apply_to(m, wrt=m.Time)

Can we pass dataframes between different notebooks in databricks and sequentially run multiple notebooks? [duplicate]

I have a notebook which will process the file and creates a data frame in structured format.
Now I need to import that data frame created in another notebook, but the problem is before running the notebook I need to validate that only for some scenarios I need to run.
Usually to import all data structures, we use %run. But in my case it should be combinations of if clause and then notebook run
if "dataset" in path": %run ntbk_path
its giving an error " path not exist"
if "dataset" in path": dbutils.notebook.run(ntbk_path)
this one I cannot get all the data structures.
Can someone help me to resolve this error?
To implement it correctly you need to understand how things are working:
%run is a separate directive that should be put into the separate notebook cell, you can't mix it with the Python code. Plus, it can't accept the notebook name as variable. What %run is doing - it's evaluating the code from specified notebook in the context of the current Spark session, so everything that is defined in that notebook - variables, functions, etc. is available in the caller notebook.
dbutils.notebook.run is a function that may take a notebook path, plus parameters and execute it as a separate job on the current cluster. Because it's executed as a separate job, then it doesn't share the context with current notebook, and everything that is defined in it won't be available in the caller notebook (you can return a simple string as execution result, but it has a relatively small max length). One of the problems with dbutils.notebook.run is that scheduling of a job takes several seconds, even if the code is very simple.
How you can implement what you need?
if you use dbutils.notebook.run, then in the called notebook you can register a temp view, and caller notebook can read data from it (examples are adopted from this demo)
Called notebook (Code1 - it requires two parameters - name for view name & n - for number of entries to generate):
name = dbutils.widgets.get("name")
n = int(dbutils.widgets.get("n"))
df = spark.range(0, n)
df.createOrReplaceTempView(name)
Caller notebook (let's call it main):
if "dataset" in "path":
view_name = "some_name"
dbutils.notebook.run(ntbk_path, 300, {'name': view_name, 'n': "1000"})
df = spark.sql(f"select * from {view_name}")
... work with data
it's even possible to do something like with %run, but it could require a kind of "magic". The foundation of it is the fact that you can pass arguments to the called notebook by using the $arg_name="value", and you can even refer to the values specified in the widgets. But in any case, the check for value will happen in the called notebook.
The called notebook could look as following:
flag = dbutils.widgets.get("generate_data")
dataframe = None
if flag == "true":
dataframe = ..... create datarame
and the caller notebook could look as following:
------ cell in python
if "dataset" in "path":
gen_data = "true"
else:
gen_data = "false"
dbutils.widgets.text("gen_data", gen_data)
------- cell for %run
%run ./notebook_name $generate_data=$gen_data
------ again in python
dbutils.widgets.remove("gen_data") # remove widget
if dataframe: # dataframe is defined
do something with dataframe

Passing DataFrame from notebook to another with pyspark

i'am trying to call a DataFrame that i created in notebook1 to use it in my notebook2 in Databricks Community addition with pyspark and i tried this code dbutils.notebook.run("notebook1", 60, {"dfnumber2"})
but it shows this error.
py4j.Py4JException: Method _run([class java.lang.String, class java.lang.Integer, class java.util.HashSet, null, class java.lang.String]) does not exist
any help please?
The actual problem is that you pass last parameter ({"dfnumber2"}) incorrectly - with this syntax it's a set, not the map type. You need to use syntax: {"table_name": "dfnumber2"} to represent it as a dict/map.
But if you look into documentation of dbutils.notebook.run, you will see following phrase:
To implement notebook workflows, use the dbutils.notebook.* methods. Unlike %run, the dbutils.notebook.run() method starts a new job to run the notebook.
But jobs aren't supported on the Community Edition, so it won't work anyway.
Create a global temp view and pass the table name as argument to your next notebook.
Drnumber2.createOrReplaceGlobalTempView("dfnumber2")
dbutils.notebook.run("notebook1", 60, {table_name:"dfnumber2"})
In your notebook1 you can do
table_name= dbutils.widgets.get("table_name")
Dfnumber2 = spark.sql("select * from global_temp."+table_name)

How can I access value in sequence type?

There are the following attributes in client_output
weights_delta = attr.ib()
client_weight = attr.ib()
model_output = attr.ib()
client_loss = attr.ib()
After that, I made the client_output in the form of a sequence through
a = tff.federated_collect(client_output) and round_model_delta = tff.federated_map(selecting_fn,a)in here . and I declared
`
#tff.tf_computation() # append
def selecting_fn(a):
#TODO
return round_model_delta
in here. In the process of averaging on the server, I want to average the weights_delta by selecting some of the clients with a small loss value. So I try to access it via a.weights_delta but it doesn't work.
The tff.federated_collect returns a tff.SequenceType placed at tff.SERVER which you can manipulate the same way as for example client dataset is usually handled in a method decorated by tff.tf_computation.
Note that you have to use the tff.federated_collect operator in the scope of a tff.federated_computation. What you probably want to do[*] is pass it into a tff.tf_computation, using the tff.federated_map operator. Once inside the tff.tf_computation, you can think of it as a tf.data.Dataset object and everything in the tf.data module is available.
[*] I am guessing. More detailed explanation of what you would like to achieve would be helpful.

Query trig file with Sparql

I have a .trig file which I want to query without pushing to Jena Fuseki.
However when I try to load the model using:
Model model= FileManager.get().loadModel("filepath/demo.trig");
certain links in the original TRIG file are getting lost.
this is the code snippet:
FileManager.get().addLocatorClassLoader(RDFProject.class.getClassLoader());
Model model= FileManager.get().loadModel("filePath/demo.trig");
model.write(System.out);
Is there any alternate way to do this?
Use RDFDataMgr to load a dataset (not a model) and query that.
Dataset ds = RDFDataMgr.loadDataset("filepath/demo.trig");