Module aliasing in Julia - module

In python you can do something like this to let you use a shortened module name:
>>> import tensorflow as tf
From then on you can refer to tf, instead of having to type tensorflow everywhere.
Is something like this possible in Juila?

Yep, you can just assign the module to a new name.
import JSON
const J = JSON
J.print(Dict("Hello, " => "World!"))
I highly recommend to use the const because otherwise there will be a performance penalty. (With the const, there is no performance penalty.)

Julia now supports renaming with as.

I'm pushing the practical snippet #Alan don't mentioned:
At MyMod.jl:
module MyModule
plus(x, y) = x + y
myfield = 0
end
At Main.jl:
# Only if you're including the module from another file.
include("MyMod.jl")
import .MyModule as mymod
println(mymod.myfield)
println(mymod.plus(1, 1))
Main resources:
Docs https://docs.julialang.org/en/v1/manual/modules/.
Issue https://github.com/JuliaLang/julia/issues/1255.
Commit https://github.com/JuliaLang/julia/commit/40fee86878068d5c7c833655b7637820fb2cba33.

Related

Synapse Analytics Auto ML Predict No module named 'azureml.automl'

I follow the official tutotial from microsoft: https://learn.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-score-model-predict-spark-pool
When I execute:
#Bind model within Spark session
model = pcontext.bind_model(
return_types=RETURN_TYPES,
runtime=RUNTIME,
model_alias="Sales", #This alias will be used in PREDICT call to refer this model
model_uri=AML_MODEL_URI, #In case of AML, it will be AML_MODEL_URI
aml_workspace=ws #This is only for AML. In case of ADLS, this parameter can be removed
).register()
I got : No module named 'azureml.automl'
My Notebook
As per the repro from my end, the above code which you have shared works as excepted and I don't see any error message which you are experiencing.
I had even tested the same code on the newly created Apache spark 3.1 runtime and it works as expected.
I would request you to create a new cluster and see if you are able to run the above code.
I solved it. In my case it works best like this:
Imports
#Import libraries
from pyspark.sql.functions import col, pandas_udf,udf,lit
from notebookutils.mssparkutils import azureML
from azureml.core import Workspace, Model
from azureml.core.authentication import ServicePrincipalAuthentication
from azureml.core.model import Model
import joblib
import pandas as pd
ws = azureML.getWorkspace("AzureMLService")
spark.conf.set("spark.synapse.ml.predict.enabled","true")
Predict function
def forecastModel():
model_path = Model.get_model_path(model_name="modelName", _workspace=ws)
modeljob = joblib.load(model_path + "/model.pkl")
validation_data = spark.read.format("csv") \
.option("header", True) \
.option("inferSchema",True) \
.option("sep", ";") \
.load("abfss://....csv")
validation_data_pd = validation_data.toPandas()
predict = modeljob.forecast(validation_data_pd)
return predict

Can't add vertex with python in neptune workbench

I'm trying to put together a demo of Neptune using Neptune workbench, but something's not working right. I've got this block set up:
from __future__ import print_function # Python 2/3 compatibility
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
graph = Graph()
cluster_url = #my cluster
remoteConn = DriverRemoteConnection( f'wss://{cluster_url}:8182/gremlin','g')
g = graph.traversal().withRemote(remoteConn)
import uuid
tmp = uuid.uuid4()
tmp_id=str(id)
def get_id(name):
uid = uuid.uuid5(uuid.NAMESPACE_DNS, f"{name}.licensing.company.com")
return str(uid)
def add_sku(name):
tmp_id = get_id(name)
g.addV('SKU').property('id', tmp_id, 'name', name)
return name
def get_values():
return g.V().properties().toList()
The problem is that calling add_sku doesn't result in a vertex being added to the graph. Doing the same operation in a cell with gremlin magic works, and I can retrieve values through python, but I can't add vertices. Does anyone see what I'm missing here?
The Python code is not working because it is missing a terminal step (next() or iterate()) on the end of it which forces it to evaluate. If you add the terminal step it should work:
g.addV('SKU').property('id', tmp_id, 'name', name).next()

NameError: global name 'linear' is not defined

I am trying to run an implementation of attention mechanism by Google DeepMind. However it is based on an older version of TensorFlow and I am getting this error
from tensorflow.models.rnn.rnn_cell import RNNCell, linear
concat = linear([inputs, h, self.c], 4 * self._num_units, True)
NameError: global name 'linear' is not defined
I couldn't find the linear model/function in the new tensorflow documentation. Can anyone help me with this? Thanks!
You need to use the tf.nn.rnn_cell._linear function to make the code work. Have a look at this tutorial for a sample usage.

nesting openmdao "assemblies"/drivers - working from a 0.13 analogy, is this possible to implement in 1.X?

I am using NREL's DAKOTA_driver openmdao plugin for parallelized Monte Carlo sampling of a model. In 0.X, I was able to nest assemblies, allowing an outer optimization driver to direct the DAKOTA_driver sampling evaluations. Is it possible for me to nest this setup within an outer optimizer? I would like the outer optimizer's workflow to call the DAKOTA_driver "assembly" then the get_dakota_output component.
import pandas as pd
import subprocess
from subprocess import call
import os
import numpy as np
from dakota_driver.driver import pydakdriver
from openmdao.api import IndepVarComp, Component, Problem, Group
from mpi4py import MPI
import sys
from itertools import takewhile
sigm = .005
n_samps = 20
X_bar=[0.065 , sigm] #2.505463e+03*.05]
dacout = 'dak.sout'
class get_dak_output(Component):
mean_coe = 0
def execute(self):
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
nam ='ape.net_aep'
csize = 10000
with open(dacout) as f:
for i,l in enumerate(f):
pass
numlines = i
dakchunks = pd.read_csv(dacout, skiprows=0, chunksize = csize, sep='there_are_no_seperators')
linespassed = 0
vals = []
for dchunk in dakchunks:
for line in dchunk.values:
linespassed += 1
if linespassed < 49 or linespassed > numlines - 50: continue
else:
split_line = ''.join(str(s) for s in line).split()
if len(split_line)==2:
if (len(split_line) != 2 or
split_line[0] in ('nan', '-nan') or
split_line[1] != nam):
continue
else:vals.append(float(split_line[0]))
self.coe_vals = sorted(vals)
self.mean_coe = np.mean(self.coe_vals)
class ape(Component):
def __init__(self):
super(ape, self).__init__()
self.add_param('x', val=0.0)
self.add_output('net_aep', val=0.0)
def solve_nonlinear(self, params, unknowns, resids):
print 'hello'
x = params['x']
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
outp = subprocess.check_output("python test/exampleCall.py %f"%(float(x)),
shell=True)
unknowns['net_aep'] = float(outp.split()[-1])
top = Problem()
root = top.root = Group()
root.add('ape', ape())
root.add('p1', IndepVarComp('x', 13.0))
root.connect('p1.x', 'ape.x')
drives = pydakdriver(name = 'top.driver')
drives.UQ('sampling', use_seed=False)
#drives.UQ()
top.driver = drives
#top.driver = ScipyOptimizer()
#top.driver.options['optimizer'] = 'SLSQP'
top.driver.add_special_distribution('p1.x','normal', mean=0.065, std_dev=0.01, lower_bounds=-50, upper_bounds=50)
top.driver.samples = n_samps
top.driver.stdout = dacout
#top.driver.add_desvar('p2.y', lower=-50, upper=50)
#top.driver.add_objective('ape.f_xy')
top.driver.add_objective('ape.net_aep')
top.setup()
top.run()
bak = get_dak_output()
bak.execute()
print('\n')
print('E(aep) is %f'%bak.mean_coe)
There are two different options for this situation. Both will work in parallel, and both can be currently supported. But only one of them will work when you want to use analytic derivatives:
1) Nested Problems: You create one problem class that has a DOE driver in it. You pass the list of cases you want run into that driver, and it runs them in parallel. Then you put that problem into a parent problem as a component.
The parent problem doesn't know that it has a sub-problem. It just thinks it has a single component that uses multiple processors.
This is the most similar way to how you would have done it in 0.x. However I don't recommend going this route because it won't work if you want to use ever want to use analytic derivatives.
If you use this way, the dakota driver can stay pretty much as is. But you'll have to use a special sub-problem class. This isn't an officially supported feature yet, but its very doable.
2) Using a multi-point approach, you would create a Group class that represent your model. You would then create one instance of that group for each monte-carlo run you want to do. You put all of these instances into a parallel group inside your overall problem.
This approach avoids the sub-problem messiness. Its also much more efficient for actual execution. It will have a somewhat greater setup-cost than the first method. But in my opinion its well worth the one time setup cost to get the advantage of analytic derivatives. The only issue is that it would probably require some changes to the way the dakota_driver works. You would want to get a list of evaluations from the driver, then hand them them out to the individual children groups.

How to Reload a Python3 C extension module?

I wrote a C extension (mycext.c) for Python 3.2. The extension relies on constant data stored in a C header (myconst.h). The header file is generated by a Python script. In the same script, I make use of the recently compiled module. The workflow in the Python3 myscript (not shown completely) is as follows:
configure_C_header_constants()
write_constants_to_C_header() # write myconst.h
os.system('python3 setup.py install --user') # compile mycext
import mycext
mycext.do_stuff()
This works perfectly fine the in a Python session for the first time. If I repeat the procedure in the same session (for example, in two different testcases of a unittest), the first compiled version of mycext is always (re)loaded.
How do I effectively reload a extension module with the latest compiled version?
You can reload modules in Python 3.x by using the imp.reload() function. (This function used to be a built-in in Python 2.x. Be sure to read the documentation -- there are a few caveats!)
Python's import mechanism will never dlclose() a shared library. Once loaded, the library will stay until the process terminates.
Your options (sorted by decreasing usefulness):
Move the module import to a subprocess, and call the subprocess again after recompiling, i.e. you have a Python script do_stuff.py that simply does
import mycext
mycext.do_stuff()
and you call this script using
subprocess.call([sys.executable, "do_stuff.py"])
Turn the compile-time constants in your header into variables that can be changed from Python, eliminating the need to reload the module.
Manually dlclose() the library after deleting all references to the module (a bit fragile since you don't hold all the references yourself).
Roll your own import mechanism.
Here is an example how this can be done. I wrote a minimal Python C extension mini.so, only exporting an integer called version.
>>> import ctypes
>>> libdl = ctypes.CDLL("libdl.so")
>>> libdl.dlclose.argtypes = [ctypes.c_void_p]
>>> so = ctypes.PyDLL("./mini.so")
>>> so.PyInit_mini.argtypes = []
>>> so.PyInit_mini.restype = ctypes.py_object
>>> mini = so.PyInit_mini()
>>> mini.version
1
>>> del mini
>>> libdl.dlclose(so._handle)
0
>>> del so
At this point, I incremented the version number in mini.c and recompiled.
>>> so = ctypes.PyDLL("./mini.so")
>>> so.PyInit_mini.argtypes = []
>>> so.PyInit_mini.restype = ctypes.py_object
>>> mini = so.PyInit_mini()
>>> mini.version
2
You can see that the new version of the module is used.
For reference and experimenting, here's mini.c:
#include <Python.h>
static struct PyModuleDef minimodule = {
PyModuleDef_HEAD_INIT, "mini", NULL, -1, NULL
};
PyMODINIT_FUNC
PyInit_mini()
{
PyObject *m = PyModule_Create(&minimodule);
PyModule_AddObject(m, "version", PyLong_FromLong(1));
return m;
}
there is another way, set a new module name, import it, and change reference to it.
Update: I have now created a Python library around this approach:
https://github.com/bergkvist/creload
https://pypi.org/project/creload/
Rather than using the subprocess module in Python, you can use multiprocessing. This allows the child process to inherit all of the memory from the parent (on UNIX-systems).
For this reason, you also need to be careful not to import the C extension module into the parent.
If you return a value that depends on the C extension, it might also force the C extension to become imported in the parent as it receives the return-value of the function.
import multiprocessing as mp
import sys
def subprocess_call(fn, *args, **kwargs):
"""Executes a function in a forked subprocess"""
ctx = mp.get_context('fork')
q = ctx.Queue(1)
is_error = ctx.Value('b', False)
def target():
try:
q.put(fn(*args, **kwargs))
except BaseException as e:
is_error.value = True
q.put(e)
ctx.Process(target=target).start()
result = q.get()
if is_error.value:
raise result
return result
def my_c_extension_add(x, y):
assert 'my_c_extension' not in sys.modules.keys()
# ^ Sanity check, to make sure you didn't import it in the parent process
import my_c_extension
return my_c_extension.add(x, y)
print(subprocess_call(my_c_extension_add, 3, 4))
If you want to extract this into a decorator - for a more natural feel, you can do:
class subprocess:
"""Decorate a function to hint that it should be run in a forked subprocess"""
def __init__(self, fn):
self.fn = fn
def __call__(self, *args, **kwargs):
return subprocess_call(self.fn, *args, **kwargs)
#subprocess
def my_c_extension_add(x, y):
assert 'my_c_extension' not in sys.modules.keys()
# ^ Sanity check, to make sure you didn't import it in the parent process
import my_c_extension
return my_c_extension.add(x, y)
print(my_c_extension_add(3, 4))
This can be useful if you are working in a Jupyter notebook, and you want to rerun some function without rerunning all your existing cells.
Notes
This answer might only be relevant on Linux/macOS where you have a fork() system call:
Python multiprocessing linux windows difference
https://rhodesmill.org/brandon/2010/python-multiprocessing-linux-windows/