pyspark RDDs strip attributes of numpy subclasses

pyspark RDDs strip attributes of numpy subclasses - numpy

I've been fighting an unexpected behavior when attempting to construct a subclass of numpy ndarray within a map call to a pyspark RDD. Specifically, the attribute that I added within the ndarray subclass appears to be stripped from the resulting RDD.
The following snippets contain the essence of the issue.
import numpy as np
class MyArray(np.ndarray):
def __new__(cls,shape,extra=None,*args):
obj = super().__new__(cls,shape,*args)
obj.extra = extra
return obj
def __array_finalize__(self,obj):
if obj is None:
return
self.extra = getattr(obj,"extra",None)
def shape_to_array(shape):
rval = MyArray(shape,extra=shape)
rval[:] = np.arange(np.product(shape)).reshape(shape)
return rval
If I invoke shape_to_array directly (not under pyspark), it behaves as expected:
x = shape_to_array((2,3,5))
print(x.extra)
outputs:
(2, 3, 5)
But, if I invoke shape_to_array via a map to an RDD of inputs, it goes wonky:
from pyspark.sql import SparkSession
sc = SparkSession.builder.appName("Steps").getOrCreate().sparkContext
rdd = sc.parallelize([(2,3,5),(2,4),(2,5)])
result = rdd.map(shape_to_array).cache()
print(result.map(lambda t:type(t)).collect())
print(result.map(lambda t:t.shape).collect())
print(result.map(lambda t:t.extra).collect())
Outputs:
[<class '__main__.MyArray'>, <class '__main__.MyArray'>, <class '__main__.MyArray'>]
[(2, 3, 5), (2, 4), (2, 5)]
22/10/15 15:48:02 ERROR Executor: Exception in task 7.0 in stage 2.0 (TID 23)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/Cellar/apache-spark/3.3.0/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 686, in main
process()
File "/usr/local/Cellar/apache-spark/3.3.0/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 678, in process
serializer.dump_stream(out_iter, outfile)
File "/usr/local/Cellar/apache-spark/3.3.0/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 273, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/usr/local/Cellar/apache-spark/3.3.0/libexec/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper
return f(*args, **kwargs)
File "/var/folders/w7/42_p7mcd1y91_tjd0jzr8zbh0000gp/T/ipykernel_94831/2519313465.py", line 1, in <lambda>
AttributeError: 'MyArray' object has no attribute 'extra'
What happened to the extra attribute of the MyArray instances?
Thanks much for any/all suggestions
EDIT: A bit of additional info. If I add logging inside the shape_to_array function just before the return, I can verify that the extra attribute does exist on the DataArray object that is being returned. But when I attempt to access the DataArray elements in the RDD from the main driver, they're gone.

After a night of sleeping on this, I remembered that I have often had issues with pyspark RDDs where the error message had to do the return type not working with pickle.
I wasn't getting that error message this time because numpy.ndarray does work with pickle. BUT... the __reduce__ and __setstate__ methods of numpy.ndarray known nothing of the added extra attribute on the MyArray subclass. This is where extra was being stripped.
Adding the following two methods to MyArray solved everything.
def __reduce__(self):
mthd,cls,args = super().__reduce__(self)
return mthd, cls, args + (self.extra,)
def __setstate__(self,args):
super().__setstate__(args[:-1])
self.extra = args[-1]
Thank you to anyone who took some time to think about my question.

Related

using pandas.read_csv, how can one process all errors, receive all non-error data?

Data which, for me, generates an exception instead of invoking the 'on_bad_lines' handler is at:
https://opencalaccess.org/misc/NAMES_CD.TSV
I have this:
bad_lines = list()
def bad_line_finder(x):
bad_lines.append(str(x))
return None
for file in os.listdir(dir):
bad_lines = list()
try:
for df in pd.read_csv(f"{dir}/{file}",
sep='\t',
on_bad_lines=bad_line_finder,
engine='python',
chunksize=1000):
print(f"\n{target}")
df.info()
print(f"Bad Lines: {bad_lines}")
bad_lines = list()
except:
print("EXCEPTION:")
traceback.print_exc()
and this works great. There are errors in the files and the method handles them so that I can keep track of them. Except, why do i still see this:
EXCEPTION:
Traceback (most recent call last):
File "/home/ray/Projects/opencalaccess-data/import.py", line 41, in <module>
for df in pd.read_csv(f"{dir}/{file}",
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1698, in __next__
return self.get_chunk()
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1810, in get_chunk
return self.read(nrows=size)
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/python_parser.py", line 250, in read
content = self._get_lines(rows)
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/python_parser.py", line 1114, in _get_lines
new_rows.append(next(self.data))
_csv.Error: ' ' expected after '"'
What is the "on_bad_lines" option doing if it does not handle all of the bad lines? Which of them will it handle and which will it not?
This is a government data source. There are format errors in the data that cannot be corrected by the agency, because they constitute the 0fficial record. So, I must fix them myself. But which of them throw exceptions and which do not?

keras.models.load_model() gives ValueError

I have saved the trained model and the weights as below.
model, history, score = fit_model(model, train_batches, val_batches, callbacks=[callback])
model.save('./model')
model.save_weights('./weights')
Then I tried to get the saved model as the following way
if __name__ == '__main__':
model = keras.models.load_model('./model', compile= False,custom_objects={"F1Score": tfa.metrics.F1Score})
test_batches, nb_samples = test_gen(dataset_test_path, 32, img_width, img_height)
predict, loss, acc = predict_model(model,test_batches, nb_samples)
print(predict)
print(acc)
print(loss)
But it gives me an error. What should I do to overcome this?
Traceback (most recent call last):
File "test_pro.py", line 34, in <module>
model = keras.models.load_model('./model',compile= False,custom_objects={"F1Score": tfa.metrics.F1Score})
File "/home/dcs2016csc007/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 212, in load_model
return saved_model_load.load(filepath, compile, options)
File "/home/dcs2016csc007/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py", line 138, in load
keras_loader.load_layers()
File "/home/dcs2016csc007/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py", line 379, in load_layers
self.loaded_nodes[node_metadata.node_id] = self._load_layer(
File "/home/dcs2016csc007/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py", line 407, in _load_layer
obj, setter = revive_custom_object(identifier, metadata)
File "/home/dcs2016csc007/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py", line 921, in revive_custom_object
raise ValueError('Unable to restore custom object of type {} currently. '
ValueError: Unable to restore custom object of type _tf_keras_metric currently. Please make sure that the layer implements `get_config`and `from_config` when saving. In addition, please use the `custom_objects` arg when calling `load_model()`.

Looking at the source code for Keras, the error is raised when trying to load a model with a custom object:
def revive_custom_object(identifier, metadata):
"""Revives object from SavedModel."""
if ops.executing_eagerly_outside_functions():
model_class = training_lib.Model
else:
model_class = training_lib_v1.Model
revived_classes = {
constants.INPUT_LAYER_IDENTIFIER: (
RevivedInputLayer, input_layer.InputLayer),
constants.LAYER_IDENTIFIER: (RevivedLayer, base_layer.Layer),
constants.MODEL_IDENTIFIER: (RevivedNetwork, model_class),
constants.NETWORK_IDENTIFIER: (RevivedNetwork, functional_lib.Functional),
constants.SEQUENTIAL_IDENTIFIER: (RevivedNetwork, models_lib.Sequential),
}
parent_classes = revived_classes.get(identifier, None)
if parent_classes is not None:
parent_classes = revived_classes[identifier]
revived_cls = type(
compat.as_str(metadata['class_name']), parent_classes, {})
return revived_cls._init_from_metadata(metadata) # pylint: disable=protected-access
else:
raise ValueError('Unable to restore custom object of type {} currently. '
'Please make sure that the layer implements `get_config`'
'and `from_config` when saving. In addition, please use '
'the `custom_objects` arg when calling `load_model()`.'
.format(identifier))
The method will only work fine with the custom objects of the types defined in revived_classes. As you can see, it currently only works with input layer, layer, model, network, and sequential custom objects.
In your code, you pass an tfa.metrics.F1Score class in the custom_objects argument, which is of type METRIC_IDENTIFIER, therefore, not supported (probably because it doesn't implement the get_config and from_config functions as the error output says):
keras.models.load_model('./model', compile=False, custom_objects={"F1Score": tfa.metrics.F1Score})
It's been a while since I last worked with Keras but maybe you can try and follow what was proposed in this other related answer and wrap the call to tfa.metrics.F1Score in a method. Something like this (adjust it to your needs):
def f1(y_true, y_pred):
metric = tfa.metrics.F1Score(num_classes=3, threshold=0.5)
metric.update_state(y_true, y_pred)
return metric.result()
keras.models.load_model('./model', compile=False, custom_objects={'f1': f1})

Pandas Making multiple HTTP requests

I have below code that reads from a csv file a number of ticker symbols into a dataframe.
Each ticker calls the Web Api returning a dafaframe df which is then attached to the last one until complete. The code works , but when a large number of tickers is used the code slows down tremendously. I understand I can use multiprocessing and threads to speed up my code but dont know where to start and what would be the most suited in my particular case.
What code should I use to get my data into a combined daframe in the fastest possible manner?
import pandas as pd
import numpy as np
import json
tickers=pd.read_csv("D:/verhuizen/pensioen/MULTI.csv",names=['symbol','company'])
read_str='https://financialmodelingprep.com/api/v3/income-statement/AAPL?limit=120&apikey=demo'
df = pd.read_json (read_str)
df = pd.DataFrame(columns=df.columns)
for ind in range(len(tickers)):
read_str='https://financialmodelingprep.com/api/v3/income-statement/'+ tickers['symbol'][ind] +'?limit=120&apikey=demo'
df1 = pd.read_json (read_str)
df=pd.concat([df,df1], ignore_index=True)
df.set_index(['date','symbol'], inplace=True)
df.sort_index(inplace=True)
df.to_csv('D:/verhuizen/pensioen/MULTI_out.csv')
The code provided works fine for smaller data sets, but when I use a large number of tickers (>4,000) at some point I get the below error. Is this because the web api gets overloaded or is there another problem?
Traceback (most recent call last):
File "D:/Verhuizen/Pensioen/Equity_Extractor_2021.py", line 43, in <module>
data = pool.starmap(download_data, enumerate(TICKERS, start=1))
File "C:\Users\MLUY\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "C:\Users\MLUY\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 657, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x00C33E30>'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object")'
Process finished with exit code 1
It keeps giving the same error (for a larger amount of tickers)
code is exactly as provided:
def download_data(pool_id, symbols):
df = []
for symbol in symbols:
print("[{:02}]: {}".format(pool_id, symbol))
#do stuff here
read_str = BASEURL.format(symbol)
df.append(pd.read_json(read_str))
#df.append(pd.read_json(fake_data(symbol)))
return pd.concat(df, ignore_index=True)
It failed again with the pool.map, but one strange thing I noticed. Each time it fails it does so around 12,500 tickers (total is around 23,000 tickers) Similar error:
Traceback (most recent call last):
File "C:/Users/MLUY/AppData/Roaming/JetBrains/PyCharmCE2020.1/scratches/Equity_naive.py", line 21, in <module>
data = pool.map(download_data, TICKERS)
File "C:\Users\MLUY\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\MLUY\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 657, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x078D1BF0>'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object")'
Process finished with exit code 1
I get the tickers also from a API call https://financialmodelingprep.com/api/v3/financial-statement-symbol-lists?apikey=demo (I noticed it does not work without subscription), I wanted to attach the data it as a csv file but I dont have sufficient rights. I dont think its a good idea to paste the returned data here...
I tried adding time.sleep(0.2) before return as suggested, but again I ge the same error at ticker 12,510. Strange everytime its around the same location. As there are multiple processes going on I cannot see at what point its breaking
Traceback (most recent call last):
File "C:/Users/MLUY/AppData/Roaming/JetBrains/PyCharmCE2020.1/scratches/Equity_naive.py", line 24, in <module>
data = pool.map(download_data, TICKERS)
File "C:\Users\MLUY\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\MLUY\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 657, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x00F32C90>'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object")'
Process finished with exit code 1
Something very very strange is going on , I have split the data in chunks of 10,000 / 5,000 / 4,000 and 2,000 and each time the code breaks approx 100 tickers from the end. Clearly there is something going on that not right
import time
import pandas as pd
import multiprocessing
# get tickers from your csv
df=pd.read_csv('D:/Verhuizen/Pensioen/All_Symbols.csv',header=None)
# setting the Dataframe to a list (in total 23,000 tickers)
df=df[0]
TICKERS=df.tolist()
#Select how many tickers I want
TICKERS=TICKERS[0:2000]
BASEURL = "https://financialmodelingprep.com/api/v3/income-statement/{}?limit=120&apikey=demo"
def download_data(symbol):
print(symbol)
# do stuff here
read_str = BASEURL.format(symbol)
df = pd.read_json(read_str)
#time.sleep(0.2)
return df
if __name__ == "__main__":
with multiprocessing.Pool(multiprocessing.cpu_count()) as pool:
data = pool.map(download_data, TICKERS)
df = pd.concat(data).set_index(["date", "symbol"]).sort_index()
df.to_csv('D:/verhuizen/pensioen/Income_2000.csv')
In this particular example the code breaks at position 1,903
RPAI
Traceback (most recent call last):
File "C:/Users/MLUY/AppData/Roaming/JetBrains/PyCharmCE2020.1/scratches/Equity_testing.py", line 27, in <module>
data = pool.map(download_data, TICKERS)
File "C:\Users\MLUY\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\MLUY\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 657, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x0793EAF0>'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object")'

First optimization is to avoid concatenate your dataframe at each iteration.
You can try something like that:
url = "https://financialmodelingprep.com/api/v3/income-statement/{}?limit=120&apikey=demo"
df = []
for symbol in tickers["symbol"]:
read_str = url.format(symbol)
df.append(pd.read_json(read_str))
df = pd.concat(df, ignore_index=True)
If it's not sufficient, we will see to use async, threading or multiprocessing.
Edit:
The code below can do the job:
import pandas as pd
import numpy as np
import multiprocessing
import time
import random
PROCESSES = 4 # number of parallel process
CHUNKS = 6 # one process handle n symbols
# get tickers from your csv
TICKERS = ["BCDA", "WBAI", "NM", "ZKIN", "TNXP", "FLY", "MYSZ", "GASX", "SAVA", "GCE",
"XNET", "SRAX", "SINO", "LPCN", "XYF", "SNSS", "DRAD", "WLFC", "OILD", "JFIN",
"TAOP", "PIC", "DIVC", "MKGI", "CCNC", "AEI", "ZCMD", "YVR", "OCG", "IMTE",
"AZRX", "LIZI", "ORSN", "ASPU", "SHLL", "INOD", "NEXI", "INR", "SLN", "RHE-PA",
"MAX", "ARRY", "BDGE", "TOTA", "PFMT", "AMRH", "IDN", "OIS", "RMG", "IMV",
"CHFS", "SUMR", "NRG", "ULBR", "SJI", "HOML", "AMJL", "RUBY", "KBLMU", "ELP"]
# create a list of n sublist
TICKERS = [TICKERS[i:i + CHUNKS] for i in range(0, len(TICKERS), CHUNKS)]
BASEURL = "https://financialmodelingprep.com/api/v3/income-statement/{}?limit=120&apikey=demo"
def fake_data(symbol):
dti = pd.date_range("1985", "2020", freq="Y")
df = pd.DataFrame({"date": dti, "symbol": symbol,
"A": np.random.randint(0, 100, size=len(dti)),
"B": np.random.randint(0, 100, size=len(dti))})
time.sleep(random.random()) # to simulate network delay
return df.to_json()
def download_data(pool_id, symbols):
df = []
for symbol in symbols:
print("[{:02}]: {}".format(pool_id, symbol))
# do stuff here
# read_str = BASEURL.format(symbol)
# df.append(pd.read_json(read_str))
df.append(pd.read_json(fake_data(symbol)))
return pd.concat(df, ignore_index=True)
if __name__ == "__main__":
with multiprocessing.Pool(PROCESSES) as pool:
data = pool.starmap(download_data, enumerate(TICKERS, start=1))
df = pd.concat(data).set_index(["date", "symbol"]).sort_index()
In this example, I split the list of tickers into sublists for each process retrieves data for multiple symbols and limits overhead due to create and destroy processes.
The delay is to simulate the response time from the network connection and highlight the multiprocess behaviour.
Edit 2: simpler but naive version for your needs
import pandas as pd
import multiprocessing
# get tickers from your csv
TICKERS = ["BCDA", "WBAI", "NM", "ZKIN", "TNXP", "FLY", "MYSZ", "GASX", "SAVA", "GCE",
"XNET", "SRAX", "SINO", "LPCN", "XYF", "SNSS", "DRAD", "WLFC", "OILD", "JFIN",
"TAOP", "PIC", "DIVC", "MKGI", "CCNC", "AEI", "ZCMD", "YVR", "OCG", "IMTE",
"AZRX", "LIZI", "ORSN", "ASPU", "SHLL", "INOD", "NEXI", "INR", "SLN", "RHE-PA",
"MAX", "ARRY", "BDGE", "TOTA", "PFMT", "AMRH", "IDN", "OIS", "RMG", "IMV",
"CHFS", "SUMR", "NRG", "ULBR", "SJI", "HOML", "AMJL", "RUBY", "KBLMU", "ELP"]
BASEURL = "https://financialmodelingprep.com/api/v3/income-statement/{}?limit=120&apikey=demo"
def download_data(symbol):
print(symbol)
# do stuff here
read_str = BASEURL.format(symbol)
df = pd.read_json(read_str)
return df
if __name__ == "__main__":
with multiprocessing.Pool(multiprocessing.cpu_count()) as pool:
data = pool.map(download_data, TICKERS)
df = pd.concat(data).set_index(["date", "symbol"]).sort_index()
Note about pool.map: for each symbol in TICKERS, create a process and call function download_data.

Time Dependant 1D Schroedinger Equation using Numpy and SciPy solve_ivp

I am trying to solve the 1D time dependent Schroedinger equation using finite difference methods, here is how the equation looks and how it undergoes discretization
Say I have N spatial points (the x_i goes from 0 to N-1), and suppose my time span is K time points.
I strive to get a K by N matrix. each row (j) will be the function at time t_j
I suspect that my issue is that I am defining the system of the coupled equations in a wrong way.
My boundary conditions are psi=0, or some constant at the sides of the box so I make the ode's in the sides of my X span to be zero
My Code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
#Defining the length and the resolution of our x vector
length = 2*np.pi
delta_x = .01
# create a vector of X values, and the number of X values
def create_x_vector(length, delta_x):
x = np.arange(-length, length, delta_x)
N = len(x)
return x, N
# create initial condition vector
def create_initial_cond(x,x0,Gausswidth):
psi0 = np.exp((-(x-x0)**2)/Gausswidth)
return psi0
#create the system of ODEs
def ode_system(psi,t,delta_x,N):
psi_t = np.zeros(N)
psi_t[0]=0
psi_t[N-1]=0
for i in range(1,N-1):
psi_t[i] = (psi[i+1]-2*psi[i]+psi[i-1])/(delta_x)**2
return psi_t
#Create the actual time, x and initial condition vectors using the functions
t = np.linspace(0,15,5000)
x,N= create_x_vector(length,delta_x)
psi0 = create_initial_cond(x,0,1)
psi = np.zeros(N)
psi= solve_ivp(ode_system(psi,t,delta_x,N),[0,15],psi0,method='Radau',max_step=0.1)
After running I get an error:
runfile('D:/Studies/Project/Simulation Test/Test2.py', wdir='D:/Studies/Project/Simulation Test')
Traceback (most recent call last):
File "<ipython-input-16-bff0a1fd9937>", line 1, in <module>
runfile('D:/Studies/Project/Simulation Test/Test2.py', wdir='D:/Studies/Project/Simulation Test')
File "C:\Users\Pasha\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 704, in runfile
execfile(filename, namespace)
File "C:\Users\Pasha\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/Studies/Project/Simulation Test/Test2.py", line 35, in <module>
psi= solve_ivp(ode_system(psi,t,delta_x,N),[0,15],psi0,method='Radau',max_step=0.1)
File "C:\Users\Pasha\Anaconda3\lib\site-packages\scipy\integrate\_ivp\ivp.py", line 454, in solve_ivp
solver = method(fun, t0, y0, tf, vectorized=vectorized, **options)
File "C:\Users\Pasha\Anaconda3\lib\site-packages\scipy\integrate\_ivp\radau.py", line 288, in __init__
self.f = self.fun(self.t, self.y)
File "C:\Users\Pasha\Anaconda3\lib\site-packages\scipy\integrate\_ivp\base.py", line 139, in fun
return self.fun_single(t, y)
File "C:\Users\Pasha\Anaconda3\lib\site-packages\scipy\integrate\_ivp\base.py", line 21, in fun_wrapped
return np.asarray(fun(t, y), dtype=dtype)
TypeError: 'numpy.ndarray' object is not callable
In a more general note, how can I make python solve N ode's without manually defining each and one of them?
I want to have a big vector called xdot where each cell in this vector will be a function of some X[i]'s and I seem to fail to do that? or maybe my approach is completely wrong?
Also I feel maybe that the "Vectorized" argument of ivp_solve can be connected, but I do not understand the explanation in the SciPy documentation.

The problem is probably that solve_ivp expects a function for its first parameter, and you provided ode_system(psi,t,delta_x,N) which results in a matrix instead (therefore you get type error - ndarray).
You need to provide solve_ivp a function that accepts two variables, t and y (which in your case is psi). It can be done like this:
def temp_function(t, psi):
return ode_system(psi,t,delta_x,N)
and then, your last line should be:
psi= solve_ivp(temp_function,[0,15],psi0,method='Radau',max_step=0.1)
This code solved the problem for me.
For a shorthand way of doing this, you can also just write the function inline using Lambda :
psi= solve_ivp(lambda t,psi : ode_system(psi,t,delta_x,N),[0,15],psi0,method='Radau',max_step=0.1)

neo4django mixin inheritance problems

Considering my previous question, I try to implement what I need.
The following is the content of a django app models.py.
from neo4django.db import models
from neo4django.auth.models import User as AuthUser
class MyManager(models.manager.NodeModelManager):
def filterLocation(self,**kwargs):
qs = self.get_query_set()
if 'dist' in kwargs:
qs = qs.filter(_where_dist=kwargs['dist'])
elif 'prov' in kwargs:
qs = qs.filter(_where_prov=kwargs['prov'])
elif 'reg' in kwargs:
qs = qs.filter(_where_reg=kwargs['reg'])
return qs
class MyMixin(object):
_test = models.BooleanProperty(default=True)
_where_dist = models.StringProperty(indexed=True)
_where_prov = models.StringProperty(indexed=True)
_where_reg = models.StringProperty(indexed=True)
search = MyManager()
class Meta:
abstract = True
class Activity(MyMixin,models.NodeModel):
name = models.StringProperty()
class User(MyMixin,AuthUser):
info = models.StringProperty()
I have many problems. The first is the non-inheritance of MyMixin's attributes:
>>> joe=User.objects.create(username='joe') # OK!
>>> joe
<User: joe>
>>> bill=User.objects.create(username='bill',_test=True)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/tonjo/venv/tuned/local/lib/python2.7/site-packages/neo4django/db/models/manager.py", line 43, in create
return self.get_query_set().create(**kwargs)
File "/home/tonjo/venv/tuned/local/lib/python2.7/site-packages/neo4django/db/models/query.py", line 1296, in create
return super(NodeQuerySet, self).create(**kwargs)
File "/home/tonjo/venv/tuned/local/lib/python2.7/site-packages/django/db/models/query.py", line 375, in create
obj = self.model(**kwargs)
File "/home/tonjo/venv/tuned/local/lib/python2.7/site-packages/neo4django/db/models/base.py", line 141, in __init__
super(NodeModel, self).__init__(*args, **kwargs)
File "/home/tonjo/venv/tuned/local/lib/python2.7/site-packages/django/db/models/base.py", line 367, in __init__
raise TypeError("'%s' is an invalid keyword argument for this function" % kwargs.keys()[0])
TypeError: '_test' is an invalid keyword argument for this function
But also the create fails to set User's own attributes!
>>> k=User.objects.create(username='kevin',info='The Best')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/tonjo/venv/tuned/local/lib/python2.7/site-packages/neo4django/db/models/manager.py", line 43, in create
return self.get_query_set().create(**kwargs)
File "/home/tonjo/venv/tuned/local/lib/python2.7/site-packages/neo4django/db/models/query.py", line 1296, in create
return super(NodeQuerySet, self).create(**kwargs)
File "/home/tonjo/venv/tuned/local/lib/python2.7/site-packages/django/db/models/query.py", line 375, in create
obj = self.model(**kwargs)
File "/home/tonjo/venv/tuned/local/lib/python2.7/site-packages/neo4django/db/models/base.py", line 141, in __init__
super(NodeModel, self).__init__(*args, **kwargs)
File "/home/tonjo/venv/tuned/local/lib/python2.7/site-packages/django/db/models/base.py", line 367, in __init__
raise TypeError("'%s' is an invalid keyword argument for this function" % kwargs.keys()[0])
TypeError: 'info' is an invalid keyword argument for this function
None of the mixin or User class own attributes exist in User.
If I derived in reverse order:
class User(AuthUser,MyMixin):
Here they are present, but I don't think is a good practice,
should not core models go to the right?
Anyway, as we see below, Activity does not have this problem,
like if AuthUser removed all attributes (intended behavior?).
While the alternative creation method works:
>>> k=User(username='kevin',info='The Best')
>>> k.save()
>>> k
<User: kevin>
But using the other Model, Activity, which inherits directly from NodeModelManager
(with User we have an intermediate parent AuthUser), things are better:
>>> a=Activity.objects.create(name="AA")
>>> a
<Activity: Activity object>
Several tests made with a simple NodeModel inheritance were ok,
the problems arise with multiple inheritance and mixins.
Another problem, with my NodeModelManager:
>>> User.search.filterLocation(dist="b")
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/tonjo/prj/tuned_prj/tuned_django/myapp/models.py", line 6, in filterLocation
qs = self.get_query_set()
File "/home/tonjo/venv/tuned/local/lib/python2.7/site-packages/neo4django/db/models/manager.py", line 31, in get_query_
set
return NodeQuerySet(self.model)
File "/home/tonjo/venv/tuned/local/lib/python2.7/site-packages/neo4django/db/models/query.py", line 1222, in __init__
self._app_label = model._meta.app_label
AttributeError: 'NoneType' object has no attribute '_meta'
This one is beyond my comprehension ;)
MyManager worked well when in a previous test I derived from a NodeModel's child,
not from a mixin.

This is a pretty complicated question, but hopefully I can give you a pointer.
First- you need to understand that Django fields (and by extension neo4django properties) cooperate with the class on which they're defined. That's why they only work when defined on a Model (or, in neo4django, a NodeModel). There is no easy way to do multiple inheritance using Django models and fields- my mixin suggestion from your other question allows adding Python methods and attributes, but won't magically make Property or Field play nicely with object as a parent class.
If you really want to avoid duplication of property definitions in this situation, you have a few choices.
One is to use a shared super class- but in this case, you can't, since you need to inherit from neo4django.auth.models.User with one of your classes. This particular requirement will when neo4django supports Django 1.5+, which allows swappable user models.
Most metaprogramming won't work easily, since Django and neo4django make use of metaclasses. That said, I'm sure you could hack around this with a clever class decorator or child metaclass- but I'm not sure you should from a sanity standpoint :)
Let me know how it goes- maybe I'm missing an easier approach.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

pyspark RDDs strip attributes of numpy subclasses - numpy

Related

using pandas.read_csv, how can one process all errors, receive all non-error data?

keras.models.load_model() gives ValueError

Pandas Making multiple HTTP requests

Time Dependant 1D Schroedinger Equation using Numpy and SciPy solve_ivp

neo4django mixin inheritance problems

Categories

Resources