django-rest-framework: aserializer.data returns ReturnDict and OrderedDict instead of dict object - serialization

I have written a model serializer class for a model. When I test to see if it serializes properly I get following error.
AssertionError: ReturnDict([(u'id', 1), ('apple', OrderedDict([(u'id', 1), ('data', u'this is a apple data.')])) ...
I tried converting it to dict wrapping aserializer.data to dict function like
dict(aserializer.data)
Then I get following error.
AssertionError: {'status': False, 'http': OrderedDict([(u'id', 1), ...
How to get dictionary data from a serializer.

Related

pyspark RDDs strip attributes of numpy subclasses

I've been fighting an unexpected behavior when attempting to construct a subclass of numpy ndarray within a map call to a pyspark RDD. Specifically, the attribute that I added within the ndarray subclass appears to be stripped from the resulting RDD.
The following snippets contain the essence of the issue.
import numpy as np
class MyArray(np.ndarray):
def __new__(cls,shape,extra=None,*args):
obj = super().__new__(cls,shape,*args)
obj.extra = extra
return obj
def __array_finalize__(self,obj):
if obj is None:
return
self.extra = getattr(obj,"extra",None)
def shape_to_array(shape):
rval = MyArray(shape,extra=shape)
rval[:] = np.arange(np.product(shape)).reshape(shape)
return rval
If I invoke shape_to_array directly (not under pyspark), it behaves as expected:
x = shape_to_array((2,3,5))
print(x.extra)
outputs:
(2, 3, 5)
But, if I invoke shape_to_array via a map to an RDD of inputs, it goes wonky:
from pyspark.sql import SparkSession
sc = SparkSession.builder.appName("Steps").getOrCreate().sparkContext
rdd = sc.parallelize([(2,3,5),(2,4),(2,5)])
result = rdd.map(shape_to_array).cache()
print(result.map(lambda t:type(t)).collect())
print(result.map(lambda t:t.shape).collect())
print(result.map(lambda t:t.extra).collect())
Outputs:
[<class '__main__.MyArray'>, <class '__main__.MyArray'>, <class '__main__.MyArray'>]
[(2, 3, 5), (2, 4), (2, 5)]
22/10/15 15:48:02 ERROR Executor: Exception in task 7.0 in stage 2.0 (TID 23)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/Cellar/apache-spark/3.3.0/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 686, in main
process()
File "/usr/local/Cellar/apache-spark/3.3.0/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 678, in process
serializer.dump_stream(out_iter, outfile)
File "/usr/local/Cellar/apache-spark/3.3.0/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 273, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/usr/local/Cellar/apache-spark/3.3.0/libexec/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper
return f(*args, **kwargs)
File "/var/folders/w7/42_p7mcd1y91_tjd0jzr8zbh0000gp/T/ipykernel_94831/2519313465.py", line 1, in <lambda>
AttributeError: 'MyArray' object has no attribute 'extra'
What happened to the extra attribute of the MyArray instances?
Thanks much for any/all suggestions
EDIT: A bit of additional info. If I add logging inside the shape_to_array function just before the return, I can verify that the extra attribute does exist on the DataArray object that is being returned. But when I attempt to access the DataArray elements in the RDD from the main driver, they're gone.
After a night of sleeping on this, I remembered that I have often had issues with pyspark RDDs where the error message had to do the return type not working with pickle.
I wasn't getting that error message this time because numpy.ndarray does work with pickle. BUT... the __reduce__ and __setstate__ methods of numpy.ndarray known nothing of the added extra attribute on the MyArray subclass. This is where extra was being stripped.
Adding the following two methods to MyArray solved everything.
def __reduce__(self):
mthd,cls,args = super().__reduce__(self)
return mthd, cls, args + (self.extra,)
def __setstate__(self,args):
super().__setstate__(args[:-1])
self.extra = args[-1]
Thank you to anyone who took some time to think about my question.

Tensorflow 2 custom dataset Sequence

I have a dataset in a python dictionary. The structure is as follow:
data.data['0']['input'],data.data['0']['target'],data.data['0']['length']
Both input and target are arrays of size (n,) and length is an int.
I have created a class object with tf.keras.utils.Sequence and specify __getitem__ as this:
def __getitem__(self, idx):
idx = str(idx)
return {
'input': np.asarray(self.data[idx]['input']),
'target': np.asarray(self.data[idx]['target']),
'length': self.data[idx]['length']
}
How can I iterate over such dataset using tf.data.Dataset? I am getting this error if I try to use from_tensor_slices
ValueError: Attempt to convert a value with an unsupported type (<class 'dict'>) to a Tensor.
I think you should modify the dictionary to a tensor as proposed here convert a dictionary to a tensor
or change the dictionary to a text file or to a tfrecords. Hope this would help you!

Convert function / method to string?

i have got the following forecast:
but i am not able to save as "functional" dataframe. I would like to rename the columns and then plot it.
But nothing does not work.
The whole looks like a one "unit", not a normal dataframe.
I have tried to do something to change it, but without any result.
df18.columns = [''] * len(df18.columns)
df18.columns = ['A', 'B']
df18.iloc[0]
AttributeError: 'function' object has no attribute 'columns'
AttributeError: 'function' object has no attribute 'iloc'
df18 = df18[:-1]
TypeError: 'method' object is not subscriptable
How i can convert a function to float64? Or to anything that i can work with?

Init pd.Series from list of fields

I wonder that
columns=['A','B','C']
pd.Series(columns=columns)
yield
TypeError: __init__() got an unexpected keyword argument 'columns'
while
pd.DataFrame(columns=columns)
works.
Is there such init function from a list of columns in pd.Series?
I know that this can be done with a dict in the middle:
columns=['A','B','C']
my_dict = dict.fromkeys(columns)
my_series = pd.Series(my_dict)
..But I think it would be nice to have it as pd.Series(colums=columns)directly.

Getting error while passing Class_weight parameter in Random Forest

I am doing binary classifier. Since my data is unbalanced i am using class weight. I am getting error while passing values how to fix this.
Error: ValueError: class_weight must be dict, 'balanced', or None, got: [{0: 0.4, 1: 0.6}]"
Code
rf=RandomForestClassifier(n_estimators=1000,oob_score=True,min_samples_leaf=500,class_weight=[{0:.4, 1:.6}])
fit_rf=rf.fit(X_train_res,y_train_res)
Error
\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\class_weight.py in compute_class_weight(class_weight, classes, y)
60 if not isinstance(class_weight, dict):
61 raise ValueError("class_weight must be dict, 'balanced', or None,"
---> 62 " got: %r" % class_weight)
63 for c in class_weight:
64 i = np.searchsorted(classes, c)
ValueError: class_weight must be dict, 'balanced', or None, got: [{0: 0.4, 1: 0.6}]
How to fix this.
Per the documentation
class_weight : dict, list of dicts, “balanced”,
Therefore, the class_weight paramter accepts a dictionary, a list of dictionary, or the string "balanced". The error message you are given states that it wants a dictionary, and since you have only one dictionary a list is not needed.
So, let's try:
rf=RandomForestClassifier(n_estimators=1000,
oob_score=True,
min_samples_leaf=500,
class_weight={0:.4, 1:.6})
fit_rf=rf.fit(X_train_res,y_train_res)