Boolean calculation: Do 'and' and 'And' share the same meaning in colab? - google-colaboratory

I have two codes about image mask in Google colab:
(1)
mask1 = img1.gt(0.5) and img2.lt(1.2)
(2)
mask2 = img1.gt(0.5).And(img2.lt(1.2))
Are they different?
Does mask1 equal mask2?

These are two different things. And() is a server-side operation and and is a client-side operation. You have to use the server-side operation for this. You can read up on client- vs server-side here.

Related

Transforming Python Classes to Spark Delta Rows

I am trying to transform an existing Python package to make it work with Structured Streaming in Spark.
The package is quite complex with multiple substeps, including:
Binary file parsing of metadata
Fourier Transformations of spectra
The intermediary & end results were previously stored in an SQL database using sqlalchemy, but we need to transform it to delta.
After lots of investigation, I've made the first part work for the binary file parsing but only by statically defining the column types in an UDF:
fileparser = F.udf(File()._parseBytes,FileDelta.getSchema())
Where the _parseBytes() method takes a binary stream and outputs a dictionary of variables
Now I'm trying to do this similarly for the spectrum generation:
spectrumparser = F.udf(lambda inputDict : vars(Spectrum(inputDict)),SpectrumDelta.getSchema())
However the Spectrum() init method generates multiple Pandas Dataframes as fields.
I'm getting errors as soon as the Executor nodes get to that part of the code.
Example error:
expected zero arguments for construction of ClassDict (for pandas.core.indexes.base._new_Index).
This happens when an unsupported/unregistered class is being unpickled that requires construction arguments.
Fix it by registering a custom IObjectConstructor for this class.
Overall, I feel like i'm spending way too much effort for building the Delta adaptation. Is there maybe an easy way to make these work?
I read in 1, that we could switch to the Pandas on spark API but to me that seems to be something to do within the package method itself. Is that maybe the solution, to rewrite the entire package & parsers to work natively in PySpark?
I also tried reproducing the above issue in a minimal example but it's hard to reproduce since the package code is so complex.
After testing, it turns out that the problem lies in the serialization when wanting to output (with show(), display() or save() methods).
The UDF expects ArrayType(xxxType()), but gets a pandas.Series object and does not know how to unpickle it.
If you explicitly tell the UDF how to transform it, the UDF works.
def getSpectrumDict(inputDict):
spectrum = Spectrum(inputDict["filename"],inputDict["path"],dict_=inputDict)
dict = {}
for key, value in vars(spectrum).items():
if type(value) == pd.Series:
dict[key] = value.tolist()
elif type(value) == pd.DataFrame:
dict[key] = value.to_dict("list")
else:
dict[key] = value
return dict
spectrumparser = F.udf(lambda inputDict : getSpectrumDict(inputDict),SpectrumDelta.getSchema())

What is the meaning of concatenation in the case of the cm operator in PDF?

I understand that "cm" concatenates two CTMs, however, it's not obvious to me what the specific definition of concatenation is. Reading through to "graphics state operators", in the specification, has not helped me.
Thus far I've looked at a whole bunch of different resources about matrix concatenation. There seems to be a number of different ways concatenation is defined for matrices: some examples seem to show it as:
[1,2; concat [5,6; = [1,2,5,6;
3,4] 7,8]. 3,4,7,8]
... however that would seem to break the transform matrices, so I assume that's not it.
Another option is that they just mean matrix addition:
[1,2; + [5,6; = [6 ,8 ;
3,4] 7,8]. 10,12]
but I feel, if it were just a matrix addition, they would just call it addition/matrix addition.
my last idea is:
[1,2; + [5,6; = [15,26 ;
3,4] 7,8]. 37,48]
but that seems like a bizarre approach, not least because it would have numbers behaving like text.
Thanks in advance

How to set the sentiment attribute on a Span?

I'm trying thhe Keras example from spacy documentation, but instead of suming the sentiment score in the Doc like that
for sent, label in zip(sentences, ys):
sent.doc.sentiment += label - 0.5
I would like to keep the score on the sentence level like that
for sent, label in zip(sentences, ys):
sent.sentiment = float(label)
This code give me that error
AttributeError: attribute 'sentiment' of 'spacy.tokens.span.Span' objects is not writable
Is there a setter to call instead? I tried set_sentiment without success.
Am I missing something? Is it a bug?
You can find the implementation of Span.sentiment here. You can see it is indeed not writable, because it either looks up the value in self.doc.user_span_hooks, or takes the average of token.sentiment for the tokens in that span.
[EDITED BELOW]
The sentiment of a Token is not context-dependent though. It uses the information present in the underlying Lexeme. That means that any word, such as "love", would have the same sentiment value in any sentence/context.
So there's two things you can do: either write to the sentiment of the lexemes like so:
vocab["love"].sentiment = 3.0
Or implement a custom hook that allows you to define any function you want. You can do this on the span (doc.user_span_hooks) or token (doc.user_token_hooks) level:
doc.user_span_hooks["sentiment"] = lambda span: 10.0

visualizing correlations on specific class in a target

I have a Dataset that consists of four classes, meaning the target variable has 4 different classes (like 0,1,2,3)
as we know we can obtain most correlated features to our target by pandas using this snippet.
# Find correlations with the target and sort
correlations = train.corr()['Target'].sort_values()
# Display correlations
print('Most Positive Correlations:\n', correlations.tail(15))
print('\nMost Negative Correlations:\n', correlations.head(15))
but my question is, i need to obtain most correlated features to specific target class. for example i want to get which features have high effect(correlated) on target class 3. I have tried this,
correlations = train.corr()[(train['Target'] == 3)].sort_values()
but it gives this error
IndexingError: Unalignable boolean Series provided as indexer (index of the
boolean Series and of the indexed object do not match
my expected output
You haven't given us anything to work with but I'm assuming your problem is calling .corr() before masking. You need to call:
correlations = train[(train['Target'] == 3)].corr().sort_values()
Edit:
A more elegant solution is probably groupby. Try something along the lines of:
train.groupby('Target').apply(lambda grp: grp.corr())

Pandas and Fuzzy Match

Currently I have two data frames. I am trying to get a fuzzy match of client names using fuzzywuzzy's process.extractOne function. When I have run the following script on sample data I get good results and no error, but when I run the following on my current data frames I get both an Attribute and Type error. I am not able to provide the data for security reasons, but if anyone can figure out why I am getting errors based on the script provided I would be much obliged.
names2 = list(dftr3['Common Name'])
names3 = dict(zip(names2,names2))
def get_fuzz_match(row):
match = process.extractOne(row['CLIENT_NAME'],choices = n3.keys(),score_cutoff = 80)
if match:
return n3[match[0]]
return np.nan
dfmi4['Match Name'] = dfmi4.apply(get_fuzz_match, axis=1)
I know not having some examples makes this more difficult to troubleshoot, so I will answer any question and edit the post to help this process along. The specific errors are:
1.AttributeError: 'dict_keys' object has no attribute 'items'
2.TypeError: expected string or buffer
The AttributeError is straightforward and to be expected, I think. Fuzzywuzzy's process.extract function, which does most of the actual work in process.extractOne, uses a try:... except: clause to determine whether to process the choices parameter as dict-like or list-like. I think you are seeing the exception because the TypeError is raised during the except: clause.
The TypeError is trickier to pin down, but I suspect it occurs somewhere in the StringProcessor class, used in the processor module, again called by extract, which uses several string methods and doesn't catch exceptions. So it seems likely that your apply call is passing something that is not a string. Is it possible that you have any empty cells?