Spacy's Dependency Parser - spacy

I was trying to play around with Spacy's Dependency Parser to extract Aspect for Aspect Based Sentiment Analysis.
I followed this link: https://remicnrd.github.io/Aspect-based-sentiment-analysis/
When I tried the following piece of the code on my data, I got an Error message.
import spacy
nlp = spacy.load('en')
dataset.review = dataset.review.str.lower()
aspect_terms = []
for review in nlp.pipe(dataset.review):
chunks = [(chunk.root.text) for chunk in review.noun_chunks if chunk.root.pos_ == 'NOUN']
aspect_terms.append(' '.join(chunks))
dataset['aspect_terms'] = aspect_terms
dataset.head(10)
The Error message was:
TypeError: object of type 'NoneType' has no len()
The Error was in this line:
for review in nlp.pipe(dataset.review):
Could someone please help me understand the issue here and how to resolve this. Thanks.

Writing the solution here incase it helps someone in future.
I was getting the Error because I had some empty rows for the column review.
I re-ran the code after removing the empty rows/rows with NaN values for the column reviews and it works fine :)

Related

TFTransform ValueError: "Split does not exist over all example artifacts"

I'm attempting to construct a TFX pipeline, but keep running into an error during the TFTransform component stem. After diving into the error message and its code on GitHub, it appears to have something to do with a function def get_split_uris(). From what I can glean, there is a mismatch between the number of Artifacts being consumed by this function during runtime and the number of URIs being retrieved and being matched back to this list.
It's odd because my CSVExampleGen() function doesn't seem to have any problems ingesting my original data set that's already split into two CSV files: 'target' and 'candidate'. I cannot find any documentation on this error on the TFX website so my apologies for not having more information.
I can provide additional details if needed.

CatBoost Error: bayesian bootstrap doesn't support taken fraction option

The error occur when running gridsearchcv using catboostclassifier and bootstrap_type='Bayesian'. Any idea what that error means?
I had similar issue while training my data with CatBoostClassifier. I had set the "subsample" parameter to "0.5". After removing this parameter, it worked for me.
More details at: enter link description here
Hope this is useful.

TensorFlow in PyCharm value error: Failed to find data adapter that can handle input

I am referring TensorFlow speciliazation from Coursera where a certain piece of code works absolutely fine in Google Colab, whereas when I try to run it locally on PyCharm, it gives following error:
Failed to find data adapter that can handle input
Any suggestions?
Can you tell me the code where the error occurred?
It should be available in logs under your PyCharm console.
Looking at your comments, it seems that the model is expecting an array while you provided a list.
I was facing the same issue. Turns out it was a in the form of a list. I had to convert the fields into a numpy array like:
training_padded = np.array(training_padded)
training_labels = np.array(training_labels)
testing_padded = np.array(testing_padded)
testing_labels = np.array(testing_labels)
thats it!
Try it out and let me know if it works.

Textacy - Vectorizer Weighting Error

I've recently found Textacy and as i go through the API reference guide I'm running into an error for the Vectorizer. If i add any options from the API reference I get a TypeError: unexpected keyword argument. I get this error for other options in addition to weighting.
I installed textacy using pip and I'm using Python3 on Ubuntu. Any help is appreciated. Thanks!
vectorizer = textacy.vsm.Vectorizer(weighting='tfidf')
TypeError: __init__() got an unexpected keyword argument 'weighting'
Ran into the same problem. The API documentation does not reflect the current Vectorizer keyword arguments. The Vectorizer now provides different keyword arguments to allow more control over how TF*IDF is applied.
vectorizer = textacy.Vectorizer(tf_type='linear', apply_idf=True, idf_type='smooth')
tf_type applies standard term frequency (TF), apply_idf=True applies the inverse document frequency (IDF). From the repo comments, idf_type='smooth' adds one to each document frequency in order to avoid zero divisions.
To see more information about the options check the comment at line 182 in the repository here: https://github.com/chartbeat-labs/textacy/blob/master/textacy/vsm/vectorizers.py

OrientDB 2.2.4 Load Balancing

I am trying to setup a cluster for load balancing. I am using the Java Graph API. In the documentation there is this code:
final OrientGraphFactory factory = new OrientGraphFactory("remote:localhost/demo");
factory.setConnectionStrategy(OStorageRemote.CONNECTION_STRATEGY.ROUND_ROBIN_CONNECT);
OrientGraphNoTx graph = factory.getNoTx();
I copied and pasted the code exactly like this and I get this compilation error
"incompatible types: CONNECTION_STRATEGY cannot be converted to
String"
The only relevant import I have is:
import com.orientechnologies.orient.client.remote.OStorageRemote;
Can you please help?
Has anyone tried this?
Thanks.
You could use
OStorageRemote.CONNECTION_STRATEGY.ROUND_ROBIN_CONNECT.toString()
Hope it helps