spacy.matcher.PhraseMatcher object has no attribute "remove" - spacy

I studied spacy at link https://spacy.io/api/phrasematcher#remove. I copy it to jupyter like this
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy
from spacy.matcher import PhraseMatcher
phraseMatcher = PhraseMatcher(nlp.vocab)
phraseMatcher.add("OBAMA", [nlp("Barack Obama")])
assert "OBAMA" in phraseMatcher
phraseMatcher.remove("OBAMA")
assert "OBAMA" not in phraseMatcher
But I got the error
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-25-a5e1de7c4514> in <module>
2 phraseMatcher.add("OBAMA", [nlp("Barack Obama")])
3 assert "OBAMA" in phraseMatcher
----> 4 phraseMatcher.remove("OBAMA")
5 assert "OBAMA" not in phraseMatcher
AttributeError: 'spacy.matcher.PhraseMatcher' object has no attribute 'remove'
I searched but didn't get any ideas why, anyone can show where my mistakes :(
This is my screen shoot:

PhraseMatcher.remove was added in spacy v2.2 (https://v2.spacy.io/api/phrasematcher#remove). Maybe you are using an older version?

Related

import pandas error : Traceback (most recent call last) and expected string or bytes-like object

I installed GIS-Pro and use Jupyter. I believe Jupyter is in the GIS-Pro package. I use Jupyter to write Python codes. Since yesterday, I've got the following errors once executing import pandas as pd :
TypeError Traceback (most recent call last)
C:\Users\AppData\Local\Temp\2/ipykernel_23172/4080736814.py in <module>
----> 1 import pandas as pd
C:\ArcGISPro28\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\__init__.py in <module>
# numpy compat
from pandas.compat import (
np_version_under1p18 as _np_version_under1p18,
is_numpy_dev as _is_numpy_dev,
C:\ArcGISPro28\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\compat\__init__.py in <module>
np_version_under1p20)
from pandas.compat.pyarrow import (
pa_version_under1p0,
pa_version_under2p0,
C:\ArcGISPro28\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\compat\pyarrow.py in <module>
pa_version = pa.__version__
palv = Version(_pa_version)
pa_version_under1p0 = _palv < Version("1.0.0")
pa_version_under2p0 = _palv < Version("2.0.0")
C:\ArcGISPro28\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\util\version\__init__.py in __init__(self, version)
# Validate the version and parse it into pieces
match = self._regex.search(version)
if not match:
raise InvalidVersion(f"Invalid version: '{version}'")
TypeError: expected string or bytes-like object

Why am I getting a method object not iterable error on an iterrows function?

I've gotten a bit of code to work, but when wanting the code to iterate through my pandas dataframe, it errors out. The code is supposed to open and MPO image file and save it as a jpeg. This works until I put the snippet in an iterrows call.
The error is as such:
> --------------------------------------------------------------------------- TypeError Traceback (most recent call
> last) <ipython-input-21-30128a738cdb> in <module>
> ----> 1 for i, row in mpo_list.iterrows:
> 2 im = Image.open(Path(row['location']))
> 3 im.save('D:\\2018_Formost\\2018-12\\Photos\\'+i, format = 'JPEG')
>
> TypeError: 'method' object is not iterable
Code below:
import pandas as pd
from PIL import Image
from pathlib import Path
for i, row in mpo_list.iterrows:
im = Image.open(Path(row['location']))
im.save('D:\\2018_Formost\\2018-12\\Photos\\'+i, format = 'JPEG')
Can anyone spot what I'm doing wrong?
Try this:
mpo_list.iterrows()
Brackets are missing in your version.

Assertion error when making an MP4 video out of numpy arrays with OpenCV

I have this python code that should make a video:
import cv2
import numpy as np
out = cv2.VideoWriter("/tmp/test.mp4",
cv2.VideoWriter_fourcc(*'MP4V'),
25,
(500, 500),
True)
data = np.zeros((500,500,3))
for i in xrange(500):
out.write(data)
out.release()
I expect a black video but the code throws an assertion error:
$ python test.py
OpenCV(3.4.1) Error: Assertion failed (image->depth == 8) in writeFrame, file /io/opencv/modules/videoio/src/cap_ffmpeg.cpp, line 274
Traceback (most recent call last):
File "test.py", line 11, in <module>
out.write(data)
cv2.error: OpenCV(3.4.1) /io/opencv/modules/videoio/src/cap_ffmpeg.cpp:274: error: (-215) image->depth == 8 in function writeFrame
I tried various fourcc values but none seem to work.
According to #jeru-luke and #dan-masek's comments:
import cv2
import numpy as np
out = cv2.VideoWriter("/tmp/test.mp4",
cv2.VideoWriter_fourcc(*'mp4v'),
25,
(1000, 500),
True)
data = np.transpose(np.zeros((1000, 500,3), np.uint8), (1,0,2))
for i in xrange(500):
out.write(data)
out.release()
The problem is that you did not specify the data type of elements when calling np.zeros. As the documentation states, by default numpy will use float64.
>>> import numpy as np
>>> np.zeros((500,500,3)).dtype
dtype('float64')
However, the VideoWriter implementation only supports 8 bit image depth (as the "(image->depth == 8)" part of the error message suggests).
The solution is simple -- specify the appropriate data type, in this case uint8.
data = np.zeros((500,500,3), dtype=np.uint8)

hstack csr matrix with pandas array

I am doing an exercise on Amazon Reviews, Below is the code.
Basically I am not able to add column (pandas array) to CSR Matrix which i got after applying BoW.
Even though the number of rows in both matrices matches i am not able to get through.
import sqlite3
import pandas as pd
import numpy as np
import nltk
import string
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.metrics import roc_curve, auc
from nltk.stem.porter import PorterStemmer
from sklearn.manifold import TSNE
#Create Connection to sqlite3
con = sqlite3.connect('C:/Users/609316120/Desktop/Python/Amazon_Review_Exercise/database/database.sqlite')
filtered_data = pd.read_sql_query("""select * from Reviews where Score != 3""", con)
def partition(x):
if x < 3:
return 'negative'
return 'positive'
actualScore = filtered_data['Score']
actualScore.head()
positiveNegative = actualScore.map(partition)
positiveNegative.head(10)
filtered_data['Score'] = positiveNegative
filtered_data.head(1)
filtered_data.shape
display = pd.read_sql_query("""select * from Reviews where Score !=3 and Userid="AR5J8UI46CURR" ORDER BY PRODUCTID""", con)
sorted_data = filtered_data.sort_values('ProductId', axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')
final=sorted_data.drop_duplicates(subset={"UserId","ProfileName","Time","Text"}, keep='first', inplace=False)
final.shape
display = pd.read_sql_query(""" select * from reviews where score != 3 and id=44737 or id = 64422 order by productid""", con)
final=final[final.HelpfulnessNumerator<=final.HelpfulnessDenominator]
final['Score'].value_counts()
count_vect = CountVectorizer()
final_counts = count_vect.fit_transform(final['Text'].values)
final_counts.shape
type(final_counts)
positive_negative = final['Score']
#Below is giving error
final_counts = hstack((final_counts,positive_negative))
sparse.hstack combines the coo format matrices of the inputs into a new coo format matrix.
final_counts is a csr matrix, so the sparse.coo_matrix(final_counts) conversion is trivial.
positive_negative is a column of a DataFrame. Look at
sparse.coo_matrix(positive_negative)
It probably is a (1,n) sparse matrix. But to combine it with final_counts it needs to be (1,n) shaped.
Try creating the sparse matrix, and transposing it:
sparse.hstack((final_counts, sparse.coo_matrix(positive_negative).T))
Used Below but still getting error
merged_data = scipy.sparse.hstack((final_counts, scipy.sparse.coo_matrix(positive_negative).T))
Below is the error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sparse' is not defined
>>> merged_data = scipy.sparse.hstack((final_counts, sparse.coo_matrix(positive_
negative).T))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sparse' is not defined
>>> merged_data = scipy.sparse.hstack((final_counts, scipy.sparse.coo_matrix(pos
itive_negative).T))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python34\lib\site-packages\scipy\sparse\construct.py", line 464, in h
stack
return bmat([blocks], format=format, dtype=dtype)
File "C:\Python34\lib\site-packages\scipy\sparse\construct.py", line 600, in b
mat
dtype = upcast(*all_dtypes) if all_dtypes else None
File "C:\Python34\lib\site-packages\scipy\sparse\sputils.py", line 52, in upca
st
raise TypeError('no supported conversion for types: %r' % (args,))
TypeError: no supported conversion for types: (dtype('int64'), dtype('O'))
Even I was facing the same issue with sparse matrices. you can convert the CSR matrix to dense by todense() and then you can use np.hstack((dataframe.values,converted_dense_matrix)). It will work fine. you can't deal with sparse matrices by using numpy.hstack
However for very large data set converting to dense matrix is not a good idea. In your case scipy hstack won't work because the data types are different in hstack(int,object).
Try positive_negative = final['Score'].values and scipy.sparse.hstack it. if it doesn't work can you give me the output of your positive_negative.dtype

win32com.client error

When using win32com, something puzzled my.
>>> import win32com
>>> w=win32com.client.Dispatch('Word.Application')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
w=win32com.client.Dispatch('Word.Application')
AttributeError: 'module' object has no attribute 'client'
what's wrong?
win32com.client is a module in the win32com package you need to import the actual module.
import win32com.client
w = win32com.client.Dispatch('Word.Application')