Textacy - Vectorizer Weighting Error - textacy

I've recently found Textacy and as i go through the API reference guide I'm running into an error for the Vectorizer. If i add any options from the API reference I get a TypeError: unexpected keyword argument. I get this error for other options in addition to weighting.
I installed textacy using pip and I'm using Python3 on Ubuntu. Any help is appreciated. Thanks!
vectorizer = textacy.vsm.Vectorizer(weighting='tfidf')
TypeError: __init__() got an unexpected keyword argument 'weighting'

Ran into the same problem. The API documentation does not reflect the current Vectorizer keyword arguments. The Vectorizer now provides different keyword arguments to allow more control over how TF*IDF is applied.
vectorizer = textacy.Vectorizer(tf_type='linear', apply_idf=True, idf_type='smooth')
tf_type applies standard term frequency (TF), apply_idf=True applies the inverse document frequency (IDF). From the repo comments, idf_type='smooth' adds one to each document frequency in order to avoid zero divisions.
To see more information about the options check the comment at line 182 in the repository here: https://github.com/chartbeat-labs/textacy/blob/master/textacy/vsm/vectorizers.py

Related

ValueError: Unexpected option 'height' for Points type when using the 'matplotlib' extension. No similar options found

I am trying to do this project in the link. But various errors first concerning the library installation to now this value error. I am writing the code as how it is shown in this link. But unable to run.
First error:
Unexpected option 'height' for Points type when using the 'matplotlib' extension. No similar options found.
Second error:
ValueError: zero-size array to reduction operation minimum which has no identity
Link: https://towardsdatascience.com/interactive-geospatial-data-visualization-with-geoviews-in-python-7d5335c8efd1
[![
zero-size array to reduction operation minimum which has no identity
https://i.stack.imgur.com/4I6nc.png)](https://i.stack.imgur.com/4I6nc.png)
[![
Unexpected option 'height' for Points type when using the 'matplotlib' extension. No similar options found.
](https://i.stack.imgur.com/kJrEK.png)](https://i.stack.imgur.com/kJrEK.png)
I have tried writing same code in the link which is provided. I even tried using a different dataset but everything seemed to be not working. These codes are important while doing a GIS problem.

TFTransform ValueError: "Split does not exist over all example artifacts"

I'm attempting to construct a TFX pipeline, but keep running into an error during the TFTransform component stem. After diving into the error message and its code on GitHub, it appears to have something to do with a function def get_split_uris(). From what I can glean, there is a mismatch between the number of Artifacts being consumed by this function during runtime and the number of URIs being retrieved and being matched back to this list.
It's odd because my CSVExampleGen() function doesn't seem to have any problems ingesting my original data set that's already split into two CSV files: 'target' and 'candidate'. I cannot find any documentation on this error on the TFX website so my apologies for not having more information.
I can provide additional details if needed.

CatBoost Error: bayesian bootstrap doesn't support taken fraction option

The error occur when running gridsearchcv using catboostclassifier and bootstrap_type='Bayesian'. Any idea what that error means?
I had similar issue while training my data with CatBoostClassifier. I had set the "subsample" parameter to "0.5". After removing this parameter, it worked for me.
More details at: enter link description here
Hope this is useful.

Why is the plotting-functions of package "ControlSystems" in Julia giving me a "UndefVarError: subplot not defined"?

I use Julia 0.4.3 and I have updated all packages using Pkg.update().
From the "ControlSystems" documentation, it is explicitly stated that plotting requires extra care in that the user is free to choose plotting back-end (I guess back-end means which plotting-package is used by "ControlSystems"). I have installed and like to use pyplot - hence I try the following code:
using ControlSystems
Plots.pyplot()
s = tf("s");
G = 1/(s+1);
stepplot(G);
Gives the error-message
ERROR: UndefVarError: subplot not defined
in stepplot at C:\folder\.julia\v0.4\ControlSystems\src\plotting.jl81
in stepplot at C:\folder\.julia\v0.4\ControlSystems\src\plotting.jl103
I have also tried the same code without the "Plots.pyplot()"-command.

Nonetype Error on Python Google Search Script - Is this a spam prevention tactic?

Fairly new to Python so apologies if this is a simple ask. I have browsed other answered questions but can't seem to get it functioning consistently.
I found the below script which prints the top result from google for a set of defined terms. It will work the first few times that I run it but will display the following error when I have searched 20 or so terms:
Traceback (most recent call last):
File "term2url.py", line 28, in <module>
results = json['responseData']['results']
TypeError: 'NoneType' object has no attribute '__getitem__'
From what I can gather, this indicates that one of the attributes does not have a defined value (potentially a result of google blocking me?). I attempted to solve the issue by adding in the else clause though I still run into the same problem.
Any help would be greatly appreciated; I have pasted the full code below.
Thanks!
#
# This is a quick and dirty script to pull the most likely url and description
# for a list of terms. Here's how you use it:
#
# python term2url.py < {a txt file with a list of terms} > {a tab delimited file of results}
#
# You'll must install the simpljson module to use it
#
import urllib
import urllib2
import simplejson
import sys
# Read the terms we want to convert into URL from info redirected from the command line
terms = sys.stdin.readlines()
for term in terms:
# Define the query to pass to Google Search API
query = urllib.urlencode({'q' : term.rstrip("\n")})
url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s" % (query)
# Fetch the results and convert to JSON format
search_results = urllib2.urlopen(url)
json = simplejson.loads(search_results.read())
# Process the results by pulling the first record, which has the best match
results = json['responseData']['results']
for r in results[:1]:
if results is not None:
url = r['url']
desc = r['content'].encode('ascii', 'replace')
else:
url = "none"
desc = "none"
# Print the results to stdout. Use redirect to capture the output
print "%s\t%s" % (term.rstrip("\n"), url)
import time
time.sleep(1)
Here are some Python details for you first:
None is a valid object in Python, of the type NoneType:
print(type(None))
Produces:
< class 'NoneType' >
And the no attribute error you got is normal when you try to access some method or attribute of an object that doesn't have that attribute. In this case, you were attempting to use the __getitem__ syntax (object[item_index]), which NoneType objects don't support because it doesn't have the __getitem__ method.
The point of the previous explanation is that your assumption about what your error means is correct: your results object is essentially empty.
As for why you're hitting this in the first place, I believe you are running up against Google's API limits. It looks like you're using the old API that is now deprecated. The number of search results (not queries) used to be limited to around 64 per query, and there used to be no rate or per-day limit. However, since it's been deprecated for over 5 years now, there may be new undocumented limits.
I don't think it necessarily has anything to do with SPAM, but I do believe it is an undocumented limit.