Nonetype Error on Python Google Search Script - Is this a spam prevention tactic? - typeerror

Fairly new to Python so apologies if this is a simple ask. I have browsed other answered questions but can't seem to get it functioning consistently.
I found the below script which prints the top result from google for a set of defined terms. It will work the first few times that I run it but will display the following error when I have searched 20 or so terms:
Traceback (most recent call last):
File "term2url.py", line 28, in <module>
results = json['responseData']['results']
TypeError: 'NoneType' object has no attribute '__getitem__'
From what I can gather, this indicates that one of the attributes does not have a defined value (potentially a result of google blocking me?). I attempted to solve the issue by adding in the else clause though I still run into the same problem.
Any help would be greatly appreciated; I have pasted the full code below.
Thanks!
#
# This is a quick and dirty script to pull the most likely url and description
# for a list of terms. Here's how you use it:
#
# python term2url.py < {a txt file with a list of terms} > {a tab delimited file of results}
#
# You'll must install the simpljson module to use it
#
import urllib
import urllib2
import simplejson
import sys
# Read the terms we want to convert into URL from info redirected from the command line
terms = sys.stdin.readlines()
for term in terms:
# Define the query to pass to Google Search API
query = urllib.urlencode({'q' : term.rstrip("\n")})
url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s" % (query)
# Fetch the results and convert to JSON format
search_results = urllib2.urlopen(url)
json = simplejson.loads(search_results.read())
# Process the results by pulling the first record, which has the best match
results = json['responseData']['results']
for r in results[:1]:
if results is not None:
url = r['url']
desc = r['content'].encode('ascii', 'replace')
else:
url = "none"
desc = "none"
# Print the results to stdout. Use redirect to capture the output
print "%s\t%s" % (term.rstrip("\n"), url)
import time
time.sleep(1)

Here are some Python details for you first:
None is a valid object in Python, of the type NoneType:
print(type(None))
Produces:
< class 'NoneType' >
And the no attribute error you got is normal when you try to access some method or attribute of an object that doesn't have that attribute. In this case, you were attempting to use the __getitem__ syntax (object[item_index]), which NoneType objects don't support because it doesn't have the __getitem__ method.
The point of the previous explanation is that your assumption about what your error means is correct: your results object is essentially empty.
As for why you're hitting this in the first place, I believe you are running up against Google's API limits. It looks like you're using the old API that is now deprecated. The number of search results (not queries) used to be limited to around 64 per query, and there used to be no rate or per-day limit. However, since it's been deprecated for over 5 years now, there may be new undocumented limits.
I don't think it necessarily has anything to do with SPAM, but I do believe it is an undocumented limit.

Related

Intermittent authentication error when posting to a pubsub topic

We have a data pipeline built in Google Cloud Dataflow that consumes messages from a pubsub topic and streams them into BigQuery. In order to test that it works successfully we have some tests that run in a CI pipeline, these tests post messages onto the pubsub topic and verify that the messages are written to BigQuery successfully.
This is the code that posts to the pubsub topic:
from google.cloud import pubsub_v1
def post_messages(project_id, topic_id, rows)
futures = dict()
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(
project_id, topic_id
)
def get_callback(f, data):
def callback(f):
try:
futures.pop(data)
except:
print("Please handle {} for {}.".format(f.exception(), data))
return callback
for row in rows:
# When you publish a message, the client returns a future. Data must be a bytestring
# ...
# construct a message in var json_data
# ...
message = json.dumps(json_data).encode("utf-8")
future = publisher.publish(
topic_path,
message
)
futures_key = str(message)
futures[futures_key] = future
future.add_done_callback(get_callback(future, futures_key))
# Wait for all the publish futures to resolve before exiting.
while futures:
time.sleep(1)
When we run this test in our CI pipeline it has started failing intermittently with error
21:38:55: AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x7f5247407220>" raised exception!
Traceback (most recent call last):
File "/opt/conda/envs/py3/lib/python3.8/site-packages/grpc/_plugin_wrapping.py", line 89, in __call__
self._metadata_plugin(
File "/opt/conda/envs/py3/lib/python3.8/site-packages/google/auth/transport/grpc.py", line 101, in __call__
callback(self._get_authorization_headers(context), None)
File "/opt/conda/envs/py3/lib/python3.8/site-packages/google/auth/transport/grpc.py", line 87, in _get_authorization_headers
self._credentials.before_request(
File "/opt/conda/envs/py3/lib/python3.8/site-packages/google/auth/credentials.py", line 134, in before_request
self.apply(headers)
File "/opt/conda/envs/py3/lib/python3.8/site-packages/google/auth/credentials.py", line 110, in apply
_helpers.from_bytes(token or self.token)
File "/opt/conda/envs/py3/lib/python3.8/site-packages/google/auth/_helpers.py", line 130, in from_bytes
raise ValueError("***0!r*** could not be converted to unicode".format(value))
ValueError: None could not be converted to unicode
Error: The operation was canceled.
Unfortunately this only fails in our CI pipeline, and even then it is failing intermittently (only fails on a small percentage of all CI pipeline runs). If I run the same test locally it succeeds every time. When running in the CI pipeline the code is authenticating as a service account whereas when I run it locally it is authenticating as myself
I know from the error message that it is failing on this code:
if isinstance(result, six.text_type):
return result
else:
raise ValueError("{0!r} could not be converted to unicode".format(value))
https://github.com/googleapis/google-auth-library-python/blob/3c3fbf40b07e090f2be7fac5b304dbf438b5cd6c/google/auth/_helpers.py#L127-L130
which is in a python library from google that we install using pip.
Clearly the expression:
isinstance(result, six.text_type)
is evaluating to False. I put a breakpoint on that code when I ran it locally and discovered that under normal circumstances (i.e. when it works) the value of result is something like this:
That looks like some sort of auth token.
Given the error message:
ValueError: None could not be converted to unicode
it seems that whatever action is being undertaken by the google authentication libraries it is passing None through to the code shown above.
I am at the bounds of my knowledge here. Given this is only failing in a CI pipeline I don't have the opportunity to put a breakpoint in my code and debug it. Given the call stack in the error message this is something to do with authentication.
I'm hoping someone can advise on a course of action.
Can anyone explain a means by which I can discover why None is being passed through to the code that is raising an error?
We had the same error. Finally solved it by using a JSON Web Token for authentication per Google's Quckstart. Like so:
import json
from google.cloud import pubsub_v1
from google.auth import jwt
def post_messages(credentials_path, topic, list_of_messages):
credentials_dict = json.load(open(credentials_path,'r'))
audience = "https://pubsub.googleapis.com/google.pubsub.v1.Publisher"
credentials_ob = jwt.Credentials.from_service_account_info(
credentials_dict, audience=audience
)
publisher = pubsub_v1.PublisherClient(credentials=credentials_ob)
for message_dict in list_of_message_dicts:
message = json.dumps(message_dict, default=str).encode("utf-8")
future = publisher.publish(topic, message)
We also updated our environment but it didn't fix the ValueError until we changed to jwt. Here's the environment in any case:
google-api-core==2.4.0
google-api-python-client==2.36.0
google-auth==2.3.2
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.6
google-cloud-core==2.1.0
google-cloud-pubsub==2.9.0
Tried the jwt solution above and though it solved the issue, it drastically degraded my write throughput.
Offering another work around that solved this issue for me.
My GOOGLE_APPLICATION_CREDENTIALS env var was set to the location of my key-file. Instead, unset that env variable and, at the start of your process, run
gcloud auth activate-service-account {account_name} --key-file {location_of_key_file}
This allows the google auth to bypass a key file and use the default service account set up (which is now the original, intended service account). Works with normal throughput and zero errors. :)

Get Gurobi IIS using Pyomo and gurobipy

Could someone walk me through the steps to get the IIS from a Pyomo model using gurobipy?
opt = SolverFactory('gurobi',solver_io='python')
As a reference, this is what I use in JuMP
function getIIS(m::JuMP.Model)
grb_model = m.internalModel.inner
num_constrs = Gurobi.num_constrs(grb_model)
Gurobi.computeIIS(grb_model)
iis_constrs = Gurobi.get_intattrarray(grb_model, "IISConstr", 1, num_constrs)
m.linconstr[find(iis_constrs)]
end
So, basically I need access to the internal gurobi model to run the computeIIS function, and then I need a way to map the array of rows to the actual Pyomo constraints.
thanks!
You can pass this as an option to Gurobi when you call
the solve function by using options_string. Then, Gurobi's Model.write() function will write the file. In this case, you would write a .ilp file, but other file formats exist for different purposes. An example:
solver_parameters = "ResultFile=model.ilp" # write an ILP file to print the IIS
Then, you would add options_string when you call the solve function:
results = solver.solve(instance, options_string=solver_parameters)
You can also string multiple options together with the following syntax. Note the leading empty space within the quotes for the LogToConsole and ResultFile options:
solver_parameters = "TimeLimit=60" # set time limit (seconds)
solver_parameters += " LogToConsole=0" # 0 = turn off console output
solver_parameters += " ResultFile=model.ilp" # write a MIP start file to warm start
The documentation found here applies to solving a model with Gurobi, and the examples at the bottom work with any Gurobi file format:
http://www.gurobi.com/documentation/8.0/refman/solving_a_model2.html
Finally, this link explains the different file formats that Gurobi can write: http://www.gurobi.com/documentation/8.0/refman/model_file_formats.html#sec:FileFormats
See this Pyomo example using suffixes. I believe it's doing what you want.
https://github.com/Pyomo/pyomo/blob/master/examples/pyomo/suffixes/gurobi_ampl_iis.py

Textacy - Vectorizer Weighting Error

I've recently found Textacy and as i go through the API reference guide I'm running into an error for the Vectorizer. If i add any options from the API reference I get a TypeError: unexpected keyword argument. I get this error for other options in addition to weighting.
I installed textacy using pip and I'm using Python3 on Ubuntu. Any help is appreciated. Thanks!
vectorizer = textacy.vsm.Vectorizer(weighting='tfidf')
TypeError: __init__() got an unexpected keyword argument 'weighting'
Ran into the same problem. The API documentation does not reflect the current Vectorizer keyword arguments. The Vectorizer now provides different keyword arguments to allow more control over how TF*IDF is applied.
vectorizer = textacy.Vectorizer(tf_type='linear', apply_idf=True, idf_type='smooth')
tf_type applies standard term frequency (TF), apply_idf=True applies the inverse document frequency (IDF). From the repo comments, idf_type='smooth' adds one to each document frequency in order to avoid zero divisions.
To see more information about the options check the comment at line 182 in the repository here: https://github.com/chartbeat-labs/textacy/blob/master/textacy/vsm/vectorizers.py

Telegram bot: a dirctionary of available commands

How to organize a dictionary of available commands for a Telegram bot? How do good programmers do it? I know that writing dozens of if statements is a bad idea, as well as a switch statement.
For now it's implemented using switch:
The bot receives a command
Finds it in a switch
Processes the command
Sends the response to the user
But when there are dozens of commands, the switch operator becomes hard to maintain. What is the common way to solve this problem?
I'm not a Python coder, but it seems that your problem should be solved with with an associative array data structure regardless the language you use. The actual name of the structure may vary from language to language: for example, in C++ it is called map, and in Python it is.. dictionary! Thus, you several times wrote the relevant keyword in your question (even in the original language).
Bearing the above in mind, a sketch of your program may look like this:
#!/usr/bin/python
# Command processing functions:
def func1():
return "Response 1"
def func2():
return "Response 2"
# Commands dictionary:
d = {"cmd1":func1, "cmd2":func2}
# Suppose this command was receiced by the bot:
command_received = "cmd1"
# Processing:
try:
response = d[command_received]()
except KeyError:
response = "Unknown command"
# Sending response:
print response

Flask + SQLAlchemy + pytest - not rolling back my session

There are several similar questions on stack overflow, and I apologize in advance if I'm breaking etiquette by asking another one, but I just cannot seem to come up with the proper set of incantations to make this work.
I'm trying to use Flask + Flask-SQLAlchemy and then use pytest to manage the session such that when the function-scoped pytest fixture is torn down, the current transation is rolled back.
Some of the other questions seem to advocate using the db "drop all and create all" pytest fixture at the function scope, but I'm trying to use the joined session, and use rollbacks, since I have a LOT of tests. This would speed it up considerably.
http://alexmic.net/flask-sqlalchemy-pytest/ is where I found the original idea, and Isolating py.test DB sessions in Flask-SQLAlchemy is one of the questions recommending using function-level db re-creation.
I had also seen https://github.com/mitsuhiko/flask-sqlalchemy/pull/249 , but that appears to have been released with flask-sqlalchemy 2.1 (which I am using).
My current (very small, hopefully immediately understandable) repo is here:
https://github.com/hoopes/flask-pytest-example
There are two print statements - the first (in example/__init__.py) should have an Account object, and the second (in test/conftest.py) is where I expect the db to be cleared out after the transaction is rolled back.
If you pip install -r requirements.txt and run py.test -s from the test directory, you should see the two print statements.
I'm about at the end of my rope here - there must be something I'm missing, but for the life of me, I just can't seem to find it.
Help me, SO, you're my only hope!
You might want to give pytest-flask-sqlalchemy-transactions a try. It's a plugin that exposes a db_session fixture that accomplishes what you're looking for: allows you to run database updates that will get rolled back when the test exits. The plugin is based on Alex Michael's blog post, with some additional support for nested transactions that covers a wider array of user cases. There are also some configuration options for mocking out connectibles in your app so you can run arbitrary methods from your codebase, too.
For test_accounts.py, you could do something like this:
from example import db, Account
class TestAccounts(object):
def test_update_view(self, db_session):
test_acct = Account(username='abc')
db_session.add(test_acct)
db_session.commit()
resp = self.client.post('/update',
data={'a':1},
content_type='application/json')
assert resp.status_code == 200
The plugin needs access to your database through a _db fixture, but since you already have a db fixture defined in conftest.py, you can set up database access easily:
#pytest.fixture(scope='session')
def _db(db):
return db
You can find detail on how to setup and installation in the docs. Hope this helps!
I'm also having issues with the rollback, my code can be found here
After reading some documentation, it seems the begin() function should be called on the session.
So in your case I would update the session fixture to this:
#pytest.yield_fixture(scope='function', autouse=True)
def session(db, request):
"""Creates a new database session for a test."""
db.session.begin()
yield db.session
db.session.rollback()
db.session.remove()
I didn't test this code, but when I try it on my code I get the following error:
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR> File "./venv/lib/python2.7/site-packages/_pytest/main.py", line 90, in wrap_session
INTERNALERROR> session.exitstatus = doit(config, session) or 0
...
INTERNALERROR> File "./venv/lib/python2.7/site-packages/_pytest/python.py", line 59, in filter_traceback
INTERNALERROR> return entry.path != cutdir1 and not entry.path.relto(cutdir2)
INTERNALERROR> AttributeError: 'str' object has no attribute 'relto'
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
from unittest import TestCase
# global application scope. create Session class, engine
Session = sessionmaker()
engine = create_engine('postgresql://...')
class SomeTest(TestCase):
def setUp(self):
# connect to the database
self.connection = engine.connect()
# begin a non-ORM transaction
self.trans = self.connection.begin()
# bind an individual Session to the connection
self.session = Session(bind=self.connection)
def test_something(self):
# use the session in tests.
self.session.add(Foo())
self.session.commit()
def tearDown(self):
self.session.close()
# rollback - everything that happened with the
# Session above (including calls to commit())
# is rolled back.
self.trans.rollback()
# return connection to the Engine
self.connection.close()
sqlalchemy doc has solution for the case