Scrapy : TypeError: argument of type 'NoneType' is not iterable - scrapy

Whit scrapy, I receive this NoneType error when I launch my spider:
if 'Jockey' in tab_arrivee_th: TypeError: argument of type 'NoneType'
is not iterable
The code works fine in the console test with a list, but not with the response.css.
I think the problem comes from the response_arrivee_th, and I don't understand why, because the 'scrapy shell' gives me a list in return, and it's the same that I use in the test.
def parse(self, response):
tab_arrivee_th = response.css('.arrivees th::text').extract()
# list obtained whit the response.css from above in scrapy shell
# tab_arrivee_th = ['Cl.', 'N°', 'Cheval', 'S/A', 'Œill.', 'Poids', 'Corde', 'Ecart', 'Jockey', 'Entraîneur', 'Tx', 'Récl.', 'Rapp. Ouv.']
if 'Jockey' in tab_arrivee_th:
col_jockey = tab_arrivee_th.index('Jockey') + 1
elif 'Driver' in tab_arrivee_th:
col_jockey = tab_arrivee_th.index('Driver') + 1
else:
col_jockey = 0
jockey = partant.css('td:nth-child(' + str(col_jockey) + ') > a::text').extract()
if 'Jockey' in tab_arrivee_th: TypeError: argument of type 'NoneType'
is not iterable
thx for the help

Solved : the 'response.css('.arrivees th::text').extract()' point to a list construct in js.
So I used scrapy-splash to have a 0.5 second delay. And it works fine.

the response for this line tab_arrivee_th = response.css('.arrivees th::text').extract() is empty , check the response again.

Related

TypeError: 'Value' object is not iterable : iterate around a Dataframe for prediction purpose with GCP Natural Language Model

I'm trying to iterate over a dataframe in order to apply a predict function, which calls a Natural Language Model located on GCP. Here is the loop code :
model = 'XXXXXXXXXXXXXXXX'
barometre_df_processed = barometre_df
barometre_df_processed['theme'] = ''
barometre_df_processed['proba'] = ''
print('DEBUT BOUCLE FOR')
for ind in barometre_df.index:
if barometre_df.verbatim[ind] is np.nan :
barometre_df_processed.theme[ind]="RAS"
barometre_df_processed.proba[ind]="1"
else:
print(barometre_df.verbatim[ind])
print(type(barometre_df.verbatim[ind]))
res = get_prediction(file_path={'text_snippet': {'content': barometre_df.verbatim[ind]},'mime_type': 'text/plain'} },model_name=model)
print(res)
theme = res['displayNames']
proba = res["classification"]["score"]
barometre_df_processed.theme[ind]=theme
barometre_df_processed.proba[ind]=proba
and the get_prediction function that I took from the Natural Language AI Documentation :
def get_prediction(file_path, model_name):
options = ClientOptions(api_endpoint='eu-automl.googleapis.com:443')
prediction_client = automl_v1.PredictionServiceClient(client_options=options)
payload = file_path
# Uncomment the following line (and comment the above line) if want to predict on PDFs.
# payload = pdf_payload(file_path)
parameters_dict = {}
params = json_format.ParseDict(parameters_dict, Value())
request = prediction_client.predict(name=model_name, payload=payload, params=params)
print("fonction prediction")
print(request)
return resultat[0]["displayName"], resultat[0]["classification"]["score"], resultat[1]["displayName"], resultat[1]["classification"]["score"], resultat[2]["displayName"], resultat[2]["classification"]["score"]
I'm doing a loop this way because I want each of my couple [displayNames, score] to create a new line on my final dataframe, to have something like this :
verbatim1, theme1, proba1
verbatim1, theme2, proba2
verbatim1, theme3, proba3
verbatim2, theme1, proba1
verbatim2, theme2, proba2
...
The if barometre_df.verbatim[ind] is np.nan is not causing problems, I just use it to deal with nans, don't take care of it.
The error that I have is this one :
TypeError: 'Value' object is not iterable
I guess the issues is about
res = get_prediction(file_path={'text_snippet': {'content': barometre_df.verbatim[ind]} },model_name=model)
but I can't figure what's goign wrong here.
I already try to remove
,'mime_type': 'text/plain'}
from my get_prediction parameters, but it doesn't change anything.
Does someone knows how to deal with this issue ?
Thank you already.
I think you are not iterating correctly.
The way to iterate through a dataframe is:
for index, row in df.iterrows():
print(row['col1'])

I get this error when i try to use Wolfram Alpha in VS code python ValueError: dictionary update sequence element #0 has length 1; 2 is required

This is my code
import wolframalpha
app_id = '876P8Q-R2PY95YEXY'
client = wolframalpha.Client(app_id)
res = client.query(input('Question: '))
print(next(res.results).text)
the question I tried was 1 + 1
and i run it and then i get this error
Traceback (most recent call last):
File "c:/Users/akshi/Desktop/Xander/Untitled.py", line 9, in <module>
print(next(res.results).text)
File "C:\Users\akshi\AppData\Local\Programs\Python\Python38\lib\site-packages\wolframalpha\__init__.py", line 166, in text
return next(iter(self.subpod)).plaintext
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Please help me
I was getting the same error when I tried to run the same code.
You can refer to "Implementing Wolfram Alpha Search" section of this website for better understanding of how the result was extracted from the dictionary returned.
https://medium.com/#salisuwy/build-an-ai-assistant-with-wolfram-alpha-and-wikipedia-in-python-d9bc8ac838fe
Also, I tried the following code by referring to the above website....hope it might help you :)
import wolframalpha
client = wolframalpha.Client('<your app_id>')
query = str(input('Question: '))
res = client.query(query)
if res['#success']=='true':
pod0=res['pod'][0]['subpod']['plaintext']
print(pod0)
pod1=res['pod'][1]
if (('definition' in pod1['#title'].lower()) or ('result' in pod1['#title'].lower()) or (pod1.get('#primary','false') == 'true')):
result = pod1['subpod']['plaintext']
print(result)
else:
print("No answer returned")

Beaufiulsoup: AttributeError: 'NoneType' object has no attribute 'text' and is not subscriptable

I'm using beautiful soup and I'm getting the error, "AttributeError: 'NoneType' object has no attribute 'get_text'" and also "TypeError: 'NoneType' object is not subscriptable".
I know my code works when I use it to search for a single restaurant. However when I try to make a loop for all restaurants, then I get an error.
Here is my screen recording showing the problem. https://streamable.com/pok13
The rest of the code can be found here: https://pastebin.com/wsv1kfNm
# AttributeError: 'NoneType' object has no attribute 'get_text'
restaurant_address = yelp_containers[yelp_container].find("address", {
"class": 'lemon--address__373c0__2sPac'
}).get_text()
print("restaurant_address: ", restaurant_address)
# TypeError: 'NoneType' object is not subscriptable
restaurant_starCount = yelp_containers[yelp_container].find("div", {
"class": "lemon--div__373c0__1mboc i-stars__373c0__30xVZ i-stars--regular-4__373c0__2R5IO border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I"
})['aria-label']
print("restaurant_starCount: ", restaurant_starCount)
# AttributeError: 'NoneType' object has no attribute 'text'
restaurant_district = yelp_containers[yelp_container].find("div", {
"class": "lemon--div__373c0__1mboc display--inline-block__373c0__25zhW border-color--default__373c0__2xHhl"
}).text
print("restaurant_district: ", restaurant_district)
You are getting the error because your selectors are too specific, and you don't check if the tag was found or not. One solution is loosen the selectors (the lemon--div-XXX... selectors will probably change in the near future anyway):
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import csv
import re
my_url = 'https://www.yelp.com/search?find_desc=Restaurants&find_loc=San%20Francisco%2C%20CA'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
bs = soup(page_html, "html.parser")
yelp_containers = bs.select('li:contains("All Results") ~ li:contains("read more")')
for idx, item in enumerate(yelp_containers, 1):
print("--- Restaurant number #", idx)
restaurant_title = item.h3.get_text(strip=True)
restaurant_title = re.sub(r'^[\d.\s]+', '', restaurant_title)
restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]
restaurant_numReview = item.select_one('[class*="reviewCount"]').get_text(strip=True)
restaurant_numReview = re.sub(r'[^\d.]', '', restaurant_numReview)
restaurant_starCount = item.select_one('[class*="stars"][aria-label]')['aria-label']
restaurant_starCount = re.sub(r'[^\d.]', '', restaurant_starCount)
pr = item.select_one('[class*="priceRange"]')
restaurant_price = pr.get_text(strip=True) if pr else '-'
restaurant_category = [a.get_text(strip=True) for a in item.select('[class*="priceRange"] ~ span a')]
restaurant_district = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[-1]
print(restaurant_title)
print(restaurant_address)
print(restaurant_numReview)
print(restaurant_price)
print(restaurant_category)
print(restaurant_district)
print('-' * 80)
Prints:
--- Restaurant number # 1
Fog Harbor Fish House
Pier 39
5487
$$
['Seafood', 'Bars']
Fisherman's Wharf
--------------------------------------------------------------------------------
--- Restaurant number # 2
The House
1230 Grant Ave
4637
$$$
['Asian Fusion']
North Beach/Telegraph Hill
--------------------------------------------------------------------------------
...and so on.

pyautogui.center, TypeError: 'NoneType' object is not subsriptable

I'm trying to code a program to take care of some boring stuff. When I try to use pyautogui.center() I get an error. Here is an example of the code and the error:
c = pyautogui.locateOnScreen('sample.png')
d = pyautogui.center((c))
d = pyautogui.center((c))
File "C:\Users\\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyscreeze\__init__.py",
line 404, in center
return (coords[0] + int(coords[2] / 2), coords[1] + int(coords[3] / 2)) TypeError: 'NoneType' object is not subscriptable
This could be because of image is not found on screen too. But try this
sample_image = pyautogui.locateAllOnScreen('sample.png')
center_of_image = pyautogui.center(sample_image)
pyautogui.mouseDown(center[0], center[1])
pyautogui.mouseUp(center[0], center[1])

How to properly pass arguments to scrapy spider on scrapinghub?

I am trying to pass paramters to my spider (ideally a Dataframe or csv) with:
self.client = ScrapinghubClient(apikey)
self.project = self.client.get_project()
job = spider.jobs.run()
I tried using the *args and **kwargs argument type but each time I only get the last result. For example:
data = ["1", "2", "3"]
job = spider.jobs.run(data=data)
When I try to print them from inside my spider I only get the element 3:
def __init__(self, **kwargs):
for key in kwargs:
print kwargs[key]
2018-05-17 08:39:28 INFO [stdout] 3
I think that there is some easy explanation that i just can't seem to understand.
Thanks in advance!
For passing arguments and tags you can do like this
priority = randint(0, 4)
job = spider.jobs.run(
units=1,
job_settings=setting,
add_tag=['auto','test', 'somethingelse'],
job_args={'arg1': arg1,'arg2': arg2,'arg3': arg3},
priority=priority
)