I have a soup object like:
class="js-product-discount-item product-discount__item ">
<p class="product-discount__price js-product-discount-price">
<span class="price">3 033 <span class="currency w500">₽<span class="currency_seo">руб.</span></span></span> </p>
I did
soup = BeautifulSoup(src, 'lxml')
price_2 = soup.find(class_='price-discount-value').find(class_='price').text.strip()
x = 2
Result :
3 033 ₽руб.
I'd like to make:
price_3 = price_2/x
I have : TypeError: unsupported operand type(s) for /: 'str' and 'int'
What happens?
You are extracting a string with .text but to use the / operand it should be an int
How to fix?
First at all, clean your string from non digit characters:
...find(class_='price').text.split('₽')[0].replace(' ','')
For calculating convert it with int() to an integer:
int(price_2)/x
Example
Note Changed the find() for these example, cause your question do not provide an correct html
from bs4 import BeautifulSoup
html = '''
<p class="product-discount__price js-product-discount-price">
<span class="price">3 033 <span class="currency w500">₽<span class="currency_seo">руб.</span></span></span>
</p>'''
soup = BeautifulSoup(html, 'lxml')
price_2 = soup.find(class_='product-discount__price').find(class_='price').text.split('₽')[0].replace(' ','')
x = 2
price_3 = int(price_2)/x
print(price_3)
Output
1516.5
Related
How can I get the data-video-id attribute from the below HTML using BeautifulSoup?
<a href="/watch/36242552" class="thumbnail video vod-show play-video-trigger user-can-watch" data-video-id="36242552" data-video-type="show">
The following prints an empty list.
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "lxml")
ids = [tag['data-video-id'] for tag in soup.select('a href[data-video-id]')]
print(ids)
Output:
[]
You are getting empty [] because soup.select('a href[data-video-id]') is return nothing.You could try below code. Hope its help you.
from bs4 import BeautifulSoup
html = """<a href="/watch/36242552" class="thumbnail video vod-show play-video-trigger user-can-watch" data-video-id="36242552" data-video-type="show">"""
# html_content = requests.get(url).text
soup = BeautifulSoup(html, "lxml")
print(soup.select('a href[data-video-id]'))
ids = [tag['data-video-id'] for tag in soup.select('a') if tag['data-video-id']]
print(ids)
im doing the crawling about r6s
like this
from bs4 import BeautifulSoup as bs
import requests
bsURL = "https://r6.tracker.network/profile/pc/Spoit.GODSENT"
respinse = requests.get(bsURL)
html = bs(respinse.text, 'html.parser')
level = html.find_all(class_='trn-defstat__value')
print(level[0])
print Result-->
<div class="trn-defstat__value">
439
</div>
I only want to print numbers.
so i did print(level[0].text)
Result -> none
how can I solve this problem?
Just use .string instead of .text like this:
print(level[0].string)
Output:
439
Hope that this helps!
I want to fetch text between
<label>A</label> class A <br/><label>B</label> class B <br/> <label>C </label> class C <br />
Expected output in Dictionary like data
{'A':'class A','B':'class B','C':'class C'}
You can search for <label> tag and then get next text sibling to it.
For example:
from bs4 import BeautifulSoup
txt = '''<label>A</label> class A <br/><label>B</label> class B <br/> <label>C </label> class C <br />'''
soup = BeautifulSoup(txt, 'html.parser')
data = {label.get_text(strip=True): label.find_next_sibling(text=True).strip() for label in soup.select('label')}
print(data)
Prints:
{'A': 'class A', 'B': 'class B', 'C': 'class C'}
I'm using beautiful soup and I'm getting the error, "AttributeError: 'NoneType' object has no attribute 'get_text'" and also "TypeError: 'NoneType' object is not subscriptable".
I know my code works when I use it to search for a single restaurant. However when I try to make a loop for all restaurants, then I get an error.
Here is my screen recording showing the problem. https://streamable.com/pok13
The rest of the code can be found here: https://pastebin.com/wsv1kfNm
# AttributeError: 'NoneType' object has no attribute 'get_text'
restaurant_address = yelp_containers[yelp_container].find("address", {
"class": 'lemon--address__373c0__2sPac'
}).get_text()
print("restaurant_address: ", restaurant_address)
# TypeError: 'NoneType' object is not subscriptable
restaurant_starCount = yelp_containers[yelp_container].find("div", {
"class": "lemon--div__373c0__1mboc i-stars__373c0__30xVZ i-stars--regular-4__373c0__2R5IO border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I"
})['aria-label']
print("restaurant_starCount: ", restaurant_starCount)
# AttributeError: 'NoneType' object has no attribute 'text'
restaurant_district = yelp_containers[yelp_container].find("div", {
"class": "lemon--div__373c0__1mboc display--inline-block__373c0__25zhW border-color--default__373c0__2xHhl"
}).text
print("restaurant_district: ", restaurant_district)
You are getting the error because your selectors are too specific, and you don't check if the tag was found or not. One solution is loosen the selectors (the lemon--div-XXX... selectors will probably change in the near future anyway):
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import csv
import re
my_url = 'https://www.yelp.com/search?find_desc=Restaurants&find_loc=San%20Francisco%2C%20CA'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
bs = soup(page_html, "html.parser")
yelp_containers = bs.select('li:contains("All Results") ~ li:contains("read more")')
for idx, item in enumerate(yelp_containers, 1):
print("--- Restaurant number #", idx)
restaurant_title = item.h3.get_text(strip=True)
restaurant_title = re.sub(r'^[\d.\s]+', '', restaurant_title)
restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]
restaurant_numReview = item.select_one('[class*="reviewCount"]').get_text(strip=True)
restaurant_numReview = re.sub(r'[^\d.]', '', restaurant_numReview)
restaurant_starCount = item.select_one('[class*="stars"][aria-label]')['aria-label']
restaurant_starCount = re.sub(r'[^\d.]', '', restaurant_starCount)
pr = item.select_one('[class*="priceRange"]')
restaurant_price = pr.get_text(strip=True) if pr else '-'
restaurant_category = [a.get_text(strip=True) for a in item.select('[class*="priceRange"] ~ span a')]
restaurant_district = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[-1]
print(restaurant_title)
print(restaurant_address)
print(restaurant_numReview)
print(restaurant_price)
print(restaurant_category)
print(restaurant_district)
print('-' * 80)
Prints:
--- Restaurant number # 1
Fog Harbor Fish House
Pier 39
5487
$$
['Seafood', 'Bars']
Fisherman's Wharf
--------------------------------------------------------------------------------
--- Restaurant number # 2
The House
1230 Grant Ave
4637
$$$
['Asian Fusion']
North Beach/Telegraph Hill
--------------------------------------------------------------------------------
...and so on.
Im trying to get the value "4" from below html from this website. This is just one of the values from the product list page. I want multiple values in a list form to put it in a dataframe.
<div class="review-stars-on-hover">
<divclass="product-rating">
<divclass="product-rating__meter"alt="4">
<divclass="product-rating__meter-btm">★★★★★</div>
<divclass="product-rating__meter-top"style="width:80%;">★★★★★</div>
</div>
<divclass="product-rating__countedf-font-size--xsmallnsg-text--medium-grey"alt="95">(95)</div>
</div>
</div>...
I tried:
items = soup.select('.grid-item-content')
star = [item.find('div', {'class': 'review-stars-on-hover'}).get('alt') for item in items]
Output(there are 16 products in total in the page, but only none shows up):
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
Any advice please?
Try the following code.However it returns 16 records based on the class you have mentioned but its only having 11 records for the class product-rating__meter.I have provided the check if product-rating__meter class available then print the alt value.
Hope this will help.
from bs4 import BeautifulSoup
import requests
data= requests.get('https://store.nike.com/us/en_us/pw/mens-walking-shoes/7puZ9ypZoi3').content
soup = BeautifulSoup(data, 'lxml')
print("Total element count : " + str(len(soup.find_all('div',class_='grid-item-content'))))
for item in soup.find_all('div',class_='grid-item-content'):
if item.find('div',class_='product-rating__meter'):
print("Alt value : " + item.find('div',class_='product-rating__meter')['alt'])
Output
Total element count : 16
Alt value : 4
Alt value : 4.3
Alt value : 4.6
Alt value : 4.8
Alt value : 4.4
Alt value : 4.7
Alt value : 4.7
Alt value : 3.8
Alt value : 4.5
Alt value : 3.3
Alt value : 4.5
EDITED
from bs4 import BeautifulSoup
import requests
data= requests.get('https://store.nike.com/us/en_us/pw/mens-walking-shoes/7puZ9ypZoi3').content
soup = BeautifulSoup(data, 'lxml')
print("Total element count : " + str(len(soup.find_all('div',class_='grid-item-content'))))
itemlist=[]
for item in soup.find_all('div',class_='grid-item-content'):
if item.find('div',class_='product-rating__meter'):
#print("Alt value : " + item.find('div',class_='product-rating__meter')['alt'])
itemlist.append("Alt value : " + item.find('div',class_='product-rating__meter')['alt'])
print(itemlist)
OutPut:
Total element count : 16
['Alt value : 4', 'Alt value : 4.3', 'Alt value : 4.6', 'Alt value : 4.8', 'Alt value : 4.4', 'Alt value : 4.7', 'Alt value : 4.7', 'Alt value : 3.8', 'Alt value : 4.5', 'Alt value : 3.3', 'Alt value : 4.5']
You can select by taking the first match only for inner class within parent class
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://store.nike.com/us/en_us/pw/mens-walking-shoes/7puZ9ypZoi3')
soup = bs(r.content, 'lxml')
stars = [item.select_one('.product-rating__meter')['alt'] for item in soup.select('.grid-item-box:has(.product-rating__meter)')]
You can write something like below to retrieve all divs with "alt" attribute:
xml = bs.find_all("div", {"alt": True})
And to retrieve the value:
for x in xml:
print(x["alt"])
Or directly like below if you only want the first "alt":
xml = bs.find("div", {"alt": True})["alt"]