How can I get a value from an attribute inside a tag as a int - beautifulsoup

I have a soup object like:
class="js-product-discount-item product-discount__item ">
<p class="product-discount__price js-product-discount-price">
<span class="price">3 033 <span class="currency w500">₽<span class="currency_seo">руб.</span></span></span> </p>
I did
soup = BeautifulSoup(src, 'lxml')
price_2 = soup.find(class_='price-discount-value').find(class_='price').text.strip()
x = 2
Result :
3 033 ₽руб.
I'd like to make:
price_3 = price_2/x
I have : TypeError: unsupported operand type(s) for /: 'str' and 'int'

What happens?
You are extracting a string with .text but to use the / operand it should be an int
How to fix?
First at all, clean your string from non digit characters:
...find(class_='price').text.split('₽')[0].replace(' ','')
For calculating convert it with int() to an integer:
int(price_2)/x
Example
Note Changed the find() for these example, cause your question do not provide an correct html
from bs4 import BeautifulSoup
html = '''
<p class="product-discount__price js-product-discount-price">
<span class="price">3 033 <span class="currency w500">₽<span class="currency_seo">руб.</span></span></span>
</p>'''
soup = BeautifulSoup(html, 'lxml')
price_2 = soup.find(class_='product-discount__price').find(class_='price').text.split('₽')[0].replace(' ','')
x = 2
price_3 = int(price_2)/x
print(price_3)
Output
1516.5

Related

How to find_all(data-video-id) from html using beautiful soup

How can I get the data-video-id attribute from the below HTML using BeautifulSoup?
<a href="/watch/36242552" class="thumbnail video vod-show play-video-trigger user-can-watch" data-video-id="36242552" data-video-type="show">
The following prints an empty list.
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "lxml")
ids = [tag['data-video-id'] for tag in soup.select('a href[data-video-id]')]
print(ids)
Output:
[]
You are getting empty [] because soup.select('a href[data-video-id]') is return nothing.You could try below code. Hope its help you.
from bs4 import BeautifulSoup
html = """<a href="/watch/36242552" class="thumbnail video vod-show play-video-trigger user-can-watch" data-video-id="36242552" data-video-type="show">"""
# html_content = requests.get(url).text
soup = BeautifulSoup(html, "lxml")
print(soup.select('a href[data-video-id]'))
ids = [tag['data-video-id'] for tag in soup.select('a') if tag['data-video-id']]
print(ids)

crawling result is null

im doing the crawling about r6s
like this
from bs4 import BeautifulSoup as bs
import requests
bsURL = "https://r6.tracker.network/profile/pc/Spoit.GODSENT"
respinse = requests.get(bsURL)
html = bs(respinse.text, 'html.parser')
level = html.find_all(class_='trn-defstat__value')
print(level[0])
print Result-->
<div class="trn-defstat__value">
439
</div>
I only want to print numbers.
so i did print(level[0].text)
Result -> none
how can I solve this problem?
Just use .string instead of .text like this:
print(level[0].string)
Output:
439
Hope that this helps!

Python in Beautiful Soup

I want to fetch text between
<label>A</label> class A <br/><label>B</label> class B <br/> <label>C </label> class C <br />
Expected output in Dictionary like data
{'A':'class A','B':'class B','C':'class C'}
You can search for <label> tag and then get next text sibling to it.
For example:
from bs4 import BeautifulSoup
txt = '''<label>A</label> class A <br/><label>B</label> class B <br/> <label>C </label> class C <br />'''
soup = BeautifulSoup(txt, 'html.parser')
data = {label.get_text(strip=True): label.find_next_sibling(text=True).strip() for label in soup.select('label')}
print(data)
Prints:
{'A': 'class A', 'B': 'class B', 'C': 'class C'}

Beaufiulsoup: AttributeError: 'NoneType' object has no attribute 'text' and is not subscriptable

I'm using beautiful soup and I'm getting the error, "AttributeError: 'NoneType' object has no attribute 'get_text'" and also "TypeError: 'NoneType' object is not subscriptable".
I know my code works when I use it to search for a single restaurant. However when I try to make a loop for all restaurants, then I get an error.
Here is my screen recording showing the problem. https://streamable.com/pok13
The rest of the code can be found here: https://pastebin.com/wsv1kfNm
# AttributeError: 'NoneType' object has no attribute 'get_text'
restaurant_address = yelp_containers[yelp_container].find("address", {
"class": 'lemon--address__373c0__2sPac'
}).get_text()
print("restaurant_address: ", restaurant_address)
# TypeError: 'NoneType' object is not subscriptable
restaurant_starCount = yelp_containers[yelp_container].find("div", {
"class": "lemon--div__373c0__1mboc i-stars__373c0__30xVZ i-stars--regular-4__373c0__2R5IO border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I"
})['aria-label']
print("restaurant_starCount: ", restaurant_starCount)
# AttributeError: 'NoneType' object has no attribute 'text'
restaurant_district = yelp_containers[yelp_container].find("div", {
"class": "lemon--div__373c0__1mboc display--inline-block__373c0__25zhW border-color--default__373c0__2xHhl"
}).text
print("restaurant_district: ", restaurant_district)
You are getting the error because your selectors are too specific, and you don't check if the tag was found or not. One solution is loosen the selectors (the lemon--div-XXX... selectors will probably change in the near future anyway):
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import csv
import re
my_url = 'https://www.yelp.com/search?find_desc=Restaurants&find_loc=San%20Francisco%2C%20CA'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
bs = soup(page_html, "html.parser")
yelp_containers = bs.select('li:contains("All Results") ~ li:contains("read more")')
for idx, item in enumerate(yelp_containers, 1):
print("--- Restaurant number #", idx)
restaurant_title = item.h3.get_text(strip=True)
restaurant_title = re.sub(r'^[\d.\s]+', '', restaurant_title)
restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]
restaurant_numReview = item.select_one('[class*="reviewCount"]').get_text(strip=True)
restaurant_numReview = re.sub(r'[^\d.]', '', restaurant_numReview)
restaurant_starCount = item.select_one('[class*="stars"][aria-label]')['aria-label']
restaurant_starCount = re.sub(r'[^\d.]', '', restaurant_starCount)
pr = item.select_one('[class*="priceRange"]')
restaurant_price = pr.get_text(strip=True) if pr else '-'
restaurant_category = [a.get_text(strip=True) for a in item.select('[class*="priceRange"] ~ span a')]
restaurant_district = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[-1]
print(restaurant_title)
print(restaurant_address)
print(restaurant_numReview)
print(restaurant_price)
print(restaurant_category)
print(restaurant_district)
print('-' * 80)
Prints:
--- Restaurant number # 1
Fog Harbor Fish House
Pier 39
5487
$$
['Seafood', 'Bars']
Fisherman's Wharf
--------------------------------------------------------------------------------
--- Restaurant number # 2
The House
1230 Grant Ave
4637
$$$
['Asian Fusion']
North Beach/Telegraph Hill
--------------------------------------------------------------------------------
...and so on.

getting the alt value in the div tag using beautifulsoup

Im trying to get the value "4" from below html from this website. This is just one of the values from the product list page. I want multiple values in a list form to put it in a dataframe.
<div class="review-stars-on-hover">
<divclass="product-rating">
<divclass="product-rating__meter"alt="4">
<divclass="product-rating__meter-btm">★★★★★</div>
<divclass="product-rating__meter-top"style="width:80%;">★★★★★</div>
</div>
<divclass="product-rating__countedf-font-size--xsmallnsg-text--medium-grey"alt="95">(95)</div>
</div>
</div>...
I tried:
items = soup.select('.grid-item-content')
star = [item.find('div', {'class': 'review-stars-on-hover'}).get('alt') for item in items]
Output(there are 16 products in total in the page, but only none shows up):
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
Any advice please?
Try the following code.However it returns 16 records based on the class you have mentioned but its only having 11 records for the class product-rating__meter.I have provided the check if product-rating__meter class available then print the alt value.
Hope this will help.
from bs4 import BeautifulSoup
import requests
data= requests.get('https://store.nike.com/us/en_us/pw/mens-walking-shoes/7puZ9ypZoi3').content
soup = BeautifulSoup(data, 'lxml')
print("Total element count : " + str(len(soup.find_all('div',class_='grid-item-content'))))
for item in soup.find_all('div',class_='grid-item-content'):
if item.find('div',class_='product-rating__meter'):
print("Alt value : " + item.find('div',class_='product-rating__meter')['alt'])
Output
Total element count : 16
Alt value : 4
Alt value : 4.3
Alt value : 4.6
Alt value : 4.8
Alt value : 4.4
Alt value : 4.7
Alt value : 4.7
Alt value : 3.8
Alt value : 4.5
Alt value : 3.3
Alt value : 4.5
EDITED
from bs4 import BeautifulSoup
import requests
data= requests.get('https://store.nike.com/us/en_us/pw/mens-walking-shoes/7puZ9ypZoi3').content
soup = BeautifulSoup(data, 'lxml')
print("Total element count : " + str(len(soup.find_all('div',class_='grid-item-content'))))
itemlist=[]
for item in soup.find_all('div',class_='grid-item-content'):
if item.find('div',class_='product-rating__meter'):
#print("Alt value : " + item.find('div',class_='product-rating__meter')['alt'])
itemlist.append("Alt value : " + item.find('div',class_='product-rating__meter')['alt'])
print(itemlist)
OutPut:
Total element count : 16
['Alt value : 4', 'Alt value : 4.3', 'Alt value : 4.6', 'Alt value : 4.8', 'Alt value : 4.4', 'Alt value : 4.7', 'Alt value : 4.7', 'Alt value : 3.8', 'Alt value : 4.5', 'Alt value : 3.3', 'Alt value : 4.5']
You can select by taking the first match only for inner class within parent class
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://store.nike.com/us/en_us/pw/mens-walking-shoes/7puZ9ypZoi3')
soup = bs(r.content, 'lxml')
stars = [item.select_one('.product-rating__meter')['alt'] for item in soup.select('.grid-item-box:has(.product-rating__meter)')]
You can write something like below to retrieve all divs with "alt" attribute:
xml = bs.find_all("div", {"alt": True})
And to retrieve the value:
for x in xml:
print(x["alt"])
Or directly like below if you only want the first "alt":
xml = bs.find("div", {"alt": True})["alt"]