I've been using selenium to scrape data on on this site but i have this error
matches = driver.find_element(By.XPATH,'//*[#id="mainContent"]/div[3]/div[1]/div[2]/section/div[1]/ul/li[1]')
SyntaxError: invalid syntax
So i would like to know if my code below is correct to scrape all the season on this site.
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
website = "https://www.premierleague.com/results?co=1&se=363&cl=-1"
driver.get(website)
sleep(10)
element = driver.find_element(By.XPATH,('//*[#class="_24Il51SkQ29P1pCkJOUO-7"]/button')).click()
sleep(10)
WebDriverWait(driver, 20).until(driver.find_element(By.ID,('//*[#id="advertClose"]')).click()
matches = driver.find_element(By.XPATH,'//*
[#id="mainContent"]/div[3]/div[1]/div[2]/section/div[1]/ul/li[1]')
matches = []
for match in matches:
team = matches.find_element(By.XPATH,('//*
[#id="mainContent"]/div[3]/div[1]/div[2]/section/div[1]/ul/li[1]/div/span')).text
scores = matches.find_element(By.XPATH,('//*[#id="mainContent"]/div[3]/div[1]/div[2]/section/div[1]/ul/li[1]/div/span/span[1]/span[2]')).text
print(match.text)
Related
I am trying to webscrape this website: https://va.betway.com/sports/category/basketball/usa/nba?tab=matches. I am unable to get this elementImage of game stats.
This is my code snippet:
s = Service("./drivers/geckodriver")
options = FirefoxOptions()
options.headless = True
browser = webdriver.Firefox(service=s,options = options)
browser.get(website_hr)
print('Title: %s' % browser.title)
player_prop0 = browser.find_elements(by='id',value = 'root')
player_prop2 = browser.find_elements(by=By.CLASS_NAME,value = 'sc-eZMymg ktcjyc')
I get no value from player_prop2 and player_prop0.
enter image description here
How can I get the data on this page? Thank you
I tried using ID and class to get the game lines for the NBA games
I'm trying to make a scraper for capterra. I'm having issues getting blocked, so I think I need a proxy for my driver.get. Also, I am having trouble exporting a dataframe to a CSV. The first half of my code (not attached) is able to get all the links and store them in a list that I am trying to access with Selenium to get the information I want, but the second part is where I am having trouble.
For an example, these are the types of links I am storing in the plinks list and that the driver is accessing:
https://www.capterra.com/p/212448/Blackbaud-Altru/
https://www.capterra.com/p/80509/Volgistics-Volunteer-Management/
https://www.capterra.com/p/179048/One-Earth/
for link in plinks:
driver.get(link)
#driver.implicitly_wait(20)
companyProfile = bs(driver.page_source, 'html.parser')
try:
name = companyProfile.find("h1", class_="sm:nb-type-2xl nb-type-xl").text
except AttributeError:
name = "couldn't find"
try:
reviews = companyProfile.find("div", class_="nb-ml-3xs").text
except AttributeError:
reviews = "couldn't find"
try:
location = driver.find_element(By. XPATH, "//*[starts-with(., 'Located in')]").text
except NoSuchElementException:
location = "couldn't find"
try:
url = driver.find_element(By. XPATH, "//*[starts-with(., 'http')]").text
except NoSuchElementException:
url = "couldn't find"
try:
features = [x.get_text() for x in companyProfile.select('[id="LoadableProductFeaturesSection"] li span')]
except AttributeError:
features = "couldn't find"
companyInfo.append([name, reviews, location, url, features])
companydf = pd.DataFrame(companyInfo, columns = ["Name", "Reviews", "Location", "URL", "Features"])
companydf.to_csv(wmtest.csv, sep='\t')
driver.close()
I am using Mozilla for the webdriver, and I am happy to change to Chrome if it works better, but is it possible to have the webdriver pick from a random set of proxies for each get request?
Thanks!
I am trying to extract userid, rating and review from the following site using selenium and it is showing "Invalid selector error". I think, the Xpath I have tried to define to get the review text is the reason for error. But I am unable to resolve the issue. The site link is as below:
teslamotor review
The code that I have used is following:
#Class for Review webscraping from consumeraffairs.com site
class CarForumCrawler():
def __init__(self, start_link):
self.link_to_explore = start_link
self.comments = pd.DataFrame(columns = ['rating','user_id','comments'])
self.driver = webdriver.Chrome(executable_path=r'C:/Users/mumid/Downloads/chromedriver/chromedriver.exe')
self.driver.get(self.link_to_explore)
self.driver.implicitly_wait(5)
self.extract_data()
self.save_data_to_file()
def extract_data(self):
ids = self.driver.find_elements_by_xpath("//*[contains(#id,'review-')]")
comment_ids = []
for i in ids:
comment_ids.append(i.get_attribute('id'))
for x in comment_ids:
#Extract dates from for each user on a page
user_rating = self.driver.find_elements_by_xpath('//*[#id="' + x +'"]/div[1]/div/img')[0]
rating = user_rating.get_attribute('data-rating')
#Extract user ids from each user on a page
userid_element = self.driver.find_elements_by_xpath('//*[#id="' + x +'"]/div[2]/div[2]/strong')[0]
userid = userid_element.get_attribute('itemprop')
#Extract Message for each user on a page
user_message = self.driver.find_elements_by_xpath('//*[#id="' + x +'"]]/div[3]/p[2]/text()')[0]
comment = user_message.text
#Adding date, userid and comment for each user in a dataframe
self.comments.loc[len(self.comments)] = [rating,userid,comment]
def save_data_to_file(self):
#we save the dataframe content to a CSV file
self.comments.to_csv ('Tesla_rating-6.csv', index = None, header=True)
def close_spider(self):
#end the session
self.driver.quit()
try:
url = 'https://www.consumeraffairs.com/automotive/tesla_motors.html'
mycrawler = CarForumCrawler(url)
mycrawler.close_spider()
except:
raise
The error that I am getting is as following:
Also, The xpath that I tried to trace is from following HTML
You are seeing the classic error of...
as find_elements_by_xpath('//*[#id="' + x +'"]]/div[3]/p[2]/text()')[0] would select the attributes, instead you need to pass an xpath expression that selects elements.
You need to change as:
user_message = self.driver.find_elements_by_xpath('//*[#id="' + x +'"]]/div[3]/p[2]')[0]
References
You can find a couple of relevant detailed discussions in:
invalid selector: The result of the xpath expression "//a[contains(#href, 'mailto')]/#href" is: [object Attr] getting the href attribute with Selenium
I have made a youtube automation bot. I am getting error : unable to locate element (for the Xpath of subscribe button)
here is my code
from selenium import webdriver
from selenium import common
from selenium.webdriver.common import keys
from webdriver_manager.firefox import GeckoDriverManager
import time
class actions:
def __init__(self, email, password):
self.email = email
self.password = password
profile = webdriver.FirefoxProfile()
profile.set_preference("dom.webdriver.enabled", False)
profile.set_preference('useAutomationExtension', False)
profile.update_preferences()
driver = webdriver.Firefox(
executable_path=GeckoDriverManager().install(), firefox_profile=profile)
self.bot = driver
# self.bot.maximize_window()
self.bot.set_window_size(400, 700)
self.is_logged_in = False
def login(self):
bot = self.bot
bot.get("https://accounts.google.com/signin/v2/identifier?service=youtube&uilel=3&passive=true&continue=https%3A%2F%2Fwww.youtube.com%2Fsignin%3Faction_handle_signin%3Dtrue%26app%3Ddesktop%26hl%3Den%26next%3Dhttps%253A%252F%252Fwww.youtube.com%252F&hl=en&ec=65620&flowName=GlifWebSignIn&flowEntry=ServiceLogin")
time.sleep(5)
try:
email = bot.find_element_by_name('identifier')
except common.exceptions.NoSuchElementException:
time.sleep(5)
email = bot.find_element_by_name('identifier')
email.clear()
email.send_keys(self.email + keys.Keys.RETURN)
time.sleep(5)
try:
password = bot.find_element_by_name('password')
except common.exceptions.NoSuchElementException:
time.sleep(5)
password = bot.find_element_by_name('password')
password.clear()
password.send_keys(self.password + keys.Keys.RETURN)
time.sleep(5)
self.is_logged_in = True
def kill(self):
bot = self.bot
bot.quit()
def subscribe(self, url):
if not self.is_logged_in:
return
bot = self.bot
bot.get(url)
time.sleep(4)
try:
value = bot.find_element_by_xpath(
'/html/body/ytd-app/div/ytd-page-manager/ytd-watch-flexy/div[5]/div[1]/div/div[7]/div[2]/ytd-video-secondary-info-renderer/div/div/div/ytd-subscribe-button-renderer/tp-yt-paper-button').get_attribute('aria-label')
value = value.split()
except:
bot.execute_script(
'window.scrollTo(0,document.body.scrollHeight/3.5)')
time.sleep(3)
value = bot.find_element_by_xpath(
'/html/body/ytd-app/div/ytd-page-manager/ytd-watch-flexy/div[5]/div[1]/div/div[7]/div[2]/ytd-video-secondary-info-renderer/div/div/div/ytd-subscribe-button-renderer/tp-yt-paper-button').get_attribute('aria-label')
value = value.split(':')
if value[0] == "Subscribe":
try:
bot.find_element_by_xpath(
'/html/body/ytd-app/div/ytd-page-manager/ytd-watch-flexy/div[5]/div[1]/div/div[7]/div[2]/ytd-video-secondary-info-renderer/div/div/div/ytd-subscribe-button-renderer/tp-yt-paper-button').click()
time.sleep(3)
except:
bot.execute_script(
'window.scrollTo(0,document.body.scrollHeight/3.5)')
time.sleep(3)
bot.find_element_by_xpath(
'/html/body/ytd-app/div/ytd-page-manager/ytd-watch-flexy/div[5]/div[1]/div/div[7]/div[2]/ytd-video-secondary-info-renderer/div/div/div/ytd-subscribe-button-renderer/tp-yt-paper-button').click()
time.sleep(3)
how can i resolve this issue. I am not able to understand where things are going wrong. Or i should try find elements by id or other ways instead of Xpath.
Or is there any problem with any software.
Please help me out
Always use relative XPath in your test. Using the absolute XPath will cause regular test failures.
Refer to this tutorial about writing the relative XPaths. https://www.guru99.com/xpath-selenium.html
This extension will help you to write the relative XPaths. https://chrome.google.com/webstore/detail/chropath/ljngjbnaijcbncmcnjfhigebomdlkcjo
You can refer how to write XPath in different ways using functions like text(), starts-with(), contains(). so you can locate them by visible texts also.
Refer this articlehere
I am trying to scrape the stock data but even though I'm using the "find elements by id" the result is one text.
i have tried various methods such as find elements by xpath and etc..
and i tried to make an array that contains all the IDs by finding "attribute 'target'" so i can loop through it but i wasn't successful so i had to code each ID.
import json
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
url = 'http://www.tsetmc.com/Loader.aspx?ParTree=15131F'
delay = 100
driver = webdriver.Chrome()
driver.get(url)
WebDriverWait(driver, delay)
zapna = driver.find_elements_by_id(id_='43479730079120887')
renik = driver.find_elements_by_id(id_='33854964748757477')
retko = driver.find_elements_by_id(id_='3823243780502959')
rampna = driver.find_elements_by_id(id_='67126881188552864')
mafakher = driver.find_elements_by_id(id_='4247709727327181')
for ii in retko:
print(ii.text , "\n")
driver.close()
and the result is:
رتكوكنترلخوردگيتكينكو2,1512.531M63.044 B25,14523,88824,900-245-0.9724,907-238-0.9523,88825,699-749-33.2512,55324,90024,9035,4601
what i expect is:
رتكو
كنترلخوردگيتكينكو
2,151
2.531M
63.044 B
25,145
23,888
24,900
-245
-0.97
24,907
-238
-0.95
23,888
25,699
-749
-33.25
1
2,553
24,900
24,903
5,460
1
any idea ?
You just have to go one layer deeper (using, for example, xpath) and iterate through the children:
for ii in retko:
targets = ii.find_elements_by_xpath('.//*')
for target in targets:
print(target.text)
Output:
رتكو
رتكو
كنترلخوردگيتكينكو
كنترلخوردگيتكينكو
3,149
3.235M
3.235M
etc.