selenium get abs url from href attribute

selenium get abs url from href attribute - selenium

when im downloading a page with selenium and process it with java jsoup. I get the hrefs in the source code like this:
Technical Trading
Is there a way to get the absolute url from this or to force selenium to transform it to an absolute url? Updating the links after getting the page doesn't sound like a clean solution.

If you get the href just with selenium, this works as expected:
yourElement.get_attribute('href')
This is a quick sample:
driver = webdriver.Chrome() # note this is my webdriver
driver.implicitly_wait(10)
url = "https://www.duckduckgo.co.uk"
driver.get(url)
aList = driver.find_elements(By.TAG_NAME, 'a')
for a in aList:
print(a.get_attribute('href'))
Output contains:
https://duckduckgo.com/spread
https://duckduckgo.com/spread
https://duckduckgo.com/app
https://duckduckgo.com/app
https://duckduckgo.com/newsletter
https://duckduckgo.com/newsletter
This is how the DOM looks: (it's relative - but gets the full path)

Related

BeautifulSoup can't get all page source sometimes

I'm using Selenium and beautifulSoup4 for scraping. The problem is that my script sometimes 'result'is empty and sometimes no. I don't understand why it's not working sometimes. Is it a security problem in the website or RAM problem ? I have no idea
page_source = BeautifulSoup(driver.page_source, "html.parser")
result= page_source.find_all('div',{'class':'pv-profile-section-pager ember-view'})

your class name can be error somewhere, you can try:
result= page_source.find_all('div',{'class': lambda x: x and 'pv-profile-section-pager' in x})
or iframe html tag can be also a problem here
Select iframe using Python + Selenium

I would suggest to have some delay, cause there's no error as per OP.
put some time.sleep(5)
if you want to do it using Selenium, I would suggest you to have a look on ExplicitWait from Selenium in Python bindings.
Python - Selenium - ExplicitWait
Sample code :
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
finally:
driver.quit()

Can we get to know Browser rendering time for each page in JMeter using webDriver sampler?

I am planning to do a load test with around 220 users ,Client is expecting Browser rendering time as well. So I though 1 will create one script for load test ,and create one more script with Selenium script in JMeter to measure rendering time. So that while executing load test , if I execute selenium script as well. It will give the Browser rendering time.
But as I saw, With Selenium sampler ,Aggregate report shows end to end response time. If i want to know the Browser rendering time of each page ,if there any way to get the breakdown?

You have 2 options:
Use a separate WebDriver Sampler per "page" like:
Alternatively you can use WDS.sampleResult.addSubResult function to add "child" results to a single WebDriver Sampler instance, example code would be something like:
WDS.sampleResult.sampleStart()
var seleniumDev = new org.apache.jmeter.samplers.SampleResult()
seleniumDev.setSampleLabel('Selenium main page')
seleniumDev.sampleStart()
WDS.browser.get('https://selenium.dev')
seleniumDev.setResponseCodeOK()
seleniumDev.setSuccessful(true)
seleniumDev.sampleEnd()
WDS.sampleResult.addSubResult(seleniumDev)
var jmeter = new org.apache.jmeter.samplers.SampleResult()
jmeter.setSampleLabel('JMeter main page')
jmeter.sampleStart()
WDS.browser.get('https://jmeter.apache.org')
jmeter.setResponseCodeOK()
jmeter.setSuccessful(true)
jmeter.sampleEnd()
WDS.sampleResult.addSubResult(jmeter)
WDS.sampleResult.sampleEnd()
resulting in the following:
More information: The WebDriver Sampler: Your Top 10 Questions Answered

Selenium - Unable to find element with xpath

I am trying to find an element on this page. Specifically the bid price in the first row: 196.20p.
I am using selenium and this is my code:
from selenium import webdriver
driver = webdriver.PhantomJS()
address = 'https://www.trustnet.com/factsheets/o/g6ia/ishares-global-property-securities-equity-index-uk'
xpath = '//*[#id="factsheet-tabs"]/fund-tabs/div/div/fund-tab[3]/div/unit-details/div/div/unit-information/div/table[2]/tbody/tr[3]/td[2]'
price = driver.find_element_by_xpath(asset['xpath'])
print price.text
driver.close()
When executed I receive the following error
NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '//*[#id=\"factsheet-tabs\"]/fund-tabs/div/div/fund-tab[3]/div/unit-details/div/div/unit-information/div/table[2]/tbody/tr[3]/td[2]'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"214","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:62727","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"sessionId\": \"8faaff70-af12-11e7-a17c-416247c75eb6\", \"value\": \"//*[#id=\\\"factsheet-tabs\\\"]/fund-tabs/div/div/fund-tab[3]/div/unit-details/div/div/unit-information/div/table[2]/tbody/tr[3]/td[2]\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/8faaff70-af12-11e7-a17c-416247c75eb6/element"}}
Screenshot: available via screen
I have used the same approach, but with different xpath, on yahoo finance and it works fine, but unfortunately the price I am looking for is not available there.

If I didn't fail to understand your requirement then this is the price you wanted to scrape. I used css selector here.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.trustnet.com/factsheets/o/g6ia/ishares-global-property-securities-equity-index-uk')
price = driver.find_element_by_css_selector('[ng-if^="$ctrl.priceInformation.Mid"] td:nth-child(2)').text
print(price.split(" ")[0])
driver.quit()
Result:
196.20p/196.60p
If you wanna stick to xpath then try this:
price = driver.find_element_by_xpath('//*[contains(#ng-if,"$ctrl.priceInformation.Mid")]//td[2]').text

Selenium's WebDriver.execute_script() returns 'None'

My program is having trouble getting an existing class from a webpage using Selenium. It seems that my WebDriver.execute_script function is not working.
import urllib
from selenium import webdriver
#Path to the chromedriver is definitely working fine.
path_to_chromedriver = 'C:\Users\Ben\Desktop\Coding\FreeFoodFinder\chromedriver.exe'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)
url = 'http://www.maidservicetexas.com/'
browser.implicitly_wait(30)
browser.get(url)
content = browser.execute_script("document.getElementsByClassName('content')");
#Just printing the first character of the returned content's toString for now. Don't want the whole thing yet.
#Only ever prints 'N', the first letter of 'None'...so obviously it isn't finding the jsgenerated content even after waiting.
print content
My program returns 'None,' which tells me that the javascript function is not returning a value/being executed. Chrome's web dev tools tell me that 'content' is certainly a valid class name. The webpage isn't even dynamically generated (my eventual goal is to scrape dynamic content, which is why I make my WebDriver wait for 30 seconds before running the script.)

Return the value:
content = browser.execute_script("return document.getElementsByClassName('content');");

Selenium - echo Base URL

How can I read the current value of Base URL in Selenium IDE 2.8.0?
Please suggest a working version of following selenese:
echo ${BASEURL}

Take your instance of the IWebDriver and put .Url after it. This gets the current URL that the driver is associated with. Then you can use whatever output mechanism you want to use. So if you go with the echo...
echo ${driver.URL}
where driver equals your active selenium WebDriver instance. If you want just the root of the URL then you need to do a regex expression on the returned URL and look for the .com/.net/.org and chop anything after that off.
If you are using php you might want to look here: http://forums.phpfreaks.com/topic/175838-extract-base-url-from-entire-url/

1) Create IWebDriver instance
IwebDriver driver = new FireFoxDriver();
2) navigate to URL
driver.navigate().to("");
3) Print the URL
printline("The base URL is " + driver.URL);
(please ignore syntactical and language specific errors)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

selenium get abs url from href attribute - selenium

Related

BeautifulSoup can't get all page source sometimes

Can we get to know Browser rendering time for each page in JMeter using webDriver sampler?

Selenium - Unable to find element with xpath

Selenium's WebDriver.execute_script() returns 'None'

Selenium - echo Base URL

Categories

Resources