I'm trying a new thing for me, using playwright in google colab.
this combination requires/forces async programming.
I've got a context manager which is able to handle the login and logout called "Login". That works great!
The internal page I'm trying to get to has datasets, with no links, just div's to click on.
the locator (I believe) is working fine and should return multiple elements when combined with .element_handles() I'm assuming.
from playwright.async_api import async_playwright
import asyncio
from IPython.display import Image
import nest_asyncio
nest_asyncio.apply()
# browser is set to webkit in the Login() context manager
...
async def loop_over_datasets(browser=None, page=None):
print("starting")
datasets = page.locator("div.horizontal.clickable")
print("continuing")
datasets = await asyncio.gather(datasets.element_handles())
for ds in datasets:
print(f'inside the loop, ds is {ds}')
print("doesn't get here in tact")
# for each dataset I want to launch a new page where the dataset is clicked but I'll settle for sync programming at this point.
# new_page = await ds.click()
# ds_page = await browser.new_page(new_page)
# ds_page.click()
async def get_all_new_info():
async with Login() as (b,l):
await loop_over_datasets(browser=b,page = l)
asyncio.run(get_all_new_info()) #has to be killed manually or it will run forever.
In the line datasets = await asyncio.gather(datasets.element_handles()) gather() doesn't actually work without await and await never returns
which means I don't get "inside the loop...".
without await I get the "ds" variable but it's not anything I can do something with.
How is this supposed to be used?
Without full code it's a little bit hard to test but wanted to share few things that may help:
datasets = await asyncio.gather(datasets.element_handles())
As far as I can see in Playwright documentation element_handles() returns <List[ElementHandle]> and your are trying to pass this list to asyncio.gather which needs awaitable objects which are coroutines, Tasks, and Futures and probably thats why it's not working, so I would just done
datasets = datasets.element_handles()
Now, I assume you'd like to go through those datasets in an asynchronous manner. You should be able to put the content of the for loop into a coroutine and based on that create tasks that will be executed by gather.
async def process_dataset(ds):
new_page = await ds.click()
ds_page = await browser.new_page(new_page)
ds_page.click()
tasks = []
for ds in datasets:
tasks.append(asyncio.create_task(process_dataset(ds)))
await asyncio.gather(*tasks)
Related
I'm relatively new to using WebDriverJS and trying out a simple script to begin with.
However, am facing a lot of issues and did not find any resources that were helpful.
Scenario being Tested:
Launch browser
Navigate to google.com
Capture Title of the page
Add a wait statement (driver.sleep)
Enter some text in Search box
Here is the code snippet:
var webdriver = require('selenium-webdriver'),
By = webdriver.By,
until = webdriver.until;
var driver = new webdriver.Builder().forBrowser('chrome').build();
driver.get("http://www.google.com");
driver.getTitle().then(function(title) {
console.log("Title is: " + title);
});
console.log('Before sleep');
driver.sleep(10000);
console.log('After sleep');
driver.findElement(By.name('q')).sendKeys("Hello");
Here is the output:
Before sleep
After sleep
DevTools listening on ws://127.0.0.1:52449/devtools/browser/aea4d9eb-20ee-4f10-b53f-c2003c751796
Title is:
As can be seen, it is a very straight forward scenario. However none of it is working as expected.
Below are my queries/ observations:
console.log for Before/ After sleep is executed as the very first statement even before browser is launched whereas it is not clearly the intention.
Title is returned an empty String. No value printed.
driver.sleep() never waited for the specified duration. All commands got immediately executed. How to make driver hard wait when driver.sleep is not working?
Tried adding implicit wait, however that resulted in error as well.
What are the best practices to be followed?
I did not find very many helpful webdriver javascript resources and it is not clear how to proceed.
Any guidance is appreciated. TIA.!
I referred the documentation as well and similar steps are given there. Not sure if there is some issue from my end. https://github.com/SeleniumHQ/selenium/wiki/WebDriverJs
Assuming that you example is written in JavaScript and runs on Node.js, it looks to be as if you would miss all the waiting for asynchronous functions to have finished processing. Please be aware that most functions return a promise and you must wait for the promise to be resolved.
Consider the following example code:
const {Builder, By, Key, until} = require('selenium-webdriver');
(async function example() {
let driver = await new Builder().forBrowser('firefox').build();
try {
await driver.get('http://www.google.com/ncr');
await driver.findElement(By.name('q')).sendKeys('webdriver', Key.RETURN);
await driver.wait(until.titleIs('webdriver - Google Search'), 1000);
} finally {
await driver.quit();
}
})();
Tests using the 'currentURL()' started to randomly fail when the URL changes more than 2 times (in the same test case).
This started after the application was updated (ember-cli) from the version 1.X to the ~2.16.2.
I've tried (With no positive results):
Updating the ember-cli-qunit.
Using import { currentURL } from '#ember/test-helpers';.
Update:
Here is an example of one of those tests (Marked as 'Flaky' the problematic parts):
test('My test', async () => {
await visit('/testing-page-1');
const selectSortId = find('[data-test-select="my-select"]');
equal(selectSortId.val(), "1", "Wrong selection,...");
await selectSortId.val(3).trigger('change');
equal(currentURL(), '/testing-page-2'); // All good here
const firstEditButton = find('[data-test-button="1"]');
await click(firstEditButton);
equal(currentURL(), '/testing-page-3'); // Flaky
const secondButton = find('[data-test-button="2"]');
await click(secondButton);
equal(currentURL(), '/testing-page-4') // Flaky
});
I finally fixed this, and it was not an issue with the ember test suit...
There was an Ember.hash in the app, one of those calls had some data needed to populate our urls with the some parameters.
In production, the call with the data was noticeable slower than the others, so we luckily got the data.
In testing mode, we had this data being overwritten by the other calls.
i.e; it was a matter of a bad design needed to be refactored...
After sending keys to an input field with selenium, the result is not as expected - the keys are inserted in incorrect order.
e.g. send_keys('4242424242424242') -> result is "4224242424242424"
EDIT: On some machines I observe the issue only randomly, 1 case out of 10 attempts. On another machine it is 10/10
This happens specifically with Stripe payment form + I see this problem only in Chrome version 69 (in previous versions it worked OK)
This can be easily reproduced on sample Stripe site: https://stripe.github.io/elements-examples/
Sample python code:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://stripe.github.io/elements-examples/')
driver.switch_to.frame(driver.find_element_by_tag_name('iframe')) # First iframe
cc_input = driver.find_element_by_css_selector('input[name="cardnumber"]')
cc_input.send_keys('4242424242424242')
Result:
I am able to get pass this by sending the keys one by one with slight delay - but this is also not 100% reliable (plus terribly slow)
I am not sure if this is a problem with selenium (3.14.1)/chromedriver (2.41.578737) or if I am doing something wrong.
Any ideas please?
We are having the exact same problem on MacOS and Ubuntu 18.04, as well as on our CI server with protractor 5.4.1 and the same version of selenium and chromedriver. It has only started failing since Chrome 69, worse in v70.
Update - Working (for the moment)
After much further investigation, I remembered that React tends to override change/input events, and that the values in the credit card input, ccv input etc are being rendered from the React Component State, not from just the input value. So I started looking, and found What is the best way to trigger onchange event in react js
Our tests are working (for the moment):
//Example credit input
function creditCardInput (): ElementFinder {
return element(by.xpath('//input[contains(#name, "cardnumber")]'))
}
/// ... snippet of our method ...
await ensureCreditCardInputIsReady()
await stripeInput(creditCardInput, ccNumber)
await stripeInput(creditCardExpiry, ccExpiry)
await stripeInput(creditCardCvc, ccCvc)
await browser.wait(this.hasCreditCardZip(), undefined, 'Should have a credit card zip')
await stripeInput(creditCardZip, ccZip)
await browser.switchTo().defaultContent()
/// ... snip ...
async function ensureCreditCardInputIsReady (): Promise<void> {
await browser.wait(ExpectedConditions.presenceOf(paymentIFrame()), undefined, 'Should have a payment iframe')
await browser.switchTo().frame(await paymentIFrame().getWebElement())
await browser.wait(
ExpectedConditions.presenceOf(creditCardInput()),
undefined,
'Should have a credit card input'
)
}
/**
* SendKeys for the Stripe gateway was having issues in Chrome since version 69. Keys were coming in out of order,
* which resulted in failed tests.
*/
async function stripeInput (inputElement: Function, value: string): Promise<void> {
await browser.executeScript(`
var nativeInputValueSetter = Object.getOwnPropertyDescriptor(window.HTMLInputElement.prototype, "value").set;
nativeInputValueSetter.call(arguments[0], '${value}');
var inputEvent = new Event('input', { bubbles: true});
arguments[0].dispatchEvent(inputEvent);
`, inputElement()
)
await browser.sleep(100)
const typedInValue = await inputElement().getWebElement().getAttribute('value')
if (typedInValue.replace(/\s/g, '') === value) {
return
}
throw new Error(`Failed set '${typedInValue}' on ${inputElement}`)
}
Previous Idea (only worked occasionally):
I have setup a minimal repro using https://stripe.com/docs/stripe-js/elements/quickstart and it succeeds when tests are run sequentially, but not in parallel (we think due to focus/blur issues when switching to the iframes).
Our solution is similar, although we noticed from watching the tests that input.clear() wasn't work on tel inputs which are used in the iframe.
This still fails occasionally, but far less frequently.
/**
* Types a value into an input field, and checks if the value of the input
* matches the expected value. If not, it attempts for `maxAttempts` times to
* type the value into the input again.
*
* This works around an issue with ChromeDriver where sendKeys() can send keys out of order,
* so a string like "0260" gets typed as "0206" for example.
*
* It also works around an issue with IEDriver where sendKeys() can press the SHIFT key too soon
* and cause letters or numbers to be converted to their SHIFT variants, "6" gets typed as "^", for example.
*/
export async function slowlyTypeOutField (
value: string,
inputElement: Function,
maxAttempts = 20
): Promise<void> {
for (let attemptNumber = 0; attemptNumber < maxAttempts; attemptNumber++) {
if (attemptNumber > 0) {
await browser.sleep(100)
}
/*
Executing a script seems to be a lot more reliable in setting these flaky fields than using the sendKeys built-in
method. However, I struggled in finding out which JavaScript events Stripe listens to. So we send the last key to
the input field to trigger all events we need.
*/
const firstPart = value.substring(0, value.length - 1)
const secondPart = value.substring(value.length - 1, value.length)
await browser.executeScript(`
arguments[0].focus();
arguments[0].value = "${firstPart}";
`,
inputElement()
)
await inputElement().sendKeys(secondPart)
const typedInValue = await inputElement().getAttribute('value')
if (typedInValue === value) {
return
}
console.log(`Tried to set value ${value}, but instead set ${typedInValue} on ${inputElement}`)
}
throw new Error(`Failed after ${maxAttempts} attempts to set value on ${inputElement}`)
}
I faced a similar issue in ubuntu 14.04, the following trick helped me.
Have not got any issue since.
First I used the regular send_keys method.
Then I called the execute script to update the value
input_data = "someimputdata"
some_xpath = "//*[contains(#id,'input_fax.number_')]"
element = web_driver_obj.find_element_by_xpath(some_xpath)
element.clear()
element.send_keys(input_data)
web_driver_obj.execute_script("arguments[0].value = '{0}';".format(input_data), element)
Edit
Thanks a lot to #Benno - his answer was correct.
I will just add python solution that worked for me, based on his JS
driver.get('https://stripe.github.io/elements-examples/')
driver.switch_to.frame(driver.find_element_by_tag_name('iframe')) # First iframe
cc_input = driver.find_element_by_css_selector('input[name="cardnumber"]')
value = "4242424242424242"
driver.execute_script('''
input = arguments[0];
var nativeInputValueSetter = Object.getOwnPropertyDescriptor(window.HTMLInputElement.prototype, "value").set;
nativeInputValueSetter.call(input, "{}");
var eventCard = new Event("input", {{bubbles: true}});
input.dispatchEvent(eventCard);
'''.format(value), cc_input)
driver.switch_to.default_content()
driver.quit()
After couple of hours of trying, I gave up and accepted the fact that this is really a random issue and went with a workaround.
Where it is not necessary to update, I will stay with Chrome version < 69
In order to test latest Chrome, I will use React solution
What I've found out
The issue manifested itself mostly on MacOS, quite rarely on Windows (there are most probably other factors in play, this is just an observation)
I've run an experiment with 100 repetitions of filling the form.
Mac - 68 failures
Windows - 6 failures
The cookies/local history (as suggested in comments) do not seem to be the problem. The webdriver always spawned a "clean" instance of the browser, with no cookies or local storage.
Maybe my solution will help for somebody:
I used sendKeys(" 4242424242424242")
Same for cvc field
With a space before string, it actually works for selenide + chrome + java
You could make your own generic SendKeys method that takes the input element and the string you would like to send. The method would split the string into individual characters and then use the selenium sendkeys method on each character.
Adding in some backspaces worked for me for whatever reason:
from selenium.webdriver.common.keys import Keys
my_value = "123"
my_xpath="//input[#class='form-text']"
element = driver.find_element_by_xpath(my_xpath)
element.clear()
element.send_keys(Keys.BACKSPACE * 3, my_value)
I was having the same issue using RSelenium and got the idea to try adding spaces to the credit card number as they appear on the card from #Pavel's answer, since adding a space before the card number didn't work for me.
Using RSelenium this would be:
element$sendKeysToElement(list("4242 4242 4242 4242"))
I am using the following code, which uses the imagesLoaded package with a callback to tell me when an element with a particular csspath has finished loading all of its images:
imagesLoadedScript = "imagesLoaded( '#{csspath}', { background: true }, function(message) { console.log('PHANTOM CLIENT REPORTING: #{csspath} Images Loaded'); return message; })"
imagesLoadedScript = imagesLoadedScript.strip.gsub(/\s+/,' ')
#session.evaluate_script(imagesLoadedScript)
The timing of the console.log statement, on inspection of PhantomJS logs with debug on, indicates that Capybara/Poltergiest is not waiting for the images to load, as expected, before it moves on to the next statement. I also cannot return a true (or false) value from inside the callback as I would like.
Capybara responds with
{"command_id":"678f1e2e-4820-4631-8cd6-413ce6f4b66f","response":"(cyclic structure)"}
Anyone have any ideas on how to return a value from inside a callback in a function executed via evaluate_script?
Many thanks.
TLDR; You can't
evaluate_script doesn't support asynchronous functions - you must return the result you want from the function passed in. One way to do what you want would be to execute the imagesLoaded script and have the callback set a global variable, and then loop on an evaluate_script fetching the result of the global until it's what you want - A very basic implementation would be something like
imagesLoadedScript = "window.allImagesLoaded = false; imagesLoaded( '#{csspath}', { background: true }, function() { window.my_images_loaded = true })"
#session.execute_script(imagesLoadedScript)
while !#session.evaluate_script('window.allImagesLoaded')
sleep 0.05
end
Obviously this could be made more flexible with a timeout ability, etc.
A second option would to write a custom capybara selector type for images with a loaded filter, although with the need for background image checking it would become pretty complicated and probably too slow to be useful.
Just in case someone finds this later.
I did roughly what Thomas Walpole suggested in his answer, in a more roundabout fashion, but taking advantage of Poltergeist's inherent waiting capabilities;
#to check that the target has loaded its images, run images loaded
#after a small timeout to allow the page to get the images
#append a marker div to the dom if the images have successfully loaded
imagesLoadedScript = "var item = document.querySelector('#{csspath}');
window.scroll(0, item.offsetTop);
function imagesDone(path, fn) {
imagesLoaded( path, function(instance) {
console.log('PHANTOM CLIENT REPORTING: ' + path + ' Images Loaded');
fn(true);
})
}
setTimeout(function(){
imagesDone('#{csspath}', function(done) {
var markerDiv = document.createElement('div');
markerDiv.id = 'ImagesLoadedMarker';
document.getElementsByTagName('html')[0].appendChild(markerDiv);
});
}, 1000)"
#then we strip the new lines and spaces that we added to make it readable
imagesLoadedScript = imagesLoadedScript.strip.gsub(/\s+/,' ')
#now we just execute the script as we do not need a return value
#session.execute_script(imagesLoadedScript)
#then we check for the marker, using capybara's inbuilt waiting time
if #session.has_xpath? "//*[#id ='ImagesLoadedMarker']"
Rails.logger.debug "!!!!! PhantomClient: Images Loaded Reporting: #{csspath} Images Loaded: Check Time #{Time.now} !!!!!"
#session.save_screenshot(file_path, :selector => csspath)
else
Rails.logger.debug "!!!!! PhantomClient: Images Loaded Reporting: #{csspath} Images NOT Loaded: Check Time #{Time.now} !!!!!"
#session.save_screenshot(file_path, :selector => csspath)
end
I see that CasperJS has a "download" function and an "on resource received" callback but I do not see the contents of a resource in the callback, and I don't want to download the resource to the filesystem.
I want to grab the contents of the resource so that I can do something with it in my script. Is this possible with CasperJS or PhantomJS?
This problem has been in my way for the last couple of days. The proxy solution wasn't very clean in my environment so I found out where phantomjs's QTNetworking core put the resources when it caches them.
Long story short, here is my gist. You need the cache.js and mimetype.js files:
https://gist.github.com/bshamric/4717583
//for this to work, you have to call phantomjs with the cache enabled:
//usage: phantomjs --disk-cache=true test.js
var page = require('webpage').create();
var fs = require('fs');
var cache = require('./cache');
var mimetype = require('./mimetype');
//this is the path that QTNetwork classes uses for caching files for it's http client
//the path should be the one that has 16 folders labeled 0,1,2,3,...,F
cache.cachePath = '/Users/brandon/Library/Caches/Ofi Labs/PhantomJS/data7/';
var url = 'http://google.com';
page.viewportSize = { width: 1300, height: 768 };
//when the resource is received, go ahead and include a reference to it in the cache object
page.onResourceReceived = function(response) {
//I only cache images, but you can change this
if(response.contentType.indexOf('image') >= 0)
{
cache.includeResource(response);
}
};
//when the page is done loading, go through each cachedResource and do something with it,
//I'm just saving them to a file
page.onLoadFinished = function(status) {
for(index in cache.cachedResources) {
var file = cache.cachedResources[index].cacheFileNoPath;
var ext = mimetype.ext[cache.cachedResources[index].mimetype];
var finalFile = file.replace("."+cache.cacheExtension,"."+ext);
fs.write('saved/'+finalFile,cache.cachedResources[index].getContents(),'b');
}
};
page.open(url, function () {
page.render('saved/google.pdf');
phantom.exit();
});
Then when you call phantomjs, just make sure the cache is enabled:
phantomjs --disk-cache=true test.js
Some notes:
I wrote this for the purpose of getting the images on a page without using the proxy or taking a low res snapshot. QT uses compression on certain text file resources and you will have to deal with the decompression if you use this for text files. Also, I ran a quick test to pull in html resources and it didn't parse the http headers out of the result. But, this is useful to me, hopefully someone else will find it so, modify it if you have problems with a specific content type.
I've found that until the phantomjs matures a bit, according to the issue 158 http://code.google.com/p/phantomjs/issues/detail?id=158 this is a bit of a headache for them.
So you want to do it anyways? I've opted to go a bit higher to accomplish this and have grabbed PyMiProxy over at https://github.com/allfro/pymiproxy, downloaded, installed, set it up, took their example code and made this in proxy.py
from miproxy.proxy import RequestInterceptorPlugin, ResponseInterceptorPlugin, AsyncMitmProxy
from mimetools import Message
from StringIO import StringIO
class DebugInterceptor(RequestInterceptorPlugin, ResponseInterceptorPlugin):
def do_request(self, data):
data = data.replace('Accept-Encoding: gzip\r\n', 'Accept-Encoding:\r\n', 1);
return data
def do_response(self, data):
#print '<< %s' % repr(data[:100])
request_line, headers_alone = data.split('\r\n', 1)
headers = Message(StringIO(headers_alone))
print "Content type: %s" %(headers['content-type'])
if headers['content-type'] == 'text/x-comma-separated-values':
f = open('data.csv', 'w')
f.write(data)
print ''
return data
if __name__ == '__main__':
proxy = AsyncMitmProxy()
proxy.register_interceptor(DebugInterceptor)
try:
proxy.serve_forever()
except KeyboardInterrupt:
proxy.server_close()
Then I fire it up
python proxy.py
Next I execute phantomjs with the proxy specified...
phantomjs --ignore-ssl-errors=yes --cookies-file=cookies.txt --proxy=127.0.0.1:8080 --web-security=no myfile.js
You may want to turn your security on or such, it was needless for me currently as I'm scraping just one source. You should now see a bunch of text flowing through your proxy console and if it lands on something with the mime type of "text/x-comma-separated-values" it'll save it as data.csv. This will also save all the headers and everything, but if you've come this far I'm sure you can figure out how to pop those off.
One other detail, I've found that I've had to disable gzip encoding, I could use zlib and decompress data in gzip from my own apache webserver, but if it comes out of IIS or such the decompression will get errors and I'm not sure about that part of it.
So my power company won't offer me an API? Fine! We do it the hard way!
Did not realize I could grab the source from the document object like this:
casper.start(url, function() {
var js = this.evaluate(function() {
return document;
});
this.echo(js.all[0].outerHTML);
});
More info here.
You can use Casper.debugHTML() to print out contents of a HTML resource:
var casper = require('casper').create();
casper.start('http://google.com/', function() {
this.debugHTML();
});
casper.run();
You can also store the HTML contents in a var using casper.getPageContent(): http://casperjs.org/api.html#casper.getPageContent (available in lastest master)