Related
I have installed SenseNet version 6.5 (Code from codeplex). Wanted to upload the files in content repositry using Sensenet Client API, unfortunately it is not working with bulk upload.
string [] fileEntries = Directory.GetFiles(#"C:\Users\conyna\Downloads\Chirag");
foreach (string fileName in fileEntries)
{
using (Stream fs = File.OpenRead(fileName))
{
string fn = Path.GetFileName(fileName);
Task<SenseNet.Client.Content> x = SenseNet.Client.Content.UploadAsync("/Root/Sites/Default_Site/workspaces/(apps)/DocumentLibrary", fn, fs);
}
}
There are two problems with the code above:
you have to 'await' for async methods. Currently you start the task with the UploadAsync method, but you do not wait for it to finish, which casuses problems, because the file stream closes immediately after starting the upload task. Please upload files in an async way (of course you'll have to make your caller method async too, but that is the point of using an async api):
await Content.UploadAsync(...)
You may also consider using the Importer class in the client, it is able to import full directory structures.
You are trying to upload into an (apps) folder, which is not a correct target, that was designed to contain applications (mostly pages). It would be better if you uploaded into a document library in a workspace, for example:
/Root/Sites/Default_Site/workspaces/Document/SampleWorkspace/DocumentLibrary
We created a small application with SN ClientLibrary. I think, you can use this application/information/code.
This application can upload entire folders via Client Libray. Please check it out my Github repository: https://github.com/marosvolgyiz/SNClientLibraryUploader
There is relevant upload method:
public async Task Upload()
{
try
{
Console.WriteLine("Initilization...");
ClientContext.Initialize(new[] { sctx });
Console.WriteLine("Upload Started");
//Is Parent exists
var content = await Content.LoadAsync(Target);
if (content != null)
{
//Uploading files
var tasks = new List<Task>();
foreach (var file in Files)
{
string fileTargetFolder = Target + file.DirectoryName.Replace(Source, "").Replace(BaseDirectory, "").Replace("\\", "/");
var fileTargetContentFolder = await Content.LoadAsync(fileTargetFolder);
if (fileTargetContentFolder == null)
{
if (CreateFolderPath(Target, file.DirectoryName.Replace(Source, "")))
{
fileTargetContentFolder = await Content.LoadAsync(fileTargetFolder);
Console.WriteLine("#Upload file: " + file.FullName);
tasks.Add(Content.UploadAsync(fileTargetContentFolder.Id, file.Name, file.OpenRead()));
LoggerClass.LogToCSV("File uploaded", file.Name);
}
else
{
LoggerClass.LogToCSV("File target folder does not exist or you do not have enough permission to see! File can not be uploaded. ", file.Name);
}
}
else
{
Console.WriteLine("#Upload file: " + file.FullName);
tasks.Add(Content.UploadAsync(fileTargetContentFolder.Id, file.Name, file.OpenRead()));
LoggerClass.LogToCSV("File uploaded", file.Name);
}
}
await Task.WhenAll(tasks);
}
else
{
Console.WriteLine("Target does not exist or you do not have enough permission to see!");
LoggerClass.LogToCSV("Target does not exist or you do not have enough permission to see!");
}
Console.WriteLine("Upload finished.");
}
catch (Exception ex)
{
LoggerClass.LogToCSV(ex.Message);
}
}
I hope my answer is helpful to you.
Br,
maros
Can anyone let me how can I make selenium wait until the time the page loads completely? I want something generic, I know I can configure WebDriverWait and call something like 'find' to make it wait but I don't go that far. I just need to test that the page loads successfully and move on to next page to test.
I found something in .net but couldn't make it work in java ...
IWait<IWebDriver> wait = new OpenQA.Selenium.Support.UI.WebDriverWait(driver, TimeSpan.FromSeconds(30.00));
wait.Until(driver1 => ((IJavaScriptExecutor)driver).ExecuteScript("return document.readyState").Equals("complete"));
Any thoughts anyone?
Your suggested solution only waits for DOM readyState to signal complete. But Selenium by default tries to wait for those (and a little bit more) on page loads via the driver.get() and element.click() methods. They are already blocking, they wait for the page to fully load and those should be working ok.
Problem, obviously, are redirects via AJAX requests and running scripts - those can't be caught by Selenium, it doesn't wait for them to finish. Also, you can't reliably catch them via readyState - it waits for a bit, which can be useful, but it will signal complete long before all the AJAX content is downloaded.
There is no general solution that would work everywhere and for everyone, that's why it's hard and everyone uses something a little bit different.
The general rule is to rely on WebDriver to do his part, then use implicit waits, then use explicit waits for elements you want to assert on the page, but there's a lot more techniques that can be done. You should pick the one (or a combination of several of them) that works best in your case, on your tested page.
See my two answers regarding this for more information:
How I can check whether page is loaded completely or not in web driver
Selenium Webdriver : Wait for complex page with javascript to load
Try this code:
driver.manage().timeouts().pageLoadTimeout(10, TimeUnit.SECONDS);
The above code will wait up to 10 seconds for page loading. If the page loading exceeds the time it will throw the TimeoutException. You catch the exception and do your needs. I am not sure whether it quits the page loading after the exception thrown. i didn't try this code yet. Want to just try it.
This is an implicit wait. If you set this once it will have the scope until the Web Driver instance destroy.
See the documentation for WebDriver.Timeouts for more info.
This is a working Java version of the example you gave :
void waitForLoad(WebDriver driver) {
new WebDriverWait(driver, 30).until((ExpectedCondition<Boolean>) wd ->
((JavascriptExecutor) wd).executeScript("return document.readyState").equals("complete"));
}
Example For c#:
public static void WaitForLoad(IWebDriver driver, int timeoutSec = 15)
{
IJavaScriptExecutor js = (IJavaScriptExecutor)driver;
WebDriverWait wait = new WebDriverWait(driver, new TimeSpan(0, 0, timeoutSec));
wait.Until(wd => js.ExecuteScript("return document.readyState").ToString() == "complete");
}
Example for PHP:
final public function waitUntilDomReadyState(RemoteWebDriver $webDriver): void
{
$webDriver->wait()->until(function () {
return $webDriver->executeScript('return document.readyState') === 'complete';
});
}
Here's my attempt at a completely generic solution, in Python:
First, a generic "wait" function (use a WebDriverWait if you like, I find them ugly):
def wait_for(condition_function):
start_time = time.time()
while time.time() < start_time + 3:
if condition_function():
return True
else:
time.sleep(0.1)
raise Exception('Timeout waiting for {}'.format(condition_function.__name__))
Next, the solution relies on the fact that selenium records an (internal) id-number for all elements on a page, including the top-level <html> element. When a page refreshes or loads, it gets a new html element with a new ID.
So, assuming you want to click on a link with text "my link" for example:
old_page = browser.find_element_by_tag_name('html')
browser.find_element_by_link_text('my link').click()
def page_has_loaded():
new_page = browser.find_element_by_tag_name('html')
return new_page.id != old_page.id
wait_for(page_has_loaded)
For more Pythonic, reusable, generic helper, you can make a context manager:
from contextlib import contextmanager
#contextmanager
def wait_for_page_load(browser):
old_page = browser.find_element_by_tag_name('html')
yield
def page_has_loaded():
new_page = browser.find_element_by_tag_name('html')
return new_page.id != old_page.id
wait_for(page_has_loaded)
And then you can use it on pretty much any selenium interaction:
with wait_for_page_load(browser):
browser.find_element_by_link_text('my link').click()
I reckon that's bulletproof! What do you think?
More info in a blog post about it here.
I had a similar problem. I needed to wait until my document was ready but also until all Ajax calls had finished. The second condition proved to be difficult to detect. In the end I checked for active Ajax calls and it worked.
Javascript:
return (document.readyState == 'complete' && jQuery.active == 0)
Full C# method:
private void WaitUntilDocumentIsReady(TimeSpan timeout)
{
var javaScriptExecutor = WebDriver as IJavaScriptExecutor;
var wait = new WebDriverWait(WebDriver, timeout);
// Check if document is ready
Func<IWebDriver, bool> readyCondition = webDriver => javaScriptExecutor
.ExecuteScript("return (document.readyState == 'complete' && jQuery.active == 0)");
wait.Until(readyCondition);
}
WebDriverWait wait = new WebDriverWait(dr, 30);
wait.until(ExpectedConditions.jsReturnsValue("return document.readyState==\"complete\";"));
For C# NUnit, you need to convert WebDriver to JSExecuter and then execute the script to check if document.ready state is complete or not. Check below code for reference:
public static void WaitForLoad(IWebDriver driver)
{
IJavaScriptExecutor js = (IJavaScriptExecutor)driver;
int timeoutSec = 15;
WebDriverWait wait = new WebDriverWait(driver, new TimeSpan(0, 0, timeoutSec));
wait.Until(wd => js.ExecuteScript("return document.readyState").ToString() == "complete");
}
This will wait until the condition is satisfied or timeout.
For initial page load I have noticed that "Maximizing" the browser window practically waits until page load is completed (including sources)
Replace:
AppDriver.Navigate().GoToUrl(url);
With:
public void OpenURL(IWebDriver AppDriver, string Url)
{
try
{
AppDriver.Navigate().GoToUrl(Url);
AppDriver.Manage().Window.Maximize();
AppDriver.SwitchTo().ActiveElement();
}
catch (Exception e)
{
Console.WriteLine("ERR: {0}; {1}", e.TargetSite, e.Message);
throw;
}
}
than use:
OpenURL(myDriver, myUrl);
This will load the page, wait until completed, maximize and focus on it. I don't know why its like this but it works.
If you want to wait for page load after click on next or any other page navigation trigger other then "Navigate()", Ben Dyer's answer (in this thread) will do the work.
In Nodejs you can get it via promises...
If you write this code, you can be sure that the page is fully loaded when you get to the then...
driver.get('www.sidanmor.com').then(()=> {
// here the page is fully loaded!!!
// do your stuff...
}).catch(console.log.bind(console));
If you write this code, you will navigate, and selenium will wait 3 seconds...
driver.get('www.sidanmor.com');
driver.sleep(3000);
// you can't be sure that the page is fully loaded!!!
// do your stuff... hope it will be OK...
From Selenium documentation:
this.get( url ) → Thenable
Schedules a command to navigate to the given URL.
Returns a promise that will be resolved when the document has finished loading.
Selenium Documentation (Nodejs)
Have a look at tapestry web-framework. You can download source code there.
The idea is to signalize that page is ready by html attribute of body. You can use this idea ignore complicated sue cases.
<html>
<head>
</head>
<body data-page-initialized="false">
<p>Write you page here</p>
<script>
$(document).ready(function () {
$(document.body).attr('data-page-initialized', 'true');
});
</script>
</body>
</html>
And then create extension of Selenium webdriver (according to tapestry framework)
public static void WaitForPageToLoad(this IWebDriver driver, int timeout = 15000)
{
//wait a bit for the page to start loading
Thread.Sleep(100);
//// In a limited number of cases, a "page" is an container error page or raw HTML content
// that does not include the body element and data-page-initialized element. In those cases,
// there will never be page initialization in the Tapestry sense and we return immediately.
if (!driver.ElementIsDisplayed("/html/body[#data-page-initialized]"))
{
return;
}
Stopwatch stopwatch = Stopwatch.StartNew();
int sleepTime = 20;
while(true)
{
if (driver.ElementIsDisplayed("/html/body[#data-page-initialized='true']"))
{
return;
}
if (stopwatch.ElapsedMilliseconds > 30000)
{
throw new Exception("Page did not finish initializing after 30 seconds.");
}
Thread.Sleep(sleepTime);
sleepTime *= 2; // geometric row of sleep time
}
}
Use extension ElementIsDisplayed written by Alister Scott.
public static bool ElementIsDisplayed(this IWebDriver driver, string xpath)
{
try
{
return driver.FindElement(By.XPath(xpath)).Displayed;
}
catch(NoSuchElementException)
{
return false;
}
}
And finally create test:
driver.Url = this.GetAbsoluteUrl("/Account/Login");
driver.WaitForPageToLoad();
Ben Dryer's answer didn't compile on my machine ("The method until(Predicate<WebDriver>) is ambiguous for the type WebDriverWait").
Working Java 8 version:
Predicate<WebDriver> pageLoaded = wd -> ((JavascriptExecutor) wd).executeScript(
"return document.readyState").equals("complete");
new FluentWait<WebDriver>(driver).until(pageLoaded);
Java 7 version:
Predicate<WebDriver> pageLoaded = new Predicate<WebDriver>() {
#Override
public boolean apply(WebDriver input) {
return ((JavascriptExecutor) input).executeScript("return document.readyState").equals("complete");
}
};
new FluentWait<WebDriver>(driver).until(pageLoaded);
I tried this code and it works for me. I call this function every time I move to another page
public static void waitForPageToBeReady()
{
JavascriptExecutor js = (JavascriptExecutor)driver;
//This loop will rotate for 100 times to check If page Is ready after every 1 second.
//You can replace your if you wants to Increase or decrease wait time.
for (int i=0; i<400; i++)
{
try
{
Thread.sleep(1000);
}catch (InterruptedException e) {}
//To check page ready state.
if (js.executeScript("return document.readyState").toString().equals("complete"))
{
break;
}
}
}
The wait for the document.ready event is not the entire fix to this problem, because this code is still in a race condition: Sometimes this code is fired before the click event is processed so this directly returns, since the browser hasn't started loading the new page yet.
After some searching I found a post on Obay the testing goat, which has a solution for this problem. The c# code for that solution is something like this:
IWebElement page = null;
...
public void WaitForPageLoad()
{
if (page != null)
{
var waitForCurrentPageToStale = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
waitForCurrentPageToStale.Until(ExpectedConditions.StalenessOf(page));
}
var waitForDocumentReady = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
waitForDocumentReady.Until((wdriver) => (driver as IJavaScriptExecutor).ExecuteScript("return document.readyState").Equals("complete"));
page = driver.FindElement(By.TagName("html"));
}
`
I fire this method directly after the driver.navigate.gotourl, so that it gets a reference of the page as soon as possible. Have fun with it!
normaly when selenium open a new page from a click or submit or get methods, it will wait untell the page is loaded but the probleme is when the page have a xhr call (ajax) he will never wait of the xhr to be loaded, so creating a new methode to monitor a xhr and wait for them it will be the good.
public boolean waitForJSandJQueryToLoad() {
WebDriverWait wait = new WebDriverWait(webDriver, 30);
// wait for jQuery to load
ExpectedCondition<Boolean> jQueryLoad = new ExpectedCondition<Boolean>() {
#Override
public Boolean apply(WebDriver driver) {
try {
Long r = (Long)((JavascriptExecutor)driver).executeScript("return $.active");
return r == 0;
} catch (Exception e) {
LOG.info("no jquery present");
return true;
}
}
};
// wait for Javascript to load
ExpectedCondition<Boolean> jsLoad = new ExpectedCondition<Boolean>() {
#Override
public Boolean apply(WebDriver driver) {
return ((JavascriptExecutor)driver).executeScript("return document.readyState")
.toString().equals("complete");
}
};
return wait.until(jQueryLoad) && wait.until(jsLoad);
}
if $.active == 0 so the is no active xhrs call (that work only with jQuery).
for javascript ajax call you have to create a variable in your project and simulate it.
You can write some logic to handle this. I have write a method that will return the WebElement and this method will be called three times or you can increase the time and add a null check for WebElement Here is an example
public static void main(String[] args) {
WebDriver driver = new FirefoxDriver();
driver.get("https://www.crowdanalytix.com/#home");
WebElement webElement = getWebElement(driver, "homekkkkkkkkkkkk");
int i = 1;
while (webElement == null && i < 4) {
webElement = getWebElement(driver, "homessssssssssss");
System.out.println("calling");
i++;
}
System.out.println(webElement.getTagName());
System.out.println("End");
driver.close();
}
public static WebElement getWebElement(WebDriver driver, String id) {
WebElement myDynamicElement = null;
try {
myDynamicElement = (new WebDriverWait(driver, 10))
.until(ExpectedConditions.presenceOfElementLocated(By
.id(id)));
return myDynamicElement;
} catch (TimeoutException ex) {
return null;
}
}
I executed a javascript code to check if the document is ready. Saved me a lot of time debugging selenium tests for sites that has client side rendering.
public static boolean waitUntilDOMIsReady(WebDriver driver) {
def maxSeconds = DEFAULT_WAIT_SECONDS * 10
for (count in 1..maxSeconds) {
Thread.sleep(100)
def ready = isDOMReady(driver);
if (ready) {
break;
}
}
}
public static boolean isDOMReady(WebDriver driver){
return driver.executeScript("return document.readyState");
}
public boolean waitForElement(String zoneName, String element, int index, int timeout) {
WebDriverWait wait = new WebDriverWait(appiumDriver, timeout/1000);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath(element)));
return true;
}
Like Rubanov wrote for C#, i write it for Java, and it is:
public void waitForPageLoaded() {
ExpectedCondition<Boolean> expectation = new
ExpectedCondition<Boolean>() {
public Boolean apply(WebDriver driver) {
return (((JavascriptExecutor) driver).executeScript("return document.readyState").toString().equals("complete")&&((Boolean)((JavascriptExecutor)driver).executeScript("return jQuery.active == 0")));
}
};
try {
Thread.sleep(100);
WebDriverWait waitForLoad = new WebDriverWait(driver, 30);
waitForLoad.until(expectation);
} catch (Throwable error) {
Assert.fail("Timeout waiting for Page Load Request to complete.");
}
}
In Java it will like below :-
private static boolean isloadComplete(WebDriver driver)
{
return ((JavascriptExecutor) driver).executeScript("return document.readyState").equals("loaded")
|| ((JavascriptExecutor) driver).executeScript("return document.readyState").equals("complete");
}
The following code should probably work:
WebDriverWait wait = new WebDriverWait(driver, 10);
wait.until(ExpectedConditions.presenceOfAllElementsLocated(By.xpath("//*")));
If you have a slow page or network connection, chances are that none of the above will work. I have tried them all and the only thing that worked for me is to wait for the last visible element on that page. Take for example the Bing webpage. They have placed a CAMERA icon (search by image button) next to the main search button that is visible only after the complete page has loaded. If everyone did that, then all we have to do is use an explicit wait like in the examples above.
public void waitForPageToLoad()
{
(new WebDriverWait(driver, DEFAULT_WAIT_TIME)).until(new ExpectedCondition<Boolean>() {
public Boolean apply(WebDriver d) {
return (((org.openqa.selenium.JavascriptExecutor) driver).executeScript("return document.readyState").equals("complete"));
}
});//Here DEFAULT_WAIT_TIME is a integer correspond to wait time in seconds
Here's something similar, in Ruby:
wait = Selenium::WebDriver::Wait.new(:timeout => 10)
wait.until { #driver.execute_script('return document.readyState').eql?('complete') }
You can have the thread sleep till the page is reloaded. This is not the best solution, because you need to have an estimate of how long does the page take to load.
driver.get(homeUrl);
Thread.sleep(5000);
driver.findElement(By.xpath("Your_Xpath_here")).sendKeys(userName);
driver.findElement(By.xpath("Your_Xpath_here")).sendKeys(passWord);
driver.findElement(By.xpath("Your_Xpath_here")).click();
I Checked page load complete, work in Selenium 3.14.0
public static void UntilPageLoadComplete(IWebDriver driver, long timeoutInSeconds)
{
Until(driver, (d) =>
{
Boolean isPageLoaded = (Boolean)((IJavaScriptExecutor)driver).ExecuteScript("return document.readyState").Equals("complete");
if (!isPageLoaded) Console.WriteLine("Document is loading");
return isPageLoaded;
}, timeoutInSeconds);
}
public static void Until(IWebDriver driver, Func<IWebDriver, Boolean> waitCondition, long timeoutInSeconds)
{
WebDriverWait webDriverWait = new WebDriverWait(driver, TimeSpan.FromSeconds(timeoutInSeconds));
webDriverWait.Timeout = TimeSpan.FromSeconds(timeoutInSeconds);
try
{
webDriverWait.Until(waitCondition);
}
catch (Exception e)
{
Console.WriteLine(e);
}
}
For the people who need to wait for a specific element to show up. (used c#)
public static void WaitForElement(IWebDriver driver, By element)
{
WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(20));
wait.Until(ExpectedConditions.ElementIsVisible(element));
}
Then if you want to wait for example if an class="error-message" exists in the DOM you simply do:
WaitForElement(driver, By.ClassName("error-message"));
For id, it will then be
WaitForElement(driver, By.Id("yourid"));
Are you using Angular? If you are it is possible that the webdriver doesn't recognize that the async calls have finished.
I recommend looking at Paul Hammants ngWebDriver.
The method waitForAngularRequestsToFinish() could come in handy.
We have large suites of selenium scripts and some of the tests are "unstable": On the CI-builds they fail but on the dev-machines they are ok.
We assume that the reason is in performance: the CI-builds are slower than the dev-machines and our application blocks any interaction with the web-app directly after an action until the server response comes back.
This brings me to the question:
How can the client know how long the server might take?
We can wait after each click for a long time -> but this will slow down the test-suite heavily.
Is there a trick to wait just long enough ?
Since our suites are so large, I do not want to maintain/treat this in each and every test-case but generically in test-framework or on server side.
I dont have much technical experience with selenium, but this sounds more like a conceptual problem.
This is what you can do (at a high level).
On the client you can call your API. When the API completes (whether success or fail) you can set a global variable stating that the API has completed.
$.getJSON('/some/endpoint.xhtml', {
someParam: someVar
})
.done(function (data) {
// do work
})
.fail(function (data) {
// do error handling
})
.always(function(data) {
// window.testVars will need to be initialized earlier
// like so : window.testVars = { completedEndoint = 0 };
window.testVars.completedEndpoint++;
});
Then in the Java Selenium code you have access to those variables through a JavascriptExecutor driver.
private static Map<String, Object> getState(WebDriver driver)
{
Map<String, Object> map = (Map<String,Object>)((JavascriptExecutor)driver).executeScript("return window.testVars");
return map;
}
Then in your test you can use:
Map<String,Object> defState = initializeState(driver);
performUseCaseToCallTheEndpoint();
ExpectedCondition<Boolean> condition = new ExpectedCondition<Boolean>()
{
#Override
public Boolean apply(WebDriver driver)
{
Map<String,Object> curState = getState(driver);
if(curState.get("completedEndpoint") > defState get("completedEndpoint"))
{
return true;
}
else
{
return false;
}
}
};
WebDriverWait wait = new WebDriverWait(driver, 60);
wait.until(condition);
This should get you started.
In the class, you could have fields
public int startTime;
public int stopTime;
public int averageResponseTime = 2000;
Here, set averageResponseTime to the maximum time you are willing to wait for a response (in this example, 2000 milliseconds)
In your #Before/testSetup method:
// set the startime for the test
startTime = new Date();
In your #After/tearDown method:
// set the stopTime
stopTime = new Date();
// calculate how long it took
int duration = stopTime - startTime;
// average out the response time
averageResponseTime = (averageResponseTime + duration)/2;
In your test, instantiate a new wait, passing in averageResponseTime as the waitTimeout
WebDriverWait wait = new WebDriverWait(driver, averageResponseTime); wait.until(expectedCondition);
After the first test (or so) the timeout will get closer and closer to the average time the server actually takes to respond.
I have a working windows 8 caching solution using DataContractSerializer that raises a XmlException "Unexpected end of file" only when the UI is being used 'quickly'.
public static class CachingData<T>
{
public static async void Save(T data, string filename, StorageFolder folder = null)
{
folder = folder ?? ApplicationData.Current.LocalFolder;
try
{
StorageFile file = await ApplicationData.Current.LocalFolder.CreateFileAsync(filename, CreationCollisionOption.ReplaceExisting);
using (IRandomAccessStream raStream = await file.OpenAsync(FileAccessMode.ReadWrite))
{
using (IOutputStream outStream = raStream.GetOutputStreamAt(0))
{
DataContractSerializer serializer = new DataContractSerializer(typeof(T));
serializer.WriteObject(outStream.AsStreamForWrite(), data);
await outStream.FlushAsync();
}
}
}
catch (Exception exc)
{
throw exc;
}
}
public static async System.Threading.Tasks.Task<T> Load(string filename, StorageFolder folder = null)
{
folder = folder ?? ApplicationData.Current.LocalFolder;
T data = default(T);
StorageFile file = await folder.GetFileAsync(filename);
using (IInputStream inStream = await file.OpenSequentialReadAsync())
{
DataContractSerializer serializer = new DataContractSerializer(typeof(T));
data = (T)serializer.ReadObject(inStream.AsStreamForRead());
}
return data;
}
}
e.g. user clicks on item in list CachingData.Load is called async via await, checks for FileNotEoundException and either loads the data from disk or from the network, serialising on completion.
After first loaded user selects another item in the list and cycle repeats.
The problem occurs when "After first loaded" becomes "does not wait for load" and the item selected is not available cached.
Not quite sure how to proceed or even how to debug, hoping that just ignoring will allow the app to continue(just withough the nice speed increase of caching)
I was wondering, how can one use selenium/webdriver to download an image for a page. Assuming that the user session is required to download the image hence having pure URL is not helpful. Any sample code is highly appreciated.
I prefer doing something like this :
1. Get the SRC attribute of the image.
2. Use ImageIO.read to read the image onto a BufferedImage
3. Save the BufferedImage using ImageIO.write function
For e.g.
String src = imgElement.getAttribute('src');
BufferedImage bufferedImage = ImageIO.read(new URL(src));
File outputfile = new File("saved.png");
ImageIO.write(bufferedImage, "png", outputfile);
I prefer like this:
WebElement logo = driver.findElement(By.cssSelector(".image-logo"));
String logoSRC = logo.getAttribute("src");
URL imageURL = new URL(logoSRC);
BufferedImage saveImage = ImageIO.read(imageURL);
ImageIO.write(saveImage, "png", new File("logo-image.png"));
try the following
JavascriptExecutor js = (JavascriptExecutor) driver;
String base64string = (String) js.executeScript("var c = document.createElement('canvas');"
+ " var ctx = c.getContext('2d');"
+ "var img = document.getElementsByTagName('img')[0];"
+ "c.height=img.naturalHeight;"
+ "c.width=img.naturalWidth;"
+ "ctx.drawImage(img, 0, 0,img.naturalWidth, img.naturalHeight);"
+ "var base64String = c.toDataURL();"
+ "return base64String;");
String[] base64Array = base64string.split(",");
String base64 = base64Array[base64Array.length - 1];
byte[] data = Base64.decode(base64);
ByteArrayInputStream memstream = new ByteArrayInputStream(data);
BufferedImage saveImage = ImageIO.read(memstream);
ImageIO.write(saveImage, "png", new File("path"));
For my use case there were cookies and other issues that made the other approaches here unsuitable.
I ended up using an XMLHttpRequest to populate a FileReader (from How to convert image into base64 string using javascript, and then calling that using Selenium's ExecuteAsyncScript (as shown in Selenium and asynchronous JavaScript calls). This allowed me to get a Data URL which was straight forward to parse.
Here's my C# code for getting the Data URL:
public string ImageUrlToDataUrl(IWebDriver driver, string imageUrl)
{
var js = new StringBuilder();
js.AppendLine("var done = arguments[0];"); // The callback from ExecuteAsyncScript
js.AppendLine(#"
function toDataURL(url, callback) {
var xhr = new XMLHttpRequest();
xhr.onload = function() {
var reader = new FileReader();
reader.onloadend = function() {
callback(reader.result);
}
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', url);
xhr.responseType = 'blob';
xhr.send();
}"); // XMLHttpRequest -> FileReader -> DataURL conversion
js.AppendLine("toDataURL('" + imageUrl + "', done);"); // Invoke the function
var executor = (IJavaScriptExecutor) driver;
var dataUrl = executor.ExecuteAsyncScript(js.ToString()) as string;
return dataUrl;
}
Another mostly correct solution is to download it directly by simple HTTP request.
You could use webDriver's user session, cause it stores cookies.
In my example, I'm just analyzing what status code it returns. If 200, then image exists and it is available for show or download. If you need to really download file itself - you could just get all image data from httpResponse entity (use it as simple input stream).
// just look at your cookie's content (e.g. using browser)
// and import these settings from it
private static final String SESSION_COOKIE_NAME = "JSESSIONID";
private static final String DOMAIN = "domain.here.com";
private static final String COOKIE_PATH = "/cookie/path/here";
protected boolean isResourceAvailableByUrl(String resourceUrl) {
HttpClient httpClient = new DefaultHttpClient();
HttpContext localContext = new BasicHttpContext();
BasicCookieStore cookieStore = new BasicCookieStore();
// apply jsessionid cookie if it exists
cookieStore.addCookie(getSessionCookie());
localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
// resourceUrl - is url which leads to image
HttpGet httpGet = new HttpGet(resourceUrl);
try {
HttpResponse httpResponse = httpClient.execute(httpGet, localContext);
return httpResponse.getStatusLine().getStatusCode() == HttpStatus.SC_OK;
} catch (IOException e) {
return false;
}
}
protected BasicClientCookie getSessionCookie() {
Cookie originalCookie = webDriver.manage().getCookieNamed(SESSION_COOKIE_NAME);
if (originalCookie == null) {
return null;
}
// just build new apache-like cookie based on webDriver's one
String cookieName = originalCookie.getName();
String cookieValue = originalCookie.getValue();
BasicClientCookie resultCookie = new BasicClientCookie(cookieName, cookieValue);
resultCookie.setDomain(DOMAIN);
resultCookie.setExpiryDate(originalCookie.getExpiry());
resultCookie.setPath(COOKIE_PATH);
return resultCookie;
}
The only way I found to avoid downloading the image twice is to use the Chrome DevTools Protocol Viewer.
In Python, this gives:
import base64
import pychrome
def save_image(file_content, file_name):
try:
file_content=base64.b64decode(file_content)
with open("C:\\Crawler\\temp\\" + file_name,"wb") as f:
f.write(file_content)
except Exception as e:
print(str(e))
def response_received(requestId, loaderId, timestamp, type, response, frameId):
if type == 'Image':
url = response.get('url')
print(f"Image loaded: {url}")
response_body = tab.Network.getResponseBody(requestId=requestId)
file_name = url.split('/')[-1].split('?')[0]
if file_name:
save_image(response_body['body'], file_name)
tab.Network.responseReceived = response_received
# start the tab
tab.start()
# call method
tab.Network.enable()
# get request to target the site selenium
driver.get("https://www.realtor.com/ads/forsale/TMAI112283AAAA")
# wait for loading
tab.wait(50)
Other solutions here don't work across all browsers, don't work across all websites, or both.
This solution should be far more robust. It uses the browser to view the image, resizes the browser to fit the image size, takes a screenshot, and finally resizes the browser back to the original size.
Python:
def get_image(driver, img_url):
'''Given an images url, return a binary screenshot of it in png format.'''
driver.get_url(img_url)
# Get the dimensions of the browser and image.
orig_h = driver.execute_script("return window.outerHeight")
orig_w = driver.execute_script("return window.outerWidth")
margin_h = orig_h - driver.execute_script("return window.innerHeight")
margin_w = orig_w - driver.execute_script("return window.innerWidth")
new_h = driver.execute_script('return document.getElementsByTagName("img")[0].height')
new_w = driver.execute_script('return document.getElementsByTagName("img")[0].width')
# Resize the browser window.
logging.info("Getting Image: orig %sX%s, marg %sX%s, img %sX%s - %s"%(
orig_w, orig_h, margin_w, margin_h, new_w, new_h, img_url))
driver.set_window_size(new_w + margin_w, new_h + margin_h)
# Get the image by taking a screenshot of the page.
img_val = driver.get_screenshot_as_png()
# Set the window size back to what it was.
driver.set_window_size(orig_w, orig_h)
# Go back to where we started.
driver.back()
return img_val
One disadvantage of this solution is that if the image is very small, the browser will not resize that small, and you may get a black border around it.
use selenium for getting the image src
elemImg.get_attribute('src')
use the programming language for this, for python;
check this answer:
How to save an image locally using Python whose URL address I already know?
If you need to test that image is available and exists, you may do like this:
protected boolean isResourceAvailableByUrl(String resourceUrl) {
// backup current url, to come back to it in future
String currentUrl = webDriver.getCurrentUrl();
try {
// try to get image by url
webDriver.get(resourceUrl);
// if "resource not found" message was not appeared - image exists
return webDriver.findElements(RESOURCE_NOT_FOUND).isEmpty();
} finally {
// back to page
webDriver.get(currentUrl);
}
}
But you need to be sure, that going through currentUrl will really turn you back on page before execution of this method. In my case it was so. If not - you may try to use:
webDriver.navigate().back()
And also, unfortunately, as it seems, there is no any chance to analyze response status code. That's why you need to find any specific web element on NOT_FOUND page and check that it was appeared and decide then - that image doesn't exist.
It is just workaround, cause I found no any official way to solve it.
NOTE:
This solution is helpful in case when you use authorized session to get resource, and can't just download it by ImageIO or strictly by HttpClient.
here is a javascript solution.
it's a tad silly -- and i'm weary of hitting the source image's server with too many requests. can someone tell me if the fetch() accesses the browser's cache? i don't want to spam the source server.
it appends a FileReader() to the window, fetches and converts the image to base64 and tags that string onto the window.
the driver can then return that window variable.
export async function scrapePic(driver) {
try {
console.log("waiting for that profile piccah")
console.log(driver)
let rootEl = await driver.findElement(By.css('.your-root-element'));
let imgEl = await rootEl.findElement(By.css('img'))
await driver.wait(until.elementIsVisible(imgEl, 10000));
console.log('profile piccah found')
let img = await imgEl.getAttribute('src')
//attach reader to driver window
await driver.executeScript(`window.myFileReader = new FileReader();`)
await driver.executeScript(`
window.myFileReader.onloadend = function() {
window['profileImage'] = this.result
}
fetch( arguments[0] ).then( res => res.blob() ).then( blob => window.electronFileReader.readAsDataURL(blob) )
`, img)
await driver.sleep(5000)
let img64 = await driver.executeScript(`return window.profileImage`)
console.log(img64)
} catch (e) {
console.log(e)
} finally {
return img64
}
}
Works for me:
# open the image in a new tab
driver.execute_script('''window.open("''' + wanted_url + '''","_blank");''')
sleep(2)
driver.switch_to.window(driver.window_handles[1])
sleep(2)
# make screenshot
driver.save_screenshot("C://Folder/" + photo_name + ".jpeg")
sleep(2)
# close the new tab
driver.execute_script('''window.close();''')
sleep(2)
#back to original tab
driver.switch_to.window(driver.window_handles[0])
Although #aboy021 JS code is syntactly correct I couldn't the code running. (using Chrome V83.xx)
However this code worked (Java):
String url = "/your-url-goes.here.jpg";
String imageData = (String) ((JavascriptExecutor) driver).executeAsyncScript(
"var callback = arguments[0];" + // The callback from ExecuteAsyncScript
"var reader;" +
"var xhr = new XMLHttpRequest();" +
"xhr.onreadystatechange = function() {" +
" if (xhr.readyState == 4) {" +
"var reader = new FileReader();" +
"reader.readAsDataURL(xhr.response);" +
"reader.onloadend = function() {" +
" callback(reader.result);" +
"}" +
" }" +
"};" +
"xhr.open('GET', '" + url + "', true);" +
"xhr.responseType = 'blob';" +
"xhr.send();");
String base64Data = imageData.split(",")[1];
byte[] decodedBytes = Base64.getDecoder().decode(base64Data);
try (OutputStream stream = new FileOutputStream("c:\\dev\\tmp\\output.jpg")) {
stream.write(decodedBytes);
} catch (IOException e) {
e.printStackTrace();
}
How to download to a file, taking URL from element text or attribute
The complete extension code can be found here:
https://github.com/gravity-api/gravity-core/blob/master/src/csharp/Gravity.Core/Gravity.Core/Extensions/WebElementExtensions.cs
If you want to use this method without writing the code, use the NuGet https://www.nuget.org/packages/Gravity.Core/
Install-Package Gravity.Core -Version 2020.7.5.3
Usage
using OpenQA.Selenium.Extensions;
...
var driver = new ChromeDriver();
// from element attribute
var element = driver.FindElement(By.XPath("//img[#id='my_img']")).DownloadResource(path: #"C:\images\cap_image_01.png", attribute: "src");
// from element text
var element = driver.FindElement(By.XPath("//div[1]")).DownloadResource(path: #"C:\images\cap_image_01.png");
It is recommended to use the NuGet, since it contains a lot more tools and extension for Selenium
For using without the NuGet (implement on your own)
Extension Class
using System.IO;
using System.Net.Http;
using System.Text.RegularExpressions;
namespace Extensions
{
public static class WebElementExtensions
{
public static IWebElement DownloadResource(this IWebElement element, string path)
{
return DoDownloadResource(element, path, "");
}
public static IWebElement DownloadResource(this IWebElement element, string path, string attribute)
{
return DoDownloadResource(element, path, attribute);
}
private static IWebElement DoDownloadResource(this IWebElement element, string path, string attribute)
{
// get resource address
var resource = (string.IsNullOrEmpty(attribute))
? element.Text
: element.GetAttribute(attribute);
// download resource
using (var client = new HttpClient())
{
// get response for the current resource
var httpResponseMessage = client.GetAsync(resource).GetAwaiter().GetResult();
// exit condition
if (!httpResponseMessage.IsSuccessStatusCode) return element;
// create directories path
Directory.CreateDirectory(path);
// get absolute file name
var fileName = Regex.Match(resource, #"[^/\\&\?]+\.\w{3,4}(?=([\?&].*$|$))").Value;
path = (path.LastIndexOf(#"\") == path.Length - 1)
? path + fileName
: path + $#"\{fileName}";
// write the file
File.WriteAllBytes(path, httpResponseMessage.Content.ReadAsByteArrayAsync().GetAwaiter().GetResult());
}
// keep the fluent
return element;
}
}
}
Usage
using Extensions;
...
var driver = new ChromeDriver();
// from element attribute
var element = driver.FindElement(By.XPath("//img[#id='my_img']")).DownloadResource(path: #"C:\images\cap_image_01.png", attribute: "src");
// from element text
var element = driver.FindElement(By.XPath("//div[1]")).DownloadResource(path: #"C:\images\cap_image_01.png");