HtmlUnit - lazy loading of images - lazy-loading

I am using HtmlUnit to download URL and the webpage is using lazy loading (I think) to load some of the images. Which settings should I use in HtmlUnit so that I can get those images.
For example, this is one of the URLs I am trying to download-
http://www.ebay.com.au/sch/i.html?_from=R40&_trksid=p2050601.m570.l1313.TR10.TRC0.A0.H0.Xiphone6s.TRS0&_nkw=iphone6s&_sacat=0
The product images (after first few) have dummy src value-
As you can see the src tag has dummy value and actual image url is stored in imgurl attribute. I think the webpage uses some javascript to change the src attribute by correct value once we scroll down.
This is my sample code-
webClient = new WebClient(BrowserVersion.FIREFOX_38);
webClient.getOptions().setActiveXNative(false);
webClient.getOptions().setAppletEnabled(false);
webClient.getOptions().setDoNotTrackEnabled(true);
webClient.getOptions().setPopupBlockerEnabled(true);
webClient.getOptions().setPrintContentOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.setCssErrorHandler(new SilentCssErrorHandler());
Page page = webClient.getPage(url);
I have tried the following-
1) Increase window height-
webClient.getCurrentWindow().setInnerHeight(60000);
webClient.getCurrentWindow().setInnerWidth(60000);
2) Try to scroll down after page is downloaded
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(true);
webClient.waitForBackgroundJavaScript(10 * 1000);
HtmlPage page = (HtmlPage) webClient.getPage(url);
page.getBody().type(KeyboardEvent.DOM_VK_PAGE_DOWN);
Thread.sleep(3000);
String html = page.asXml();
But so far, I have not been able to get the correct src URL.
If anyone has successfully fixed this lazy loading issue, please suggest some workarounds.
thank you!

Related

Is that possible to save pictures with a name from page?

I want to save images whole products inside a site with their own names which is written on same page, is it possible to do that on a site with below logic?
Main product page has link for whole product at same page so I think I can manage to get every product here, in product page there is sub menus such as "General - Gallery etc." I want to get product name from General section then go to Gallery section and save images with this name like ProductName1.jpg, ProductName2.jpg ...
Is it possible or impossible to do with selenium?
Product Page: http://www.laboory.com/products
Here a sample link for product:
http://www.laboory.com/product/laboory-water-soluble-m/3937
Yes, we can do this. As you mentioned selenium tag only I assume it's using Java.
Go to the product page
Get image source URL and product name.
Using BufferedImage and ImageIO classes save the image into desired location.
Code:
driver = new ChromeDriver(options);
driver.manage().deleteAllCookies();
driver.get("http://laboory.com/product/laboory-water-soluble-m/3937");
WebElement logo = driver.findElement(By.xpath("(//span//img[#class='imgin' and #src])[1]"));
String logoSRC = logo.getAttribute("src");
String productName = driver.findElement(By.xpath("//div/h1")).getText();
URL imageURL = new URL(logoSRC);
BufferedImage saveImage = ImageIO.read(imageURL);
ImageIO.write(saveImage, "png", new File(productName+".png"));
Output: The product CAPSULE GC 510.png saved in project directory.
Note: You can change the location as well.
You can capture the screen shot of the image by using the dimension of the image element, and save it with desired name, below is the reference
How to capture the screenshot of a specific element rather than entire page using Selenium Webdriver?

How to upload an image on Pentaho dashboard ? ( variable, condition, local path)

I want to display an image depending on parameter called "ca_code" on dashboard.
All the images are in a repository of my current theme ( I know that you can also upload images to server, must be easiler but I need to keep this), here's an example of an image path:
D:\pentaho\pentaho-8-2\pentaho-server\pentaho-solutions\system\common-ui\resources\themes\sapphire\img_project\CA120.jpg
Here, 120 is a ca_code. I get this ca_code as variable by query component.
Here's what I tried on Post Fetch of query component:
function f(ca_code) {
var ca_code=ca_code.resultset;
var img = '<img src="../common-ui/resources/themes/sapphire/img_project/CA'+ca_code.resultset+'.jpg/content"/>';
var img_default='<img src="../common-ui/resources/themes/sapphire/img_project/CA000.jpg/content"/>';
document.getElementById('ca_logo').innerHTML=img;
}
And it doesn't work, think it is the path prob.
When I used HTML on Layout Panel, the path was fine, the image was displayed but I can't do on HTML because I need variable ca_code, I want to do it on Query Component - Post Fetch.
And also, how to check if the image exists ? If it doesn't exist, I want to display img_default.
Any help would be nice !

Switching to a new tab/window which is having xml-style-view instead of web-view in selenium

I have a scenario wherein when I click a button on a page, it gets redirected to a new page in a separate tab. Now the new page is not a regular page, And when I use normal switchTo().window() operations, it does not work saying "Web view not found, target window closed.
How should I handle this scenario in selenium
A screenshot of the result xml-viewer-style page
What is the resultant pages complete path? Does it end with XML? And why do you want that page? I believe that page is an XML file opening in a new tab. If you have stuff to retrieve from that page, you need to first download it as an XML file. Then use a parser to retrieve the data from it.
You can use DOM to parse an XML file like so:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(response));
Document doc = builder.parse(is);
NodeList nList = doc.getElementsByTagName("item");
Node namedItem = nList.item(0).getAttributes().getNamedItem("uid");
System.out.println(namedItem.getNodeValue());
Now before you even can do that, you have to get that file to your local system.
You can do it using a dummy argument in the href for your file like so
<a href="http://link/to/file.xml?dummy=dummy" download>Download Now</a>

Images in Html to PDF using wkhtmltopdf in mvc 4

I am using wkhtmltopdf to convert html to pdf. I am using mvc 4. I was able to convert html to pdf. The only problem I have is that images do not render. There is small rectangle where image should appear. I have my images in database so when I get html string in my controller this is how image is shown right before I pass this string to converter:
<img src="/Images/Image/GetImageThumbnail?idImage=300" alt=""/>
So I am thinking that this approach is not working becuase I pass string to converter so image cannot be rendered. Any ideas how to solve this problem if images are in db?
I solve a similar issue by replacing src from src="/img/derp.png" to src="http://localhost/img/derp.png". I get the host part from the request that my Controller receives.
// Here I'm actually processing with HtmlAgilityPack but you get the idea
string host = request.Headers["host"];
string src = node.Attributes["src"].Value;
node.Attributes["src"].Value = "http://" + host + src;
This means that the server must be also be able to vomit images directly from URLs like that.
I guess it could be done with string.Replace as well if your HTML is in a string
string host = request.Headers["host"];
html = html.Replace("src=\"/", "src=\"http://"+host+"/"); // not tested

Need a Hyperlink control to do several things at once

On my site I have a DataList full of image thumbnails. The thumbnails are HyperLink controls that, when clicked, offer an enlarged view of the source image (stored in my database).
My client wants a facebook Like button on each image and I was hoping to put that in the lightbox window that appears when you click on a thumbnail.
My challenge here is that to generate the info for the Like, I need to create meta tags and each image should, preferably, create it's own meta tags on the fly.
What I can't figure out is how to make the HyperLink click open the lightbox AND create the meta tags at the same time.
Any help will be greatly appreciated.
For a live view of the site, go to http://www.dossier.co.za
The way that we approach similar problems is to hook the onclick event of the href in javascript.
Depending on exactly what you need to do, you can even prevent the standard browser behavior for the hyperlink from executing by returning false from the javascript method.
And in some cases, we just use the hyperlink for "show" by setting the href to "#".
Here is an example that combines all of these concepts:
File Name
In this case, the specified javascript is executed, there is no real hyperlink, and the browser doesn't try to navigate to the specified URL because we return false in the javascript.
Add a Classname to the opening table tag like class="tbl_images" so we can use JQuery to access it. Capture the click on the td and pickup the id of the item. Pass that id to your code as required to generate your meta tags. In the following when the user clicks on an anchor in a td, a function will run.
I use this all the time to access attributes in the td so i can run a function. You could use something like this to pickup values from your image/anchor and create something...
$("#tbl_images > tbody > tr ").each(function () {
//get the id of the tr (if required)
var id = $(this).attr("id");
var ImageTitle = $(this).find("img.Image_Class_Name").attr("title");
//on click of anchor with classname of lighthouse run function,
//passing in our id or other data in the row to our function
$(this).find("td: > a.lighthouse").click(function () {
//update script for this record
MyFunction(id,ImageTitle);
});
});