I have been asked if I can use some form data and present it in HTML.
I need to add up the LOADS data/values and multiply it by the RATE data/values, I thought this would have worked but its not showing anything in my browser?
Where am I going wrong?
I'm new to JavaScript and I have HTML/CSS skills. I know jQuery is probably the best way to do this sort of stuff but I don't know it.
<html>
<script>
if (window.XMLHttpRequest)
{// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp=new XMLHttpRequest();
}
else
{// code for IE6, IE5
xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.open("GET","xmltest.xml",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;
document.write("<table border='1'>");
var k=xmlDoc.x[1].getElementsByTagName("LOADS");
var J=xmlDoc.x[1].getElementsByTagName("RATE");
{
document.write("<tr ><td>");
document.write(k*J);
document.write("</td><tr>");
}
document.write("</table>");
</script>
</html>
XML file
<MULTILOAD_TICKET>
<TICKET>
<DATE>12/11/12</DATE>
<ADDRESS>123 FAKE STREET</ADDRESS>
<RATE>300</RATE>
<LOADS>3</LOADS>
<CUSTOMER>Columbia Ales</CUSTOMER>
<ORDERID>BBKHJ1001</ORDERID>
<DRIVER>BOB</DRIVER>
<VEHICAL_REG>UJ78 JHE</VEHICAL_REG>
<MATERIAL>SPOIL</MATERIAL>
<SIG>URL</SIG>
</TICKET>
<TICKET>
<DATE>12/11/12</DATE>
<ADDRESS>123 FAKE STREET</ADDRESS>
<RATE>300</RATE>
<LOADS>6</LOADS>
<CUSTOMER>Columbia Ales</CUSTOMER>
<ORDERID>BBKHJ1001</ORDERID>
<DRIVER>JACK</DRIVER>
<VEHICAL_REG>EU78 JHD</VEHICAL_REG>
<MATERIAL>SPOIL</MATERIAL>
<SIG>URL</SIG>
</TICKET>
<TICKET>
<DATE>15/11/12</DATE>
<ADDRESS>123 FAKE STREET</ADDRESS>
<RATE>300</RATE>
<LOADS>5</LOADS>
<CUSTOMER>Columbia Ales</CUSTOMER>
<ORDERID>BBKHJ1001</ORDERID>
<DRIVER>BOB</DRIVER>
<VEHICAL_REG>UJ78 JHE</VEHICAL_REG>
<MATERIAL>SPOIL</MATERIAL>
<SIG>URL</SIG>
</TICKET>
</MULTILOAD_TICKET>
k and J refer to a collection of elements, so you can't just multiply them together.
You need to loop through the collections, access their textContent (or innerText depending on browser version), convert to numbers (parseInt or parseFloat as needed), multiply them and then add them to a running total. Finally, at the end, you can output the total.
Related
I am creating a video scraper (for the Rumble website) and I am trying to get the src attribute of the video using HTMLUnit, this is because the element is added dynamically to the page (I am a beginner to these APIs):
val webClient = WebClient()
webClient.options.isThrowExceptionOnFailingStatusCode = false
webClient.options.isThrowExceptionOnScriptError = false
webClient.options.isJavaScriptEnabled = true
val myPage: HtmlPage? = webClient.getPage("https://rumble.com/v1m9oki-our-first-automatic-afk-farms-locals-minecraft-server-smp-ep3-live-stream.html")
Thread.sleep(10000)
val document: Document = Jsoup.parse(myPage!!.asXml())
println(document)
The issue is, the output for the <video> element is the following:
<video muted playsinline="" hidefocus="hidefocus" style="width:100% !important;height:100% !important;display:block" preload="metadata"></video>
Whereas -- if you navigate to the page itself and let the JS load -- it should be:
<video muted="" playsinline="" hidefocus="hidefocus" style="width:100% !important;height:100% !important;display:block" preload="metadata" poster="https://sp.rmbl.ws/s8/1/I/6/v/1/I6v1f.OvCc-small-Our-First-Automatic-AFK-Far.jpg" src="blob:https://rumble.com/91372f42-30cf-46b3-8850-805ee634e2e8"></video>
Some attributes are missing, which are crucial for my scraper to work. I need the src value so that ExoPlayer can play the video.
I am not totally sure, but I was wondering whether it had to do with the fact that the crossOrigin attribute is anonymous in the JavaScript:
<video muted playsinline hidefocus="hidefocus" style="width:100% !important;height:100% !important;display:block" preload="'+t+'"'+(a.vars.opts.cc?' crossorigin="anonymous"':"")+'>
I tried to play around with the different HTMLUnit options, as well as look online but I still haven't been able to extract the right attributes I need so that it can work.
How would I be able to bypass this and get the appropriate element values (src) that I need for the scraper using HTMLUnit? Is this even possible to do with HTMLUnit? I was also suspecting that maybe the site owners added this cross origin anonymous statement because it can bypass scrapers, though I am not sure.
How to reproduce my issue
Navigate to this link with a GUI browser.
Press 'Inspect Element' until you find the <video> HTML tag and observe that it contains an src attribute as you would expect to the mp4 file:
<video muted="" playsinline="" hidefocus="hidefocus" style="width:100% !important;height:100% !important;display:block" preload="metadata" src="https://sp.rmbl.ws/s8/2/I/6/v/1/I6v1f.caa.rec.mp4?u=3&b=0" poster="https://sp.rmbl.ws/s8/1/I/6/v/1/I6v1f.OvCc-small-Our-First-Automatic-AFK-Far.jpg"></video>
Now, let's simulate this with a headless browser, so add the following code to IntelliJ or any IDE (add a dependency to HTMLUnit and JSoup):
To gradle (Kotlin):
implementation(group = "net.sourceforge.htmlunit", name = "htmlunit", version = "2.64.0")
implementation("org.jsoup:jsoup:1.15.3")
To gradle (Groovy):
implementation group = 'net.sourceforge.htmlunit', name = 'htmlunit', version = '2.64.0'
implementation 'org.jsoup:jsoup:1.15.3'
Then in Main function:
val webClient = WebClient()
webClient.options.isThrowExceptionOnFailingStatusCode = false
webClient.options.isThrowExceptionOnScriptError = false
webClient.options.isJavaScriptEnabled = true
val myPage: HtmlPage? = webClient.getPage("https://rumble.com/v1m9oki-our-first-automatic-afk-farms-locals-minecraft-server-smp-ep3-live-stream.html")
Thread.sleep(10000)
val document: Document = Jsoup.parse(myPage!!.asXml())
println(".....................")
println(document.getElementsByTag("video").first())
If it throws an exception add this:
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit.html.HtmlScript").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit.javascript.host.WindowProxy").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache").setLevel(Level.OFF);
We are simply fetching the page with the headless browser and then using JSoup to parse the HTML output and finding the first video element.
Observe that the output does not contain any 'src' attribute as you saw in the GUI browser:
<video muted playsinline="" hidefocus="hidefocus" style="width:100% !important;height:100% !important;display:block" preload="metadata"></video>
Screenshot of how your output should look like in the console:
This is the major issue I am having, the src attribute of the <video> element is seemingly disappeared in the headless browser, and I am unsure why although I suspect it's related to some sort of mp4 codec issue.
Correct, the js support for the video element was not sufficient for this case.
Have done a bunch of fixes/improvements and the upcoming version 2.66.0 will be able to support this.
Btw: there is no need to parse the page a second time using jsoup - HtmlUnit has all the methods to deeply look inside the dom tree of the current page.
String url = "https://rumble.com/v1m9oki-our-first-automatic-afk-farms-locals-minecraft-server-smp-ep3-live-stream.html";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
webClient.getOptions().setThrowExceptionOnScriptError(false);
HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(10_000);
HtmlVideo video = (HtmlVideo) page.getElementsByTagName("video").get(0);
System.out.println(video.getSrc());
}
This code prints https://sp.rmbl.ws/s8/2/I/6/v/1/I6v1f.caa.rec.mp4?u=3&b=0 - the same as the source attribute in the browser.
But there are still two js errors reported when running this code. This is because some other js (i guess some tracking staff) provokes this errors. You can fix this by ignoring the js code for this two locations, this will make the code a bit faster also.
String url = "https://rumble.com/v1m9oki-our-first-automatic-afk-farms-locals-minecraft-server-smp-ep3-live-stream.html";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
webClient.getOptions().setThrowExceptionOnScriptError(false);
// ignore some js
new WebConnectionWrapper(webClient) {
public WebResponse getResponse(WebRequest request) throws IOException {
WebResponse response = super.getResponse(request);
if (request.getUrl().toExternalForm().contains("sovrn_standalone_beacon.js")
|| request.getUrl().toExternalForm().contains("r2.js")) {
WebResponseData data = new WebResponseData("".getBytes(response.getContentCharset()),
response.getStatusCode(), response.getStatusMessage(), response.getResponseHeaders());
response = new WebResponse(data, request, response.getLoadTime());
}
return response;
}
};
HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(10_000);
HtmlVideo video = (HtmlVideo) page.getElementsByTagName("video").get(0);
System.out.println(video.getSrc());
Thanks for this report - will inform on https://twitter.com/htmlunit about the new release.
I have adopted various approaches to embed PDF blob in html in IE in order to display it.
1) creating a object URL and passing it to the embed or iframe tag. This works fine in Chrome but not in IE.
</head>
<body>
<input type="file" onchange="previewFile()">
<iframe id="test_iframe" style="width:100%;height:500px;"></iframe>
<script>
function previewFile() {
var file = document.querySelector('input[type=file]').files[0];
var downloadUrl = URL.createObjectURL(file);
console.log(downloadUrl);
var element = document.getElementById('test_iframe');
element.setAttribute('src',downloadUrl);
}
</script>
</body>
2) I have also tried wrapping the URL Blob inside a encodeURIcomponent()
Any pointers on how I can approach to solve this?
IE doesn't support iframe with data url as src attribute. You could check it in caniuse. It shows that the support is limited to images and linked resources like CSS or JS in IE. Please also check this documentation:
Data URIs are supported only for the following elements and/or
attributes.
object (images only)
img
input type=image
link
CSS declarations that accept a URL, such as background, backgroundImage, and so on.
Besides, IE doesn't have PDF viewer embeded, so you can't display PDFs directly in IE 11. You can only use msSaveOrOpenBlob to handle blobs in IE, then choose to open or save the PDF file:
if(window.navigator.msSaveOrOpenBlob) {
//IE11
window.navigator.msSaveOrOpenBlob(blobData, fileName);
}
else{
//Other browsers
window.URL.createObjectURL(blobData);
...
}
I'm using the awesome GoSquared API to get the number of current visitors on my Site.
I have build a Jquery Script, that automatically updates the number every two seconds with Jquery .get, but this doesn't seem to work in IE and Firefox.
JSFiddle
Thanks :)
In Firefox data is a string for some reason. You can specify the data type of the response explicitly:
$.get('url', function(){}, "json");
Otherwise you can turn it into an object like this:
if (typeof data === "string"){
data = JSON.parse(data);
}
I'm trying to add id to a element using dojo.query. I'm not sure if it's possible though. I trying to use the code below to add the id but it's not working.
dojo.query('div[style=""]').attr("id","main-body");
<div style="">
content
</div>
If this is not possible, is there another way to do it? Using javascript or jquery? Thanks.
Your way of adding an id to an element is correct.
The code runs fine for me in Firefox 17 and Chrome 23 but I have an issue in IE9. I suspect you may have the same issue.
In IE9 the query div[style=""] returns no results. The funny thing is,it works fine in compatibility mode!
t seems that in IE9 in normal mode if an HTML element has an inline empty style attribute, that attribute is not being preserved when the element is added to the DOM.
So a solution would be to use a different query to find the divs you want.
You could try to find the divs with an empty style attributes OR with no style attribute at all.
A query like this should work:
div[style=""], div:not([style])
Take a look at the following example:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Test Page</title>
<script type="text/javascript" src="//ajax.googleapis.com/ajax/libs/dojo/1.8.2/dojo/dojo.js"></script>
<script type="text/javascript">
dojo.require("dojo.NodeList-manipulate");//just for the innerHTML() function
dojo.addOnLoad(function () {
var nodeListByAttr = dojo.query('div[style=""], div:not([style])');
alert('Search by attribute nodeList length:' + nodeListByAttr.length);
nodeListByAttr.attr("id", "main-body");
var nodeListByID = dojo.query('#main-body');
alert('Search by id nodeList length:' + nodeListByID.length);
nodeListByID.innerHTML('Content set after finding the element by ID');
});
</script>
</head>
<body>
<div style="">
</div>
</body>
</html>
Hope this helps
#Nikanos' answer covers the query issue, I would like to add, that any query returns an array of elements, in case of Dojo it is dojo/NodeList.
The problem is you are about to assign the same id to multiple DOM nodes, especially with query containing div:not([style]). I recommend to use more specific query like first div child of body:
var nodes = dojo.query('body > div:first-child');
nodes.attr("id", "main-body");
To make it more robust, do not manipulate all the nodes, just the first node (even through there should be just one):
dojo.query('body > div:first-child')[0].id = "main-body";
This work also in IE9, see it in action: http://jsfiddle.net/phusick/JN4cz/
The same example written in Modern Dojo: http://jsfiddle.net/phusick/BReda/
For some reason I'm sure the folks at DivX think is important, there is no straightforward way to prevent their plugin from replacing all video elements on your page with they fancy logo.
What I need is a workaround for this, telling the plugin to skip some videos, i.e. not replace them with their playable content.
I got around this by putting an empty HTML 5 video tag, then putting in the video source tags in a JavaScript function in the body onload event. The video then comes up in the normal HTML 5 player and not the DivX web player.
e.g.
This would give the DivX player:
<video width="320" height="240" controls="controls">
<source src="movie.mp4" type="video/mp4" />
</video>
But this would give the normal html 5 player:
<head>
<script type="text/javascript">
function changevid() {
document.getElementById('vid').innerHTML = '<source src="inc/videos/sample1.mp4" type="video/mp4" />';
document.getElementById('vid').load();
}
</script>
</head>
<body onload="changevid()">
<video id="vid" width="800" height="450" controls="controls">
</video>
</body>
At this time, there is no API or means to block the divx plugin from replacing video elements with their placeholder. :-(
i started reverse-engineering the divx-plugin to find out what can be done to hack a way into disabling it. An example, including the complete sourcecode of the divx-plugin, can be found here: http://jsfiddle.net/z4JPB/1/
It currently appears to me that a possible solution could work like this:
create a "clean" backup of the methods appendChild, replaceChild and insertBefore - this has to happen before the content-script from the chrome-extension is executed.
the content-script will execute, overrides the methods mentioned above and adds event-listeners to the DOMNodeInsertedIntoDocument and DOMNodeInserted events
after that, the event-listeners can be removed and the original DOM-Methods restored. You should now be able to replace the embed-elements created by the plugin with the video-elements
It seems, that the plugin is only replacing the video when there are src elements within the video tag. For me it worked by first adding the video tag, and then - in a second thread - add the src tags. However, this doesn´t work in IE but IE had no problem with an insertion of the complete video tag at once.
So following code worked for me in all browsers (of course, jQuery required):
var $container = $('video_container');
var video = 'my-movie';
var videoSrc = '<source src="video/'+video+'.mp4" type="video/mp4"></source>' +
'<source src="video/'+video+'.webm" type="video/webm"></source>' +
'<source src="video/'+video+'.ogv" type="video/ogg"></source>';
if(!$.browser.msie) {
$container.html('<video autoplay loop></video>');
// this timeout avoids divx player to be triggered
setTimeout(function() {
$container.find('video').html(videoSrc);
}, 50);
}
else {
// IE has no problem with divx player, so we add the src in the same thread
$container.html('<video autoplay loop>' + videoSrc + '</video>');
}