A WinForm application. I want to scrape a part of an HTML web page and save it into a local html file.
I have one local file, "empty.htm" (containing just "I'm empty" in the body), one remote web page, and two WebBrowser controls. WebBrowser1 navigates to the remote page, WebBrowser2 to the local file. Both display their content appropriately.
Now I try:
string rootIDToCopy = "InterestingDivID";
HtmlDocument htmlDocument = webBrowser1.Document;
HtmlElement rootElementToCopy =
htmlDocument.GetElementById(rootIDToCopy);
if (rootElementToCopy != null)
{
HtmlDocument dest = webBrowser2.Document;
if (dest != null)
{
HtmlElement destBody = dest.Body; // Point 1
destBody.AppendChild(rootElementToCopy); // Point 2
}
}
Now, when I'm in Point 1, I see that destBody exists, has no children and has an InnerHTML of "I'm empty". rootElementToCopy appears valid (has three children and an ok InnerHtml). However, at Point 2 I get "Value does not fail within the expected range" (probably from Windows.Forms.UnsafeNativeMethods.IHTMLElement2.InsertAdjacentElement).
Help will be appreciated!
You may not be allowed to: see WRONG_DOCUMENT_ERR and ownerDocument in the DOM specification.
Instead I think you might have to serialize the subtree to a flat string format before you try to insert it into a different document.
Related
I am new to HTAs. I just read https://msdn.microsoft.com/en-us/library/ms536496%28v=vs.85%29.aspx and am a bit confused.
Can I use HTAs to automate browsing? Say I want to download a web page and fill in a form automatically, i.e. from a script. How would an HTA help me do this, if at all? It's important that the JavaScript code in the downloaded page is run as usual. I should be able to enter somehow and fill in the form after it has finished initializing, just as if I were a human agent.
First, you need to open an IE window, as follows:
var IE = new ActiveXObject("InternetExplorer.Application");
Then navigate the IE window to the webpage you want:
IE.Navigate("www.example.com");
Wether your IE window is visible or invisible, it's up to you. Use Visible property to make it visible:
IE.Visible = true;
Then, you should wait until the webpage is completely loaded and then run a function that takes your desired actions. To do so, first, get the HTML document object from the webpage using Document property of IE object, then repeatedly check the readyState property of document object. In the code below, it is assumed that you have a function named myFunc, which takes your desired actions on the webpage. (For example, modifying the contents of the webpage.)
var doc = IE.Document;
interval = setInterval(function() {
try
{
if (doc.readyState == "complete")
{
myFunc();
clearInterval(interval);
}
}
catch (e) {}
}, 1000);
In the function myFunc, you can do anything you want with the webpage since you have HTML document object stored in doc variable. You can also use parentWindow property to get the HTML window object.
this is the first time I'm posting a question here; I have searched and searched and searched here and other places and I cannot seem to get any results. I'm using VISUAL BASIC 2015 in Visual Studio 2015. QUESTION: I need to have a modal window/popup from a particular website remain INSIDE the web browser control/window on my form (WebBrowser1); when a particular link is clicked, the modal window/popup jumps out of the form and directly to the user on their screen. I have to keep this popup inside because there are other links to be clicked on that popup, but if it jumps out of the web browser control, no code will work since it's outside WebBrowser1. What I have found is code for older versions, and not 2015; if anything I can even add WebBrowser2 to have the popups/modal windows appear there if possible, just as long as I can code them to keep clicking inside the form. PLEASE HELP! THANK YOU!
window.open (and a click on <a target="_blank"> etc) can be handled via the NewWindow2 event. Hans already pointed out how to do that in comments. NewWindow3 works too, but need at least Windows XP SP2.
As for window.showModalDialog, it is a bit tricky. IE has IDispatchEx (wrapped as IExpando in .Net) implemented on scripting objects so you replace the methods and properties with your own implementation. But window.showModalDialog shows a dialog that has arguments and return values, you need to override those properties in the modal dialog you create too. The code looks roughly like tis:
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//skip events from frames
if(WebBrowserReadyState.Complete!=webBrowser1.ReadyState) return;
if(FindLoginFormOnPage()) {DoLogin();return;}
if(IsWelcomePage()){NavigateToPage1();return;}
if(IsPage1()){SubmitFormOnPage1();return;}
if(IsPage1FormResult()){
var document=webBrowser1.Document.DomDocument as mshtml.ITMLDocument2;
var expando =(IExpando)document.parentWindow;
expando.RemoveMember(expando.GetMethod("showModalDialog"
,BindingFlags.Instance | BindingFlags.Public);
expando.AddMethod("showModalDialog"
,new ShowModalDialogDelegate(this.MyShowModalDialog));
}
......
}
object MyShowModalDialog(string url, object varArgIn, object options)
{
using(FromMyShowModalDialog myShowModalDialog
=new MyShowModalDialog())
{
myShowModalDialog.StartupUrl=url;
myShowModalDialog.DialogArguments=varArgIn;
//omit the code to parse options
//and set dialog height/width/topleft location etc
if(myShowModalDialog.ShowDialog()==DialogResult.OK)
{
//do something on the return value before passing to the scripts
......
return myShowModalDialog.ReturnValue;
}
return null;
}
}
and in the Load event handler of MyShowModalDialog you call something like webBrowser1.Navigate to show the page requested by the parent page.
Now you need to pass the arguments to the webbrowser control on the new form. Do the same as above but replace another property this time.
expando.RemoveProperty("dialogArguments");
expando.AddProperty("dialogArguments")
.SetValue(expando,this.DialogArguments);
This will let the web page access the value passed from MyShowModalDialog and stored in this.DialogArguments.
The earliest you can access the DOM is in webBrowser1_DocumentCompleted. By that time the scipts on the page that read window.dialogArguments are probably already executed and got nothing. After overriding window.dialogArguments, you need to study the script on the page to find out how to revert that. for example, if the page has
<head>
<script>
var oMyObject = window.dialogArguments;
var sFirstName = oMyObject.firstName;
var sLastName = oMyObject.lastName;
</script>
...
<span style="color: 00ff7f">
<script>
document.write(sFirstName);
</script>
</span>
you need to change the values of sFirstName and sLastName then change the innerText property of the span, probably identify via its relationship with a named div or table cell. You can write the necessary changes in a script and call it via HtmlDocument.InvokeScript.
If the page returns a value to its parent, you need to pass it on to your parent form too. Override window.returnValue so when the script writes to window.returnValue it writes to a variable you provided
......
expando.RemoveProperty("returnValue");
expando.AddProperty("returnValue").SetValue(expando,this.ReturnValue);
Problem: I routinely receive PDF reports and annotate (highlight etc.) some of them. I had the bad habit of saving the annotated PDFs together with the non-annotated PDFs. I now have hundreds of PDF files in the same folder, some annotated and some not. Is there a way to check every PDF file for annotations and copy only the annotated ones to a new folder?
Thanks a lot!
I'm on Win 7 64bit, I have Adobe Acrobat XI installed and I'm able to do some beginner coding in Python and Javascript
Please ignore the following suggestion, since the answers already solved the problem.
EDIT: Following Mr. Wyss' suggestion, I created the following code for Acrobat's Javascript console to be run only once at the beginning:
counter = 1;
// Open a new report
var rep = new Report();
rep.size = 1.2;
rep.color = color.blue;
rep.writeText("Files WITH Annotations");
Then this code should be applied to all PDFs:
this.syncAnnotScan();
annots = this.getAnnots();
path = this.path;
if (annots) {
rep.color = color.black;
rep.writeText(" ");
rep.writeText(counter.toString()+"- "+path);
rep.writeText(" ");
if (counter% 20 == 0) {
rep.breakPage();
}
counter++;
}
And, at last, one code to be run only once at the end:
//Now open the report
var docRep = rep.open("files_with_annots.pdf");
There are two problems with this solution:
1. The "Action Wizard" seems to always apply the same code afresh to each PDF (that means that the "counter" variable, for instance, is meaningless; it will always be = 1. But more importantly, var "rep" will be unassigned when the middle code is run on different PDFs).
2. How can I make the codes that should be run only once run only at the beginning or at the end, instead of running everytime for every single PDF (like it does by default)?
Thank you very much again for your help!
This would be possible using the Action Wizard to put together an action.
The function to determine whether there are annotations in the document would be done in Acrobat JavaScript. Roughly, the core function would look like this:
this.syncAnnotScan() ; // updates all annots
var myAnnots = this.getAnnots() ;
if (myAnnots != null) {
// do something if there are annots
} else {
// do something if there are no annots
}
And that should get you there.
I am not completely positive, but I think there is also a Preflight check which tells you whether there are annotations in the document. If so, you would create a Preflight droplet, which would sort out the annotated and not annotated documents.
Mr. Wyss is right, here's a step-by-step guide:
In Acrobat XI Pro, go to the 'Tools' panel on the right side
Click on the 'Action Wizard' tab (you must first make it visible, though)
Click on 'Create New Action...', choose 'More tools' > 'Execute Javascript' and add it to right-hand pane > click on 'Execute Javascript' > 'Specify Settings' (uncheck 'prompt user' if you want) > paste this code:
.
this.syncAnnotScan();
var annots = this.getAnnots();
var fname = this.documentFileName;
fname = fname.replace(",", ";");
var errormsg = "";
if (annots) {
try {
this.saveAs({
cPath: "/c/folder/"+fname,
bPromptToOverwrite: false //make this 'true' if you want to be prompted on overwrites
});
} catch(e) {
for (var i in e)
{errormsg+= (i + ": " + e[i]+ " / ");}
app.alert({
cMsg: "Error! Unable to save the file under this name ('"+fname+"'- possibly an unicode string?) See this: "+errormsg,
cTitle: "Damn you Acrobat"
});
}
;}
annots = 0;
Save and run it! All your annotated PDFs will be saved to 'c:\folder' (but only if this folder already exists!)
Be sure to enable first Javascript in 'Edit' > 'Preferences...' > 'Javascript' > 'Enable Acrobat Javascript'.
VERY IMPORTANT: Acrobat's JS has a bug that doesn't allow Docs to be saved with commas (",") in their names (e.g., "Meeting with suppliers, May 11th.pdf" - this will get an error). Therefore, I substitute in the code above all "," for ";".
i have to run following javascript through one of my method. But its not running
Whats wrong with the code.
private void fillGrid1()
{
GridView1.DataSource = myDocCenter.GetDsWaitingForMe(Session["UserID"].ToString());
HiddenField1.Value = { myDocCenter.GetDsWaitingForMe(Session["UserID"].ToString()).Tables[0].Rows.Count).ToString();
GridView1.DataBind();
String csname1 = "PopupScript1";
String csname2 = "ButtonClickScript1";
Type cstype = this.GetType();
// Get a ClientScriptManager reference from the Page class.
ClientScriptManager cs = Page.ClientScript;
// Check to see if the client script is already registered.
if (!cs.IsClientScriptBlockRegistered(cstype, csname2))
{
StringBuilder cstext2 = new StringBuilder();
cstext2.Append("<script type=\"text/javascript\"> ");
// You can add JavaScript by using "cstext2.Append()".
cstext2.Append("var count = document.getElementById('ctl00_ContentPlaceHolder1_HiddenField2');");
cstext2.Append("var count = '100';");
cstext2.Append("document.getElementById('sp2').innerHTML = count;");
cstext2.Append("script>");
cs.RegisterClientScriptBlock(cstype, csname2, cstext2.ToString(), false);
}
}
Your script tag is not properly closed.
Change
cstext2.Append("script>");
to
cstext2.Append("</script>");
On top of what adamantium said, your JS looks a bit strange. You seem to declare and set the count variable twice - did you mean to do this.
Following that, best thing to do, render the page then view source. is your JS getting rendered to the page? try and stick an alert in there... is it firing?
> cstext2.Append("var count =
> document.getElementById('ctl00_ContentPlaceHolder1_HiddenField2');");
I would use the ClientID property here. HiddenField2.ClientID
RegisterClientScriptBlock emits the script just after the <form> tag openning. Browser executes this script just after the tag openning as well but referenced elements are not processed yet at this time - browser cannot find them.
RegisterStartupScript method emits the script just before the <form> tag ending. Nearly all page elements are processed by the browser at this place and getElementById could find something.
See http://jakub-linhart.blogspot.com/2012/03/script-registration-labyrinth-in-aspnet.html for more details.
I've got a VB.NET class that is invoked with a context menu extension in Internet Explorer.
The code has access to the object model of the page, and reading data is not a problem. This is the code of a test function...it changes the status bar text (OK), prints the page HTML (OK), changes the HTML by adding a text and prints again the page HTML (OK, in the second pop-up my added text is in the HTML)
But the Internet Explorer window doesn't show it. Where am I doing wrong?
Public Sub CallingTest(ByRef Source As Object)
Dim D As mshtml.HTMLDocument = Source.document
Source.status = "Working..."
Dim H As String = D.documentElement.innerHTML()
MsgBox(H)
D.documentElement.insertAdjacentText("beforeEnd", "ThisIsATest")
H = D.documentElement.outerHTML()
MsgBox(H)
Source.status = ""
End Sub
The function is called like this from JavaScript:
<script>
var EB = new ActiveXObject("MyObject.MyClass");
EB.CallingTest(external.menuArguments);
</script>
To the best of my understanding, in order to use insertAdjacentText or any of the other editing methods, the document object should be in the design mode.
In design mode you can edit the document freely, and so can the user.
Check this site for more details
I do not think that Alex is right, something else is the matter.
When I tried to do something like that, insertBefore would not work for me, but appendChild worked just fine, so adding an element is possible.
I worked in Javascript, but I don't expect that makes a difference.