Run pageMod in background thread in firefox addon sdk? - highlighting

I have to make a firefox addon that searches the loaded page against a list of words (potentially 6500 words) and highlight matches and show synonyms on hover.
So i am using HightlightRegex.js that traverses the dom and searches based on a regex which is using the regex \bMyWord\b.
The main problem is when testing the addon on a page that has many occurrences of the search word, Firefox hangs for a while (5-6 sec) and then highlights are shown. This is happening for 1 word so one can just imagine what will happen if i search 6500 words.
So is there any way that i can run the pageMod in a background thread or asynchronously and highlight words as they are matched without freezing the UI?
You can have a look at the add-on at https://builder.addons.mozilla.org/addon/1042263/latest/
Currently the add-on is not tied to separate tabs and run as a whole on the browser but i doubt that would cause Firefox to hang.
I need to do this as efficiently as possible so suggestion are very welcome.

DOM is generally not thread-safe and you cannot access it from anything other than the main thread. The only solution would be breaking up the work into smaller chunks and using setTimeout(..., 0) to run next chunk asynchronously, without blocking everything.

One thing you could try is useing the page-worker module to load the page and process it:
https://addons.mozilla.org/en-US/developers/docs/sdk/1.6/packages/addon-kit/page-worker.html
And, as Wladimir suggested, only use asynchronous code to search the document text to avoid locking up firefox.

As canuckistani hinted, the better solution requires only two synced DOM operations: reading and writing. Rip the entire page (or, even better, only its <body>) and send it through to an async worker or thread which will perform the highlighting. When it's done, the worker emits an event and passes the highlighted content, which the addon can now insert back into the page.
That way, the only synchronous operations done are fast, inexpensive ones, while the rest is done asynchronously, away from the main thread. However, canuckistani suggested to load the page in a page-worker: there is no need to do that, as the page is already loaded in a tab. Just load up a fake page and insert the actual content.

Related

a wait that stops the test till nothing is happening in the web page anymore

I am using Java and Selenium to write a test. Many times in my test i have to wait till the page or some part of it loads something. I want to wait till the load is finished. I found these two:
1) pageLoadTimeout
2) setScriptTimeout
But I don't know the exact difference between them or even if they are what I need.
The document says:
setScriptTimeout: Sets the amount of time to wait for an asynchronous script to finish execution before throwing an error. If the timeout is negative, then the script will be allowed to run indefinitely.
and
pageLoadTimeout: Sets the amount of time to wait for a page load to complete before throwing an error. If the timeout is negative, page loads can be indefinite.
but I don't really get it.
what I need is a wait that stops the test till nothing is happening in the web page anymore.
Define "nothing is happening in the web page." Now define it in such a way that will satisfy every web page ever created, or that ever will be created. Did your definition include that no more network traffic is coming over the wire? Did your definition include that the onload event has fired? How about document.readyState returning a value meaning "complete?" Did you take JavaScript execution into account? How about JavaScript scheduled using window.setTimeout()? WebSockets? Service Workers? What about <frame> or <iframe> elements?
Now you know why WebDriver doesn't have a "wait for page complete" type of wait. The items mentioned above don't even begin to scratch the surface of the list of things a user could legitimately use in a definition of "nothing is happening on the page anymore." The correct approach would be to wait for the element you want to interact with on the next page to be available in the DOM. The WebDriverWait class is ideally suited for exactly this.

WebKitGTK about webkit_web_view_load_uri

I have a question about WebktGTK.
These days I am making a program which is can analysis web page if has suspicious web content.
When "load-failed" "load-changed" signal is emitted with WEBKIT_LOAD_FINISHED,
The program anlaysis the next page continuously by calling webkit_web_view_load_uri again again.
(http://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebView.html#webkit-web-view-load-uri)
The question want to ask you is memory problem.
The more the program analsysis the webpages, The more WebKitWebProcess is bigger.
webkit_back_forward_list_get_length() return value also increased by analysising web pages. Where shoud I free memory?
Do you know how Can I solve this problem or Could give me any advice where Can I get advice?
Thank you very much :-) Have a nice day ^^
In theory, what you're doing is perfectly fine, and you shouldn't need to change your code at all. In practice, WebKit has a lot of memory leaks, and programatically loading many new URIs in the same web view is eventually going to be problematic, as you've found.
My recommendation is to periodically, every so many page loads, create a new web view that uses a separate web process, and destroy the original web view. (That will also reset the back/forward list to stop it from growing, though I suspect the memory lost to the back/forward list is probably not significant compared to memory leaks when rendering the page.) I filed Bug 151203 - [GTK] Start a new web process when calling webkit_web_view_load functions? to consider having this happen automatically; your issue indicates we may need to bump the priority on that. In the meantime, you'll have to do it manually:
Before doing anything else in your application, set the process model to WEBKIT_PROCESS_MODEL_MULTIPLE_SECONDARY_PROCESSES using webkit_web_context_set_process_model(). (If you are not creating your own web contexts, you'll need to use the default web context webkit_web_context_get_default().)
Periodically destroy your web view with gtk_widget_destroy(), then create a new one using webkit_web_view_new() et. al. and attach it somewhere in your widget hierarchy. (Be sure NOT to use webkit_web_view_new_with_related_view() as that's how you get two web views to use the same web process.)
If you have trouble getting that solution to work, an extreme alternative would be to periodically send SIGTERM to your web process to get a new one. Connect to WebKitWebView::web-process-crashed, and call webkit_web_view_load_uri() from there. That will result in the same web view using a new web process.

Asynchronous Pluggable Protocol for CID: (email), how to handle duplicate URLs

This is somewhat a duplicate of this question, but that question has no (valid) answer and is 1.5 years old so asking my own with hopes people have more info now.
If you are using multiple instances of a WebBrowser control, MSHTML, IHTMLDocument, or whatever... from inside the APP instance, mostly IInternetProtocol::Start, is there a way to know which instance is loading the resource? Or is there a way to use a different APP for each instance of the control, maybe by providing one via IDocHostUIHandler or ICustomDoc or otherwise? I'm currently using IInternetSession::RegisterNameSpace to make it process wide.
Optional reading below, don't feel you need to read it unless above isn't clear.
I'm working on a legacy (Win32 C++) email client that uses the MS ActiveX WebBrowser control (MSHTML or other names it goes by) to display HTML emails. It was saving everything to temp files, updating the cid: URLs, and then having the control load that. Now I want to do it the correct way, using APP. I've got it all working with some test code that just uses static variables/globals and loads one email.
My problem now is, the app might have several instances of the control all loading different emails (and other stuff) at the same time... not really multiple threads so much, just the asynchronous nature of the control. I can give each instance of the control a unique URL to load the email, say, cid:email-GUID, and then in my APP code I can use that URL to know which email to load. However, when it comes to loading any content inside the email, like attached images using src="cid:", those will not always be unique so I will not always know which image it is, for which email. I'd like to avoid having to modify the URLs of the HTML before displaying it (I'm doing that now for the temp file thing, but want to do it a better way).
IInternetBindInfo::GetBindString can return the referrer, BINDSTRING_XDR_ORIGIN, or the root URL, BINDSTRING_ROOTDOC_URL, but those require newer versions of IE and my legacy app must support older XP installs that might even have IE6 or IE7, so I'd rather not use these.
Tagged as TWebBrowser because that is actually what I'm using (Borland Builder 6 C++), but don't need answers specific to that platform.
As the Asynchronous Pluggable Protocol Handler us very low level, you cannot attach handlers individually to different rendering controls.
Here is a way to get the referrer:
Obtain BINDSTRING_HEADERS
Extract the referrer by parsing the line Referer: http://....
See also How can I add an extra http header using IHTTPNegotiate?
Here is another crazy way:
Create another Asynchronous Pluggable Protocol Handler by calling RegisterMimeFilter.
Monitor text/plain and text/html
Scan the incoming email source (content comes incrementally) and parse store all image links in a dictionary
In NameSpaceHandler you can use this dictionary to find the reference of any image resources.

Best multithreading approach in Objective C?

I'm developing an iPad-app and I'm currently struggling with finding the best approach to multithreading. Let me illustrate this with a simplified example:
I have a view with 2 subviews, a directory picker and a gallery with thumbnails of all the images in the selected directory. Since 'downloading' and generating these thumbnails can take quite a while I need multithreading so the interaction and updating of the view doesn't get blocked.
This is what I already tried:
[self performSelectorInBackground:#selector(displayThumbnails:) withObject:currentFolder];
This worked fine because the users interactions didn't get blocked, however it miserably fails when the user taps on another folder while the first folder is still loading. Two threads are trying to access the same view and variables which results in messing up each others proper execution. When the users taps another folder, the displayThumbnails of the currently loading folder should get aborted. I didn't find any way to do this..
NSThreads
I tried this but struggled with almost the same problems as with the first method, I didn't find a (easy) way to cancel the ongoing method. (Yes, I know about [aThread cancel] but didn't find a way to 'resume' the thread). Maybe I should subclass NSThread and implement my own isRunning etc methods? But isn't there any better way or a third (or even fourth and fifth) option I'm overlooking?
I think this is a fairly simple example and I think there is perhaps a better solution without subclassing NSThread. So, what would you do? Your opinions please!
NSOperationQueue should work well for this task.
Another option would be plain GCD, however, if you've never worked with it, NSOperationQueue is probably the better choice since it pretty much automatically guides you to implementing things "the right way", has obvious ways for cancellation, etc.
You want to use Concurrent NSOperations to download and process images in the background. These would be managed by an NSOperationsQueue. Essentially these operations would be configured to fetch one image per operation, process it, save it in the file system, then message back to the main app in the main thread that the image is available.
There are several projects on github that you can look at that show how to do this - just search github using "Concurrent" or "NSOperation".
iOS has a really nice facility for doing background work. Grand Central Dispatch (GCD) and Blocks, but those don't let you have an object using delegate callbacks - thus NSOperation.
So you need to read up on blocks, GCD, and then look at some open source Concurrent NSOperations code. Using Concurrent NSOperations is not as simple as using blocks.
If I had this problem, I would probably go for an approach like this:
a single thread that will load the images, and causes the main thread to display results (I'm not a big fan of having thread mess around with GUI objects)
when a new directory is requested... well, it depends on how you want to manage things. Basically, a standard queue construct (condition variable and array) could be used for the main thread to tell the thread that "this directory will be needed" by passing it the path name; the thread will check the queue even when it's loading images (like after every image or so), and switch to the new directory whenever one shows up
you could make a directory-reader object that keeps all the state, and store this indexed by the path into a dictionary. When a new directory is requested, check that dictionary first, and only create a new object if there's none for this directory. That way, partially loaded directories would stick around until they are needed again, and can continue to load instead of having to start from scratch.
Pseudocode for the thread:
while (forever)
new element = nil
if we have an active directory loader
tell directory loader to load one image
if false then make directory loader inactive
lock queue condition
if queue has elements
new element = retrieve LAST element (we aren't interested in the others)
empty queue
unlock with status "empty"
else
unlock queue
else
lock queue on condition "has elements"
new element = retrieve last element
empty queue
unlock with status "empty"
if new element != nil
if directory loader for new path does not exist
setup new directory loader for new path
store in dictionary
make it the "active" one
else
make the current one the "active"
As for the directory loader, it might look something like this:
read one image:
if there are still images to read:
read, process and store one
return true
else
performSelectorOnMainThread with an "update GUI" method and the image list as parameter
return false;
This is just a quick sketch; there's some code duplication in the thread, and the way I wrote it will only update the GUI after all images have been read, instead of making them appear as we read them. You'll have to copy the current image list, or add synchronization if you want to do that.

Google Gears hangs when capturing (unmanaged resource store)

Our code was written based on the example from Google Gears' own docs. We're using an unmanaged resource store. So we declare the files in an array, create the store, and capture all the files.
Trouble is, the capturing process hangs. It always hangs on a random file (no discernible pattern has emerged), and when you reload the page, it always successfully captures.
We're capturing 48 files. It seems to have nothing to do with the files themselves, as it hangs on every file type. I've seen it hang on the 6th file or the 47th. Windows and Mac. FF, IE, and Safari.
We are not using a WorkerPool, and I'm thinking this may be necessary. Any other ideas why it would hang?
I discovered that the problem was in the scope of the variable. The code we had used from Google's own example had the creation of the store, and the capture happening in separate functions, and since we were downloading so many files along the way, the object was getting destroyed by the browser's own trash collector.
That's why the callback was producing no errors, and it was instead just hanging.