How to capture screen shot of 1000 web pages concurrently in c# - webbrowser-control

I need to get screenshot of 1000 URLs using Parallel.Foreach in windows service. I tried to use WebBrowser control but it throws error since it runs only in STA. Kindly tell me how to achieve this task using Parallel.Foreach...
Edit : I am using a third party trial version DLL in below code to process it...
Parallel.ForEach(webpages, webPage=>
{
GetScreenShot(webPage);
}
public void GetScreenShot(string webPage)
{
WebsitesScreenshot.WebsitesScreenshot _Obj;
_Obj = new WebsitesScreenshot.WebsitesScreenshot();
WebsitesScreenshot.WebsitesScreenshot.Result _Result;
_Result = _Obj.CaptureWebpage(webPage);
if (_Result == WebsitesScreenshot.
WebsitesScreenshot.Result.Captured)
{
_Obj.ImageFormat = WebsitesScreenshot.
WebsitesScreenshot.ImageFormats.PNG;
_Obj.SaveImage(somePath);
}
_Obj.Dispose();
}
Most of the time this code runs fine upto processing of 80 urls but after that some tasks are being blocked. I don't know why...
Some times error is ContextSwitchDeadlock....as given below...
ContextSwitchDeadlock was detected
Message: The CLR has been unable to transition from COM context 0x44d3a8 to COM context 0x44d5d0 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages. This situation generally has a negative performance impact and may even lead to the application becoming non responsive or memory usage accumulating continually over time. To avoid this problem, all single threaded apartment (STA) threads should use pumping wait primitives (such as CoWaitForMultipleHandles) and routinely pump messages during long running operations.

This error indicates that a CLR thread is not sending any messages for an extended period of time. If a process is resource starved causing extended waits during processing this error can occur.
Given that you are trying to process 1000 web pages simultaneously, it would be no surprise that at least some of the threads will become resource starved. Personally, it is surprising to me that you can hit 80 websites without seeing errors.
Back off the number of websites you are trying to processing in parallel and your problems will likely disappear. Since you are running the trial version, there is little else you can do. If you licensed the commercial version you might be able to get support from the vendor. But at a guess, they would simply tell you to do the same thing.
The Websites.Screenshot library can be quite resource intensive depending upon the web page, esp. if the pages have flash. Think of it as being logically equivalent to opening 80 tabs simultaneously in a web browser.
You don't mention whether you are using the 32-bit or the 64-bit version. But the 64-bit version is likely to have fewer resource constraint, esp. memory. IMHO The .Net framework does a poor job of minimizing memory usage, so memory problems can crop up earlier than you would think.
ADDED
Please try limiting the number of threads threads first, e.g.
Parallel.ForEach(
Webpages,
new ParallelOptions { MaxDegreeOfParallelism = 10 }, // 10 thread limit
webPage => { GetScreenShot(webPage); }
);
Without access to the source code, you may not be able to change the threading model at all. You might also try setting the timeout to a higher value.
I don't have this control personally and am not willing to install it on my machine to answer a question re: changing the threading model. Unless it is a documented feature, you probably won't be able to do it without changing or at least inspecting the source.

Related

Understanding Cro request/response cycle and memory use

I'm a bit confused about how Cro handles client requests and, specifically, why some requests seem to cause Cro's memory usage to balloon.
A minimal example of this shows up in the literal "Hello world!" Cro server.
use Cro::HTTP::Router;
use Cro::HTTP::Server;
my $application = route {
get -> {
content 'text/html', 'Hello Cro!';
}
}
my Cro::Service $service = Cro::HTTP::Server.new:
:host<localhost>, :port<10000>, :$application;
$service.start;
react whenever signal(SIGINT) {
$service.stop;
exit;
}
All that this server does is respond to GET requests with "Hello Cro!' – which certainly shouldn't be taxing. However, if I navigate to localhost:10000 and then rapidly refresh the page, I notice Cro's memory use start to climb (and then to stay elevated).
This only seems to happen when the refreshes are rapid, which suggests that the issue might be related either to not properly closing connections or to a concurrency issue (a maybe-slightly-related prior question).
Is there some performance technique or best practice that this "Hello world" server has omitted for simplicity? Or am I missing something else about how Cro is designed to work?
The Cro request processing pipeline is a chain of supply blocks that requests and, later, responses pass through. Decisions about the optimal number of processing threads to create are left to the Raku ThreadPoolScheduler implementation.
So far as connection lifetime goes, it's up to the client - that is, the web browser - as to how eagerly connections are closed; if the browser uses a keep-alive HTTP/1.1 connection or retains a HTTP/2.0 connection, Cro respects that request.
Regarding memory use, growth up to a certain point isn't surprising; it's only a problem if it doesn't eventually level out. Causes include:
The scheduler determining more threads are required to handle the load. Each OS thread comes with some overhead inside the VM, the majority of it being that the GC nursery is per thread to allow simple bump-the-pointer allocation.
The MoarVM optimizer using memory for specialized bytecode and JIT-compiled machine code, which it produces in the background as the application runs, and is driven by certain bits of code having been executed enough times.
The GC trying to converge on a full collection threshold.

Is there any internal timeout in Microsoft UIAutomation?

I am using the UI Automation COM-to-.NET Adapter to read the contents of the target Google Chrome browser that plays a FLASH content on Windows 7. It works.
I succeeded to get the content and elements. Everything works fine for some time but after few hours the elements become inaccessible.
The (AutomationElement).FindAll() returns 0 children.
Is there any internal undocumented Timeout used by UIAutomation ?
According to this IUIAutomation2 interface
There are 2 timeouts but they are not accessible from IUIAutomation interface.
IUIAutomation2 is supported only on Windows 8 (desktop apps only).
So I believe there is some timeout.
I made a workaround that restarts the searching and monitoring of elements from the beginning of the desktop tree but the elements are still not available.
After some time (not sure how much) the elements are available again.
My requirements are to read the values all the time as fast as possible but this behavior makes a damage to the whole architecture.
I read somewhere that there is some timeout of 3 minutes but not sure.
if there is a timeout, is it possible to change it ?
Is it possible to restart something or release/dispose something ?
I can't find anything on MSDN.
Does anybody have any idea what is happening and how to resolve ?
Thanks for this nicely put question. I have a similar issue with a much different setup. I'm on Win7, using UIAutomationCore.dll directly from C# to test our application-under-development. After running my sequence of actions & event subscriptions and all the other things, I intermittently observe that the UIA interface stops working (about 8-10min in my case, but I'm heavily using the UIA interface).
Many different things including dispatching the COM interface, sleeping at different places failed. The funny revelation was I managed to use the AccEvent.exe (part of SDK like inspect.exe) during the test and saw that events also stopped flowing to AccEvent, too. So it wasn't my client's interface that stopped, but it was rather the COM-server (or whatever the UIAutomationCore does) that stopped responding.
As a solution (that seems to work most of the time - or improve the situation a lot), I decided I should give the application-under-test some breathing point, since using the UIA puts additional load on it. This could be a smartly-put sleep points in your client, but instead of sleeping a set time, I'm monitoring the processor load of the application and waiting until it settles down.
One of the intermittent errors I receive when the problem manifests itself is "... was unable to call any of the subscribers..", and my search resulted in an msdn page saying they have improved things on CUIAutomation8 interface, but as this is Windows8 specific, I didn't have the chance to try that yet.
I should also add that I also reduced the number of calls to UIA by incorporating more ui caching (FindAllBuildCache), as the less the frequency of back-and-forth the better it is for the uia. Thanks to the answer of Guy in another question: UI Automation events stop being received after a while monitoring an application and then restart after some time

calling a long running vb6 com object in classic asp .. time out error

I have a long running vb6 Com object called from a classic asp page
it works perfectly when there is not a lot to do but it times out if it has to do a lot.
is there a way of calling it async so it wont time out or
could i call a progress bar to keep on refreshing the client so it wouldnt time out ?
Set objQReport = Server.CreateObject("ReportGenerator")
mainRpt = objQReport.GenerateReport(MySessionRef) ' times out here sometimes
Set objQReport = nothing
any tips would be helpful
Web technology is not really suited for long-running tasks, but you have several options:
One option is to do an AJAX-call to a second ASP page. As soon as you're ASP is running, the server will finish the process, even if the client (the browser/AJAX that did the actual call) is no longer connected.
This method does use web-technology to process a long running task and the downside is that you are burdening your IIS machine with performing this long-running-task, leaving less performance for the thing IIS is good at; serving webpages.
So in your landing page (say default.asp) do an AJAX call to your (long-running) report page. How to do an ajax call depends on what (if any) javascript library you use. in Jquery it would be something like this:
<script type="text/javascript">
/* AJAX call to start the report generation */
$(document).ready(function(){
$.get("[URL_OF_YOUR_LONG_RUNNING_PROCESS]", function(data)
{
alert(data);
});
});
</script>
As you can see I am alerting any data that is returned from this URL, but in your case that is probably not what you want. You want your visitor to keep browsing while the long running process keeps working.
This way, the URL is called asynchronously. The server will start processing the URL and the browser doesn't have to wait for it. The server will continue and finish the long running task in the background.
Please note that you will still have to increase the server.scripttimeout on your asp page that runs the long process. This just makes sure that the user can continue browsing, the server will still respect the server.scripttimeout setting that is configured, and fail if it takes too long.
A widely used second option is to use a message queue. A message queue accepts messages and guarantees delivery of these messages, even if the computer or network goes down.
Microsoft Windows has MSMQ built in (you'll have to enable it in the software settings), and you can use this from classic ASP. The queue will store messages and deliver them to a consumer. The consumer is something you need to write yourself; an application that can read a queue and process the messages inside.
What you do is have ASP write a message to the MSMQ, containing information on what task to perform and its parameters.
Your consumer application will have to poll the MSMQ, read the message and start the long-running process. This will then run completely independent of IIS, and can even be run on a totally different computer (MSMQ can run across networks).
The downside of this second method is that you will have to write a consumer, most likely in a bit more low level language like VB or C# (though you might be able to use Python for example), and preferably write it as a service. I don't know how comfortable you are in (one of) these languages, but if you wrote the COM object yourself, it would be trivial to write an executable in VB6 that polls an MSMQ and calls the COM object.
Below are some links to get you started.
http://support.microsoft.com/kb/173339
http://technosock.blogspot.nl/2007/07/microsoft-message-queue-from-classical.html
http://www.informit.com/articles/article.aspx?p=131272&seqNum=6
Hope this helps,
Erik

Mule and memory (RAM) usage

i've tried to run mule on 3 cases in order to test it's mem usage:
One case is where i had a quartz generator create an event that a filter (right after it in a flow) allways stopped (Returned false) - meaning the flow did absolutly nothing.
In another case i did not use the filter but just used that flow to send a custom object to a WCF service running on another computer (using a cxf endpoint)
Also, i've checked what happened when i leave the flow as is but drop the wcf servce (meaning a lot of socket connection exceptions were thrown).
I did this because i am building a large app that would need this bus to work at all times (weeks at a time).
In all of those cases, the mem usage kept rising. (getting as high as 200mb ram after a few hours)
Any specific reasons this could happen?? What is causing mule to take more memory, in all of these cases?
Off the top of my head I'll stick with thread pool lazy initialization as explanation for this behavior. As time goes on and usage gets higher, the thread pools will get fully initialized.
If you want proof evidences take a look to this approach, or this one (with enableStatistics).

SharePoint 2010 Sandbox Solution Timeout

Is there a way to adjust the timeout value for a SharePoint 2010 sandbox solution? I think it defaults to 30 seconds. I have a web part that occasionally runs a little longer than that. I really would prefer not to fall back to a farm solution if I can avoid it.
Finding the documentation on this was a little difficult, but I found it here. The relevant parts are these:
Per Request, with the Request Penalized: There is a hard limit to how long a sandboxed solution can take to be completed. By default, this is 30 seconds. If a sandboxed solution exceeds the limit, the application domain that handles the request (but not the sandboxed worker process) is terminated. This limit is configurable, but only through custom code against the object model. The relevant parts of the object model cannot be accessed by sandboxed solutions, so no sandboxed solution can change the limit.
CPU Execution Time The absolute limit of this resource is not applicable as long as it is set higher than the Per Request, with the Request Penalized limit described above. Normally, administrators will want to keep it higher so that the slow request is terminated before it causes a termination of the whole sandboxed worker process, including even the well-behaved sandboxed solutions running in it.
The following code can be used to adjust Per Request timeout:
SPUserCodeService.Local.WorkerProcessExecutionTimeout = 40;
SPUserCodeService.Local.Update();
You should be able to adjust the CPU Execution Time with something like the following:
SPUserCodeService.Local.ResourceMeasures["CPUExecutionTime"].AbsoluteLimit = 50.0;
SPUserCodeService.Local.Update();
You have to restart the Microsoft SharePoint Foundation Sandboxed Code Service for the changes to take effect.
In PowerShell, you can adjust the timeouts using the following commands:
$uc=[Microsoft.SharePoint.Administration.SPUserCodeService]::Local
$uc.WorkerProcessExecutionTimeout = 60
$uc.ResourceMeasures["CPUExecutionTime"].AbsoluteLimit = 120
$uc.Update()