I am writing a native YARN application following the model of the distributed shell application. In my application master I am requesting two containers using the usual looping as follows:
for (int i = 0; i < appContainerList.size(); ++i)
{
ContainerRequest containerAsk = setupContainerAskForRM(i);
amRMClient.addContainerRequest(containerAsk);
appContainerList.setStatus(i, "requested");
}
As long as the two containers request the same amount of memory, say either 512 or 1000, then shortly after this loop runs I get a callback to the onContainersAllocated method of my AMRMClientAsync.CallbackHandler with a list of two allocated containers that were allocated. This also happens if I ask for more than two containers with the same resource allocation, but I am keeping it to two here so that the demonstration of the issue is simplified.
However, if I make the requests for different capabilities, say one for 512 and the other for 1000, then I also get a callback but only one container is allocated, and I never get a call back for the second container request.
I know that the communication between the AMRMClientAsync and the RM rides on top of heartbeats that are sent every second, so I tried inserting a sleep between the two container requests and now I get two callbacks, each with one allocated container.
Here is my code with a sleep.
for (int i = 0; i < appContainerList.size(); ++i)
{
ContainerRequest containerAsk = setupContainerAskForRM(i);
amRMClient.addContainerRequest(containerAsk);
appContainerList.setStatus(i, "requested");
try
{
Thread.sleep(5000);
}
catch (InterruptedException ex)
{
LOG.info("sleep interrupted " + ex);
}
}
It this correct, is it impossible to request containers with different resource capabilities in a tight loop? Do requests for containers with different resource capabilities require a sleep in between so they don’t end up riding the same heartbeat same communication to the RM?
If so, this seems to mean that if I have many different container types, with different resource capabilities, I have to group them and make sure that requests for different types have at least one heartbeat between them. This is much more complicated than simply requesting the containers in a tight loop without regard to the resource capabilities each one is requesting.
I found a related post here: post by yihee and a JIRA here: YARN-314.
The answer to my question appears to be, as it says in YARN-314:
"Currently, resource requests for the same container and locality are expected to all be the same size." Therefore to request containers of different resource requirements in a tight loop they must have different priorities if the resources requested are different.
Answering my own question. Based on the other references I mentioned, especially YARN-314, I changed the priorities of the containers I requested and now I can request the containers in a tight loop and I get both containers allocated in the same call to my onContainersAllocated callback handler.
Related
I'm using pipeline with lettuce, and I have a design question. When trying send a block of commands to redis using the 'sendBlock' method below, I'm thinking about 2 options:
(1) Having one instance of the connection already established in the class, and reuse it:
private void sendBlock()
{
this.conn.setAutoFlushCommands(false);
(...)
this.conn.flushCommands();
}
(2) Every time I send a block of commands get a connection from redis, perform the action and close it.
private void sendBlock()
{
StatefulRedisModulesConnection<String, String> conn = RedisClusterImpl.connect();
conn.setAutoFlushCommands(false);
(...)
conn.flushCommands();
conn.close();
}
Since established connections seem to be shared between all threads in lettuce, I'm not sure if point 1 is correct. If not, I have to go to point 2. And in this case I don't know how costly is to obtain a connection from Redis, so I'm wondering if I need to use pooling (thing that is not recommended in the lettuce docs). In our use case the 'sendBlock' method can be simultaneously called hundreds of times, so it's intensively used by a lot of different threads.
Any help would be really appreciated.
Joan.
Lettuce connections are thread-safe and can be shared if you don't use Redis-blocking commands (ex. BLPOP) and transactions.
Those should be performed on separate connections, as the transaction will apply to the entire connection, and blocking operations will block the connection until they're complete.
Whether or not you should share a manually-flushed connection depends only on the number of operations you perform between flushes. Ex. if each block is 10k commands, and you have 10 threads, you could queue 100k to send at once where you expected 10k. Whether or not this matters will depend on your application, and you should check the performance of your individual case.
If each block is not sending many commands you may not even need to flush manually as Lettuce pipelines with auto-flush enabled (see this answer).
I'm a bit confused about how Cro handles client requests and, specifically, why some requests seem to cause Cro's memory usage to balloon.
A minimal example of this shows up in the literal "Hello world!" Cro server.
use Cro::HTTP::Router;
use Cro::HTTP::Server;
my $application = route {
get -> {
content 'text/html', 'Hello Cro!';
}
}
my Cro::Service $service = Cro::HTTP::Server.new:
:host<localhost>, :port<10000>, :$application;
$service.start;
react whenever signal(SIGINT) {
$service.stop;
exit;
}
All that this server does is respond to GET requests with "Hello Cro!' – which certainly shouldn't be taxing. However, if I navigate to localhost:10000 and then rapidly refresh the page, I notice Cro's memory use start to climb (and then to stay elevated).
This only seems to happen when the refreshes are rapid, which suggests that the issue might be related either to not properly closing connections or to a concurrency issue (a maybe-slightly-related prior question).
Is there some performance technique or best practice that this "Hello world" server has omitted for simplicity? Or am I missing something else about how Cro is designed to work?
The Cro request processing pipeline is a chain of supply blocks that requests and, later, responses pass through. Decisions about the optimal number of processing threads to create are left to the Raku ThreadPoolScheduler implementation.
So far as connection lifetime goes, it's up to the client - that is, the web browser - as to how eagerly connections are closed; if the browser uses a keep-alive HTTP/1.1 connection or retains a HTTP/2.0 connection, Cro respects that request.
Regarding memory use, growth up to a certain point isn't surprising; it's only a problem if it doesn't eventually level out. Causes include:
The scheduler determining more threads are required to handle the load. Each OS thread comes with some overhead inside the VM, the majority of it being that the GC nursery is per thread to allow simple bump-the-pointer allocation.
The MoarVM optimizer using memory for specialized bytecode and JIT-compiled machine code, which it produces in the background as the application runs, and is driven by certain bits of code having been executed enough times.
The GC trying to converge on a full collection threshold.
I need to get screenshot of 1000 URLs using Parallel.Foreach in windows service. I tried to use WebBrowser control but it throws error since it runs only in STA. Kindly tell me how to achieve this task using Parallel.Foreach...
Edit : I am using a third party trial version DLL in below code to process it...
Parallel.ForEach(webpages, webPage=>
{
GetScreenShot(webPage);
}
public void GetScreenShot(string webPage)
{
WebsitesScreenshot.WebsitesScreenshot _Obj;
_Obj = new WebsitesScreenshot.WebsitesScreenshot();
WebsitesScreenshot.WebsitesScreenshot.Result _Result;
_Result = _Obj.CaptureWebpage(webPage);
if (_Result == WebsitesScreenshot.
WebsitesScreenshot.Result.Captured)
{
_Obj.ImageFormat = WebsitesScreenshot.
WebsitesScreenshot.ImageFormats.PNG;
_Obj.SaveImage(somePath);
}
_Obj.Dispose();
}
Most of the time this code runs fine upto processing of 80 urls but after that some tasks are being blocked. I don't know why...
Some times error is ContextSwitchDeadlock....as given below...
ContextSwitchDeadlock was detected
Message: The CLR has been unable to transition from COM context 0x44d3a8 to COM context 0x44d5d0 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages. This situation generally has a negative performance impact and may even lead to the application becoming non responsive or memory usage accumulating continually over time. To avoid this problem, all single threaded apartment (STA) threads should use pumping wait primitives (such as CoWaitForMultipleHandles) and routinely pump messages during long running operations.
This error indicates that a CLR thread is not sending any messages for an extended period of time. If a process is resource starved causing extended waits during processing this error can occur.
Given that you are trying to process 1000 web pages simultaneously, it would be no surprise that at least some of the threads will become resource starved. Personally, it is surprising to me that you can hit 80 websites without seeing errors.
Back off the number of websites you are trying to processing in parallel and your problems will likely disappear. Since you are running the trial version, there is little else you can do. If you licensed the commercial version you might be able to get support from the vendor. But at a guess, they would simply tell you to do the same thing.
The Websites.Screenshot library can be quite resource intensive depending upon the web page, esp. if the pages have flash. Think of it as being logically equivalent to opening 80 tabs simultaneously in a web browser.
You don't mention whether you are using the 32-bit or the 64-bit version. But the 64-bit version is likely to have fewer resource constraint, esp. memory. IMHO The .Net framework does a poor job of minimizing memory usage, so memory problems can crop up earlier than you would think.
ADDED
Please try limiting the number of threads threads first, e.g.
Parallel.ForEach(
Webpages,
new ParallelOptions { MaxDegreeOfParallelism = 10 }, // 10 thread limit
webPage => { GetScreenShot(webPage); }
);
Without access to the source code, you may not be able to change the threading model at all. You might also try setting the timeout to a higher value.
I don't have this control personally and am not willing to install it on my machine to answer a question re: changing the threading model. Unless it is a documented feature, you probably won't be able to do it without changing or at least inspecting the source.
how to design parallel processing workflow
I have a scenarial case about data analysis.
There are four steps basicly:
pick up task either read from a queue or receive a message throught API (web service maybe) to trigger the service
submit request to remote service base on the parameters from step 1
wait from remote service finished and download
perform process on the data that downloaded from step 3
the four step above looks like a sequence workflow.
my question is that how can i scale it out.
every day i might need to perform hundreds to thousands of this task.
if i can do them in parallel, that will help a lot.
e.g run 20 tasks at a time.
so can we config windows workflow foundation to run parallel?
Thanks.
You may want to use pfx (http://www.albahari.com/threading/part5.aspx), then you can control how many threads to make for fetching, and using PLINQ I find helpful.
So, you loop over the list of urls, perhaps reading from a file or database, and then in your select you can then call a function to do the processing.
If you can go into more detail as to whether you want to have the fetching and processing be on different threads, for example, it may be easier to give a more complete answer.
UPDATE:
This is how I would approach this, but I am also using ConcurrentQueue (http://www.codethinked.com/net-40-and-system_collections_concurrent_concurrentqueue) so I can be putting data into the queue while reading from it.
This way each thread can dequeue safely, without worrying about having to lock your collection.
Parallel.For(0, queue.Count, new ParallelOptions() { MaxDegreeOfParallelism = 20 },
(j) =>
{
String i;
queue.TryDequeue(out i);
// call out to URL
// process data
}
});
You may want to put the data into another concurrent collection and have that be processed separately, it depends on your application needs.
Depending on the way your tasks and workflow is modeled you can use a Parallel activity and create different branches for the different tasks to be performed. Each branch has its own logic and the WF runtime will start a second WCF request to retrieve data as soon as it is waiting for the first to respond. This requires you to model the number of branches explicitly but allows for different activities in each branch.
But from you description it sounds like you have the same steps for each task and in that case you could model it using a ParallelForEach activity and have that iterate over a collection of tasks. Each task object would need to contain all the information used for the request. This requires each task to have the same steps but you can put in as many tasks as you want.
What works best really depends on your scenario.
I have a number of data containers that can send a signal when there are updates in them. The structure looks similar like this:
typedef struct {
int data;
/*...*/
pthread_cond_t *onHaveUpdate;
} Container;
The onHaveUpdate is a pointer to a global condition that is shared by all the containers
In my application, I have a number of these structures and they can concurrently be updated by different threads.
Now, is it possible for me to have a thread that listens to the condition and can perform something on the container that sends the notification?
I know that this can be solved by using one thread per container but it feel like a waste of resources, but I was wondering if this can be done using only one thread for all container?
The problem is that your condition is shared by all containers, so when you send the condition that something has changed, you have no idea what has changed. Since the architecture of your application is not a 100% clear from your context, why not implement a Queue which holds events (and pointers to containers are pushed onto that Queue) and a worker thread which takes events from that waitqueue and performs the work. That way your worker queue needs to wait on the condition that the queue is filled (or run in a non aggressive while-true-fashion), and that way you can remove the condition variable from your container altogether.