Work managers threads constraint and page cannot be displayed - weblogic

We have a memory intensive processing for certain functionality and we would like to limit the number of parallel requests to this processing. We are able to configure by using "Work Managers" in WebLogic and putting a limit on the number of threads for that servlet.
For example, if we put maximim thread limit as 3, then if there are 10 parallel requests; 7 requests are in queue. There could be situations where these the requests waiting in queue could take up to 30-40 minutes to be processed. We did simple testing and the received page cannot be displayed due to timeout after 15 mins and received the message after 1 hour.
Does any one know if there is a setting in WebLogic to increase/decrease timeout and avoid page cannot be displayed?
Appreciate if any one has any thoughts around this.

Does any one know if there is a setting in WebLogic to increase/decrease timeout and avoid page cannot be displayed?
There might be something but I actually didn't check as it would be a bad advice anyway. By looking for this, you are trying to solve the wrong problem here. A browser is just not made for long-running process like the one you are describing (>30mn) even if you don't mind the user waiting (not mentioning that he could refresh the page and queue more and more jobs).
So, the right answer here is in my opinion: use asynchronism, this is the perfect use case. When the user clicks on the button, send a JMS message to a queue (or create a Quartz job) and send the user a page with a request ID telling him to come back later. When the processing is done, update the status somewhere and make the status/result available to the user. Really, the user experience will be better doing this and you'll face less problems than with a browser.

1) Use some other tool (not browser) like WGET where you can control timeout parameter (--timeout).
2) Why do you use HTTP? Use message driven beans and send message JMS to that and don't care about time outs.

Perhaps quartz can do what you need? Start a job and check in on it as you need to?

Related

Can I send an API response before successful persistence of data?

I am currently developing a Microservice that is interacting with other microservices.
The problem now is that those interactions are really time-consuming. I already implemented concurrent calls via Uni and uses caching where useful. Now I still have some calls that still need some seconds in order to respond and now I thought of another thing, which I could do, in order to improve the performance:
Is it possible to send a response before the sucessfull persistence of data? I send requests to the other microservices where they have to persist the results of my methods. Can I already send the user the result in a first response and make a second response if the persistence process was sucessfull?
With that, the front-end could already begin working even though my API is not 100% finished.
I saw that there is a possible status-code 207 but it's rather used with streams where someone wants to split large files. Is there another possibility? Thanks in advance.
"Is it possible to send a response before the sucessfull persistence of data? Can I already send the user the result in a first response and make a second response if the persistence process was sucessfull? With that, the front-end could already begin working even though my API is not 100% finished."
You can and should, but it is a philosophy change in your API and possibly you have to consider some edge cases and techniques to deal with them.
In case of a long running API call, you can issue an "ack" response, a traditional 200 one, only the answer would just mean the operation is asynchronous and will complete in the future, something like { id:49584958, apicall:"create", status:"queued", result:true }
Then you can
poll your API with the returned ID to see if the operation that is still ongoing, has succeeded or failed.
have a SSE channel (realtime server side events) where your server can issue status messages as pending operations finish
maybe using persistent connections and keepalives, or flushing the response in the middle, you can achieve what you point out, ie. like a segmented response. I am not familiar with that approach as I normally go for the suggesions above.
But in any case, edge cases apply exactly the same: For example, what happens if then through your API a user issues calls dependent on the success of an ongoing or not even started previous command? like for example, get information about something still being persisted?
You will have to deal with these situations with mechanisms like:
Reject related operations until pending call is resolved "server side": Api could return ie. a BUSY error informing that operations are still ongoing when you want to, for example, delete something that still is being created.
Queue all operations so the server executes all them sequentially.
Allow some simulatenous operations if you find they will not collide (ie. create 2 unrelated items)

Understanding why you would want to process Message Queues at a future time

So I'm trying to understand what practical problems Queues solve. By reading all the information from Google, I get the high-level.
Push message to Queue for processing at a later time
So I'm looking at an architecture from Company A and they have different use cases for Job Queueing like for example
chat messages
file conversion
searching
Heavy sql queries
Why process it at a later time?
Here's my best guess...
Let's say I have an application that can process 10 "things" at a time.
My application then maxes out it's processing capacity.
an 11th request came in so app puts it in the Queue for later processing
Assuming this is a valid Use Case, wouldn't adding more servers to process more "things" make sense? Is it because it's more costly to add more servers than employ a Queue and sacrifice response time a little bit?
Given my Use Case examples, what other problems would Queues solve for them?
Have you ever lined up at a bank when it is busy? You would have waited in a queue.
"But," you could say, "wouldn't adding more staff to process more customers make sense? Is it because it's more costly to add more staff than employ a Queue and sacrifice response time a little bit?"
That would be correct. It can be quite costly to staff a bank based on the peak number of customers who would arrive each day. It is cheaper to staff below this level and have some customers wait in a queue.
Also, the number of customers each day are not 100% predictable. A queue allows excess demand to wait without breaking the system.
Queues enable decoupling.
For example, imagine an online store where customers purchase an item. They select the item, provide a credit card number and click 'Purchase'. If the credit card is declined, the online store can immediately prompt them to re-enter the number. This interaction has to take place immediately while the customer is still online.
However, there is no need to have the customer wait while an invoice is generated, a record is added to the accounting system and inventory is pulled off the shelf. This can be decoupled from the ordering process. A good way to do it is to push the order into a queue, which can be handled by the next system.
If that 'next system' happens to be offline at the moment, there is no reason to cancel the whole sale. The transaction can be processed when the 'next system' comes back online. This is much better than failing the whole process just because one component (which is not required immediately) has a failure.
Bottom line: Queues are excellent. They enable better handling of failures. They makes things more resilient (just wait a few minutes and try again!). They should be used at all times when the process is compatible with a queuing architecture.
Let's do scenarios
Scenario 1 without queue:
you request an endpoint /blabla/do-eveything/
this request do
download an image from very slow FTP
e.g 1.5 sec (can error, retry ? add +X sec)
attach the image to an email
send an email (3 sec)
e.g 1 sec (can error, retry ? add +X sec)
confirmation received > store confirmation to a third company tracking stuff
e.g 1.5 (can error, retry ? add +X sec)
when tracking confirm, update your data from another third company for big data purpose
e.g 2 sec (can error, retry ? add +X sec)
... you get the idead
return the response e.g 11 sec later (this is to slow) or more or timeout when everything failed
End user said internet was faster 20 years ago, maybe I need to change my internet connection or change my 16 threads
Scenario 2 queue everything you can:
you request an endpoint /blabla/do-eveything/
this request do
Queue job "DO_EVERYTHING"
e.g 0.02 sec
Return the response less then 0.250 sec
End user said that is website/app is too fast, I can keep my 56K internet connection
on queue/event system one failed job can be retry later without affeting the end user
you can pause job, add a unlimited number a task/step after the original message
better fault tolerance
Working with queue will allow you a better micro/nano service architecture, better testing because, you can test a single job, intead of a full controller that do everything...
Ye, is maybe more work, more thinking, but a the end no need to think about the work when holidays

Queue Fairness and Messaging Servers

I'm looking to solve a problem that I have with the FIFO nature of messaging severs and queues. In some cases, I'd like to distribute the messages in a queue to the pool of consumers on a criteria other than the message order it was delivered in. Ideally, this would prevent users from hogging shared resources in the system. Take this overly simplified scenario:
There is a feature within an application where a user can empty their trash can.
This event dispatches a DELETE message for each item in trash can
The consumers for this queue invoke a web service that has a rate limited API.
Given that each user can have very large volumes of messages in their trash can, what options do we have to allow concurrent processing of each trash can without regard to the enqueue time? It seems to me that there are a few obvious solutions:
Create a separate queue and pool of consumers for each user
Randomize the message delivery from a single queue to a single pool of consumers
In our case, creating a separate queue and managing the consumers for each user really isn't practical. It can be done but I think I really prefer the second option if it's reasonable. We're using RabbitMQ but not necessarily tied to it if there is a technology more suited to this task.
I'm entertaining the idea of using Rabbit's message priorities to help randomize delivery. By randomly assigning a message a priority between 1 and 10, this should help distribute the messages. The problem with this method is that the messages with the lowest priority may be stuck in the queue forever if the queue is never completely emptied. I thought I could use a TTL on the message and then re-queue the message with an escalated priority but I noticed this in the docs:
Messages which should expire will still only expire from the head of
the queue. This means that unlike with normal queues, even per-queue
TTL can lead to expired lower-priority messages getting stuck behind
non-expired higher priority ones. These messages will never be
delivered, but they will appear in queue statistics.
I fear that I may heading down the rabbit hole with this approach. I wonder how others are solving this problem. Any feedback on creative routing, messaging patterns, or any alternative solutions would be appreaciated.
So I ended up taking a page out of the network router handbook. This a problem they routers need to solve to allow fair traffic patterns. This video has a good breakdown of the problem and the solution.
The translation of the problem into my domain:
And the solution:
The load balancer is a wrapper around a channel and a known number of queues that uses a weighted algorithm to balance between messages received on each queue. We found a really interesting article/implementation that seems to be working well so far.
With this solution, I can also prioritize workspaces after messages have been published to increase their throughput. That's a really nice feature.
The biggest challenge ahead of me is management of the queues. There will be too many queues to leave bound to the exchange for an extended period of time. I'm working on some tools to manage their lifecycle.
One solution could be to interpose a Resequencer. The principle is outlined in the diag in that link. In your case, something like:
The app dispatches its DELETE messages into the delete queue as originally.
The Resequencer (a new component you write) is interposed between the original publishers and original consumers. It:
pulls messages off the DELETE queue into memory
places them into (in-memory) queues-by-user
republishes them to a new queue (eg FairPriorityDeleteQueue), round-robinning to interleave fairly any messages from different original users
limits its republish rate into FairPriorityDeleteQueue, either such that the length of FairPriorityDeleteQueue (obtainable via polling the rabbitmq management api periodically) never exceeds some integer you choose N, or limited to some rate related to the rate-limited delete API the consumers use.
doesn't ack any message it pulled off the original DELETE queue, until it's republished it to FairPriorityDeleteQueue (so you never lose a message)
The original consumers subscribe instead to FairPriorityDeleteQueue.
You set the preFetchCount on these consumers fairly low (<10), to prevent them in turn bulk-buffering the contents of FairPriorityDeleteQueue in memory.
--
Some points to watch:
Rate- or length-limiting publishing into and/or drawing messages out of FairPriorityDeleteQueue is essential. If you don't limit, Resequencer may just hand messages on as fast as it receives them, limiting the potential for resequencing.
Resequencer of course acts as a kind of in-memory buffer while resequencing. If the original publishers can publish very large numbers of messages in to the queue suddenly, you may need to memory-limit the Resequencer process so that it doesn't ingest more than it can hold.
Your particular scenario is greatly helped by the fact that you have an external factor (the final delete API) limiting throughput. Without such an extrinsic limiting factor, it is much harder to choose the optimum parameters for such a resequencer, to balance throughput-versus-resequencing in a particular environment.
I don't think a resequencer is needed in this case. Maybe it is, if you need to ensure the items are deleted in a specific order. But that only comes into play when you send multiple messages at roughly the same time and need to guarantee order on the consumer end.
You should also avoid the timeout scenario, for the reasons you've mentioned. timeout is meant to tell RabbitMQ that a message doesn't need to be processed - or that it needs to be routed to a dead letter queue so that i can be processed by some other code. while you might be able to make timeout work, i don't think it's a good choice.
Priorities may solve part of the problem, but could introduce a scenario where files never get processed. if you have a priority 1 message sitting back in the queue somewhere, and you keep putting priority 2, 3, 5, 10, etc. into the queue, the 1 might not be processed. the timeout doesn't solve this, as you've noted.
For my money, I would suggest a different approach: sending delete requests serially, for a single file.
that is, send 1 message to delete 1 file. wait for a response to say it's done. then send the next message to delete the next file.
here's why i think that will work, and how to manage it:
Long-Running Workflow, Single File Delete Requests
In this scenario, I would suggest taking a multi-step approach to the problem using the idea of a "saga" (aka a long-running workflow object).
when a user requests to delete their trashcan, you send a single message through rabbitmq to the service that can handle the delete process. that service creates an instance of the saga for that user's trashcan.
the saga gathers a list of all files in the trashcan that need to be deleted. then it starts to send the requests to delete the individual files, one at a time.
with each request to delete a single file, the saga waits for the response to say the file was deleted.
when the saga receives the message to say the previous file has been deleted, it sends out the next request to delete the next file.
once all the files are deleted, the saga updates itself and any other part of the system to say the trash can is empty.
Handling Multiple Users
When you have a single user requesting a delete, things will happen fairly quickly for them. they will get their trash emptied soon.
u1 = User 1 Trashcan Delete Request
|u1|u1|u1|u1|u1|u1|u1|u1|u1|u1done|
when you have multiple users requesting a delete, the process of sending one file delete request at a time means each user will have an equal chance of getting the next file delete.
u1 = User 1 Trashcan Delete Request
u2 = User 2 Trashcan Delete Request
|u1|u2|u1|u1|u2|u2|u1|u2|u1|u2|u2|u1|u1|u1|u2|u2|u1|u2|u1|u1done|u2|u2done|
This way, there will be shared use of the resources to delete the files. Over-all, it will take a little longer for each person's trashcan to be emptied, but they will see progress sooner and that's an important aspect of people thinking the system is fast / responsive to their request.
Optimizing Small File Set vs Large File Set
In a scenario where you have a small number of users with a small number of files, the above solution may prove to be slower than if you deleted all the files at once. after all, there will be more messages sent across rabbitmq - at least 2 for every file that needs to be deleted (one delete request, one delete confirmation response)
To optimize this further, you could do a couple of things:
have a minimum trashcan size before you split up the work like this. below that minimum, you just delete it all at once
chunk the work into groups of files, instead of one at a time. maybe 10 or 100 files would be a better group size, than 1 file at a time
Either (or both) of these solutions would help to improve the over-all performance of the process by reducing the number of messages being sent, and batching the work a bit.
You would need to do some testing in your real scenario to see which of these (or maybe both) would help and at what settings.
Many Users Problem
There's one additional problem you may face - many users. If you have 2 or 3 users requesting deletes, it won't be a big deal.
But if you have 100 or 1000 users requesting deletes, it could take a very long time for an individual to get their trashcan emptied.
You may need to have a higher level controlling process for this situation, where all requests to empty trashcans would be managed by yet another Saga. This saga would rate-limit the number of active trashcan-deletion sagas.
For example, if you have 10 active requests for deleting trashcans, the rate-limiting saga would only start 3 of them and it would wait for one to finish before starting the next one.
Again, you would need to test your actual scenario to see if this is needed and see what the limits should be, for performance reasons.
There may be additional scenarios that have to be considered in your actual scenario, but I hope this gets you down the path! :)

Long polling blocking multiple windows?

Long polling has solved 99% of my problems. There is now just one other problem. Imagine a penny auction site, where people bid. On the frontpage, there are several Auctions.
If the user opens three of these auctions, and because javascript is not multithreaded, how would you get the other pages to ever load? Won't they always get bogged down and not load because they are waiting for long polling to end? In practice, I've experienced this and I can't think of a way around it. Any ideas?
There are two ways that javascript gets around some of this.
While javascript is single threaded conceptually, it does its io in separate threads using completion handlers. This means other pieces of javascript can be running while you are waiting for your network request to complete.
Javascript for each page (or even each frame in each page) is isolated from Javascript on the other pages/frames. This means that each copy of javascript can be running in its own thread.
A bigger issue for you is likely to be that browsers often limit the number of concurrent connections to a given site, and it sounds like you want to make many concurrent connections to the same site. In this case you will get a lock up.
If you control both the sever and client, you will need to combined the multiple long-poll request from the client into a single long-poll request to the server.

Use Redis to track concurrent outbound HTTP requests

I'm a little new to Redis, but I'd like to see if it can be used to keep track of how many concurrent HTTP connections I'm making.
Here's the high level plan:
INCR requests
// request begins
HTTP.get(...)
// request ends
DECR.requests
Then at any point, just call GET requests to see how many are currently open.
The ultimate goal here is to throttle my http requests to stay below some arbitrary amount, say 50 requests/s.
Is this the right way to do it? Are there any pitfalls?
As for pitfalls, the only one I can see is that a server that goes down or loses connection to Redis mid-request may never call DECR.
Since you don't know which server does which request, you can never reset the count to the correct value without bringing the system to a halt and reset to 0.
I'm not clear what you'd gain by using redis in this situation. It seems to me it would be more suitable to use just a global variable in your server. If your server goes down, so does your counter, so you don't have to put complicated things in place to deal with disconnection, inconsistencies, etc...