Why Alamofire is using dispatch_sync() function when creating a dataTask? - alamofire

The code below is from source code of Alamofire
let queue = dispatch_queue_create(nil, DISPATCH_QUEUE_SERIAL)
public func request(URLRequest: URLRequestConvertible) -> Request {
var dataTask: NSURLSessionDataTask!
dispatch_sync(queue) { dataTask = self.session.dataTaskWithRequest(URLRequest.URLRequest) }
let request = Request(session: session, task: dataTask)
self.delegate[request.delegate.task] = request.delegate
if startRequestsImmediately {
request.resume()
}
return request
}
It seems like every time it creates a dataTask, it dispatch that creating process to a serial queue. Would this measure protect the program from any kind of multi-thread trap?
I can't figure out what's the difference without that queue.

The reason why we implemented that check is due to Alamofire Issue #393. We were seeing duplicate task identifiers without the serial queue when creating data and upload tasks in parallel from multiple threads. It appears that Apple has a thread safety issue when incrementing the task identifiers. Therefore in Alamofire, we eliminate the issue by creating the tasks on a serial queue.
Cheers. 🍻

Related

Usage of WorkspaceJob

I have an eclipse plugin which has some performance issues. Looking into the progress view sometimes there are multiple jobs waiting and from the code most of it's arhitecture is based on classes which extend WorkspaceJobs mixed with Guava EventBus events. The current solution involves also nested jobs...
I read the documentation, I understand their purpose, but I don't get it why would I use a workspace job when I could run syncexec/asyncexec from methods which get triggered when an event is sent on the bus?
For example instead of creating 3 jobs which wait one for another, I could create an event which triggers what would have executed Job 1, then when the method is finished, it would have sent a different event type which will trigger a method that does what Job 2 would have done and so on...
So instead of:
WorkspaceJob Job1 = new WorkspaceJob("Job1");
Job1.schedule();
WorkspaceJob Job2 = new WorkspaceJob("Job2");
Job2.schedule();
WorkspaceJob Job1 = new WorkspaceJob("Job3");
Job3.schedule();
I could use:
#Subsribe
public replaceJob1(StartJob1Event event) {
//do what runInWorkspace() of Job1 would have done
com.something.getStaticEventBus().post(new Job1FinishedEvent());
}
#Subsribe
public replaceJob2(Job1FinishedEvent event) {
//do what `runInWorkspace()` of Job2 would have done
com.something.getStaticEventBus().post(new Job2FinishedEvent());
}
#Subsribe
public replaceJob3(Job2FinishedEvent event) {
//do what `runInWorkspace()` of Job3 would have done
com.something.getStaticEventBus().post(new Job3FinishedEvent());
}
I didn't tried it yet because I simplified the ideas as much as I could and the problem is more complex than that, but I think that the EventBus would win in terms of performance over the WorkspaceJobs.
Can anyone confirm my idea or tell my why this I shouldn't try this( except for the fact that I must have a good arhitecture of my events)?
WorkspaceJob delays resource change events until the job finishes. This prevents components listening for resource changes receiving half completed changes. This may or may not be important to your application.
I can't comment on the Guava code as I don't know anything about it - but note that if your code is long running you must make sure it runs in a background thread (which WorkbenchJob does).

How to handle Not authorized to access topic ... in Kafka Streams

Situation is the following.
We have setup SSL + ACLs in Kafka Broker.
We are setting up stream, which reads messages from two topics:
KStream<String, String> stringInput
= kBuilder.stream( STRING_SERDE, STRING_SERDE, inTopicName );
stringInput
.filter( streamFilter::passOrFilterMessages )
.map( processor )
.to( outTopicName );
It is done like two times (in the loop).
Then we are setting general error handler:
streams.setUncaughtExceptionHandler( ( Thread t, Throwable e ) -> {
synchronized ( this ) {
LOG.fatal( ... );
this.stop();
}
}
);
Problem is the following. If for example in one topic certificate is no more valid. The stream is throwing exception Not authorized to access topics ...
So far so good.
But the exception is handled by general error handler, so the complete application stops even if the second topic has no problems.
The question is, how to handle this exception per topic?
How to avoid situation that at some moment complete application stops due to the problem that one single topic has problems with authorization?
I understand that if Broker is not available, then complete app may stop. But if only one topic is not available, then single stream shall stop, and not complete application, or?
By design, Kafka Streams treats the topology a one and cannot distinguish between both parts. For your specific case, as you loop and build to independent pipelines, you could run two KafkaStreams instances in parallel (within the same application/JVM) to isolate both from each other. Thus, if one fails, the other one is not affected. You would need to use two different application.id for both instances.

Execute multiple downloads and wait for all to complete

I am currently working on an API service that allows 1 or more users to download 1 or more items from an S3 bucket and return the contents to the user. While the downloading is fine, the time taken to download several files is pretty much 100-150 ms * the number of files.
I have tried a few approaches to speeding this up - parallelStream() instead of stream() (which, considering the amount of simultaneous downloads, is at a serious risk of running out of threads), as well as CompleteableFutures, and even creating an ExecutorService, doing the downloads then shutting down the pool. Typically I would only want a few concurrent tasks e.g. 5 at the same time, per request to try and cut down on the number of active threads.
I have tried integrating Spring #Cacheable to store the downloaded files to Redis (the files are readonly) - while this certainly cuts down response times (several ms to retrieve files compared to 100-150 ms), the benefits are only there once the file has been previously retrieved.
What is the best way to handle waiting on multiple async tasks to finish then getting the results, also considering I don't want (or don't think I could) have hundreds of threads opening http connections and downloading all at once?
You're right to be concerned about tying up the common fork/join pool used by default in parallel streams, since I believe it is used for other things like sort operations outside of the Stream api. Rather than saturating the common fork/join pool with an I/O-bound parallel stream, you can create your own fork/join pool for the Stream. See this question to find out how to create an ad hoc ForkJoinPool with the size you want and run a parallel stream in it.
You could also create an ExecutorService with a fixed-size thread pool that would also be independent of the common fork/join pool, and would throttle the requests, using only the threads in the pool. It also lets you specify the number of threads to dedicate:
ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS_FOR_DOWNLOADS);
try {
List<CompletableFuture<Path>> downloadTasks = s3Paths
.stream()
.map(s3Path -> completableFuture.supplyAsync(() -> mys3Downloader.downloadAndGetPath(s3Path), executor))
.collect(Collectors.toList());
// at this point, all requests are enqueued, and threads will be assigned as they become available
executor.shutdown(); // stops accepting requests, does not interrupt threads,
// items in queue will still get threads when available
// wait for all downloads to complete
CompletableFuture.allOf(downloadTasks.toArray(new CompletableFuture[downloadTasks.size()])).join();
// at this point, all downloads are finished,
// so it's safe to shut down executor completely
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
} finally {
executor.shutdownNow(); // important to call this when you're done with the executor.
}
Following #Hank D's lead, you can encapsulate the creation of the executor service to ensure that you do, indeed, call ExecutorService::shutdownNow after using said executor:
private static <VALUE> VALUE execute(
final int nThreads,
final Function<ExecutorService, VALUE> function
) {
ExecutorService executorService = Executors.newFixedThreadPool(nThreads);
try {
return function.apply(executorService);
} catch (final InterruptedException | ExecutionException exception) {
exception.printStackTrace();
} finally {
executorService .shutdownNow(); // important to call this when you're done with the executor service.
}
}
public static void main(final String... arguments) {
// define variables
final List<CompletableFuture<Path>> downloadTasks = execute(
MAX_THREADS_FOR_DOWNLOADS,
executor -> s3Paths
.stream()
.map(s3Path -> completableFuture.supplyAsync(
() -> mys3Downloader.downloadAndGetPath(s3Path),
executor
))
.collect(Collectors.toList())
);
// use downloadTasks
}

RabbitMQ Wait for a message with a timeout

I'd like to send a message to a RabbitMQ server and then wait for a reply message (on a "reply-to" queue). Of course, I don't want to wait forever in case the application processing these messages is down - there needs to be a timeout. It sounds like a very basic task, yet I can't find a way to do this. I've now run into this problem with Java API.
The RabbitMQ Java client library now supports a timeout argument to its QueueConsumer.nextDelivery() method.
For instance, the RPC tutorial uses the following code:
channel.basicPublish("", requestQueueName, props, message.getBytes());
while (true) {
QueueingConsumer.Delivery delivery = consumer.nextDelivery();
if (delivery.getProperties().getCorrelationId().equals(corrId)) {
response = new String(delivery.getBody());
break;
}
}
Now, you can use consumer.nextDelivery(1000) to wait for maximum one second. If the timeout is reached, the method returns null.
channel.basicPublish("", requestQueueName, props, message.getBytes());
while (true) {
// Use a timeout of 1000 milliseconds
QueueingConsumer.Delivery delivery = consumer.nextDelivery(1000);
// Test if delivery is null, meaning the timeout was reached.
if (delivery != null &&
delivery.getProperties().getCorrelationId().equals(corrId)) {
response = new String(delivery.getBody());
break;
}
}
com.rabbitmq.client.QueueingConsumer has a nextDelivery(long timeout) method, which will do what you want. However, this has been deprecated.
Writing your own timeout isn't so hard, although it may be better to have an ongoing thread and a list of in-time identifiers, rather than adding and removing consumers and associated timeout threads all the time.
Edit to add: Noticed the date on this after replying!
There is similar question. Although it's answers doesn't use java, maybe you can get some hints.
Wait for a single RabbitMQ message with a timeout
I approached this problem using C# by creating an object to keep track of the response to a particular message. It sets up a unique reply queue for a message, and subscribes to it. If the response is not received in a specified timeframe, a countdown timer cancels the subscription, which deletes the queue. Separately, I have methods that can be synchronous from my main thread (uses a semaphore) or asynchronous (uses a callback) to utilize this functionality.
Basically, the implementation looks like this:
//Synchronous case:
//Throws TimeoutException if timeout happens
var msg = messageClient.SendAndWait(theMessage);
//Asynchronous case
//myCallback receives an exception message if there is a timeout
messageClient.SendAndCallback(theMessage, myCallback);

How to write a transactional, multi-threaded WCF service consuming MSMQ

I have a WCF service that posts messages to a private, non-transactional MSMQ queue. I have another WCF service (multi-threaded) that processes the MSMQ messages and inserts them in the database.
My issue is with sequencing. I want the messages to be in certain order. For example MSG-A need to go to the database before MSG-B is inserted. So my current solution for that is very crude and expensive from database perspective.
I am reading the message, if its MSG-B and there is no MSG-A in the database, I throw it back on the message queue and I keep doing that till MSG-A is inserted in the database. But this is a very expensive operation as it involves table scan (SELECT stmt).
The messages are always posted to the queue in sequence.
Short of making my WCF Queue Processing service Single threaded (By setting the service behavior attribute InstanceContextMode to Single), can someone suggest a better solution?
Thanks
Dan
Instead of immediately pushing messages to the DB after taking them out of the queue, keep a list of pending messages in memory. When you get an A or B, check to see if the matching one is in the list. If so, submit them both (in the right order) to the database, and remove the matching one from the list. Otherwise, just add the new message to that list.
If checking for a match is too expensive a task to serialize - I assume you are multithreading for a reason - the you could have another thread process the list. The existing multiple threads read, immediately submit most messages to the DB, but put the As and Bs aside in the (threadsafe) list. The background thread scavenges through that list finding matching As and Bs and when it finds them it submits them in the right order (and removes them from the list).
The bottom line is - since your removing items from the queue with multiple threads, you're going to have to serialize somewhere, in order to ensure ordering. The trick is to minimize the number of times and length of time you spend locked up in serial code.
There might also be something you could do at the database level, with triggers or something, to reorder the entries when it detects this situation. I'm afraid I don't know enough about DB programming to help there.
UPDATE: Assuming the messages contain some id that lets you associate a message 'A' with the correct associated message 'B', the following code will make sure A goes in the database before B. Note that it does not make sure they are adjacent records in the database - there could be other messages between A and B. Also, if for some reason you get an A or B without ever receiving the matching message of the other type, this code will leak memory since it hangs onto the unmatched message forever.
(You could extract those two 'lock'ed blocks into a single subroutine, but I'm leaving it like this for clarity with respect to A and B.)
static private object dictionaryLock = new object();
static private Dictionary<int, MyMessage> receivedA =
new Dictionary<int, MyMessage>();
static private Dictionary<int, MyMessage> receivedB =
new Dictionary<int, MyMessage>();
public void MessageHandler(MyMessage message)
{
MyMessage matchingMessage = null;
if (IsA(message))
{
InsertIntoDB(message);
lock (dictionaryLock)
{
if (receivedB.TryGetValue(message.id, out matchingMessage))
{
receivedB.Remove(message.id);
}
else
{
receivedA.Add(message.id, message);
}
}
if (matchingMessage != null)
{
InsertIntoDB(matchingMessage);
}
}
else if (IsB(message))
{
lock (dictionaryLock)
{
if (receivedA.TryGetValue(message.id, out matchingMessage))
{
receivedA.Remove(message.id);
}
else
{
receivedB.Add(message.id, message);
}
}
if (matchingMessage != null)
{
InsertIntoDB(message);
}
}
else
{
// not A or B, do whatever
}
}
If you're the only client of those queues, you could very easy add a timestamp as a message header (see IDesign sample) and save the Sent On field (kinda like an outlook message) in the database as well. You could process them in the order they were sent (basically you move the sorting logic at the time of consumption).
Hope this helps,
Adrian