Chronicle Queue - reader/tailer latency when run at same time while writing - chronicle-queue

I'm setting up a market data back-testing using Chronicle Queue (CQ), reading data from a binary file then writing into a single CQ and simultaneously reading the data from that CQ and dumping the statistics. I am doing a POC to replace our existing real-time market data feed handler worker queue.
While doing basic read/writes testing on Linux/SSD setup, I see reads are lagging behind writes - in fact latency is accumulating. Both Appender and Tailer are running as separate processes on same host.
Would like to know, if there is any issue in the code I am using?
Below is the code snippet -
Writer -
In constructor -
myQueue = SingleChronicleQueueBuilder.binary(queueName).build();
myAppender = myQueue.acquireAppender();
In data callback -
myAppender.writeDocument(myDataPacket);
myQueue.close();
where myDataPacket is Java object wrapping the byte[] and other fields.
Tailer -
In Constructor -
myQueue = SingleChronicleQueueBuilder.binary(queueName).build();
myTailer = myQueue.createTailer();
In Read method -
while (notLastRecord)
{
if(myTailer.readDocument(myDataPacket))
{
notLastRecord = ;
//do stuff
}
}
myQueue.close();
Any help is highly appreciated.
Thanks,
Pavan

First of all I assume by "reads are lagging behind writes - in fact latency is accumulating" you mean that for every every subsequent message, the time the message is read from the queue is further from the time the event was written to the queue.
If you see latency accumulating like that, most likely the data is produced much quicker then you can consume it which from the use case you described is very much possible - if all you need at the write side is parsing simple text line and dump it into a queue file, it's quick, but if you do some processing when you read the entry from the queue - it might be slower.
From the code it's not clear what/how much work your code is doing, and the code looks OK to me, except you probably shouldn't call queue.close() after each appender.writeDocument() call but most likely you are not doing this otherwise it would blow up.
Without seeing actual code or test case it's impossible to say more.

Related

Vulkan Queue submission synchronization - vkWaitForFences vs vkQueueWaitIdle [duplicate]

I have a function that copies data from one buffer to another, I need to synchronize its execution.
I have such a bad option:
void MainWindow::copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size)
{
VkCommandBuffer commandBuffer;
vkAllocateCommandBuffers(logicalDevice, &allocInfo, &commandBuffer);
//Start recording
vkBeginCommandBuffer(commandBuffer, &beginInfo);
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, &copyRegion);
vkEndCommandBuffer(commandBuffer);
//Run command buffer
vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
//Waiting for completion
vkQueueWaitIdle(graphicsQueue);
vkFreeCommandBuffers(logicalDevice, commandPool, 1, &commandBuffer);
}
This option is bad because if I want to execute the copyBuffer() function several times, then all the buffers will be copied strictly one at a time.
I want to use a fence for each function call so that multiple calls can run in parallel.
So far, I have only such a solution:
void MainWindow::copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size)
{
VkCommandBuffer commandBuffer;
vkAllocateCommandBuffers(logicalDevice, &allocInfo, &commandBuffer);
//Create fence
VkFenceCreateInfo fenceInfo{};
fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
fenceInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT;
VkFence executionCompleteFence = VK_NULL_HANDLE;
if (vkCreateFence(logicalDevice, &fenceInfo, VK_NULL_HANDLE, &executionCompleteFence) != VK_SUCCESS) {
throw MakeErrorInfo("Failed to create fence");
}
//Start recording
vkBeginCommandBuffer(commandBuffer, &beginInfo);
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, &copyRegion);
vkEndCommandBuffer(commandBuffer);
//Run command buffer
vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
vkWaitForFences(logicalDevice, 1, &executionCompleteFence, VK_TRUE, UINT64_MAX);
vkResetFences(logicalDevice, 1, &executionCompleteFence);
vkFreeCommandBuffers(logicalDevice, commandPool, 1, &commandBuffer);
vkDestroyFence(logicalDevice, executionCompleteFence, VK_NULL_HANDLE);
}
Which of these options is better?
Is the second option written correctly?
Both functions are bad in the same way. They both block the CPU from doing anything until the transfer is done. And they both could be used to potentially submit multiple CBs to the same queue in the same frame, but with different submit commands.
Neither is desirable if performance is something you care about.
Ultimately, what you need to do is have your copyBuffer function not actually perform the copy. You should have a function which builds a command buffer to do a copy. That CB is then stored in a place to be submitted later with other copying CBs. Or better yet, you can have just one copying CB that each command adds to (the first one called in a frame will create the CB).
At some point in the future, before you've submitted the work that will use this data, you need to submit the transfer operations. And the way this works depends on if you're submitting the transfer operations on the same queue as the work that will consume them or not.
If they're on the same queue, then all you need to do is have an event in a command buffer at the end of your batch that synchronizes the transfer operations with their receivers. If you want to be more clever, each transfer operation can have its own event, which the receiving operations will wait on.
And in same-queue transfers, you also want to make sure that you submit the transfers in the same vkQueueSubmit call as the rest of your work. Or to put it another way, you should never make more than one call to vkQueueSubmit for a particular queue in a particular frame.
If you're dealing with separate queues, then things change. A bit. If timeline semaphores aren't an option, you'll need to submit your transfer work before you submit the receiving operations. This is because the transfer batch will need to signal a semaphore that the receiving operation will wait on. And a binary semaphore cannot be waited on until the operation that signals it has been submitted to a queue.
But otherwise, everything else stays the same. Of course, you don't need events since you're synchronizing by semaphore.
The two functions are semantically identical and do exactly the same blocking behavior.
The second is slightly better. vkQueueWaitIdle is kind of a debug and out-of-hotspot feature. It might incur a hidden second submit to signal the implicit fence.
You don't need to reset fence that you subsequently destroy anyway. And you are creating it presignaled, which is a bug. Also you forgot to pass it to the vkQueueSubmit.

How to postpone an event in Spark/RabbitMQ

We are designing a system that will stream event using RabbitMQ (maybe later kafka) and spark streaming. Some or our events have been broken to several event types, so not to have too big of an event. This means that certain events have to wait for the other events (with the same id). We cannot proceed with processing until all events for a specific event have arrived.
Is there a way to delay the processing of an event until the next processing window in spark streaming (if the other event has not arrived)
Thank
Nir
From an architectural perspective, there are questions to consider:
how do you determine that all events have arrived?
what happens if one event gets lost?
what happens if events arrive out of order? Last first and similar?
In principle it would seem that breaking down an event in parts that was originally formed as a whole would increase the complexity and affect the reliability of the system.
To answer to the question in any case, since Spark 1.6.x a new stateful streaming function has been introduced: mapWithState. mapWithState allows you to keep state information per key and issue zero or more events of the same or different type in response to an incoming event.
Applied to this case, we could think of modelling the state as State[PartialEvent]: as events come in, they are assembled in a PartialEvent object. Once the criteria that an event is complete has been fulfilled, mapWithState can generate a WholeEvent object to be processed downstream.
The process would roughly(*) look like this:
val sourceEventDStream:DStream[Event] = ???
def stateUpdateFunction(eventId:String, event: Event, partialEventState: State[PartialEvent]): Option[WholeEvent] = {
val eventState = partialEventState.get() // Get current state of the event
val updatedEvent = merge(eventState, event)
if (updatedEvent.isComplete) {
partialEventState.remove()
Some(WholeEvent(updatedEvent))
} else {
partialEventState.update(updatedEvent)
None
}
}
val wholeEventDStream:DStream[WholeEvent] = sourceEventDStream.mapWithState(StateSpec.function(stateUpdateFunction))
//do stuff with wholeEventDStream ...
As you could observe, with this approach, any PartialEvent that never completes will stay in the state forever. We also need a unique key to identify events that belong together. There're timeout options that must be considered to cover for the failure cases, but the bottom line is that preserving a whole event through the pipeline would be a better approach, if technically possible.
(*) not compiled or tested. Provided only to illustrate the idea.

How to handle GSM buffer on the Microcontroller?

I have a GSM module hooked up to PIC18F87J11 and they communicate just fine . I can send an AT command from the Microcontroller and read the response back. However, I have to know how many characters are in the response so I can have the PIC wait for that many characters. But if an error occurs, the response length might change. What is the best way to handle such scenario?
For Example:
AT+CMGF=1
Will result in the following response.
\r\nOK\r\n
So I have to tell the PIC to wait for 6 characters. However, if there response was an error message. It would be something like this.
\r\nERROR\r\n
And if I already told the PIC to wait for only 6 characters then it will mess out the rest of characters, as a result they might appear on the next time I tell the PIC to read the response of a new AT command.
What is the best way to find the end of the line automatically and handle any error messages?
Thanks!
In a single line
There is no single best way, only trade-offs.
In detail
The problem can be divided in two related subproblems.
1. Receiving messages of arbitrary finite length
The trade-offs:
available memory vs implementation complexity;
bandwidth overhead vs implementation complexity.
In the simplest case, the amount of available RAM is not restricted. We just use a buffer wide enough to hold the longest possible message and keep receiving the messages bytewise. Then, we have to determine somehow that a complete message has been received and can be passed to further processing. That essentially means analyzing the received data.
2. Parsing the received messages
Analyzing the data in search of its syntactic structure is parsing by definition. And that is where the subtasks are related. Parsing in general is a very complex topic, dealing with it is expensive, both in computational and laboriousness senses. It's often possible to reduce the costs if we limit the genericity of the data: the simpler the data structure, the easier to parse it. And that limitation is called "transport layer protocol".
Thus, we have to read the data to parse it, and parse the data to read it. This kind of interlocked problems is generally solved with coroutines.
In your case we have to deal with the AT protocol. It is old and it is human-oriented by design. That's bad news, because parsing it correctly can be challenging despite how simple it can look sometimes. It has some terribly inconvenient features, such as '+++' escape timing!
Things become worse when you're short of memory. In such situation we can't defer parsing until the end of the message, because it very well might not even fit in the available RAM -- we have to parse it chunkwise.
...And we are not even close to opening the TCP connections or making calls! And you'll meet some unexpected troubles there as well, such as these dreaded "unsolicited result codes". The matter is wide enough for a whole book. Please have a look at least here:
http://en.wikibooks.org/wiki/Serial_Programming/Modems_and_AT_Commands. The wikibook discloses many more problems with the Hayes protocol, and describes some approaches to solve them.
Let's break the problem down into some layers of abstraction.
At the top layer is your application. The application layer deals with the response message as a whole and understands the meaning of a message. It shouldn't be mired down with details such as how many characters it should expect to receive.
The next layer is responsible from framing a message from a stream of characters. Framing is extracting the message from a stream by identifying the beginning and end of a message.
The bottom layer is responsible for reading individual characters from the port.
Your application could call a function such as GetResponse(), which implements the framing layer. And GetResponse() could call GetChar(), which implements the bottom layer. It sounds like you've got the bottom layer under control and your question is about the framing layer.
A good pattern for framing a stream of characters into a message is to use a state machine. In your case the state machine includes states such as BEGIN_DELIM, MESSAGE_BODY, and END_DELIM. For more complex serial protocols other states might include MESSAGE_HEADER and MESSAGE_CHECKSUM, for example.
Here is some very basic code to give you an idea of how to implement the state machine in GetResponse(). You should add various types of error checking to prevent a buffer overflow and to handle dropped characters and such.
void GetResponse(char *message_buffer)
{
unsigned int state = BEGIN_DELIM1;
bool is_message_complete = false;
while(!is_message_complete)
{
char c = GetChar();
switch(state)
{
case BEGIN_DELIM1:
if (c = '\r')
state = BEGIN_DELIM2;
break;
case BEGIN_DELIM2:
if (c = '\n')
state = MESSAGE_BODY:
break;
case MESSAGE_BODY:
if (c = '\r')
state = END_DELIM;
else
*message_buffer++ = c;
break;
case END_DELIM:
if (c = '\n')
is_message_complete = true;
break;
}
}
}

How to go about testing go routines?

An example of this problem is when a user creates a resource and deletes a resource. We will perform the operation and also increment (decrement) a counter cache.
In testing, there is sometimes a race condition where the counter cache has not been updated by the go routine.
EDIT: Sorry about the confusion, to clarify: the counter cache is not in memory, it is actually a field in the database. The race condition is not to a variable in memory, it is actually that the goroutine might be slow to write into the database itself!
I currently use a 1 second sleep after the operation to ensure that the counter cache has been updated before testing the counter cache. Is there another way to test go routine without the arbitrary 1 second sleep to wait for the go routine to finish?
Cheers
In testing, there is sometimes a race condition where the counter cache has not been updated by the go routine. I currently use a 1 second sleep after the operation to ensure that the counter cache has been updated before testing the counter cache.
Yikes, I hate to say it, but you're doing it wrong. Go has first-class features to make concurrency easy! If you use them correctly, it's impossible to have race conditions.
In fact, there's a tool that will detect races for you. I'll bet it complains about your program.
One simple solution:
Have the main routine create a goroutine for keeping track of the counter.
the goroutine will just do a select and get a message to increment/decrement or read the counter. (If reading, it will be passed in a channel to return the number)
when you create/delete resources, send an appropriate message to the goroutine counter via it's channel.
when you want to read the counter, send a message for read, and then read the return channel.
(Another alternative would be to use locks. It would be a tiny bit more performant, but much more cumbersome to write and ensure it's correct.)
One solution is to make to let your counter offer a channel which is updated as soon as the value
changes. In go it is common practice to synchronize by communicating the result. For example your
Couter could look like this:
type Counter struct {
value int
ValueChange chan int
}
func (c *Counter) Change(n int) {
c.value += n
c.ValueChange <- c.value
}
Whenever Change is called, the new value is passed through the channel and whoever is
waiting for the value unblocks and continues execution, therefore synchronizing with the
counter. With this code you can listen on ValueChange for changes like this:
v := <-c.ValueChange
Concurrently calling c.Change is no problem anymore.
There is a runnable example on play.

VB.NET, best practice for sorting concurrent results of threadpooling?

In short, I'm trying to "sort" incoming results of using threadpooling as they finish. I have a functional solution, but there's no way in the world it's the best way to do it (it's prone to huge pauses). So here I am! I'll try to hit the bullet points of what's going on/what needs to happen and then post my current solution.
The intent of the code is to get information about files in a directory, and then write that to a text file.
I have a list (Counter.ListOfFiles) that is a list of the file paths sorted in a particular way. This is the guide that dictates the order I need to write to the text file.
I'm using a threadpool to collect the information about each file, create a stringbuilder with all of the text ready to write to the text file. I then call a procedure(SyncUpdate, inlcluded below), send the stringbuilder(strBld) from that thread along with the name of the path of the file that particular thread just wrote to the stringbuilder about(Xref).
The procedure includes a synclock to hold all the other threads until it finds a thread passing the correct information. That "correct" information being when the xref passed by the thread matches the first item in my list (FirstListItem). When that happens, I write to the text file, delete the first item in the list and do it again with the next thread.
The way I'm using the monitor is probably not great, in fact I have little doubt I'm using it in an offensively wanton manner. Basically while the xref (from the thread) <> the first item in my list, I'm doing a pulseall for the monitor. I originally was using monitor.wait, but it would eventually just give up trying to sort through the list, even when using a pulse elsewhere. I may have just been coding something awkwardly. Either way, I don't think it's going to change anything.
Basically the problem comes down to the fact that the monitor will pulse through all of the items it has in the queue, when there's a good chance the item I am looking for probably got passed to it somewhere earlier in the queue or whatever and it's now going to sort through all of the items again before looping back around to find a criteria that matches. The result of this is that my code will hit one of these and take a huge amount of time to complete.
I'm open to believing I'm just using the wrong tool for the job, or just not using tool I have correctly. I would strongly prefer some sort of threaded solution (unsurprisingly, it's much faster!). I've been messing around a bit with the Parallel Task functionality today, and a lot of the stuff looks promising, but I have even less experience with that vs. threadpool, and you can see how I'm abusing that! Maybe something with queue? You get the idea. I am directionless. Anything someone could suggest would be much appreciated. Thanks! Let me know if you need any additional information.
Private Sub SyncUpdateResource(strBld As Object, Xref As String)
SyncLock (CType(strBld, StringBuilder))
Dim FirstListitem As String = counter.ListOfFiles.First
Do While Xref <> FirstListitem
FirstListitem = Counter.ListOfFiles.First
'This makes the code much faster for reasons I can only guess at.
Thread.Sleep(5)
Monitor.PulseAll(CType(strBld, StringBuilder))
Loop
Dim strVol As String = Form1.Volname
Dim strLFPPath As String = Form1.txtPathDir
My.Computer.FileSystem.WriteAllText(strLFPPath & "\" & strVol & ".txt", strBld.ToString, True)
Counter.ListOfFiles.Remove(Xref)
End SyncLock
End Sub
This is a pretty typical multiple producer, single consumer application. The only wrinkle is that you have to order the results before they're written to the output. That's not difficult to do. So let's let that requirement slide for a moment.
The easiest way in .NET to implement a producer/consumer relationship is with BlockingCollection, which is a thread-safe FIFO queue. Basically, you do this:
In your case, the producer threads get items, do whatever processing they need to, and then put the item onto the queue. There's no need for any explicit synchronization--the BlockingCollection class implementation does that for you.
Your consumer pulls things from the queue and outputs them. You can see a really simple example of this in my article Simple Multithreading, Part 2. (Scroll down to the third example if you're just interested in the code.) That example just uses one producer and one consumer, but you can have N producers if you want.
Your requirements have a little wrinkle in that the consumer can't just write items to the file as it gets them. It has to make sure that it's writing them in the proper order. As I said, that's not too difficult to do.
What you want is a priority queue of some sort onto which you can place an item if it comes in out of order. Your priority queue can be a sorted list or even just a sequential list if the number of items you expect to get out of order isn't very large. That is, if you typically have only a half dozen items at a time that could be out of order, then a sequential list could work just fine.
I'd use a heap because it performs well. The .NET Framework doesn't supply a heap, but I have a simple one that works well for jobs like this. See A Generic BinaryHeap Class.
So here's how I'd write the consumer (the code is in pseudo-C#, but you can probably convert it easily enough).
The assumption here is that you have a BlockingCollection called sharedQueue that contains the items. The producers put items on that queue. Consumers do this:
var heap = new BinaryHeap<ItemType>();
foreach (var item in sharedQueue.GetConsumingEnumerable())
{
if (item.SequenceKey == expectedSequenceKey)
{
// output this item
// then check the heap to see if other items need to be output
expectedSequenceKey = expectedSequenceKey + 1;
while (heap.Count > 0 && heap.Peek().SequenceKey == expectedSequenceKey)
{
var heapItem = heap.RemoveRoot();
// output heapItem
expectedSequenceKey = expectedSequenceKey + 1;
}
}
else
{
// item is out of order
// put it on the heap
heap.Insert(item);
}
}
// if the heap contains items after everything is processed,
// then some error occurred.
One glaring problem with this approach as written is that the heap could grow without bound if one of your consumers crashes or goes into an infinite loop. But then, your other approach probably would suffer from that as well. If you think that's an issue, you'll have to add some way to skip an item that you think won't ever be forthcoming. Or kill the program. Or something.
If you don't have a binary heap or don't want to use one, you can do the same thing with a SortedList<ItemType>. SortedList will be faster than List, but slower than BinaryHeap if the number of items in the list is even moderately large (a couple dozen). Fewer than that and it's probably a wash.
I know that's a lot of info. I'm happy to answer any questions you might have.