vulkan-tutorial - how rendering and presentation are synced - vulkan

https://vulkan-tutorial.com/Drawing_a_triangle/Drawing/Rendering_and_presentation
While reading above tutorial, I have found a scenario where multiple items pile up in presentation queue.
The tutorial has a loop that runs bellow codes repeatedly.
void drawFrame() {
vkWaitForFences(device, 1, &inFlightFence, VK_TRUE, UINT64_MAX);
vkResetFences(device, 1, &inFlightFence);
uint32_t imageIndex;
vkAcquireNextImageKHR(device, swapChain, UINT64_MAX, imageAvailableSemaphore, VK_NULL_HANDLE, &imageIndex);
vkResetCommandBuffer(commandBuffer, /*VkCommandBufferResetFlagBits*/ 0);
recordCommandBuffer(commandBuffer, imageIndex);
VkSubmitInfo submitInfo{};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
VkSemaphore waitSemaphores[] = {imageAvailableSemaphore};
VkPipelineStageFlags waitStages[] = {VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT};
submitInfo.waitSemaphoreCount = 1;
submitInfo.pWaitSemaphores = waitSemaphores;
submitInfo.pWaitDstStageMask = waitStages;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffer;
VkSemaphore signalSemaphores[] = {renderFinishedSemaphore};
submitInfo.signalSemaphoreCount = 1;
submitInfo.pSignalSemaphores = signalSemaphores;
if (vkQueueSubmit(graphicsQueue, 1, &submitInfo, inFlightFence) != VK_SUCCESS) {
throw std::runtime_error("failed to submit draw command buffer!");
}
VkPresentInfoKHR presentInfo{};
presentInfo.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR;
presentInfo.waitSemaphoreCount = 1;
presentInfo.pWaitSemaphores = signalSemaphores;
VkSwapchainKHR swapChains[] = {swapChain};
presentInfo.swapchainCount = 1;
presentInfo.pSwapchains = swapChains;
presentInfo.pImageIndices = &imageIndex;
vkQueuePresentKHR(presentQueue, &presentInfo);
There are two semaphores; One for rendering, another one for presentation.
Similarly, there are two queues for rendering and presentation.
Here is a scenario I found that can happen.
After the first iteration, each queue has one item to process.
At the second iteration, any of the items in the queues are not processed yet. So, it is blocked at vkWaitForFences.
The first item in graphics queue is processed.
It signals the blocking fence, and rendering semaphore.
The second iteration continues from vkWaitForFences.
Graphics queue receives second item. It has total one item.
Present queue also receives second item. Present queue has not processed the first item yet, so it has total two item.
Graphics queue process the second item.
It signals rendering semaphore again. Rendering semaphore has received two signals without turning off.
Now, present queue will only process one item and do nothing until next iteration.
Even in next iterations, if this issue keeps happening, unprocessed items will get piled up in the present queue.
Hence, if processing speed of graphics queue happens to be faster than present queue, there will be a starvation problem.
The tutorial does not explain how this issue can be solved.
Is there something in Vulkan that prevents this issue to occur, or have I actually found a flaw in the tutorial code?

vkAcquireNextImageKHR make the image semaphore to get signal when the swap image when the index it returned is presentable.
The image with the index returned by vkAcquireNextImageKHR become presentable again, when the item with the index is processed in the present queue.
Hence, if the items in present queue are not processed, vkAcquireNextImageKHR will not signal the image semaphore or block, stopping next rendering.
The number of items that can stay simultaneously in present queue will not grow infinitely, but stops increasing if the number of item is equal to the number of swap images.

Related

GPS data blocked by other tasks in the while loop

I am trying to parse GPS data while reading pressure sensor, IMU sensor and writing some data to SD card. Since reading pressure sensor, IMU sensor and writing SD card takes some time and GPS don't wait my command to send its data, I lost some GPS data so my parser can not find meaningful message. I use uart_receive interrupt to take GPS data and circular buffer to save its data. After I parse it. Since I don't know how much bytes come from GPS, I read one by one. I tried FreeRTOS but it did not work. How can I prevent other tasks to block GPS data. I am using STM32f401cc.
Here is my FreeRTOS task;
void StartDefaultTask(void* argument)
{
IMU_setParameters(&imu, &hi2c1, imu_ADD_LOW, GPIOB, GPIOB,
GPIO_PIN_1, GPIO_PIN_2);
IMU_init(&imu, &htim3);
while ((calState.accel != 3 || calState.system != 3 || calState.gyro != 3 || calState.mag != 3) && calibFlg)
{
IMU_getCalibrationState(&imu, &calState);
}
preSensor_init_default_params(&preSensor.params);
preSensor.addr = preSensor_I2C_ADDRESS_1;
preSensor.i2c = &hi2c1;
preSensor_init(&preSensor, &preSensor.params);
initSD_CARD(&USERFatFS, USERPath);
samplePacket(&telemetry);
controlRecoveryFile(&recoveryFile, "recoveryFile.txt", &telemetry);
for (;;)
{
IMU_getDatas(&imu, &calState, &linearAccel, &IMU, &imuFlg, &offsetFlg, &calibCount);
preSensor_force_measurement(&preSensor);
preSensor_read_float(&preSensor, &temperature, &pressure, &humidty);
preSensor_get_height(pressure, &height);
telemetry.Altitude_PL = height;
telemetry.Pressure_PL = pressure;
telemetry.Temperature = temperature;
telemetry.YRP[0] = IMU.yaw;
telemetry.YRP[1] = IMU.roll;
telemetry.YRP[2] = IMU.pitch;
if (calibCount % 10 == 0)
{
writoToTelemetryDatas(&logFile, "tulparLog.txt", &telemetry, 0);
if (!writeToRecoveryDatas(&recoveryFile, "recoveryFile.txt", &telemetry))
connectionFlg = 1;
}
osDelay(1);
}
}
void StartTask02(void* argument)
{
arrangeCircularBuffer(&gpsCircular, buffer, BUFFER_LENGTH);
initGPS(&huart1, &rDATA, &gps);
for (;;)
{
getGPSdata(&huart1, &gpsCircular, &gps, &rDATA);
osDelay(1);
}
}
Here is my solution to problem.
First of all I do not use FreeRtos at all. I do all the thing in main loop. Problem is "Race Condition". In my GPS data parser, there are 4 states. MSG_ID, Finish, Check, Parse. These four states do not take four loops to find meaningfull message. It depends on message length. It can be at most 103 loop. Besides, In main loop my imu sensor, pressure sensor and SD card module takes approximately 80 ms. As you know, GPS works independent from our code. It does not wait our command to send data. Every 1 second it sends its datas. Now, imagine that your GPS sends datas every 1 seconds and your CircularBuffer has 200 bytes. Your parser begin to parse message. But your parser needs at least 30+ loops to find message. Now, 30*80 = 2400 ms (2.4 s). Untill you find meaningfull data GPS sent 2 more datas and overflow happened. To fix this situation, I write for loop for my GPS parser in the main loop and I send command to GPS for just taking GPGGA and GPRMC datas (for GPS command you can look at here. I use uart_receive_ınterrupt to store data to my circularbuffer. After taking 2 '\n', I stop taking data and wait my parser to parse these datas. At the end I start uart operation taking meaningfull datas. Important thing here is calling parser in a for loop. (it can be 8-16-24 loops depends on your other tasks delay)

amqp_basic_qos not having any effect

I am trying to code a simple consumer using librabbitmq. It is working, but when I do execute amqp_basic_consume, it consumes the entire queue.
What I want is for it to get a single message, process it and repeat.
I tried using a basic_qos to have the consumer prefetch 1 at a time, but that seems to have no effect at all.
The basic setup and loop:
// set qos of 1 message at a time
if (!amqp_basic_qos(conn, channel, 0, 1, 0)) {
die_on_amqp_error(amqp_get_rpc_reply(conn), "basic.qos");
}
// Consuming the message
amqp_basic_consume(conn, channel, queue, amqp_empty_bytes, no_local, no_ack, exclusive, amqp_empty_table);
while (run) {
amqp_rpc_reply_t result;
amqp_envelope_t envelope;
amqp_maybe_release_buffers(conn);
result = amqp_consume_message(conn, &envelope, &timeout, 0);
if (AMQP_RESPONSE_NORMAL == result.reply_type) {
strncpy(message, envelope.message.body.bytes, envelope.message.body.len);
message[envelope.message.body.len] = '\0';
printf("Received message size: %d\nbody: -%s-\n", (int) envelope.message.body.len, message );
if ( strncmp(message, "DONE",4 ) == 0 )
{
printf("XXXXXXXXXXXXXXXXXX Cease message received. XXXXXXXXXXXXXXXXXXXXX\n");
run = 0;
}
amqp_destroy_envelope(&envelope);
}else{
printf("Timeout.\n");
run = 0;
}
}
I expect to have a queue filled that I can start processing and if I hit ^C, the remaining messages are still in the queue. Instead, even if I have only processed one message, the entire queue is emptied.
This is the behavior when noAck is true. What will happen is that the messages will be pushed to the connected consumer as fast as the broker can send them, because it assumes that the consumer is able to accept them as they are acknowledged immediately upon delivery.
You would want to change noAck to false, then explicitly ack each message back to the broker in this case.
Alternatively, you could use a basic.get to pull messages from the broker one at a time as opposed to using a push-based consumer (there are folks out there who don't like this idea). Your use case will determine what is most appropriate, but based on the fact that you seem to have a full queue and fairly process-intensive messages, I would assume a basic.get would be just fine in this scenario. The question then would be to decide how often to poll when the queue is empty.

Unable to exit while loop in UVM monitor

This might be a silly mistake from my side that I have overlooked but I'm fairly new to UVM and I tried tinkering with my code for a while before this. I'm trying to send in a stream of 8 bit data within a packet using Data valid stall protocol from my UVM driver to the DUT. I'm facing an issue with my input monitor not being able to pick up these transactions that are driven.
I have a while loop with a condition that the valid bit must be high and the stall bit should be low. As long as this condition holds good, the monitor needs to pick up the data byte and push into the queue. I know for a fact that the data is being picked up and pushed to a queue as I used $display statements along the way. The problem is arising once all the data bytes are received and the valid bit goes low. Ideally, this should cause the exit from the while loop but isn't doing so. Any help here would be appreciated. I have attached a snippet of the code below. Thanks in advance.
virtual task main_phase (uvm_phase phase);
$display("Run phase of input monitor");
collect_transfer();
endtask: main_phase
virtual task collect_transfer();
fork
forever begin
wait_for_valid_transaction_cycle();
create_and_populate_pkt();
broadcast_pkt();
#(iP0_vif.cb_iP0_MON);
end
join_none
endtask: collect_transfer
virtual task wait_for_valid_transaction_cycle();
wait(iP0_vif.cb_iP0_MON.ip_valid && ~iP0_vif.cb_iP0_MON.ip_stall);
endtask: wait_for_valid_transaction_cycle
virtual task create_and_populate_pkt();
pkt = Router_seq_item :: type_id :: create("pkt");
pkt.valid = iP0_vif.cb_iP0_MON.ip_valid;
pkt.sop = iP0_vif.cb_iP0_MON.ip_sop;
$display("before data collection");
while(iP0_vif.cb_iP0_MON.ip_valid === `HIGH && iP0_vif.cb_iP0_MON.ip_stall === `LOW) begin
$display("After checking for stall");
pkt.data = iP0_vif.cb_iP0_MON.ip_data;
$display(pkt.data);
pkt.data_q.push_front(pkt.data);
pkt.eop = iP0_vif.cb_iP0_MON.ip_eop;
$display("print check in input monitor # time = %0t", $time);
#(iP0_vif.cb_iP0_MON);
end
$display("before printing input packet from monitor");
Check_for_port_route_and_populate_packet_field(pkt);
print_packet(pkt);
endtask: create_and_populate_pkt
The $display statement "before printing input packet from monitor" is not being displayed.
HIGH is defined as a binary 1 and LOW is defined as a binary 0.
The output of the code in terms of display statements is as below.
before data collection
before checking for stall
After checking for stall
2
print check in input monitor # time = 105
before checking for stall
After checking for stall
1
print check in input monitor # time = 115
before checking for stall
After checking for stall
3
print check in input monitor # time = 125
It's possible that the main phase objection is being dropped elsewhere in your environment. UVM will automatically kill any threads that were spawned during a phase when it ends.
To fix this, do not object to the main phase in your monitor. Objecting to that phase is the responsibility of the threads creating the stimulus. Instead, you should be launching this monitor during the run_phase, which will ensure that your loop is not killed until the end of simulation.
Also, during the shutdown phase, you will want your monitor to object whenever it is currently seeing a packet. This will ensure that simulation doesn't end as soon as stimulus has been sent in, giving your other monitors time to collect responses from the DUT.

Tensorflow: Dequeue and then enqueue

I have a queue (called queue_A) and populate 100 elements inside. If I would like to do the following 2 things:
Dequeue 1 element from queue_A, do some processing on it and enqueue the result into another queue (queue_B). The enqueuing op is called op_B.
Enqueue this element (before processing) back to queue_A, and this enqueuing op is called op_A.
For achieving 1, I can write:
anElement = queue_A.dequeue()
result = proc(anElement)
op_B = queue_B.enqueue(result)
queue_runner = tf.train.QueueRunner(queue_B,
[op_B] * 4)
For achieving 2, I can write:
anElement = queue_A.dequeue()
op_A = queue_A.enqueue(anElement)
queue_runner = tf.train.QueueRunner(queue_A,
[op_A] * 4)
However, I don't know how can I do these two things at once.
Now, I use the following code:
anElement = queue_A.dequeue()
op_A = queue_A.enqueue(anElement)
result = proc(anElement)
op_B = queue_B.enqueue(result)
queue_runner = tf.train.QueueRunner(queue_B,
[op_A, op_B] * 4)
I expect the size of queue_A is a constant, but when I use session.run(queue_A.size()) to check it, the size is gradually decreasing.
What is wrong with that code? And how to achieve what I want?
The code in your example has two types of "queue runner":
One that runs op_A: it dequeues an element from queue_A, and enqueues it back to queue_B.
Another that runs op_B: it dequeues an element from queue_A, processes it via proc(), and enqueues the result back to queue_B.
The problem is that, when op_A and op_B run separately (e.g. in different queue runners, or in different calls to sess.run()), they will dequeue distinct elements from queue_A. The elements dequeued by running op_B will never be re-enqueued to queue_A, which explains why its size gradually decreases.
To solve this problem, as Andrei suggests, you need to create an op that runs a single TensorFlow subgraph that performs both op_A and op_B. The following example should work:
anElement = queue_A.dequeue()
op_A = queue_A.enqueue(anElement)
result = proc(anElement)
op_B = queue_B.enqueue(result)
# Creates a single op that enqueues the original element back to queue_A and the
# processed element to queue_B.
op = tf.group(op_A, op_B)
queue_runner = tf.train.QueueRunner(queue_B, [op] * 4)
Unfortunately I can't explain why your code doesn't work, but it looks like op_A doesn't execute because it's not depend on queue_B, and I suggest you to use control flow op (for example tf.group) for achieving what your want.
op = tf.group(op_A, op_B)
queue_runner = tf.train.QueueRunner(queue_B,
[op] * 4)

MSMQ queue with multiple processes reading

I had a MSMQ application setup where data was being pushed into one queue. Initially I only had one process reading from it and processing it. Since the volume has increased I started multiple processes to read from it which is basically a new instance of my original process. I do not see any errors but the performance has really dropped. My understanding is that each process will read from a queue and receive a new message that has not yet been processed and continue with that. Is this correct or is it possible that multiple processes could end up processing the same message?
Dim q As MessageQueue
If MessageQueue.Exists(".\private$\MsgsIQueue") Then
q = New MessageQueue(".\private$\MsgsIQueue")
Else
'GS - If there is no queue then we're done here
Console.WriteLine("Queue has not been created!")
Return
End If
While True
Dim message As Message
counter += 1
Try
If q.Transactional = True Then
Thread.Sleep(2000)
End If
q.MessageReadPropertyFilter.ArrivedTime = True
message = q.Peek(TimeSpan.FromSeconds(20.0))
message.UseJournalQueue = True
message = q.Receive(New TimeSpan(0, 0, 60))
message.Formatter = New XmlMessageFormatter
(New [String]() {"System.String"})
ProcessMessage(message)
....
Ok, are you sure that it is the queue reading that is actually causing the performance degradation? I would suspect that there is some other bottleneck in your pipeline as MSMQ is really good at handling reading from multiple processes/threads.
If I take a look at your code I would suggest the following changes:
Why sleep for 2 secs if is a tx queue? Always use tx queues and move the call to Sleep to the catch block to have a wait interval if the queue is empty.
Move the setting of the filter outside of the loop.
Remove the call to Peek as it performs nothing of value.
Use journal queue is only of use when sending messages. So remove it.
Set the formatter on the queue instead and it will be used for all reads.
You should also wrap the call to Read and ProcessMessage within a TransactionScope where you also wrap ProcessMessage in another try/catch block. This way you can commit the read if everything went Ok in ProcessMessage or otherwise choose to abort the read or move the message to a dead letter queue.