SIGSEGV when calling vrapi_SubmitFrame2() - vulkan

I'm porting a game to Quest and so part of my work is to interface the engine's Vulkan renderer w/ the Oculus Mobile SDK.
I believe I'm setting up the SDK correctly (I'm following the examples and the guidelines from Oculus' docs) but still I'm getting a nasty error when trying to submit a frame.
Here's a high-level list of the things I'm currently doing:
I initialize the API.
I create a Vulkan instance and device w/ the expected extensions.
I acquire per-eye swapchains and get Vulkan handlers for each of their images.
I setup framebuffers and renderpasses using those images.
I acquire a native android window.
I enter VR mode (making sure the app is resumed).
Then at the end of my render loop I setup an ovrSubmitFrameDesc and then call vrapi_SubmitFrame2(). I'm also making sure I only call vrapi_SubmitFrame2() after all work has been submitted to the GPU (I'm currently using a fence on my work queues).
However, as I mentioned before, the call to vrapi_SubmitFrame2() fails. It currently raises a SIGSEGV inside Quest's Vulkan driver:
backtrace:
#00 pc 000000000010b2d8 /vendor/lib64/hw/vulkan.kona.so (!!!0000!b78ad09fc24eab751708d0a80613cf!09c6a36!+24) (BuildId: cc478ff923cc27b87607fb1f1a3b87ef)
#01 pc 00000000000c3b04 /vendor/lib64/hw/vulkan.kona.so (qglinternal::vkQueueSubmit(VkQueue_T*, unsigned int, VkSubmitInfo const*, VkFence_T*)+4468) (BuildId: cc478ff923cc27b87607fb1f1a3b87ef)
#02 pc 000000000018a608 /system/priv-app/VrDriver/VrDriver.apk!libvrapiimpl.so (offset 0x8cd000) (BuildId: aa2c28d3d4127c2e2e9a5125be000207dcc27ebd)
#03 pc 0000000000160a2c /system/priv-app/VrDriver/VrDriver.apk!libvrapiimpl.so (offset 0x8cd000) (BuildId: aa2c28d3d4127c2e2e9a5125be000207dcc27ebd)
#04 pc 0000000000162b6c /system/priv-app/VrDriver/VrDriver.apk!libvrapiimpl.so (offset 0x8cd000) (vrapi_SubmitFrame2+7564) (BuildId: aa2c28d3d4127c2e2e9a5125be000207dcc27ebd)
#05 pc 00000000048a85fc /data/app/myapp-Y6tT_vtGWj8JJ1PwgxheNA==/base.apk!libgrid.so (offset 0x6a9b000) (MyEngine::endVrFrame(unsigned int)+160) (BuildId: c9933f7ea0ad0c36a592bc4316e499e9db767d60)
The fact that the error is happening at an internal vkQueueSubmit() call makes me think that this is somehow related to the way I'm using command queues. But even if I set a separated queue as the synchronization queue and don't do anything with it (i.e., don't submit any command to it) I still get the same error.
Does anybody have an idea of what I could be doing wrong?
PS (1), I've tried to use a blank layer, instead of a proper projection layer, just to see if I could get past that point, but that didn't help.
PS (2), I'm getting no errors from the validation layer.
PS (3), the thread in which I enter VR mode is the same thread in which I'm calling vrapi_SubmitFrame2().

Let me start this by asking: who do you think would win, 18 years of experience in software development or this bad boi here '&'?
The mystery of SIGSEGV being raised from vrapi_SubmitFrame2() was nothing more than a stupid mistake when setting OVR's synchronization queue:
vpapi_DefaultModeParmsVulkan(&m_java, (long long)&queueHandle);
That line should have been written as:
vpapi_DefaultModeParmsVulkan(&m_java, (long long)queueHandle);

Related

Can I poll my USB HID device without first sending a command

I was able to make a working HID USB stack on my "StartUSB for PIC" board for the 18F2550 microcontroller. I based it on one of the MLA libraries, which was made for the 18F45K50 (MLA 2018_11_26, hid_custom, picdem_fs_usb_k50.x), but I converted it to work with the 18F2550 (there might have been easier ways, but only learned to work with PIC about 1 month ago). On the host side, I'm using LibUsbDotNet (also here, there might be easier ways - the documentation on this library really sucks) on a Windows 10 machine.
I'm using the HID class, full speed, and all seems to work. Although, I get some random errors on the host PC (see below), but doing one close/re-open cycle on the host side when getting the error is kind of solving it. Dirty, but it works. So I kind of ignore this now.
Win32Error:Win32Error:GetOverlappedResult Ep 0x01
995:The I/O operation has been aborted because of either a thread exit or an application request.
I'm not an expert on USB (yet). But all examples I'm seeing are based on 1) you send first a command to the device and 2) then you retrieve the answer from the device. I did some performance tests, and see that this indeed shows that I can do about 500 cycles/second. I think that is correct, because each cycle, sending command and retrieving answer, each takes 1 msec.
But do I really need to send a command? Can't I just keep reading endlessly, and when the device has somthing to say, it does send the data in an IN transaction, and when not it ignores which creates a timeout on the host side. That would mean that I can poll at 1000 cycles/second? Unfortunately, I have tried it by changing my implementation on the PIC, but I get very weird results. I think I have issues with suspend mode. That brings me to another question - how can I make the device get out of suspend mode (means that not the host, but the device should be triggering this event). I have searched the MLA library for command such as "wakeup", "resume", ... but couldn't find anything.
So, to summarize, 2 questions:
Conceptual: Can I send data from device to host without being requested for it by a command from the host?
For PIC experts: How can I have a device trigger for a wakeup from suspend mode?
And indeed, the answer is Yes on the first question.
In the meantime, I found another link on the web that contains a Visual Studio C# implementation of a USB library including all the source files.
If you're interested, this is the link
This C# host implementation works as a charm. Without sending a command to the device, I get notified immediately if a button is pressed. Great!
It also proofs that my earlier device implementation based on the original MicroChip MLA, is 100% correct. I stress tested the implementation by sending a "toggle LED command" as fast as I could, and I reach 1000 commands/second. Again great!
I think that LibUsbDotNet isn't that perfect after all. As I wrote above, I get rather unstable communication (Win32Error). But with this implementation, I don't get a single error, even after running for half an hour # 1000 commands/second.
So for me, case closed.

Issues communicating with devices over usb hub

I'm facing issues when communicating with devices over USB hub. When enumerating devices directly to host port, it does work, some devices over usb hub have issues.
Setup: STM32F103C8 - MAX3421E - LUFA (usb stack) (ported to MAX3421E (host) and STM32F103C8T6 (device)) - USB Full-Speed setup
Scenario:
When I attach device directly to host, I don't experience any issues enumerating almost all (some devices seems to be faulty and have weird/nonstandard behavior) devices. But when I try to enumerate over usb hub, devices starts to behave very strangely. I'm receiving much more NAK's from devices than when connected directly to host. Some devices are able to return Device Descriptor, but retrieving Configuration Descriptor fail. Some devices return Toggle Error after several NAK's, this could be remedied so far by delaying retry IN token. Also there is different behavior of devices when connected over different hubs. I.e. one device has no problems when connected to HUB1, but have issues when connected to HUB2. Then I have HUB3 (7 port) which internally acts as HUB in HUB. On this HUB3 device working fine on port behind secondary internal hub, but not on primary ports exposed over "root" hub.
I'm in suspicion that hub's TT could be somehow interfering with usb communication, but according to information I have found, TT should not be enabled under Full-Speed setup.
I have checked (many times) that I'm setting correct device address assigned during SetAddress phase (which is proved by returning Device Descriptor). When I step debug it seems that I can get Configuration Descriptor also, but while in normal system run, it isn't retrieved, but only over hub.
Does anyone has any ideas, what to look after? I've run out of ideas here after week of trying to find a root cause.
Thanks
so...
- as usual after searching for root cause, solution after days of trying comes naturally after asking on somewhere (this is hapenning to me always, but I do try prior asking always)
- when using hubs, make sure you don't suspend SOF generation during control transfers. LUFA just resume bus inside control transfer routines, so make sure you don"t stops and reenable SOF within (my fault as I'm using ported version to MAX)
- if you have tight main loop make sure you don"t reinitialize usb transfer without completion of previous try, but if you do so, check you don't owerwrite data which haven"t been processed yet fully (especially when using interrupt-driven transfer complete processing) [things seems to work when you have quite some debug output, as it delay that time critical transfers]
Enumeration over hub isuues are now second to none. Small glitches are subject for tweaking.
Unfortunately as I was in question for electrical issues, I had to unsolder usb host shield and soldered another one, which in light of new information seems unneeded. Nevermind, I have trained my soldering skills.

Core Bluetooth immediate disconnect Peripheral after connected

In core bluetooth, after connecting to a device , it gets automatically disconnected after 5 to 10 seconds. Its gives error something like this:
Error Domain=CBErrorDomain Code=7 "The specified device has disconnected from us."
It just started showing suddenly. What could be the reason and how to resolve it.
Sounds like it could be two issues: either a release problem or firmware problem. For the first, ensure that you are retaining the peripheral after connection. Do this by assigning it to a strong CBPeripheral property or add it to a strong array. The second problem would involve issues with your firmware expecting a certain command to be read/written after connection which you are not sending. Assuming this firmware was written by someone else, developers add in extra security checks like this to prevent developers from using their peripherals for other purposes. If it is your own firmware, I suggest consulting your chip manufacturer's starter kit.

Debugging an intermittently unresponsive USB device

My app communicates with a simple USB device as follows:
The app sends commands (2 or 3 bytes each) to the USB device by using WriteFile (kernel32.dll).
For each command that is send, the USB device sends a short response, which the PC receives using ReadFile (kernel32.dll).
Reading and writing is done asynchronously, using GetOverlappedResult to check the status of an operation.
Testing on 2 out of 3 PCs, the app and device function perfectly: all responses are received 100% reliably.
Under identical tests on the third PC, approximately 50% of the ReadFile requests do not return any data - the status remains as pending (ERROR_IO_INCOMPLETE) forever.
In other words, approximately for every 2 commands sent, one response is received.
Because the device functions perfectly with the other PCs, it lead me to believe that the problem might be occuring inside Windows, in the underlying code which is called by ReadFile (I presume some lower level USB driver code).
Question:
Please could you advise what debugging tool is most useful to investigate this? With my current knowledge, the internal workings of ReadFile are quite opaque.
The PC which is experiencing the issue is running Windows 8.0
You could try DebugView. Run as admin.
Go to "Capture", enable "Capture Kernel", enable "Enable Verbose Kernel Output".
This might help to investigate errors on Kernel level, if any occured.

"Could not stop Cortex-M device" error when attempting to debug STM32F205ZG

I am having trouble running the debugger on a STM32F205ZG using µVision4 and the ULINK2. I keep getting the error message "Could not stop Cortex-M device! Please check the JTAG cable." I am using the SW port. Any help with this would be greatly appreciated.
In my own experience I have usually seen this error when either the ULINK2 is disconnected and reconnected while in the middle of a debug session or if you have some external hardware, outside of the control of the debugger, that is acting on your processor.
If the ULINK2 was disconnected mid-debug, then usually cycling power to your device will fix the problem.
If you have something like a watchdog timer that is trying to reset the processor while you are in the middle of debugging, then you will have to disable the watchdog before you can start a debug session.
I've seen the same problem with my NXP uC.
The problem was that the code loaded in flash was faulty and was placing the CPU into a busy loop branching back to the same address, this prevented the debugger accessing the bus.
the uLink worked if I put the device into ISP mode as it never got to the user code.
it seems that uLink takes too long to halt the device after reset, the spec tells you this somewhere, so by the time uLink tries to halt the CPU it is too late as it cannot access the bus and locks up.
I had this problem on LPC4337. I tried all the solutions people are talking about but the only one that worked for me was using a lower processor clock so that JTAG/SWD interface can match/catch up with the processor before it is gone too far into executing user code. In my case I set JTAG/SWD clock in Keil uVision 5 to 10MHz and the changes the processor clock divisors to give 36MHz. With these settings, it never missed to capture on reset when I begin a debug session.
This happens for ULink2 but ULINK Pro and ULINK Pro-D support a JTAG/SWD <= 50MHz. See this link for more comparisons:
ulink comparisons
Just an other issue with this message:
We have the same error message, but the problem was the wrong state of the RESET line.