Repeated NAK seem to overwrite payload - usb

I'm new to driver programming in general and also to USB. However, I managed to write a driver for Windows CE (6.0) and I also had access to an USB-Sniffer to read all traffic between the host and the device.
The problem now occurs on some boards (2 out of the 3 I have):
When the device has no data to send and I issue an Interrupt-In-Transfer the device sends an ACK.
So far this is expected. However, something (I guess either the USB-Controller or WinCE) seems to automatically issue more IN-Transfers (3 on one board, 4 on another) and I get subsequent ACK. This isn't a problem so far either.
But the next IN-Transfer will also result in an ACK, no matter if there is data to send or not, I receive zero bytes in the driver.
Yet, when I look at the USB-Sniffer the proper telegram was send, however 2 more IN-Transfers are automatically issued and are responded with an ACK. So it seems like the data is overwritten by the ACK.
I tried everything that came to my mind so far: Reset the pipe, close and reopen the connection, but nothing seems to work out properly. Resetting the Pipe solves the problem in about half of the cases though. I really ran out of ideas for solving the problem.
Is there a way to tell the USB-Controller (or WinCE or whatever causes this behaviour) to always only issue one single transfer?
EDIT
Turns out it was a threading issue. Unfortunately I wasn't the one who fixed it and I have no access to the working solution, thus I cannot give further details.

Related

Error Domain=NSPOSIXErrorDomain Code=28 “No space left on device” UserInfo={_kCFStreamErrorCodeKey=28, _kCFStreamErrorDomainKey=1}

Lately numerous network requests with Alamofire made from our iOS device fail with the following error:
Error Domain=NSPOSIXErrorDomain Code=28 "No space left on device"
UserInfo={_NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask
.<3>,
_kCFStreamErrorDomainKey=1, _NSURLErrorRelatedURLSessionTaskErrorKey=( "LocalDataTask .<3>" ),
_kCFStreamErrorCodeKey=28}
Our app has a mechanism to send a network request if the user has moved +- 10 meters. This is checked every 5 seconds, so in theory every five seconds a call can be made. The network request fails occasionally with this message, returning no status code and the above error.
The message implies the error has to do with available disk/memory space on the device. However, after checking both there is no link to be found since there is plenty of space available. Also, the error occurs on multiple devices, all running iOS 14.4 or higher.
Is there information available regarding error code 28 and what could be the culprit on iOS devices? Even better; how can this error be prevented?
To answer the occurrence of the error itself:
NSPOSIXErrorDomain Code=28 "No space left on device"
With logs in the Xcode terminal:
2021-05-07 15:56:50.873428+0200 MYAPP[21757:7406020] [] nw_path_evaluator_create_flow_inner NECP_CLIENT_ACTION_ADD_FLOW 05CD829A-810D-412F-B86E-7524369359E8 [28: No space left on device]
2021-05-07 15:56:50.877243+0200 MYAPP[21757:7400322] Task <5504BCDF-7DFE-4045-BD4B-E75054636D5B>.<1> finished with error [28] Error Domain=NSPOSIXErrorDomain Code=28 "No space left on device" UserInfo={_NSURLErrorFailingURLSessionTaskErrorKey=LocalUploadTask <5504BCDF-7DFE-4045-BD4B-E75054636D5B>.<1>, _kCFStreamErrorDomainKey=1, _NSURLErrorRelatedURLSessionTaskErrorKey=(
"LocalUploadTask <5504BCDF-7DFE-4045-BD4B-E75054636D5B>.<1>"
), _kCFStreamErrorCodeKey=28}
It appears to get called when there are too many NSURLSessions created, reaching a limit of (in our tests) 600-700 sessions, which are not maintained or closed properly. The error started to get thrown since iOS 14, so it is interesting to see if there was a limit introduced.
Linked is a github issue raised stating the same issues on the ktor microservices framework by JetBrains, pointing in the same direction, mentioning the invalidation of sessions to prevent this issue:
https://github.com/ktorio/ktor/issues/1341
In our own project the origin of the problem turned out to be our implementation of the StarScream websocket library. This might not be relevant for the issues others are having, but explained anyways to create a complete picture of the problem. It is the cause and fix of our specific situation.
At first we assumed it had something to do with the URLSession created by Alamofire (networking library used) since POST requests started to get cancelled, and a kill of the app seemed the only solution to do requests again.
However, we also make use of websocket connections using the StarScream library, which attempts to connect to an socket, and if failed retry to connect every two seconds for a max time of two hours. This would mean for two hours, every two seconds, we connect to the socket -> receive a failure to connect -> disconnect the socket -> connect again. Using a singleton of the socket it was thought there was no possibility of creating multiple URLSessions, since the socket was only initiated once. However calling the connect to the socket again would create a new nw_connection object every single time, since the library did not handle the disconnect properly.
image of NWConcrete_nw_connection objects generated in socket connection
The way this was validated was using the instruments app to check for the creation of new nw_connection objects. Logged as a "memory leak" there, the creation of the nw_connection objects was getting logged and the solution was to make sure we disconnect the socket (invalidate the session) properly before connecting again.
I hope to answer a big part of the issue here, and I will mark my own question answered since this was the solution to the problem at hand. I think Apple should consider giving accurate reports on the number of objects created being limited, instead of giving an error "No space left on device".
Just wanted to chime in with more info, since we're experiencing the same issue.
Based on our analytics, this issue only started happening since iOS 14. We've verified it happening on 14.2, 14.4 and 14.5. Naturally the most straightforward cause for this error would be low memory or disk storage. We've excluded this option with additional logging, as you seem to have done as well.
A possibly related SO post has attributed the issue to a network inspecting framework that was enabled in their release build. It's worth checking if you use a similar tool.
Another report of this issue, this time on the Github of AFNetworking (predecessor to the Alamofire library you use), says they were able to fix it by limiting the creation of URLSession objects.
For us personally, neither of these did the trick. We created a support ticket with Apple, but this hasn't lead to a solution. They requested a small sample project that reproduces the issue, but the error only manifested after 7 days of continuous use in our app. If you have a faster way to reproduce this, it may be worth it to submit your own support ticket.
Hopefully this helps you find a solution, if you do please add this to your post to help others!

Can I poll my USB HID device without first sending a command

I was able to make a working HID USB stack on my "StartUSB for PIC" board for the 18F2550 microcontroller. I based it on one of the MLA libraries, which was made for the 18F45K50 (MLA 2018_11_26, hid_custom, picdem_fs_usb_k50.x), but I converted it to work with the 18F2550 (there might have been easier ways, but only learned to work with PIC about 1 month ago). On the host side, I'm using LibUsbDotNet (also here, there might be easier ways - the documentation on this library really sucks) on a Windows 10 machine.
I'm using the HID class, full speed, and all seems to work. Although, I get some random errors on the host PC (see below), but doing one close/re-open cycle on the host side when getting the error is kind of solving it. Dirty, but it works. So I kind of ignore this now.
Win32Error:Win32Error:GetOverlappedResult Ep 0x01
995:The I/O operation has been aborted because of either a thread exit or an application request.
I'm not an expert on USB (yet). But all examples I'm seeing are based on 1) you send first a command to the device and 2) then you retrieve the answer from the device. I did some performance tests, and see that this indeed shows that I can do about 500 cycles/second. I think that is correct, because each cycle, sending command and retrieving answer, each takes 1 msec.
But do I really need to send a command? Can't I just keep reading endlessly, and when the device has somthing to say, it does send the data in an IN transaction, and when not it ignores which creates a timeout on the host side. That would mean that I can poll at 1000 cycles/second? Unfortunately, I have tried it by changing my implementation on the PIC, but I get very weird results. I think I have issues with suspend mode. That brings me to another question - how can I make the device get out of suspend mode (means that not the host, but the device should be triggering this event). I have searched the MLA library for command such as "wakeup", "resume", ... but couldn't find anything.
So, to summarize, 2 questions:
Conceptual: Can I send data from device to host without being requested for it by a command from the host?
For PIC experts: How can I have a device trigger for a wakeup from suspend mode?
And indeed, the answer is Yes on the first question.
In the meantime, I found another link on the web that contains a Visual Studio C# implementation of a USB library including all the source files.
If you're interested, this is the link
This C# host implementation works as a charm. Without sending a command to the device, I get notified immediately if a button is pressed. Great!
It also proofs that my earlier device implementation based on the original MicroChip MLA, is 100% correct. I stress tested the implementation by sending a "toggle LED command" as fast as I could, and I reach 1000 commands/second. Again great!
I think that LibUsbDotNet isn't that perfect after all. As I wrote above, I get rather unstable communication (Win32Error). But with this implementation, I don't get a single error, even after running for half an hour # 1000 commands/second.
So for me, case closed.

Continuous device and connection issues with routed Tokbox session

We’ve been using the Tokbox platform for several months now with a Javascript web-client as well as an Android phone client, where sessions and connections are managed by a Python server. While integration and bring-up went well on both ends (client and server), we continue to encounter problems with the in-session audio and video experience.
Sessions are always routed and always between two participants only, with much use of a collaborative editor.
The in-session experience is like a coin toss: we never know how it’s going to go, and that’s becoming a business threat.
Web-Client: A/V Resources
The most common problem is the acquisition of audio and/or video: at the beginning of a session, one or the other participants may have problems hearing or seeing the other. Allocating a new connection to establish new streams does not fix that, nor does restarting the browser.
Question: What’s the recommended way to detect possible resource locks (e.g. does another application hog the camera/microphone)?
Web-Client: Network
Bandwidth and packet loss are a challenge, for example this inspector graph:
Audio and video of both participants is all over the place, and while we can not control the network connections the web-client should be able to reliably give useful information.
Question: Other than continuous connection monitoring with getStats() and maybe the experimental navigator.connection property, how can the web-client monitor network connectivity?
Pre-Call Test
We recommend to customers to run a pre-call test and have implemented it on our site as well. However, results of that test often times do not reflect the in-session connectivity. Worse, a pre-call test may detect a low (no video) bandwidth while Skype works just fine.
Question: How can that be?
I'm a member of the TokBox development team. I remember you reported an issue with the Python SDK, thanks for that!
Web-Client: A/V Resources
Most acquisition issues are detected by the JS SDK and if they aren't then we'd really like to hear about it! Please report reproduction steps or affected session IDs to TokBox support (referencing this StackOverflow question): https://support.tokbox.com/hc/en-us/requests/new
Most acquisition errors appear as OT_HARDWARE_UNAVAILABLE or OT_MEDIA_ERR_ABORTED errors. Are you detecting and surfacing these errors to your users? There is also the special OT_CHROME_MICROPHONE_ACQUISITION_ERROR error which is due to a known issue with Chrome that has been mostly fixed since Chrome 63 (see https://bugs.chromium.org/p/webrtc/issues/detail?id=4799).
Web-Client: Network
Unfortunately this is one of the more difficult issues to troubleshoot. Yes, Subscriber#getStats() is the best tool we have at our disposal and is a wrapper around the native RTCPeerConnection#getStats() function. Unfortunately we don't have much control over the values returned by the native function and if you think our SDK is returning incorrect values when compared with values from RTCPeerConnection#getStats() then please let us know!
It would be worthwhile confirming whether the issue is reproducible in all browsers or only a particular one. If you have detailed data regarding the inaccuracy of the native RTCPeerConnection#getStats() function then we could work together to report it to the browser vendor(s).
Fortunately we have just released the new Publisher#getStats() function which lets you get the publisher side of the stats. This should help you narrow down the cause of a connectivity issue to either a publisher or subscriber side. Please let us know if this helps with tracking down these issues.
Pre-Call Test
Again, these tests are based on Subscriber#getStats() which in turn are based on RTCPeerConnection#getStats(), the accuracy of which is out of our hands, but we'd love any reproduction steps to either fix a bug in our client SDK or report a bug to the browser vendors.
Just to confirm though, when you say you've implemented a pre-call test in your site, did you use the official JavaScript network test module? https://github.com/opentok/opentok-network-test-js This is actually what's used by the TokBox pre-call test.
#Aiham, thanks for responding, I've been looking at the the new Publisher#getStats() you linked to (thank you!), so we too can give our users some way of visibly seeing the network conditions that might be affected the quality of their call (and who's causing it). However, it seems as though bytes / packets sent goes up sharply as the number of subscribers increases, even though we're in a routed session.
Am I wrong to expect the Publisher#getStats() statistics to stay fairly stable regardless of the number of subscribers then receiving that stream in a routed session? I expected the nature of a routed call to mean it's sent once to the OpenTok Media Servers, and the statistics would end there.

USB CDC device stalling

I'm writing a simple virtual serial port device to report an older serial port. By this point I'm able to enumerate the device and send/receive characters.
After a varying number of bulk-out transmissions from the host to the device the endpoint appears to give up and stop transferring data. On the PC side I receive a write error, and judging from a USBlyzer trace the music stops on a stall (USBD_STATUS_STALL_PID). However my code never intentionally issues a STALL condition on that endpoint and the status flag for having generated one never gets set though.
Given the short amount of time elapsed (<300 µs) between issuing the request and the STALL it would appear to be an invalid response of some sort, and not a time-out. On the device side the output endpoint is ready to go, with data in the buffer and proper DATA0/1 synchronization, but nothing further ever happens.
Note that the device appears to work fine even for long periods of time until I start sending "large" quantities of data. As near as I can tell the device enumeration/configuration also appears to complete successfully. Oh, and the bulk-in endpoint continues to work just fine after this.
For the record I'm using the standard Windows usbser.sys driver and an XMega128A4U µP. I'm also seeing the same behaviour across multiple Windows Vista and 7 machines.
Any ideas what I'm doing wrong or what further tests I might run to narrow things down?
USBlyzer log,
USB CDC stack,
test project
For the record this eventually turned out to be an oscillator problem. (Apparently the FLL's reference is always 1,024 Hz even when the 1,000 Hz USB frames are chosen. The slight clock error meant that a packet occasionally got rejected if it happened to contain one too many 1-bits in a row.)
I guess the moral of the story is to check the basics before assuming you've got a problem with the higher-level protocol. Also in retrospect a hardware USB analyzer would have been a worthwhile investment, the software alternatives mostly seems to spit out a generic error code or nothing at all when something goes awry.
Stalling the out-endpoint may happen on an overflow of the output buffer on the host side. Are you sure that the device does fetch the data it receives via out-endpoint - and if so does it fetch the data at least as fast as data is sent to the device?
Note that the device appears to work fine even for long periods of
time until I start sending "large" quantities of data.
This seems to be a hint for an overflow of the output-buffer.

Is the GameKit's communication reliable with GKMatchSendDataReliable?

I'm working with GameKit.framework and I'm trying to create a reliable communication between two iPhones.
I'm sending packages with the GKMatchSendDataReliable mode.
The documentation says:
GKMatchSendDataReliable
The data is sent continuously until it is successfully received by the intended recipients or the connection times out.
Reliable transmissions are delivered in the order they were sent. Use this when you need to guarantee delivery.
Available in iOS 4.1 and later. Declared in GKMatch.h.
I have experienced some problems on a bad WiFi connection. The GameKit does not declare the connection lost, but some packages never arrive.
Can I count on a 100% reliable communication when using GKMatchSendDataReliable or is Apple just using fancy names for something they didn't implement?
My users also complain that some data may be accidentally lost during the game. I wrote a test app and figured out that GKMatchSendDataReliable is not really reliable. On weak internet connection (e.g. EDGE) some packets are regularly lost without any error from the Game Center API.
So the only option is to add an extra transport layer for truly reliable delivery.
I wrote a simple lib for this purpose: RoUTP. It saves all sent messages until acknowledgement for each received, resends lost and buffers received messages in case of broken sequence.
In my tests combination "RoUTP + GKMatchSendDataUnreliable" works even beter than "RoUTP + GKMatchSendDataReliable" (and of course better than pure GKMatchSendDataReliable which is not really reliable).
It nearly 100% reliable but maybe not what you need sometimes… For example you dropped out of network all the stuff that you send via GKMatchSendDataReliable will be sent in the order you've send them.
This is brilliant for turn-based games for example, but if fast reaction is necessary a dropout of the network would not just forget the missed packages he would get all the now late packages till he gets to realtime again.
The case GKMatchSendDataReliable doesn't send the data is a connection time out.
I think this would be also the case when you close the app