USB CDC device stalling - embedded

I'm writing a simple virtual serial port device to report an older serial port. By this point I'm able to enumerate the device and send/receive characters.
After a varying number of bulk-out transmissions from the host to the device the endpoint appears to give up and stop transferring data. On the PC side I receive a write error, and judging from a USBlyzer trace the music stops on a stall (USBD_STATUS_STALL_PID). However my code never intentionally issues a STALL condition on that endpoint and the status flag for having generated one never gets set though.
Given the short amount of time elapsed (<300 µs) between issuing the request and the STALL it would appear to be an invalid response of some sort, and not a time-out. On the device side the output endpoint is ready to go, with data in the buffer and proper DATA0/1 synchronization, but nothing further ever happens.
Note that the device appears to work fine even for long periods of time until I start sending "large" quantities of data. As near as I can tell the device enumeration/configuration also appears to complete successfully. Oh, and the bulk-in endpoint continues to work just fine after this.
For the record I'm using the standard Windows usbser.sys driver and an XMega128A4U µP. I'm also seeing the same behaviour across multiple Windows Vista and 7 machines.
Any ideas what I'm doing wrong or what further tests I might run to narrow things down?
USBlyzer log,
USB CDC stack,
test project

For the record this eventually turned out to be an oscillator problem. (Apparently the FLL's reference is always 1,024 Hz even when the 1,000 Hz USB frames are chosen. The slight clock error meant that a packet occasionally got rejected if it happened to contain one too many 1-bits in a row.)
I guess the moral of the story is to check the basics before assuming you've got a problem with the higher-level protocol. Also in retrospect a hardware USB analyzer would have been a worthwhile investment, the software alternatives mostly seems to spit out a generic error code or nothing at all when something goes awry.

Stalling the out-endpoint may happen on an overflow of the output buffer on the host side. Are you sure that the device does fetch the data it receives via out-endpoint - and if so does it fetch the data at least as fast as data is sent to the device?
Note that the device appears to work fine even for long periods of
time until I start sending "large" quantities of data.
This seems to be a hint for an overflow of the output-buffer.

Related

Is there a protocol or well-defined procedure for instruments to send their measurement results to control PC's over GPIB?

With a control PC, I am addressing a R&S ESPI Receiver device to perform a frequency scan and return the measurement results back via BAT-EMC control software and a NI GPIB-USB controller in between. My target is to track the binary measurement data (Definite Length Block Data according to IEEE 488.2) sent to the control PC to understand how the device is deciding on the byte size of each binary block sent.
The trace shows that binary blocks are sent with no consistent pattern or rule!
E.g, running the same scan with the same frequency range and step twice may result in a different distribution of the measurement values' bytes on binary blocks (and possibly different total number of blocks sent), although the amount of data delivered is the same.
Any help to figure out how the device and control software are communicating the measurement data?
PS: The NI trace at the level of GPIB controller is not showing that the control software is specifying a byte size when querying for the next block, neither is the instrument sending this piece of info when it is issuing a service request so that it is queried for more available data by the software (according to the trace).
Make sure that you are giving enough time for the instrument to respond. Possibly you are sending commands from the PC which would assert the ATN line and interrupt the response. You should be able to configure the instrument to send one result. Configure the instrument as a listener and talker and set the instrument to send only one response per trigger. Then send the group execute trigger (GET) and read the results off the bus. When it’s done measure how long it took for that packet to get sent. If you are sending triggers before the full response you will be terminating the output stream. I suspect this because the streams are randomly different.
I’m just starting to learn GPIB so please write back what happened.

Can I poll my USB HID device without first sending a command

I was able to make a working HID USB stack on my "StartUSB for PIC" board for the 18F2550 microcontroller. I based it on one of the MLA libraries, which was made for the 18F45K50 (MLA 2018_11_26, hid_custom, picdem_fs_usb_k50.x), but I converted it to work with the 18F2550 (there might have been easier ways, but only learned to work with PIC about 1 month ago). On the host side, I'm using LibUsbDotNet (also here, there might be easier ways - the documentation on this library really sucks) on a Windows 10 machine.
I'm using the HID class, full speed, and all seems to work. Although, I get some random errors on the host PC (see below), but doing one close/re-open cycle on the host side when getting the error is kind of solving it. Dirty, but it works. So I kind of ignore this now.
Win32Error:Win32Error:GetOverlappedResult Ep 0x01
995:The I/O operation has been aborted because of either a thread exit or an application request.
I'm not an expert on USB (yet). But all examples I'm seeing are based on 1) you send first a command to the device and 2) then you retrieve the answer from the device. I did some performance tests, and see that this indeed shows that I can do about 500 cycles/second. I think that is correct, because each cycle, sending command and retrieving answer, each takes 1 msec.
But do I really need to send a command? Can't I just keep reading endlessly, and when the device has somthing to say, it does send the data in an IN transaction, and when not it ignores which creates a timeout on the host side. That would mean that I can poll at 1000 cycles/second? Unfortunately, I have tried it by changing my implementation on the PIC, but I get very weird results. I think I have issues with suspend mode. That brings me to another question - how can I make the device get out of suspend mode (means that not the host, but the device should be triggering this event). I have searched the MLA library for command such as "wakeup", "resume", ... but couldn't find anything.
So, to summarize, 2 questions:
Conceptual: Can I send data from device to host without being requested for it by a command from the host?
For PIC experts: How can I have a device trigger for a wakeup from suspend mode?
And indeed, the answer is Yes on the first question.
In the meantime, I found another link on the web that contains a Visual Studio C# implementation of a USB library including all the source files.
If you're interested, this is the link
This C# host implementation works as a charm. Without sending a command to the device, I get notified immediately if a button is pressed. Great!
It also proofs that my earlier device implementation based on the original MicroChip MLA, is 100% correct. I stress tested the implementation by sending a "toggle LED command" as fast as I could, and I reach 1000 commands/second. Again great!
I think that LibUsbDotNet isn't that perfect after all. As I wrote above, I get rather unstable communication (Win32Error). But with this implementation, I don't get a single error, even after running for half an hour # 1000 commands/second.
So for me, case closed.

Having difficulty sending small lwip packets immediately using the lwip API

I am creating a server on a ST Cortex M3 device. I am using the lwip API and FreeRTOS. All is working, but the response time is way off. I am currently using lwip 1.3.2 and FreeRTOS 7.3.
A single client connects to the server and must have some time-critical data sent frequently. These packets are on the order of 6 or so bytes. Other times, I am sending upwards of 20K.
The problem I am having is that these smaller packets seem to be taking forever to be sent. I assume this is because lwip is waiting for more data to be enqueued to make more efficient transmissions. I cannot wait around for 2 or 3 seconds for the data to be sent; the client is expecting the data nominally in a few micro-seconds or milli-seconds.
I have tried using lwip_send and lwip_write. (I understand that one is the same as the other with a flag passed at the end. Just had to try...) I have tried setting TCP_NODELAY on the socket to no avail. I tried to set SO_SNDLOWAT to '1', but this always returned -1, so I do not think it is supported.
I do not want to redo all of my code using TCP RAW. Is there a way to invoke the tcp_output() function outside of TCP RAW mode? Is there any way to speed things up or is this just how slow lwip TCP with small packets is?
Any and all suggestions are welcome. Thanks.
--EDIT--
I would also like to add that once I am ready to transmit, I make sure that my TX task in FreeRTOS is at the highest priority. There are no other tasks running up to the point at which I call lwip_send/write.
I'm fairly experienced with bare metal lwIP on xilinx and lwip does not wait to send things out. It will pump packets out as fast as your interrupts are acknowledged based on the ethernet hardware. I've been using UDP only. What is coming to mind though, is your problem might be on the receive end. If you are doing TCP, maybe those small packets are coming out late because you are having receive issues. What you need to do is find in the code the lowest level point at which ethernet is transmit, put a general purpose output toggle on that. Then also put a general purpose output toggle on when a ethernet packet is received. Look at the signals on a scope. If it confirms your hypothesis, then move the output toggles around to narrow down the issue. Wash, rinse and repeat until you are down to where the issue its. It's crude and time consuming, but oftentimes this brute force approach solves many "impossible" embedded software problems, due to pure determination. Good luck!

Repeated NAK seem to overwrite payload

I'm new to driver programming in general and also to USB. However, I managed to write a driver for Windows CE (6.0) and I also had access to an USB-Sniffer to read all traffic between the host and the device.
The problem now occurs on some boards (2 out of the 3 I have):
When the device has no data to send and I issue an Interrupt-In-Transfer the device sends an ACK.
So far this is expected. However, something (I guess either the USB-Controller or WinCE) seems to automatically issue more IN-Transfers (3 on one board, 4 on another) and I get subsequent ACK. This isn't a problem so far either.
But the next IN-Transfer will also result in an ACK, no matter if there is data to send or not, I receive zero bytes in the driver.
Yet, when I look at the USB-Sniffer the proper telegram was send, however 2 more IN-Transfers are automatically issued and are responded with an ACK. So it seems like the data is overwritten by the ACK.
I tried everything that came to my mind so far: Reset the pipe, close and reopen the connection, but nothing seems to work out properly. Resetting the Pipe solves the problem in about half of the cases though. I really ran out of ideas for solving the problem.
Is there a way to tell the USB-Controller (or WinCE or whatever causes this behaviour) to always only issue one single transfer?
EDIT
Turns out it was a threading issue. Unfortunately I wasn't the one who fixed it and I have no access to the working solution, thus I cannot give further details.

Compact Framework serial port and balance

So, to open up a serial port and successfully transmit data from the balance through the serial port, i need to make sure that the settings on the serialPort object match the actual settings of the balance.
Now, the question is how do i detect that the connection hasn't been established due to the settings being different? No exception is thrown by serialPort.Open to indicate that the connection has been established. Yes, the settings are valid, but if they don't match the device (balance) settings; I am in the dark as to why the weight off the balance is not being captured.
Any input here?
Without knowing any more information on the format of the data you expect from your balance, only general serial port settings mismatch detection techniques are applicable.
If the UART settings are significantly incorrect, you'll likely see a lot of framing errors: when the UART is expecting a 1 stop bit, it will in fact see a 0. You can detect this with the ErrorReceived event on the port.
private void OnErrorReceived(object sender, SerialErrorReceivedEventArgs e)
{
if ((e.EventType & SerialError.Frame) == SerialError.Frame)
{
// your settings don't match, try something else
}
}
If things are close, but still incorrect, the .NET serial port object may not even give you an error (that is, until something catastrophic occurs).
My most common serial port communication failure occurs due to mismatched baud rates. If you have a message that you know you can get an 'echo' for, try that as part of a handshaking effort. Perhaps the device you're connecting to has a 'status' message. No harm will come from requesting it, and you will find out if communication is flowing correctly.
For software handshaking (xon xoff) There's very little you can do to detect whether or not it's configured right. The serial port object can do anything from ignore it completely to have thread exception errors, depending on the underlying serial port driver implementation. I've had serial port drivers that completely ignore xon/xoff, and pass the characters straight into the program - yikes!
For hardware handshaking, the basic echo strategy for baud rate may work, depending on how your device works. If you know that it will do hardware handshaking, you may be able to detect it and turn it on. If the device requires hardware handshaking and it's not on, you may get nothing, and vice versa.
Another setting that's more rarely used is the DTR pin - data terminal ready. Some serial devices require that this be asserted (ie, set to true) to indicate that it's time to start sending data. It's set to false by default; give toggling it a whirl.
Note that the serial port object is ... finicky. While not necessarily required, I would consider closing the port before you make any changes.
Edit:
Thanks to your comments, it looks like this is your device. It says the default settings should be:
1200 baud
Odd parity
1 stop bit
Hardware handshaking
It doesn't specify how many data bits, but the device says it supports 7 and 8. I'd try both of those. It also says it supports 600, 1200, 2400, 4800, 9600, and 19200 baud.
If you've turned on hardware handshaking, enabled DTR (different things) and cycled through all the different baud rates, there's a good chance that it's not your settings. It could be that the serial cable that's being used may be wired incorrectly for your device. Some serial cables are 'passthrough' cables, where the 1-9 pins on one side match exactly with the 1-9 pins on the other. Then, you have 'crossover' cables, where the "TX" and "RX" cables are switched (so that when one side transmits, the other side receives, a very handy cable.)
Consider looking at the command table in the back of the manual there; there's a "print software version" command you could issue to get some type of echo back.
Serial ports use a very, very old communications technology that use a very, very old protocol called RS-232. This is pretty much as simple as it gets... the two end points have synchronized clocks and they test the line voltage every clock cycle to see if it is high or low (with high meaning 0 and low meaing 1, which is the opposite of most conventions... again an artifact of the protocol's age). The clock synchronization is accomplished through the use of stop bits, which are really just rest time in between bytes. There are also a few other things thrown into the more advanced uses of the protocol such as parity bits, XON/XOFF, etc, but those all ride on top of this very basic communication layer. Detecting a mismatch of the clocks on each end of the serial line is going to be nearly impossible -- you'll just get incorrect data on the recieving end. The protocol itself has no way built in to identify this situation. I am unaware of any serial driver that is smart enough to notice the input data being clocked an an inappropriate frequency. If you're using one of the error detection schemes such as parity bits, probabilistically every byte will be declared an error. In short, the best you can do is check the incoming data for errors (parity errors should be detected by your driver/software layer, whereas errors in the data received by your app from that layer will need to be checked by your program -- the latter can be assisted by the use of checksums).