WIndows CE USB stack performance issue - usb

We are developing a USB Driver for Ethernet device for WinCE 6.
We are finding performance issues and could narrow down to them USB Stack, using profiling of code. 95% of the time in Tx path is taken in IssueBulkTransfer, which causes the driver to queue packets internally. TX-COMPLETE routine call is not in sync with IssueBulkTransfer.
We have used USB analyzer to check the USB bandwidth usage and found it as 20-30% of total bandwidth. So hardware is fast enough to transfer data across the interface.
With above findings bottleneck seems like in the USB bus Driver and USB HCD Driver.
Is there any known performance limitation with WinCE 6 USB Stack?
What is the maximum speed we can get with High speed device (USB 2.0) using WinCE 6.0 USB stack?

Are you using sync transfers? If you use async ones you may be able to queue multiple packets for tx or rx and the host driver will not have to wait till your driver receives the completition notification to issue a new tx or rx request. This may allow you to use more bandwidth. You may also allocate buffers using HalAllocateCommonBuffer or by reserving some physical memory range for buffers. In this way you may avoid copies in the driver if the driver can use DMA.
You did not provide details about your HW architecture, it's difficult to estimate the level of performances you may expect.

Related

Arduino Due to PC High Speed USB communication

I am working on a project that uses the Arduino Due (microcontroller ATSAM3X8E). My goal is to be able to track the values of some of the key variables I am using in my firmware in real time. The fact is that I need to plot the change in the values of the variables over time.
In order to do this, I have decided to send the data to my PC through the native USB port. The real time constraint I am having is that I need to send the values of 20 variables (each of them 8 bytes long) within 0.1 ms. There is a native USB port available on the Arduino Due, which is connected to the USB peripheral of the chip. I have tried using UART over USB by setting up the Due in USB device mode. I can only get up to speeds of 115200 baud using Serial (UART) communication (any higher speeds don't allow the Due or my host PC to send the data correctly).
So, I did some home work and found that USB based devices have different classifications based on what they do. I want to know if there is a high speed protocol with a speed of at least 2 M bits/sec I can use on top of USB to data across to my PC from the Due, and, if there is an equivalent driver I can use on my Windows PC to successfully capture that high speed data - any recommendations would be greatly helpful.
Thanks in advance!
Subramanian
The Arduino Due's native USB port is capable of high-speed USB (480 Mbps), and by default it will appear to the computer a USB virtual COM port. This is a virtual serial port, so you can send data as fast as the USB drivers will allow, and you are not limited by the virtual "baud rate" of the COM port, which is an irrelevant setting. I think the virtual COM port will be fast enough for you and you should try it before doing something more complicated.
To use this port, use the SerialUSB in your Arduino program. The object has the same interface as Serial. You should already have a drivers for it if you installed the Arduino IDE or you are running Windows 10.
Please note that USB virtual COM ports usually use USB bulk endpoints, which don't guarantee any paritcular latency or throughput. If your computer is busy talking to other devices on the bus, you might get less throughput then you were hoping for, but you have a lot of margin of error here so I don't think it will be a problem in practice. If you want to be safe, just make sure that you can buffer up a few milliseconds of data on the device side so you aren't losing any data. You might have to look at the internals of the Arduino core code to see how big its buffers are.

setting usb communication speed

I would like to implement usb communication at a speed of 30Mbit/sec. My hardware support "high speed usb" so the hardware platform will not limit me.
Can I implement this speed using USB CDC class, or Mass storage class, or are these usb classes speed limited?
In USB protocol who determines the bit rate, is it the device?
The USB CDC and mass storage classes do not have any kind of artificial speed limiting, so you can probably get a throughput of 30 Mbps on a high-speed USB connection (which uses 480 Mbps per second for timing bits on the wire). The throughput you get will be determined by how much bus bandwidth is being used by other devices and how efficiently your device-side firmware, host-side driver, and host-side software operate.
The bit rate is mostly determined by the device. The device basically signals to the host what USB speeds it supports, and the host picks one. The full story is a little bit more complicated, and there are a lot more details about how that works in the USB specification.

USB in an embedded system without RTOS

I have no experience of embedded USB stacks so my question is, can I run it without an OS?
Of course it must be possible to run without OS, but will things be MUCH easier if I have one?
I want to use it to save data to a attached USB Mass Storage Device.
If your USB device is on-chip, your chip vendor will almost certainly have example code for USB that may include mass storage. You won't need an OS, but interrupt handling will be necessary and a file system too.
Your USB controller will need host or OTG capability - if it is only device capable, then you cannot connect to another USB device, only a host.
The benefit of an OS - or at least a simple RTOS kernel - is that you can schedule file system activity concurrently with other processing tasks. The OS in that case would not necessarily make things easier, but it may make your system more responsive to critical tasks and events.
I have used usb stacks in the past with PIC18F2550 (8 bits) and LPC1343 (32 bits ARM-Cortex M3) microcontrollers without any problems.

What is the minimum latency of USB 3.0

First up, I don't know much about USB, so apologies in advance if my question is wrong.
In USB 2.0 the polling interval was 0.125ms, so the best possible latency for the host to read some data from the device was 0.125ms. I'm hoping for reduced latency in USB 3.0 devices, but I'm finding it hard to learn what the minimum latency is. The USB 3.0 spec says, "USB 2.0 style polling has been replaced with asynchronous notifications", which implies the 0.125ms polling interval may no longer be a limit.
I found some benchmarks for a USB 3.0 SSDs that look like data can be read from the device in just slightly less than 0.125ms, and that includes all time spent in the host OS and the device's flash controller.
http://www.guru3d.com/articles_pages/ocz_enyo_usb_3_portable_ssd_review,8.html
Can someone tell me what the lowest possible latency is? A theoretical answer is fine. An answer including the practical limits of the various versions of Linux and Windows USB stacks would be awesome.
To head-off the "tell me what you're trying to achieve" question, I'm creating a debug interface for the ASICs my company designs. ie A PC connects to one of our ASICs via a debug dongle. One possible use case is to implement conditional breakpoints when the ASIC hardware only implements simple breakpoints. To do so, I need to determine when a simple breakpoint has been hit, evaluate the condition, if false set the processor running again. The simple breakpoint may be hit millions of times before the condition becomes true. We might implement the debug dongle on an FPGA or an off-the-shelf USB 3.0 enabled micro-controller.
Answering my own question...
I've come to realise that this question kind-of misses the point of USB 3.0. Unlike 2.0, it is not a shared-bus system. Instead it uses a point-to-point link between the host and each device (I'm oversimplifying but the gist is true). With USB 2.0, the 125 us polling interval was critical to how the bus was time-division multiplexed between devices. However, because 3.0 uses point-to-point links, there is no multiplexing to be done and thus the polling interval no longer exists. As a result, the latency on packet delivery is much less than with USB 2.0.
In my experiments with a Cypress FX-3 devkit, I have found that it is easy enough to get an average round trip from Windows application to the device and back with an average latency of 30 us. I suspect that the vast majority of that time is spent in various OS delays, eg the user-space to kernel-space mode switch and the DPC latency within the driver.
I've got a couple of resources for you, one I've just downloaded which is the complete specs ... several pdfs zipped up for USB3, and here is short excerpt from page 58,59 (USB 3_r1.0_06_06_2011.pdf):
USB 2.0 transmits SOF/uSOF at fixed 1 ms/125 μs intervals. A device driver may change the interval with small finite adjustments depending on the implementation of host and system software. USB 3.0 adds mechanism for devices to send a Bus Interval Adjustment Message that is used by the host to adjust its 125 μs bus interval up to +/-13.333 μs.
In addition, the host may send an Isochronous Timestamp Packet (ITP) within a relaxed timing window from a bus interval boundary.
Here is one more resource which looked interesting which deals with calculating latency.
You make a good point about operating system latency issues, especially in not real time operating systems.
I might suggest that you check on SuperUser too, maybe someone has other ideas. CHEERS
I dispute the marked answer.
On Windows there is no way to achieve the stated roundtrip latency over USB. SuperSpeed (3.0) or not. The documentation states:
The number of isochronous packets must be a multiple of the number of packets per frame.
https://learn.microsoft.com/en-us/windows-hardware/drivers/usbcon/transfer-data-to-isochronous-endpoints
The packets per frame is given by the bIntervaland also determines the polling interval. E.g. if you want to achieve a transfer every microframe (125usec) you will need to submit 8 transfers per URB (USB Request Block), which means a scheduling service interval of 1ms.
Anything else requires your own kernel-mode driver or is out-of-spec.
On RT Linux I can confirm roundtrips of 2*125usec + some overhead.
Excerpts from embedded.com: "USB 3.0 vs USB 2.0: A quick reference summary for the busy engineer"
Communication architecture differences
USB 2.0 employs a communication architecture where the data transaction must be initiated by the host. The host will frequently poll the device and ask for data, and the device may only transmit data once it has been requested by the host. The high polling frequency not only increases power consumption, it increases transmission latency because the data can only be transmitted when the device is polled by the host. USB 3.0 improves upon this communication model and reduces transmission latency by minimizing polling and also allowing devices to transmit data as soon as it is ready.
...
Timestamp enhancements
Unlike USB 2.0 cameras, which can range in accuracy from 0 to 125 us, the timestamp originating from USB 3.0 cameras is more precise, and mimics the accuracy of the 1394 cycle timer of FireWire cameras.
...
USB 3.0 -- or Super-speed USB -- overcomes key limitations of other specifications all these limitations with six (over IEEE 1394b) to nine (over USB 2.0) times higher bandwidth, better error management, higher power supply, ... and lower latency and jitter times.
P.S. also it says about "longer cable lengths" for USB 3.0, but other paragraph contradicts to this & says upto 5m for USB 2.0, upto 3m for USB 3.0.

How to watch/change windows buffer size for RS232 (com)?

I'm using USB for communication. Our device sends 100k/s data (ARM7, very small memory size), and the PC needs to receive and process it all.
My previous design was implemented as a mass storage device, and extended a command for the communication protocol. The PC software runs a thread with a loop to receive the data.
The issue is: sometime it loses data.
So we used another solution: usb sim com (RS232).
But I don't know whether or not the OS can contain that much data before I get it using MFC (or pyserial). How can I get/set the buffer size?
We regularly punch about 100KByte/sec through our USB CDC implementation, the PC is fast enough to receive all data. But it seems that the built-in limits are lower with usb-serial (CDC) than with mass-storage protocol (in our case ~600KB/s versus ~100KB/s CDC).
The PC receive thread should have a buffer that's "big enough".
Edit: I don't know Windows' Buffer sizes, or how to get them, though.