How is it possible to send HID packets longer than 64 bytes? - usb

While researching the HID specification for a current project, I stumbled across the following phrase:
Using USB terminology, a device may send or receive a transaction every USB frame (1 millisecond). A transaction may be made up of multiple packets (token, data, handshake) but is limited in size to 8 bytes for low-speed devices and 64 bytes for high-speed devices.
This leads me to believe that the maximum size for a HID packet is 64 bytes.
However, examining the report descriptors for some other devices I see that packets of more than 500 bytes are being used. How is this possible? Are these devices violating the USB spec? If so, what might that imply in terms of compatibility across different platforms?

As far as I know, the last HID specification is much older than the recent USB specs; IIRC the latest version is 1.11 which dates back to 2001.
There is low-speed, full-speed, and high-speed USB these days, and I think the HID spec was never changed to reflect this.
The maximum packet size for high-speed is 64 bytes for control transfers, 1024 bytes for interrupt transfers and isochronous transfers, and 512 bytes for bulk transfers.
See USB in a NutShell, which I think is more up-to-date. There are other sources, of course.
I am not entirely sure if all this applies to HID devices as well these days, what with the HID spec not having changed, but I assume high-speed HID devices exist now that use larger packets as described in the newer USB specs.

Related

What is the maximum speed of the STM32 USB CDC?

I'm using an STM32L151 to communicate with a PC using USB CDC. I used the STM32 HAL libraries to create my project.
I found that the USB sends data in 1 ms intervals, and each time 64 bytes are being sent. So, is the maximum speed of the USB CDC 64 kbyte/s? That is much lower than the USB Full Speed data rate of 12 Mbit/s. How can I reach this speed, or at least a fraction of this speed?
Nope. If your code is "fast enough", the maximum CDC speed is about 1MByte/sec. This may require a big (>1KB) FIFO on the device side. Oh, and the PC side must be able to read the data fast enough, e.g. with big buffers.
The 64KByte/s limit applies for USB HID which uses interrupt endpoints. USB CDC interface uses faster bulk endpoints.
USB FS frame is 1ms so if you put 64 bytes to the buffer (using the HAL function) - it will send those 64 bytes in the next frame. And it will not send any more data until another 1ms frame
How to increase this speed -> aggregate your data in larger chunks and send more data in the one transaction (up to 8kB using HAL libraries).

USB performance issues

I am trying to transfer data over USB. These are packets of at most 64 bytes which get sent at a frequency of 4KHz. This gives a bit rate of about 2Mb/s.
The software task that picks up this data runs at 2.5 KHz.
Ideally we never want packets to get there slower than 2.5 KHz (so 2 KHz isn't very good).
Is anyone aware of any generic limits on what USB can achieve?
We are running on a main board which has a 1.33 GHz running QNX and a daughter board which is a TWR K60F120M tower system running MQX.
Apart from the details of the system, is USB supposed to be used in this kind of data transfers, i.e., high frequency and short packet sizes?
Many Thanks for your help
MG
USB, even at its slowest spec (1.1), can transfer data at up to 12MB/sec, provided you use the proper transfer mode. USB will process 1000 "frames" per second. The frames contain control and data information, and various portions of each frame are used for various purposes, and thus the total information content is "multiplexed" amongst these competing requirements.
Low speed devices will use just a few bytes in a frame to send or receive their data. Examples are modems, mice, keyboards, etc. The so-called Full Speed devices (in USB 1.1) can achieve up to 12 MB/sec by using isochronous mode transfers, meaning they get carved out a nice big chunk of each frame, and can send that much data (a fixed size) each time a frame comes along. This is the mode audio devices use to stream the relatively data-intensive music to USB speakers, for example.
If you are able to do a little bit of local buffering, you may be able to use isochronous mode to get your 64 bytes of data sent at 1 KHz, but with 2 or 3 periods (at 2.5KHz) worth of data in the USB frame transfer. You'd want to reserve 64 x 3 = 192 bytes of data (plus maybe a few extra bytes for control info, such as how many chunks are present: 2 or 3?). Then, as the USB frames come by, you'd put your 2 chunks or 3 chunks of data onto the wire, and the receiving end would then get that data, albeit in a more bursty way than just smoothly at a precise 2.5KHz rate. However, this way of transferring the data would more than keep up, even with USB 1.1, and still only use a fraction of the total available USB bandwidth.
The problem, as I see it, is whether your system design can tolerate a data delivery rate that is "bursty"... in other words, instead of getting 64 bytes at a rate of 2.5KHz, you'll be getting (on average) 160 bytes at a 1 KHz rate. You'll actually get something like this:
So, I think with USB that will be the best you can do -- get a somewhat bursty delivery of either 2 or 3 of your device's data packet per 1 mSec USB frame rep rate.
I am not an expert in USB, but I have done some work with it, including debugging a device-to-host tunneling protocol which used USB "interrupts", so I have seen this kind of implementation on other systems, to solve the problem of matching the USB frame rate to the device's data rate.

Configuration registers for LPC bus in Poulsbo System Controller Hub (US15W)

We have a system based around an Atom Z510/Intel SCH US15W Q7 card (running Debian Linux.) We need to transfer blocks of data from a device on the Low Pin Count Bus. As far as I know this chipset does not provide DMA facilities, meaning the processor has to read the data out a byte at a time in a software loop. (The device driver actually implements this using the "rep insb" x86 instructions so the loop is actually implemented by the CPU if I understand correctly.)
This is far from optimal, but it should be possible to hit a transfer rate of 14Mb/s. Instead we can barely manage 4Mb/s with transactions on the bus no closer than 2us apart even though each read to the slave device is is done in 560ns. I don't believe other traffic on the bus is to blame, but am still investigating.
My question is:
Does any one know if there are any configuration registers on the SCH that could affect the LPC bus timing?
I cannot find any useful information on the device on the Intel website, nor have I spotted anything in the Linux Kernel code that appears to be fiddling with any such registers (but I'm a noob when it come to Linux Kernel stuff.)
I'm not an x86 expert so any other factors that might come into play or any other 'war stories' relating to this device would be good to know about too.
Edit: I have found the datasheet. I've not seen anything in it that explains this behaviour, but I am investigating the possibility of mapping our device as a firmware device as the firmware bus cycles don't seem to suffer the same delays.
For the record, the solution was to modify the FPGA firmware such that the chip's data in/out register was mapped to four adjacent addresses and the driver modified to do 32 bit inb/outb instructions. Although the SCH does not implement 32 bit LPC read/write operations, the result is 4 back-to-back 8 bit operations followed by the same dead time as I was getting previously with a single byte, meaning it averages about 1us per byte. Not ideal, but still a doubling in throughput.
It transpires the firmware cycles were quicker because the SCH transfers 64 bytes at a time from the firmware flash - after 64 bytes there is the same 1.4us gap, indicating this is the per-transaction latency of the device. Exploiting this may have been slightly quicker than the above solution however the trade-off is that it is limited to 64 bytes chunks and each byte takes longer (680ns IIRC) due to the additional cycles required to do a firmware read.

Synopsys USB OTG Controller (2.65a) occasionally truncates isochronous IN in USB device mode

I'm using a Synopsys OTG core in device mode. Programming an isochronous IN high speed endpoint (USB 2.0) for the maximum transfer per microframe (3 packets of 1024 bytes) using a periodic FIFO dedicated to this endpoint. It works 99+% of the time. But occasionally the transfer is truncated. For example, the first 1024 bytes will go onto the bus with the DATA0 PID (instead of the correct DATA2 PID) and the remaining 2048 bytes will not be sent. Since I've programmed the packet count, multicount, max packet size and transfer size correctly I'm not sure what is causing this.
Obviously this is a very specific question and I don't have much hope of getting an answer, but I figured a shot in the dark was worth a try. Thanks in advance.
Isochronous transfers does not guarantee packet delivery. So if host controller has other active transfers, it will silently drop isochronous packets. If you need guaranteed packed delivery, you should use bulk transfers (but then it will not guarantee delivery time).
Isochronous is ideal for applications, like sound or video streaming, where you need constant delivery time, but loss of some frames is ok.
The specification places limits on the bus, allowing no more than 90% of any frame to be allocated for periodic transfers (Interrupt and Isochronous) on a full speed bus. On high speed buses this limitation gets reduced to no more than 80% of a microframe can be allocated for periodic transfers. (c) http://www.beyondlogic.org/usbnutshell/usb4.shtml
Answering my own question in case it may help someone else. It appears this OTG controller has a bug where the TX FIFO does not always empty properly. I found a successful workaround is to flush the FIFO after each TX. It's quick and the truncation symptom goes away.

Implications of using many USB web cameras

I'm looking into connecting multiple low resolution USB webcams to a single computer. What implications might this have on performance? How does, for example, four 320x240 cameras fare against a single 640x480 camera? I'm not well versed in the architecture of the USB interface, what are the performance caveats? By performance I mean how would it affect the time to read the image data from multiple cameras compared to a single one.
Each webcam is connected to a different USB port? If so, its good.
Even if its just 1 port with 4 connected webcams. I dont think 4 320x240 will have any problem either. USB 2.0 = 320Mbps. Streaming a 320x240 video wouldn't be over 1mbps. Worst case scenario, putting a 320x240 at 2mbps + 1mb of other data. That would be 12mbps bandwidth between your usb port and the device.
So from the above, the 1 USB port can handle 4 webcams connected by a splitter just as fast as 1 640x480 webcam.,
Processing these images depends on your computer speed and how you write your algorithm.
The maximum data rate of USB is way higher than what you will actually get.
Webcams will probably use isochronous transfer, which under USB 2 can only get about 40% (if I recall correctly) of the bus time, and this also has a good bit of overhead.
I don't know for sure, but I suspect that this is why usb webcam resolutions and data rates seem to have hit a ceiling several years ago. They may start to increase again with the use of USB 3.
I'd suggest that you attach each of your cameras to it's own USB 2 port, as the 40% is shared among all isochronous connections.
One of those connections sharing bandwidth with a keyboard or even a usb mass storage device should be ok, because they would only use parts of the remainder of the bandwidth.
Wrong.
First, USB 2.0 is 480mbps theoretical, and you should be able to get up to about 80% of that with a direct connection.
Second, to calculate the bandwidth used by a camera, image bit depth must be taken into account, therefore:
BW = hresolution() * vresolution() * imagebitdepth(bit) * framerate(frame/s) (in bit/sec)
imagebitdepth can be, for webcams, 8, 16, 24, or 32 bits (ranging from Y800 monochrome to RGBA/RGBT colour full, check spec)
Therefore, a typical webcam # 640*480 resolution, 30fps, 16bit RGB bit-masked RGBA image bit depth will require 147.456 Mbps, and consequently, one of similar spec but # 320*240 resolution would require 36.864 Mbps, as opposed to the major BS stated by Shawn above with his 1mbps which then is also inconsistent with just about all of his other, also wrong data.
Simulatenous operation is nevertheless largely driver dependent, it is up to the manufacturer to take the otherwise minimal effort and expose unique device IDs to DirectShow.