USB performance issues

USB performance issues - usb

I am trying to transfer data over USB. These are packets of at most 64 bytes which get sent at a frequency of 4KHz. This gives a bit rate of about 2Mb/s.
The software task that picks up this data runs at 2.5 KHz.
Ideally we never want packets to get there slower than 2.5 KHz (so 2 KHz isn't very good).
Is anyone aware of any generic limits on what USB can achieve?
We are running on a main board which has a 1.33 GHz running QNX and a daughter board which is a TWR K60F120M tower system running MQX.
Apart from the details of the system, is USB supposed to be used in this kind of data transfers, i.e., high frequency and short packet sizes?
Many Thanks for your help
MG

USB, even at its slowest spec (1.1), can transfer data at up to 12MB/sec, provided you use the proper transfer mode. USB will process 1000 "frames" per second. The frames contain control and data information, and various portions of each frame are used for various purposes, and thus the total information content is "multiplexed" amongst these competing requirements.
Low speed devices will use just a few bytes in a frame to send or receive their data. Examples are modems, mice, keyboards, etc. The so-called Full Speed devices (in USB 1.1) can achieve up to 12 MB/sec by using isochronous mode transfers, meaning they get carved out a nice big chunk of each frame, and can send that much data (a fixed size) each time a frame comes along. This is the mode audio devices use to stream the relatively data-intensive music to USB speakers, for example.
If you are able to do a little bit of local buffering, you may be able to use isochronous mode to get your 64 bytes of data sent at 1 KHz, but with 2 or 3 periods (at 2.5KHz) worth of data in the USB frame transfer. You'd want to reserve 64 x 3 = 192 bytes of data (plus maybe a few extra bytes for control info, such as how many chunks are present: 2 or 3?). Then, as the USB frames come by, you'd put your 2 chunks or 3 chunks of data onto the wire, and the receiving end would then get that data, albeit in a more bursty way than just smoothly at a precise 2.5KHz rate. However, this way of transferring the data would more than keep up, even with USB 1.1, and still only use a fraction of the total available USB bandwidth.
The problem, as I see it, is whether your system design can tolerate a data delivery rate that is "bursty"... in other words, instead of getting 64 bytes at a rate of 2.5KHz, you'll be getting (on average) 160 bytes at a 1 KHz rate. You'll actually get something like this:
So, I think with USB that will be the best you can do -- get a somewhat bursty delivery of either 2 or 3 of your device's data packet per 1 mSec USB frame rep rate.
I am not an expert in USB, but I have done some work with it, including debugging a device-to-host tunneling protocol which used USB "interrupts", so I have seen this kind of implementation on other systems, to solve the problem of matching the USB frame rate to the device's data rate.

Related

USB 3.0(super speed) frame and packet sizes

I am working with hardware which support USB3.0 interface. According to USB standard spec it support 5Gbps data rates.
Now I want to know what is best way of effective utilization of USB bandwidth.
What's the recommended maximum packet size and frame sizes?
In my scenario I need to send huge amount of data (60GB) as fast as possible. How do form these data into packets.
As per USB spec we can send 1frame/mille second, so to get 5Gbps data rate what is the number of packets/frame required?
Give some brief explanation on USB3.0 protocol.
Thank you.

What is the maximum speed of the STM32 USB CDC?

I'm using an STM32L151 to communicate with a PC using USB CDC. I used the STM32 HAL libraries to create my project.
I found that the USB sends data in 1 ms intervals, and each time 64 bytes are being sent. So, is the maximum speed of the USB CDC 64 kbyte/s? That is much lower than the USB Full Speed data rate of 12 Mbit/s. How can I reach this speed, or at least a fraction of this speed?

Nope. If your code is "fast enough", the maximum CDC speed is about 1MByte/sec. This may require a big (>1KB) FIFO on the device side. Oh, and the PC side must be able to read the data fast enough, e.g. with big buffers.
The 64KByte/s limit applies for USB HID which uses interrupt endpoints. USB CDC interface uses faster bulk endpoints.

USB FS frame is 1ms so if you put 64 bytes to the buffer (using the HAL function) - it will send those 64 bytes in the next frame. And it will not send any more data until another 1ms frame
How to increase this speed -> aggregate your data in larger chunks and send more data in the one transaction (up to 8kB using HAL libraries).

Logging 16-bit data to an SD card at the rate of 44 kHz

I am using the STM32F4 microcontroller with a microSD card. I am capturing analogue data via DMA.
I am using a double buffer, taking 1280 (10*128 - 10 FFTs) samples at a time.
When one buffer is full I am setting a flag and I then look at 128 samples at a time and run an FFT calculation on it. All of this is running well.
The data is being sampled at the rate I want and FFT calculation is as I would expect. If I just let the program run for one second, I see that it runs the FFT approximately 343 times (44000/128).
But the problem is I would like to save 64 values from this FFT to the SD card.
I am using the HCC fat file system library.
Each loop of the FFT calculation I am copy the 64 values into an array.
After every 10 calculations I write the contents of this array to file and start again.
The array stores 640 float_32 values (10*64).
This works perfectly for a one-second test run. I get 22,000 values stored to the SD card.
But as I increase the time I start losing samples as it take the SD card longer to write. I need the SD card to store over 87 kbit/s (4 bytes * 64 * 343 = 87808) consistently. I have tried increasing the DMA buffer sample size and then the number of times it writes, but didn't find it helped.
I am using an 8G microSD card, class 4. I formatted the SD card to the default FAT32 allocation unit size 2048.
How should I organize the buffering of data to allow for this? I thought using fewer writes might help. Would a queue help? How would I implement this and would anyone have an example?
I saw that clifford had a similar problem and he was using a queue, How can I use an SD card for logging 16-bit data at 48 ksamples/s?.

In my case I got it to work by trying a large number of different cards - they vary a great deal. If I had enough RAM available for a longer buffer that would have worked too.
If you are not using an RTOS, the queue buffering option may not be available to you, or at least would be non-trivial to implement.
Using an RTOS queue, I suggest that you create a queue of messages each of length 64*sizeof(float_32), the number of messages in the queue will be determined by the ammount of card latency you need to deal with; a length of 343 for example, will sustain a card stall of 1 second, and will require 87Kb of RAM. The application will then have a high priority thread performing the FFT and placing data in the queue, while a low priority thread takes data from the queue and writes to the file.
You might improve performance further by accumulating multiple message blocks in your DMA buffer before initiating a write, and there may be some benefit in carefully selecting an optimum DMA buffer length.

Flash is very, very sensitive to overwrites. Writing 3kB and then a further 3kB may count as an overwrite of the first 4 kB. In your case, there's no good reason why you'd want such small writes anyway. I'd advise 16 kB writes (32 frames/write * 64 samples/frame * 4 bytes/sample). You'd need 5 or 6 writes per second, which should be well in spec of any old SD card.
Now it's quite likely that you'd get another 1280 samples it while writing; you'll have to deal with that on another thread. Should be no problem as the writing should block without using CPU (it's a low-level Flash delay)

The most probable cause of the problem might be the way you are interfacing the card through the library.
SD cards over the SPI protocol (which I assume being used here) can be read or written in 512 byte sector units, some SD commands making it possible to stream (to perform sequential sector access faster). An important element of the SD card SPI protocol are various delays, where you have to poll the card whether you could start an operation (such as writing data to a sector).
You should read the library's API to discover how its writing process might work. You will need to perform some regular action which in the end would poll the card to know whether the writing process could continue. Some cards might require a set number of accesses before becoming ready for an operation, some others might use timeouts for state transitions. It might not work well to have the function called relatively rarely (such as once in 2-3 milliseconds) anticipating the card getting ready meanwhile. You have to keep on nagging it whether it completed already.
Just from own experiences with SD interfacing.

Control stepper motors via USB

I'm doing a USB device is to control stepper motors. I've done this before using a parallel port. because these ports do not exist in current motherboards, I decided to implement a USB communication between my device and the PC (host).
To achieve My objective, I endowed the freescale microcontroller the device with that has a USB module 12Mbps.
My USB device must receive 4 bytes (one for each motor driver) at a given time, because every byte is a step that should move the engine.
In the PC (Host) an application of user processes a text file with information and make the trajectory coordinates sending bytes at a certain rate for each motor (time is trivial to achieve the acceleration and speed of the motors) .
Using the parallel port was an easy the task because each byte is sent sequentially to a time determined by the user app.
doing a little research about full speed USB protocol understood that the frame is sent every 1ms.
then you can send 4 byte or many more every 1ms but I can not manage time like I did with the parallel port.
My microcontroller can send up to 64 bytes per frame (Based on transfer papers type Control, Bulk, Int, Iso ..).
question 1:
I want to know in what way I can send 4-byte packets faster than every 1 ms?
question 2:
What type of transfer can advise me for these type of devices?
Thanks.

Like Ricardo said, USB-serial will suffice.
As for the type of transfer, try implementing a CDC stack and use your SCI receiver to listen for PC commands. That will give you a receive buffer which will meet your needs.
Initialize your SCI (baud, etc)
Enable receiver and interrupt
On data receive, move it to your 4-byte command buffer
Clear receive buffer, wait for more
When you have all 4 bytes, fire off the steppers! Four bytes should take µs.
Check with Freescale to see if your processor is supported.
http://cache.freescale.com/files/microcontrollers/doc/support_info/USB_STACK_RELEASE_NOTES_V4.1.1.pdf?fpsp=1
There might even be some sample code to get you started.
-Cheers

I am achieving the same goal (driving/control CNC machines) like this:
the USB device is just synchronous I/O parallel port. Using continuous bulk transfer one pipe as input and one as output. This way I was able to achieve synchronous 64bit parallel communication with ~70KHz sample rate. It uses traffic around (i)4.27+(o)4.27 MBit/s that is limit for mine MCU and code. Bigger speeds cause jitter on the output due to USB events interrupts.
How to do it (on MCU side)
I have 2 FIFO's one for ingoing and one for outgoing data. I have timer interrupt occurring with sample rate frequency. In it I read the inputs and feed it to the first FIFO and read data from the other FIFO and send it to the outputs.
On top of that the USB task is called (inside the same interrupt) checking FIFO for sending to and incoming data from USB handling the transfer itself
I choose ATMEL AT32UC3A chips for this task. After a long and pain full research I decided these MCU's because they have enough memory for both FIFO's and program so no need for additional IC. It has FPGA package which can be used (BGA is not an option). It has HS USB (most USB MCU's have only FS like yours). It runs at 66MHz. It supports many interesting features (did interesting projects with it in the past) and of coarse I have experience with ATMEL MCU's from past
So if you want to achieve something similar then
start with bulk transfer (PC -> USB -> MCU -> output)
add FIFO if needed
do not know the sample rate you need. The old LPT's could handle from 80-196KHz depend on the manufactor. The modern ones are much much slower (which is silly and sad).
measure the critical sample rate
you need oscilloscope or very good hearing for this. The output data must be synchronous so no holes in it, no jitter, etc...
if any of these are present you have to lower the sample rate. Mine setup could handle even 1MHz sample rate but the USB jitter was present (sometimes USB event froze the sending for longer that one sample...) so I achieve only 70KHz of stable output.
if needed also inputs then add them
but only if the output is working as it should. Do not forget to lower the sample rate after this too ... Use separate bulk pipes and FIFOs for input and output.

Implications of using many USB web cameras

I'm looking into connecting multiple low resolution USB webcams to a single computer. What implications might this have on performance? How does, for example, four 320x240 cameras fare against a single 640x480 camera? I'm not well versed in the architecture of the USB interface, what are the performance caveats? By performance I mean how would it affect the time to read the image data from multiple cameras compared to a single one.

Each webcam is connected to a different USB port? If so, its good.
Even if its just 1 port with 4 connected webcams. I dont think 4 320x240 will have any problem either. USB 2.0 = 320Mbps. Streaming a 320x240 video wouldn't be over 1mbps. Worst case scenario, putting a 320x240 at 2mbps + 1mb of other data. That would be 12mbps bandwidth between your usb port and the device.
So from the above, the 1 USB port can handle 4 webcams connected by a splitter just as fast as 1 640x480 webcam.,
Processing these images depends on your computer speed and how you write your algorithm.

The maximum data rate of USB is way higher than what you will actually get.
Webcams will probably use isochronous transfer, which under USB 2 can only get about 40% (if I recall correctly) of the bus time, and this also has a good bit of overhead.
I don't know for sure, but I suspect that this is why usb webcam resolutions and data rates seem to have hit a ceiling several years ago. They may start to increase again with the use of USB 3.
I'd suggest that you attach each of your cameras to it's own USB 2 port, as the 40% is shared among all isochronous connections.
One of those connections sharing bandwidth with a keyboard or even a usb mass storage device should be ok, because they would only use parts of the remainder of the bandwidth.

Wrong.
First, USB 2.0 is 480mbps theoretical, and you should be able to get up to about 80% of that with a direct connection.
Second, to calculate the bandwidth used by a camera, image bit depth must be taken into account, therefore:
BW = hresolution() * vresolution() * imagebitdepth(bit) * framerate(frame/s) (in bit/sec)
imagebitdepth can be, for webcams, 8, 16, 24, or 32 bits (ranging from Y800 monochrome to RGBA/RGBT colour full, check spec)
Therefore, a typical webcam # 640*480 resolution, 30fps, 16bit RGB bit-masked RGBA image bit depth will require 147.456 Mbps, and consequently, one of similar spec but # 320*240 resolution would require 36.864 Mbps, as opposed to the major BS stated by Shawn above with his 1mbps which then is also inconsistent with just about all of his other, also wrong data.
Simulatenous operation is nevertheless largely driver dependent, it is up to the manufacturer to take the otherwise minimal effort and expose unique device IDs to DirectShow.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas