PCIe devices can read or write to memory, i.e. can do DMA without requiring a device driver.
If I remember correctly, if you flash a device's firmware (let's say an FPGA device) and input 0xFFFF as device and vendor ID, the device won't be enumerated by BIOS.
I am wondering, if a PCIe device can conduct DMA operations (memory read and write) by bus mastering even when it is not enumerated by BIOS.
A PCIe device can only do DMA if Bus Master Enable (BME) is set in the command register. BME will only be set when a driver is active.
Related
I am currently using Xilinx ZCU106 board, and I am curious about How does JTAG support flash memory programming. I can upload the boot images or hardware logic just by connecting JTAG USB cable to USB connector at the ZCU106 Board, and press the flash button on the host PC. By doing this it looks like the QSPI Flash holds the boot images or hardware logic image and the system initializes itself by using this image. It seems like a magic, and I want to know about the detail.
So far I understood is that I can access JTAG interface via USB cable (thanks to the FTDI chip?), and JTAG boundary-scans the devices connected to it.
However the problem is here, I couldn't find the link between JTAG and QSPI Flash (MT25QU512ABB8ESF-0SIT). I searched several references, including manual of QSPI Flash (MT25QU512ABB8ESF-0SIT, https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/nor-flash/serial-nor/mt25q/die-rev-b/mt25q_qlkt_u_512_abb_0.pdf) but there seems no port for JTAG (such as TDI, TDO, and TCK), but only SPI.
So the first assumption, I thought on the JTAG chain maybe there is a SPI controller to flash the memory. However I couldn't find any clue on the references that I've read so far that a SPI controller exists on the chain or even a SPI controller that controls the QSPI flash really exists. And the second assumption, I thought PS- ARM Cortex A53- TAP (?) does the magic. PS TAP receives JTAG signal and processes the JTAG command. To be precise, let's say about flash memory programming case, then there is like a JTAG command for write data on the flash memory via SPI.
Among these two, I wonder if there is the answer. One thing for the last, if the second one suppose to be the answer, then the processor (Cortex A53) should implement the functionalities by hardware logic to parse Flash write/erase command signal from JTAG interface and let the SPI controller that the system includes perform the write/erase job? Somewhere I read that the JTAG itself supports writing data on the Flash memory but everything I could find was just TDI, TCK, TDO, TMS, and TRST ports which does not fit in any port of QSPI Flash.
Probably I am confusing a lot of concepts very much, but I want to know about the exact mechanism behind the scene on JTAG flash memory programming.
Normally, it will do with the following sequence:
Send some software to target processor in RAM
Send data to target processor (in RAM)
Trigger execute command to store data into the flash memory.
Now the expected data are available in flash memory.
There are a few scenarios I'm curious about:
a transfer from GPU1 memory to GPU2 memory over the PCI bus
a transfer from GPU1 to main memory with DMA
a transfer from GPU1 to main memory without DMA
Will all these scenarios be limited to the total number of PCIe lanes supported by the CPU? For Intel systems, ARM systems?
Will all these scenarios be limited to the total number of PCIe lanes supported by the CPU?
PCIe is not precisely a bus -- certainly not in the way that PCI or ISA were, for instance. It's a set of point-to-point connections between peripherals and the PCIe root complex (which is usually the CPU itself). Any given root complex will support some fixed number of PCIe lanes, each of which is connected to one device. (Often in sets. For instance, it's typical to connect 16 PCIe lanes to most GPUs.)
So, yes. Any communications between PCIe devices, or between devices and memory, must pass through the CPU, and will be limited by the number of PCIe lanes the device (or devices) have connecting them to the bus master.
I would like to implement usb communication at a speed of 30Mbit/sec. My hardware support "high speed usb" so the hardware platform will not limit me.
Can I implement this speed using USB CDC class, or Mass storage class, or are these usb classes speed limited?
In USB protocol who determines the bit rate, is it the device?
The USB CDC and mass storage classes do not have any kind of artificial speed limiting, so you can probably get a throughput of 30 Mbps on a high-speed USB connection (which uses 480 Mbps per second for timing bits on the wire). The throughput you get will be determined by how much bus bandwidth is being used by other devices and how efficiently your device-side firmware, host-side driver, and host-side software operate.
The bit rate is mostly determined by the device. The device basically signals to the host what USB speeds it supports, and the host picks one. The full story is a little bit more complicated, and there are a lot more details about how that works in the USB specification.
We are developing a USB Driver for Ethernet device for WinCE 6.
We are finding performance issues and could narrow down to them USB Stack, using profiling of code. 95% of the time in Tx path is taken in IssueBulkTransfer, which causes the driver to queue packets internally. TX-COMPLETE routine call is not in sync with IssueBulkTransfer.
We have used USB analyzer to check the USB bandwidth usage and found it as 20-30% of total bandwidth. So hardware is fast enough to transfer data across the interface.
With above findings bottleneck seems like in the USB bus Driver and USB HCD Driver.
Is there any known performance limitation with WinCE 6 USB Stack?
What is the maximum speed we can get with High speed device (USB 2.0) using WinCE 6.0 USB stack?
Are you using sync transfers? If you use async ones you may be able to queue multiple packets for tx or rx and the host driver will not have to wait till your driver receives the completition notification to issue a new tx or rx request. This may allow you to use more bandwidth. You may also allocate buffers using HalAllocateCommonBuffer or by reserving some physical memory range for buffers. In this way you may avoid copies in the driver if the driver can use DMA.
You did not provide details about your HW architecture, it's difficult to estimate the level of performances you may expect.
I have no experience of embedded USB stacks so my question is, can I run it without an OS?
Of course it must be possible to run without OS, but will things be MUCH easier if I have one?
I want to use it to save data to a attached USB Mass Storage Device.
If your USB device is on-chip, your chip vendor will almost certainly have example code for USB that may include mass storage. You won't need an OS, but interrupt handling will be necessary and a file system too.
Your USB controller will need host or OTG capability - if it is only device capable, then you cannot connect to another USB device, only a host.
The benefit of an OS - or at least a simple RTOS kernel - is that you can schedule file system activity concurrently with other processing tasks. The OS in that case would not necessarily make things easier, but it may make your system more responsive to critical tasks and events.
I have used usb stacks in the past with PIC18F2550 (8 bits) and LPC1343 (32 bits ARM-Cortex M3) microcontrollers without any problems.