by Buffer streaming i mean the vertex data are constantly updating every single frame.
i can understand how a staging buffer will be fast if it is static data, but what if the input vertices are constantly changing? does staging buffer still have the advantage over host visible and host coherent
or should i just use staging buffer as a fallback when the host coherent memory isn't available.
also, i have another question, can host visible and host coherent and device local all get used together? what's the difference between not specifying device local and specifying device local in both cases?
Staging buffers give you the ability to transfer data into device visible, host non-visible memory (i.e. discrete card GRAM).
Shared host-device buffer access without staging needs memory to be both device visible and host visible.
Impact of these two is really system dependent.
For integrated graphics and mobile graphics, there is generally no discrete GRAM. Nearly all available memory is therefore shared between the CPU and GPU, and therefore "host visible and device visible". On these devices there is no point using staging buffers unless you need layout changes (like texture upload/readback).
For discrete graphics, you have a discrete GRAM. If you want resources to live in GRAM then use staging because GRAM is not "host visible". If you want resources to live in system RAM then use "host visible and device visible". GPU access to system RAM will be slower than GRAM (requires travel over PCIe protocol). It is also normally a smaller capacity than GRAM (depends on the PCIe aperture size exposed by the device).
TLDR: YMMV, benchmark both speed and capacity on the devices you care about.
I am trying to read particular parts of SRAM within a microcontroller with micropython. I must read SRAM before being manipulated by my program flow. The goal is to build an SRAM PUF on Pycom FiPy. I need to find a way to access the External QSPI RAM memory which is located on FiPy. Any help?
I am using STM32, FATFS, SDMMC, eMMC and have created the FATFS on the eMMC.
I have also created a FATFS volume on the USBH (host mode). This also works fine.
The eMMC FATFS work fine then I need to copy all files from the eMMC to the USB drive. The copy file by file from eMMC via FATFS is taking too long.
I think it would be faster if I just blindly copy memory block by block (512 bytes) from eMMC to USBH. So I implemented enough routine to do so. The problem is the copy failed after about few hundred block copied. The failure is seemed to be due to the USBH does not respond.
My question is:
1- "Is is possible to copy block by block raw data from eMMC to USBH like I try to do?"
2- have anyone successfully doing so?
Yes it is perfectly normal to blindly copy all the blocks of one storage device to another and to expect it to work.
The only catch is that the devices have to either have the same block size, or else you have to at least pretend they do (eg: treat each 4kB physical block as eight 512-byte blocks). This is because many filesystem drivers always assume the block size is 512 bytes.
One other problem I have encountered in doing this is that devices can overheat (but this isn't a software problem).
SUMMARY: Cannot copy more than 32GB of files to a 128GB memory stick formatted under FAT32 or exFAT despite the fact that I can format the stick and ChkDsk is showing the correct results after formatting (and also when less than 32GB of files are on the stick). I cannot use NTFS because this stick is designed to transfer files to an iPhone and the app will not handle NTFS. See below for details.
DETAILS:
I have a 128GB memory stick which is designed to quickly transfer files between a computer and an iPhone. One end is a USB and the other plugs into the iPhone's lightning port. This particular type is extremely common and looks like a "T" when you unfold it (Amazon link: https://www.amazon.com/gp/product/B07SB12JHG ).
While this stick is not especially fast when I copy Windows data to it, the transfer rate to my iPhone is much better than the wireless alternatives.
Normally I'd format a large memory stick or USB drive in NTFS, but the app used to transfer files to my iPhone ("CooDisk") will only handle exFAT and FAT32. I've tried both. For exFAT formatting, I've tried both Windows 7 and 10, and for FAT32 I used a free product from RidgeCrop consulting (I can give you the link if you want).
As with all USB storage devices, my stick is formatted as a single active partition.
I do not have a problem formatting. After formatting, ChkDsk seems happy with both FAT32 and exFAT. The CooDisk app works fine with either. After formatting, all the space is ostensibly available for files.
My problem arises when populating the stick with files.
Whenever I get beyond 32GB in total space, I have various problems. Either the copy will fail, or ChkDsk will fail. (After running ChkDsk in 'fix' mode, every file created beyond the 32GB limit will be clobbered.) Interestingly, when I use the DOS copy command with "/v" (verify) it will flag an error for files beyond the 32GB limit, although DOS XCopy with "/v" keeps on going. GUI methods also die at 32GB.
Out of sheer desperation, I wrote a script that uses GNU's cp for Windows. Now I can copy more than 32GB of files and ChkDsk flags no errors. However files beyond the 32GB limit end up being filled with binary zeros despite the fact that they appear as they should in a directory or Windows file explorer listing. (Weird, isn't it?)
I have also tried various allocation unit sizes from 4K all the way up to 64K and attempted this with three different Windows OSs (XP, Win7, and Win10).
Let me emphasize: there is no problem with the first 32GB of files copied to the stick regardless of: whether I use exFAT or FAT32; my method of copying; and my choice of AU size.
Finally, there is nothing in these directories that would bother a FAT32 or exFAT system: (a) file and directory names are short (well under 100 characters); (b) directory nesting is minimal (no more than 5 levels); (c) files are small (nothing close to a GB); and directories have relatively few files (nowhere close to 200, for those of you who recall the old FAT limit of 512 files per directory :)
The only platform I haven't yet tried is using an aging MacBook that someone gave to me. I'm not terribly good with Macs, but I would rather not be dependent on it (it's 13 years old, although MacBooks are built like tanks).
Also, is it possible that FAT32 and exFAT don't allow more than 32GB on an active partition (I can find no such limitation documented anywhere, in fact in my experience USB storage devices are always bootable - as was the original version of my stick)?
Any ideas??
I am working on Uboot bootloader. I have some basic question about the functionality of Bootloader and the application it is going to handle:
Q1: As per my knowledge, bootloader is used to download the application into memory. Over internet I also found that bootloader copies the application to RAM and then the application runs from RAM. I am confused with the working of Bootloader...When application is provided to bootloader through serial or TFTP, What happens next, whether Bootloader copies it to RAM first or whether it writes directly to Flash.
Q2: Why there is a need for Bootloader to copy application to RAM and then run the application from RAM? What difficulties we will face if our application runs from FLASH?
Q3: What is the meaning of statement "My application is running from RAM/FLASH"? Is it mean that our application's .text segment or .code segment is in RAM/FLASH? And we are not concerned about .bss section because it is designed to be in RAM.
Thanks
Phogat
When any hardware system is designed, the designer must consider where the executable code will be located. The answer depends on the microcontroller, the included memory types, and the system requirements. So the answer varies from system to system. Some systems execute code located in RAM. Other systems execute code located in flash. You didn't tell us enough about your system to know what it is designed to do.
A system might be designed to execute code from RAM because RAM access times are faster than flash so code can execute faster. A system might be designed to execute code from flash because flash is plentiful and RAM may not be. A system might be designed to execute code from flash so that it boots more quickly. These are just some examples and there are other considerations as well.
RAM is volatile so it does not retain code through a power cycle. If the system executes code located in RAM then a bootloader is required to obtain and write the code to RAM at powerup. Flash is non-volatile so execution can start right away at powerup and a bootloader is not necessary (but may still be useful).
Regarding Q3, the answer is yes. If the system is running from RAM then the .text will be located in RAM (but not until after the bootloader has copied it to there). If the system is running from flash then the .text section will be located in flash. The .bss section is variables and will be in RAM regardless of where the .text section is.
Yes, in general a bootloader boots the system, but it might also provide a mechanism for interrupting the default boot path and allow alternate firmware to be downloaded and run instead, as well as other features (like flashing).
Traditional rom had a traditional ram like interface, address, data, chip select, read/write, etc. And you can still buy rom that way, but it is cheaper from a pin real estate perspective to use something spi or i2c based, which is slower. Not desireable to run from, but tolerable to read once then run from ram. newer flash technologies can/have had problems with read-disturb, where if your code is in a tight loop reading the same instructions or for any other reason the flash is being read too fast, the charge can drop such that a read returns the wrong data, potentially causing the program to change course or crash. Also your PC and other linux platforms are used to copying the kernel from NV storage (hard disk) to ram and then running from there so the copy from flash to ram and run from ram has a comfort level, and is often faster than flash. So there are many potential reasons to not use flash, but depending on the system it may be possible to run from flash just fine (some systems the flash in question is not accessible directly and not executable, of course SOME rom in that system needed to be executable/bootable).
It simplifies the coding challenges if you program the flash with something that is in ram. You can create and debug the code one time that reads from ram and writes to flash and reads from flash and writes to ram. DONE. Now you can work on separate code that receives data from serial to ram, or from ram to serial. DONE. Then work on code that does the same over ethernet or usb or whatever DONE. You dont have to deal with inventing a protocol or solving the problem of timing. Flash writing is very slow, and even xmodem at a moderate speed can be way too fast, so you have to buffer that data in ram anyway, might as well make the tasks completely separate, instead of an xmodem or any other serial based flash loader with a big ram based fifo, just move the data to ram, then separately go from ram to flash. Same for other interfaces. It is technically possible to buffer the data and give the illusion of going from the download interface straight to flash, and depending on the protocol it is technically possible to hold off the sender so that as little as one flash page is required in ram before programming flash. With the older parallel flashes you could do something pretty cool which I dont think most people figured out. When you stop writing to the flash page for some known period of time the flash would automatically start to program that page and you have to wait for 10ms or something like that before it is done. What folks assumed was you had to program sequential addresses and had to get the new data for the next address in that period of time and would demand high serial port speeds, etc, the reality is you can pound the same address over and over again with the same data and the flash wont start to program the page, and the download interface can be infinitely slow. Serial flashes work differently and either dont need tricks or have different tricks.
RAM/FLASH is not some industry term. It likely means that .text is in rom (flash) and .data and .bss are in ram. A copy of the initial state of .data will probably be on flash as well and copied to ram before main() is called, likewise .bss will be zeroed before main() is called. look at crt0.S for most platforms in gnu sources (glibc, or is it gcc, I dont know) to get the gist of how the bootstrap works in a generic fashion.
A bootloader is not required to run linux or other operating systems, you dont NEED uboot, but it is quite useful. Linux is pretty easy, you copy the kernel and root file system, either set some registers or some tags in memory or both then branch to the entry point in the kernel and linux takes over from there. Because linux is so complicated it is desireable to have a complicated bootloader that can capitalize on high speed interfaces like ethernet (rather than being limited to serial or slower).
I would add something regarding your question Q2.
Q2: Why there is a need for Bootloader to copy application to RAM and then run the application from RAM? What difficulties we will face if our application runs from FLASH?
It is not only about having SPI or similar serial external code memory (which is not that often anyway).
Even the external ROM/FLASH/EPROM/ connected to the usual high speed parallel bus will will prevent a system from running on a maximum clock (with zero wait state) even on the relatively slow MCUs due to the external memory access time. You would need 10 ns FLASH access time for the 100 MHz clock, which is not so easy to get (if economically possible at all). And you would agree that 100 MHz is not such a brain spinning speed any more :-)
That is why many MCU/CPU architectures are doing tricks with reading multiply instructions at once, or having internal cash, or doing whatever was needed to compensate for a slow external code memory. Only most older 8-bit architectures can execute the code directly from the flash memory ('in place').
Even if your only code memory was the internal Flash, something need to be done to speed it up. Take a look for example at this article:
http://www.iqmagazineonline.com/magazine/pdf/v_3_2_pdf/pg14-15-18-19-9Q6Phillips-Z.pdf
It desribes how the ARM7 has incorporated something they called MAM (Memory Accelerator Module). It is a good read, and you will find some measures there to speed up the code memory access for the specific ARM7 arhitecture (goes for most others):
Limit maximum clock frequency (from 80 MHz to about 20 MHz for the example in the article)
Insert wait-cycles during flash accesses
Use an instruction cache
Copy the program code from flash to RAM
Obviously, if the instruction cache was not an option (too small, or the clock too high) you are really left only with execution from the RAM, after relocating the code there at the start up.
There is an option also to run only specific section of code from the RAM, which could be specified to the linker. For the DSP (Digital System Processing) systems, there was really no option to run from the EPROM/FLASH even in the old days with clock around only few tens of MHz, let alone now.
Another issue is debugging, the options for debugging the code placed in ROM, or even Flash, are very limited (you have to move section of the code to RAM to be able to set a break point on most systems).
Regarding Q2, one of the difficulties you may face executing from Flash is another code update. If you are executing from the same block of Flash you are trying to update, the system will crash. This depends on your system architecture (how your application and bootloader are organized in Flash) but may be particularly hard to avoid if you are trying to update the bootloader itself.