How does DMA work? What is the workflow of DMA? [closed] - embedded

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
I am trying to learn the basics of DMA. I watched certain videos on YouTube for the same.
I have got a few queries:
Can we set/reset bits of registers using DMA? Like if I want to set the 4th bit of GPIO_ODR, can I do it using DMA?
Does DMA follow polling method or interrupt method?
If incase I want to set and reset bits of the registers of the GPIO (general purpose input-output) peripheral, then what would be the workflow of DMA?
Will it be:
CPU->DMA->Peripheral->Register
and then for reverting back
Register->Peripheral->DMA->CPU
Is this workflow correct?
Please help me with this. Also, it would be great if you explain in simple words because I am completely new to this topic.
Thanks!
-Aditya Ubarhande

Disclaimer: My answer is based on my experience on DMA hardware of STM32 microcontrollers.
If the DMA you're using have access to the memory region where hardware registers reside (like GPIO), then yes, you can move data to these registers and change the bits. But be aware that this doesn't give you bit-wise read-modify-write access. DMA writes (or reads) the memory region (can be 8, 16 or 32 bits etc.) all at once. On STM32, Timer triggered DMA driven GPIO access can be used for synchronous parallel port implementations. On the other hand, DMA is generally used for event triggered bulk memory transfers, so using it for one time manipulation of hardware registers makes little sense.
In general, you arm the DMA and it generates an interrupt when its job is done (or half complete) or when some error occurs. DMA has its own control & status registers, so you can poll them instead of enabling & using interrupts. But most of the time, using interrupts is a better idea. It's also an option (probably a bad one) to fire & forget it, if you don't need to be notified when the transfer is complete.
In general, for any DMA transfer you configure source address, destination address, data length & width and the triggering condition (unless it's a memory-to-memory transfer). Of course, there can be additional settings like enabling interrupts etc.

Related

I/O Data tranfer Modes and I/O addresses access [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I've realized that the 3 ways to make an I/O connection :
1- Programmed I/O (polling)
2- Interrupt-Driven I/O
3- Direct Memory Access (DMA)
now, I need to relate this with the reality of how accessing I/O addresses is done
(Isolated I/O || Memory-mapped I/O) :
DMA
Memory mapping does not affect the direct memory access (DMA) for a device, because, by definition, DMA is a memory-to-device communication method that bypasses the CPU.
this is all information I have.
now, what about Interrupt-driven and Programmed I/O, what is the addressing modes are used in these cases?
Does a microcontroller can do both addressing modes (Isolated/memory-mapped) or only one choice?
Am I understanding the topics right now, or there are any misconceptions?
Port mapped vs memory mapped (Communication)
This is how the IO access is performed, i.e. how the CPU communicates with the device.
With port mapped IO the CPU uses special instructions (e.g. x86's in and out) to read/write from a device in a special IO address space you can't access with load/store instructions.
With memory mapped IO the CPU performs normal memory loads and stores to communicate with a device.
The latter is usually more granular and uniform when it comes to security permissions and code generation.
Polling vs Interrupt driven (Notification)
This is how notifications from the devices are received by the CPU.
With polling the CPU will repeatedly read a status register from the device and check if a completion bit (or equivalent) is set.
With interrupt driven notifications the device will raise an interrupt without the need for the CPU to do any periodic work.
Polling hogs the CPU but has less latency for some workload.
DMA vs non-DMA (Transfer)
This is how the data is transferred from the device to the CPU.
With DMA the device will write directly into memory.
Without DMA the CPU will have to read the data repeatedly (either with port or memory mapped IO).
All these three dimensions are independent of each other, you can combine them however you like (e.g. port mapped, interrupt driven, DMA).
Note, however, that the nomenclature is not consistent in the literature.
Also, different devices have different interfaces that may not need all of the three dimensions (e.g. a very simple output-only GPIO pin may have a single write-only register, so it makes no sense to talk about polling or DMA in this case).

how to write sensor's libraries from scratch [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Can someone explain to me how can I write sensor's library from zero, I read the datasheet, and some Arduino libraries but I did not understand how they had written them
It's not a trivial task to write a library for embedded projects. Most of the times, it's almost impossible to write a completely generic one that can satisfy everyone's needs.
Don't let Arduino library examples fool you. Most of them are not designed and optimized for real world applications with strict timing constraints. They are useful when reading that sensor is the only thing your embedded system does. You can also use them sequentially in a master loop when blocking read operations are not a concern.
Complex embedded applications don't fit into this scheme. Most of the time, you need to execute more than one task at the same time, and you use interrupts and DMA to handle your sensor streams. Sometimes you need to use an RTOS. Timing constrains can be satisfied by using the advanced capabilities of STM32 Timer modules.
Connecting timers, DMAs, interrupts and communication (or GPIO) modules together so that they can work in harmony is not easy (also add RTOS, if you use one), and it's almost impossible to generalize. Here is an list of examples that comes into my mind:
You need to allocate channels for DMA usage. You library must be aware of the channel usage of other libraries to avoid conflicts.
TIM modules are not the same. They may have different number of I/O pins. Some specific peripherals (like ADC) can be triggered by some TIM modules but not the others. There are constraints if you want to chain them, you can't just take one timer and connect it to some other one.
The library user may want to use DMAs or interrupts. Maybe even an RTOS. You need to create different API calls for all possible situations.
If you use RTOS, you must consider different flavors. Although the RTOS concepts are similar, their approaches to these concepts are not the same.
HW pin allocation is a problem. In Arduino libraries, library user just says "Use pins 1, 2, 3 for the SPI". You can't do this in a serious application. You need to use pins which are connected to hardware modules. But you also need to avoid conflicts with other modules.
Devices like STM32 have a clock tree, which affects the cloks of each peripheral module. Your library must be aware of the clock frequency of the module it uses. Low power modes can change these settings and break a library which isn't flexible for such changes. Some communication modules have more complicated timing settings, like the CAN Bus module for example, which needs a complex calculation for both bit rate and bit sampling position.
[And probably many more reasons...]
This is probably why the uC vendors provide offline configuration & code generation tools, like the CubeMX for STM32's. Personally I don't like them and I don't use them. But I must admit that I still use CubeMX GUI to determine HW pin allocations, even though I don't use the code it generates.
It's not all hopeless if you only want to create libraries for your own use and your own programming style. Because you can define constraints precisely from the start. I think creating libraries are easier in C++ compared to C. While working on different projects, you slowly create and accumulate your own code snippets and with some experience, these can evolve into easily configurable libraries. But don't expect someone else can benefit from them as much as you do.

About embedded firmware development [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
In the past few days I found how important is RTOS layer on the top of the embedded hardware.
My question is :
Is there any bifurcation between device driver (written in C directly burned over the microcontroller)
And the Linux Device driver ?
This question is a little broad, but an answer, a little broad itself, can be given.
The broadness comes from the fact that "embedded hardware" is not a precise term. That hardware ranges from 4 bit microcontrollers, or 8 pins ones, up to big CPUs which have many points in common with typical processors used tipically on linux machines (desktop and servers). Linux itself can be tailored up to the point it does not resemble a normal operating system anymore.
Anyway, a few things, generally acceptable, can be the following. Linux is not, in its "plain" version, a real time operating system - with the term RTOS instead, the "real time" part is implied. So, this can be one bifurcation. But the most important thing, I think, is that embedded firmware tries to address the hardware and the task to be done without anything else added. Linux O.S. instead is general purpose - it means that it offers a lot of services and functionalities that, in many cases, are not needed and only give more cost, less performances, more complication.
Often, in a small or medium embedded system, there is not even a "driver": the hardware and the application talk directly to each other. Of course, when the hardware is (more or less) standard (like a USB port, a ethernet controller, a serial port), the programming framework can provide ready-to-use software that sometimes is called "driver" - but very often it is not a driver, but simply a library with a set of functions to initialize the device, and exchange data. The application uses those library routines to directly manage the device. The O.S. layer is not present or, if the programmer wants to use an RTOS, he must check that there are no problems.
A Linux driver is not targeted to the application, but to the kernel. And the application seldom talks to the driver - it uses instead a uniform language (tipically "file system idiom") to talk to the kernel, which in turns calls the driver on behalf of the application.
A simple example I know very well is a serial port. Under Linux you open a file (may be /dev/ttyS0), use some IOCTL and alike to set it up, and then start to read and write to the file. You don't even care that there is a driver in the middle, and the driver was written without knowledge of the application - the driver only interacts with the kernel.
In many embedded cases instead, you set up the serial port writing directly to the hardware registers; you then write two interrupt routines which read and write to the serial port, getting and putting data from/into ram buffers. The application reads and writes data directly to those buffers. Special events (or not so special ones) can be signaled directly from the interrupt handlers to the application. Sometimes I implement the serial protocol (checksum, packets, sequences) directly in the interrupt routine. It is faster, and simpler, and uses less resources. But clearly this piece of software is no more a "driver" in the common sense.
Hope this answer explains at least a part of the whole picture, which is very large.

When I should use I2C and when I should use SPI? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I want to know out of SPI and I2C communication protocol which one
is Faster? I have read in an article that SPI is faster but they
have not given an explanation why? Is it because of less
overhead in SPI when compared to I2C (like start, ack, stop)?
Which one is better out of two? I have seen that for ADC mostly SPI
is preferred, but why? For Flash also I have mostly seen SPI
protocol being used, but for Sensor both SPI and I2C. Now, what makes
as to decide that for one peripheral I should go with SPI and for
another I2C is preferred?
Better question is when I should use I2C and when I should use SPI. Like always in engineering there are different pros and cons in both protocols. I compared them below so you will be able to asses what is better match for your requirements.
Quick comparison of my own:
Additional remarks:
SPI is usually used for slave devices where speed matters e.g. ADC peripherals, FLASH memories etc.
I2C is usually used for slave devices which are fine with I2C speed constrains or which are kind of slow like sensors which can take longer time to get the measure e.g. popular temperature and humidity sensor HTU21-D with I2C performs measure between 3-16 [ms] (this time depends on the selected measurement resolution).
Post explaining I2C Bus Length constrains.
Post explaining why SPI is faster than I2C
PS:
The fastest ADC peripherals are not using either I2C or SPI. They use parallel I/O.
Keep in mind that for simple (hobby) projects it usually doesn’t matter and you will be fine with either of them.

Static or dynamic width access to computer BUS? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Suppose we have a simple processor, could be an embedded system, with one system bus, for the sake of the argument, a 32bit bus.
Now, if we have a couple of Peripherals, one named PER0 for example, attached to the bus, we can do two things:
Allow it to have fixed-width access to the main bus, for example 8 bits, and that way PER0 will always communicate with the bus in 8bit packages. This we can call static-width access.
Allow it to have options to choose how it will communicate with the
bus in terms of size of data by using signals with which it tells
the processor the mode of access it wants to use. For example, we
create two signals, A1 and A0, between the processor and PER0, whose
values will say:
00 - wait
01 - 8bit
10 - 16bit
11 - 32bit
and so the processor will know whether to send 8bit data to its
bus, or 32bit data, based on the values of A1, A0. This we can call
dynamic-width access to the bus.
Question:
In your experience, which of these two methods is preferred, and why? Also, in which cases should this be implemented? And finally, considering embedded systems, which method is more widely spread?
EDIT: I would like to expand on this topic, so I'm not asking for personal preferences, but for further information about these two methods, and their applications in computer systems. Therefore, I believe that this qualifies as a legitimate stackoverflow question.
Thanks!
There are multiple considerations. Naturally, the dynamic-width would allow better utilization of bandwidth in case you have multiple sizes in your transactions. On the other hand, if you transfer some 8 bytes, and then the next 8, you double the overhead compared to the baseline (transferring the full block in one go, assuming you can cache it until it fully consumed). So basically you need to know how well you can tell in advance which chunks you're going to need.
There's an interesting paper about the possibility of using such a dynamic sized transactions between the CPU and the DRAM:
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput
There you can see the conflict since it's very hard to tell which transactions you'll need in the future and whether bringing only partial data may cause a degradation. They went to the effort of implementing a predictor to try and speculate that. Note that this is applicable to you only if you're dealing with coherent memory.