Bit packing blocks in Simulink? - block

Could anyone please explain to me what the bit packing blocks in Simulink do? I am currently learning programming, Simulink and control theory so I am not very proficient. I tried using the help windows in MATLAB and also googling but I haven't found anything that explains it well.
Based on the online researching, bit packing is used to condense data packets before they are sent to another block? This way, the program runs faster?
Also, in Simulink, what is the "Bit patterns:" box used for? For example, if I type {[0:7]} what would that mean?
Edit: Where should I go to learn more about this? Are there any good documents online?

Byte packing and unpacking is used mostly for communications between a xPC Target and another device, over Ethernet for example. If bytes are coming in, they can be condensed into other data types such as singles or doubles. Conversely, if you are sending out doubles or singles, often you need to separate them into bytes first.

Related

How could I have known ETH won't work with AXI SRAM? Is it the same for SDMMC2? STM32H745

A while ago I wrote my own ethernet driver for fun, and most of the time was spent banging my head on the keyboard because it wasn't working - as is tradition.
Problem ended up being that ethernet simply couldn't read from or write to AXI SRAM.
I made a very long post about it (along with a couple other matters) on st's community thing, that went unanswered and I eventually forgot about it.
The reason I ask here how it is I should have known, is because it doesn't really seem to mention it anywhere. The bus interconnect table and diagrams don't seem to show any potential problem:
And the block diagram:
And maybe the ART isn't what I interpret from this:
but what I interpret is that it serves as an accelerator for the pre-fetching of instructions to be executed by (assumingly) the M4 processor from D1 memory - and to establish a connection to D1 memory in general.
Is this just me not knowing the meaning of the word "access"? English isn't my first language but I'm pretty sure when you "gain access" to something, that means you get to play around with it, so reading and writing.
This has come to mind after so long because now I want to use an SD card for something I'm doing, and I find it necessary to write from D2 memory into the card, and then from the card into D1 memory.
SDMMC1 is out of question since it can't even interact with D2 memory at all, and for SDMMC2 I'm afraid I'll have the same scenario as I did with ethernet.
I realize I can still regular-dma things around, but that's quite a bit of extra complexity and extra memory use.
So - what did I miss that would have let me know I can't ethernet-dma into axi sram? And should I expect it to prevent me from sdmmc2-dma-ing into axi sram as well?
Thank you!
I didn't write my own driver, but used Cyclone TCP/IP stack on CM7 core. I had the same problem when trying to put my buffers/descriptors into SRAM4.
I observed ETH_DMACSR all REB, TEB and FBE bits set to 1. After switch to SRAM3 all is working like harm.
Unfortunately, I am unable to answer your question and say how this will work with SDMMC :(

How to make the embedded system configurable without update the whole firmware

I'm totally a newbie in embedded software. Currently, I'm working on a project that implements an image processing pipeline on an ARM Cortex-M4 based MCU(board model: STM32F446RE).
I would like to be able to configure the parameters of the pipeline on the fly without actually update the entire firmware since we're using LoRa which has low bandwidth.
I have googled for several hours and could not find any valid solution. So could you please point me in a direction? Thank you very much.
BTW, I don't know if this is relevant, but I'm using FreeRTOS kernel with CMSIS RTOS API v2.
If you are asking this question, I would hope that either:
The board is still under design or
You have a board that was designed by someone who has thought about these issues.
If #2, speak to whoever designed the board, and find out what resources were put in, to handle these issues.
If #1, presumably you have input into the design.
Necessary resources:
Non-volatile storage: flash, eeprom, etc.
One or more ways to write parameters to that non-volatile storage
Desirable resource: communication line for input/output while running (serial is often used).
Once you have these resources, you do the following:
Design the variables, data structures, etc. to hold the parameters
Design your non-volatile storage, taking into account:
a. The features/limitations of your media (for example, flash memory generally requires an erase before writing. Erase takes time and must be done by sector, not individual bytes.
b. Verification: your program should have a way to verify that the non-volatile storage has valid values, not garbage, not all 0xFFs, and either fail or use defaults or some such, if it is not valid
Then you can write a program using this.
You need to consider how you will write the values to the non-volatile memory
during development
in production
They are not likely to be the same.
During development, you want to be able to easily change values. You may have a way to burn your flash chip via a JTAG. You may have a communications port which either runs some kind of simple CLI, accepts commands via some protocol, asks questions and reads the answers via a terminal emulator, etc. The program can then write the values to the non-volatile memory.
In production, you will likely want to burn the 'correct' values once, when setting up the system, without too much operator involvement.
This is just a starting guideline...as mentioned in the comments, your question is very general.

How Do You Profile & Optimize CUDA Kernels?

I am somewhat familiar with the CUDA visual profiler and the occupancy spreadsheet, although I am probably not leveraging them as well as I could. Profiling & optimizing CUDA code is not like profiling & optimizing code that runs on a CPU. So I am hoping to learn from your experiences about how to get the most out of my code.
There was a post recently looking for the fastest possible code to identify self numbers, and I provided a CUDA implementation. I'm not satisfied that this code is as fast as it can be, but I'm at a loss as to figure out both what the right questions are and what tool I can get the answers from.
How do you identify ways to make your CUDA kernels perform faster?
If you're developing on Linux then the CUDA Visual Profiler gives you a whole load of information, knowing what to do with it can be a little tricky. On Windows you can also use the CUDA Visual Profiler, or (on Vista/7/2008) you can use Nexus which integrates nicely with Visual Studio and gives you combined host and GPU profile information.
Once you've got the data, you need to know how to interpret it. The Advanced CUDA C presentation from GTC has some useful tips. The main things to look out for are:
Optimal memory accesses: you need to know what you expect your code to do and then look for exceptions. So if you are always loading floats, and each thread loads a different float from an array, then you would expect to see only 64-byte loads (on current h/w). Any other loads are inefficient. The profiling information will probably improve in future h/w.
Minimise serialization: the "warp serialize" counter indicates that you have shared memory bank conflicts or constant serialization, the presentation goes into more detail and what to do about this as does the SDK (e.g. the reduction sample)
Overlap I/O and compute: this is where Nexus really shines (you can get the same info manually using cudaEvents), if you have a large amount of data transfer you want to overlap the compute and the I/O
Execution configuration: the occupancy calculator can help with this, but simple methods like commenting the compute to measure expected vs. measured bandwidth is really useful (and vice versa for compute throughput)
This is just a start, check out the GTC presentation and the other webinars on the NVIDIA website.
If you are using Windows... Check Nexus:
http://developer.nvidia.com/object/nexus.html
The CUDA profiler is rather crude and doesn't provide a lot of useful information. The only way to seriously micro-optimize your code (assuming you have already chosen the best possible algorithm) is to have a deep understanding of the GPU architecture, particularly with regard to using shared memory, external memory access patterns, register usage, thread occupancy, warps, etc.
Maybe you could post your kernel code here and get some feedback ?
The nVidia CUDA developer forum forum is also a good place to go for help with this kind of problem.
I hung back because I'm no CUDA expert, and the other answers are pretty good IF the code is already pretty near optimal. In my experience, that's a big IF, and there's no harm in verifying it.
To verify it, you need to find out if the code is for sure not doing anything it doesn't really have to do. Here are ways I can see to verify that:
Run the same code on the vanilla processor, and either take stackshots of it, or use a profiler such as Oprofile or RotateRight/Zoom that can give you equivalent information.
Running it on a CUDA processor, and doing the same thing, if possible.
What you're looking for are lines of code that have high occupancy on the call stack, as shown by the fraction of stack samples containing them. Those are your "bottlenecks". It does not take a very large number of samples to locate them.

What microcontroller (and other components) would I need to create a timer device?

As a hobby project to keep myself out of trouble, I'd like to build a little programmer timer device. It will basically accept a program which is a list of times and then count down from each time.
I'd like to use a C or Java micro controller. I have used BASIC in the past to make a little autonomous robot, so this time around I'd like something different.
What micro controller and display would you recommend? I am looking to keep it simple, so the program would be loaded into memory via computer (serial is ok, but USB would make it easier)
Just use a PIC like 16F84 or 16F877 for this. It is more than enough.
As LCD use a 16 x 2 LCD. It is easy to use + will give a nice look to your project.
LCD
The language is not a matter. You can use PIC C, Micro C or any thing you like. The LCD's interface is really easy to drive.
As other components you will just need the crystal and 2 capacitors as oscillator + pull up resister. The rest of the components depend on the input method that you are going to use to set the times.
If you are using a computer to load the list then you will need additional circuit to change the protocols. Use MAX 232 to do that. If you want to use USB, you need to go ahead and use a PIC with USB support. (18F series)
(source: sodoityourself.com)
This is a set of nice tutorials you can use. You can purchase the products from them as well. I purchased once from them.
I would go with the msp430. An ez430 is $20 and you can get them at digikey or from ti directly, then sets of 3 microcontroller boards for $10 after that. llvm and gcc (and binutils) compiler support. Super simple to program, extremely small and extremely low power.
There are many ways to do this, and a number of people have already given pretty good suggestions AVR or PIC are good starting points for a microcontroller to work with that doesn't require too much in the way of complicated setup (hardware & software) or expense (these micros are very cheap). Honestly I'm somewhat surprised that nobody has mentioned Arduino here yet, which happens to have the advantage of being pretty easy to get started with, provides a USB connection (USB->Serial, really), and if you don't like the board that the ATMega MCU is plugged into, you can later plug it in wherever you might want it. Also, while the provided programming environment provides some high level tools to easily protype things you're still free to tweak the registers on the device and write any C code you might want to run on it.
As for an LCD display to use, I would recommend looking for anything that's either based on an HD44780 or emulates the behavior of one. These will typically use a set of parallel lines for talking to the display, but there are tons code examples for interfacing with these. In Arduino's case, you can find examples for this type of display, and many others, on the Arduino Playground here: http://www.arduino.cc/playground/Code/LCD
As far as a clock is concerned, you can use the built-in clock that many 8-bit micros these days provide, although they're not always ideal in terms of precision. You can find an example for Arduino on doing this sort of thing here: http://www.arduino.cc/playground/Code/DateTime. If you want something that might be a little more precise you can get a DS1307 (Arduino example: http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1191209057/0).
I don't necessarily mean to ram you towards an Arduino, since there are a huge number of ways to do this sort of thing. Lately I've been working with 32-bit ARM micros (don't do that route first, much steeper learning curve, but they have many benefits) and I might use something in that ecosystem these days, but the Arduino is easy to recommend because it's relatively inexpensive, there's a large community of people out there using it, and chances are you can find a code example for at least part of what you're trying to do. When you need something that has more horsepower, configuration options, or RAM, there are options out there.
Here are a few places where you can find some neat hardware (Arduino-related and otherwise) for projects like the one you're describing:
SparkFun Electronics
Adafruit Industries
DigiKey (this is a general electronics supplier, they have a bit of everything)
There are certainly tons more, though :-)
I agree with the other answers about using a PIC.
The PIC16F family does have C compilers available, though it is not ideally suited for C code. If performance is an issue, the 18F family would be better.
Note also that some PICs have internal RC oscillators. These aren't as precise as external crystals, but if that doesn't matter, then it's one less component (or three with its capacitors) to put on your board.
Microchip's ICD PIC programmer (for downloading and debugging your PIC software) plugs into the PC's USB port, and connects to the microcontroller via an RJ-11 connector.
Separately, if you want the software on the microcontroller to send data to the PC (e.g. to print messages in HyperTerminal), you can use a USB to RS232/TTL converter. One end goes into your PC's USB socket, and appears as a normal serial port; the other comes out to 5 V or 3.3 V signals that can be connected directly to your processor's UART, with no level-shifting required.
We've used TTL-232R-3V3 from FDTI Chip, which works perfectly for this kind of application.
There are several ways to do this, and there is a lot of information on the net. If you are going to use micro controllers then you might need to invest in some programming equipment for them. This won't cost you much though.
Simplest way is to use the sinus wave from the power grid. In Europe the AC power has a frequency of 50Hz, and you can use that as the basis for your clock signal.
I've used Atmel's ATtiny and ATmega, which are great for programming simple and advanced projects. You can program it with C or Assembly, there are lots of great projects for it on the net, and the programmers available are very cheap.
Here is a project I found by Googling AVR 7 segment clock.
A second vote for PIC. Also, I recommend the magazine Circuit Cellar Ink. Some technical bookstores carry it, or you can subscribe: http://www.circellar.com/
PIC series will be good, since you are creating a timer, I recommend C or Assembly (Assembly is good), and use MPLAB as the development environment. You can check how accurate your timer with 'Stopwatch' in MPLAB. Also PIC16F877 has built in Hardware Serial Port. Also PIC16F628 has a built in Hardware serial port. But PIC16F877 has more ports. For more accurate timers, using higher frequency oscillators is recommended.

ZyXEL ADPCM codec

I have a ZyXEL USB Omni56K Duo modem and want to send and receive voice streams on it, but to reach adequate quality I probably need to implement some "ZyXEL ADPCM" encoding because plain PCM provides too small sampling rate to transmit even medium quality voice, and it doesn't work through USB either (probably because even this bitrate is too high for USB-Serial converter in it).
This mysterious codec figures in all Microsoft WAV-related libraries as one of many codecs theoretically supported by it, but I found no implementations.
Can someone offer an implementation in any language or maybe some documentation? Writing a custom mu-law decoding algorithm won't be a problem for me.
Thanks.
I'm not sure how ZyXEL ADPCM varies from other flavors of ADPCM, but various ADPCM implementations can be found with some google searches.
However, the real reason for my post is why the choice of ADPCM. ADPCM is adaptive differential pulse-code modulation. This means that the data being passed is the difference in samples, not the current value (which is also why you see such great compression). In a clean environment with no bit loss (ie disk drive), this is fine. However, in a streaming environment, its generally assumed that bits may be periodically mangled. Any bit damage to the data and you'll be hearing static or other audio artifacts very quickly and usually, fairly badly.
ADPCM's reset mechanism isn't framed based, which means the audio problems can go on for an extended period of time depending on the encoder. The reset code is a usually a set of 0s (16 comes to mind, but its been years since I wrote my own ports).
ADPCM in the telephony environment usually converts a 12 bit PCM sample to a 4 bit ADPCM sample (not bad). As for audio quality...not bad for phone conversations and the spoken word, but most people, in a blind test, can easily detect the quality drop.
In your last sentence, you throw a curve ball into the question. You start mentioning muLaw. muLaw is a PCM implementation that takes a 12 bit sample and transforms it using a logarithmic scale to an 8 bit sample. This is the typical compression mechanism for TDM (phone) networkworks in North America (most of the rest of the world uses a similar algorithm called ALaw).
So, I'm confused what you are actually trying to find.
You also mentioned Microsft and WAV implementations. You probably know, but just in case, that WAV is just a wrapper around the audio data that provides format, sampling information, channel, size and other useful information. Without WAV, AU or other wrappers involved, muLaw and ADPCM are usually presented as raw data.
One other tip if you are implementing ADPCM. As I indicated, they use 4 bits to represent a 12 bit sample. They get away with this by both sides having a multiplier table. Your position in the table changes based on the 4 bit value (in other words, the value is both multiple against a step size and used to figure out the new step size). I've seen a variety of algorithms use slightly different tables (no idea why, but you typically see the sent and received signals slowly stray off the bias). One of the older, popular sound packages was different than what I typically saw from the telephony hardware vendors.
And, for more useless trivia, there are multiple flavors of ADPCM. The variances involve the table, source sample size and destination sample size, but I've never had a need to work with them. Just documented flavors that I've found when I did my internet search for specifications for the various audio formats used in telephony.
Piping your pcm through ffmpeg -f u16le -i - -f wav -acodec adpcm_ms - will likely work.
http://ffmpeg.org/