how to perform floating point calculation in tmote sky (contiki) - embedded

I have the following code snippet:
#include "contiki.h"
#include <stdio.h> /* For printf() */
PROCESS(calc_process, "calc process");
AUTOSTART_PROCESSES(&calc_process);
PROCESS_THREAD(calc_process, ev, data)
{
double dec=13.2, res=0, div=3.2;
PROCESS_BEGIN();
res=dec+div;
printf("%f",res);
PROCESS_END();
}
After uploading the above code in Tmote sky platform using the command
make TARGET=sky calc.upload, the program will be loaded to the mote (there is no error). Then login to the mote using make login TARGET=sky, the following output is displayed....
OUPUT:
**Rime started with address 4.0
MAC 04:00:00:00:00:00:00:00 Contiki 2.7 started. Node id is set to 4.
CSMA ContikiMAC, channel check rate 8 Hz, radio channel 26
Starting 'calc process'
%f**
How can I get the correct value?
Thanks

It is not floating point calculation support that you need - you have that already. What is missing is floating point support within printf(). That is to say that res will be calculated correctly, but printf() does not support its display.
Because it requires a relatively large amount of code, many microcontroller targeted libraries omit floating point support in stdio. There may be a library build option to include floating point support - refer to the library documentation.
You might do well to ask a question about the specific calculation necessary, and how it might be done using integer or fixed point arithmetic. Alternatively you might write your own floating point display as described here: How to print floating point value using putchar? for example.

Related

STM32CubeMX I2C code writing to reserved register bits

I'm developing an I2C driver on the STM32F74 family processors. I'm using the STM32CubeMX Low Level drivers and I can't make sense of the generated defines for I2C start and stop register values (CR2).
The code is generated in stm32f7xx_ll_i2c.h and is as follows.
/** #defgroup I2C_LL_EC_GENERATE Start And Stop Generation
* #{
*/
#define LL_I2C_GENERATE_NOSTARTSTOP 0x00000000U
/*!< Don't Generate Stop and Start condition. */
#define LL_I2C_GENERATE_STOP (uint32_t)(0x80000000U | I2C_CR2_STOP)
/*!< Generate Stop condition (Size should be set to 0). */
#define LL_I2C_GENERATE_START_READ (uint32_t)(0x80000000U | I2C_CR2_START | I2C_CR2_RD_WRN)
/*!< Generate Start for read request. */
My question is why is bit 31 included in these defines? (0x80000000U). The reference manual (RM0385) states "Bits 31:27 Reserved, must be kept at reset value.". I can't decide between modifying the generated code or keeping the 31 bit. I'll happily take recommendations simply whether its more likely that this is something needed or that I'm going to break things by writing to a reserved bit.
Thanks in advance!
I am guessing here because who knows what was on the minds of the library authors? (Not a lot if you look at the source code!). But I would guess that it is a "dirty-trick" to check that when calling LL functions you are using the specified macros.
However it is severely flawed because the "trick" is only valid for Cortex-M3/4 STM32 variants (e.g. F1xx, F2xx, F4xx) where the I2C peripheral is very different and registers such as I2C_CR2 are only 15 bits wide.
The trick is that the library functions have parameter checking asserts such as:
assert_param(IS_TRANSFER_REQUEST(Request));
Where the IS_TRANSFER_REQUEST is defined thus:
#define IS_TRANSFER_REQUEST(REQUEST) (((REQUEST) == I2C_GENERATE_STOP) || \
((REQUEST) == I2C_GENERATE_START_READ) || \
((REQUEST) == I2C_GENERATE_START_WRITE) || \
((REQUEST) == I2C_NO_STARTSTOP))
This forces you to use the LL defined macros as parameters and not some self-defined or calculated mask because they all have that "unused" check bit in them.
If that truly is the the reason, it is an ill-advised practice that did not envisage the newer I2C peripheral. You might think that the bit was stripped from the parameter before being written to the register. I have checked, it is not. And if did you would be paying for that overhead on every call, which is also undesirable.
As an error detection technique if that is what it is, it is not even applied consistently; for example all the GPIO_PIN_xx macros are 16 bits wide and since they are masks not pin numbers, using bit 31 could for example guard against passing a literal pin-number 10 where the mask 1<<10 is in fact required. Passing 10 would refer to pins 3 and 1 not 10. And to be honest that mistake is far more likely than, passing an incorrect I2C transfer request type.
In the end however "Reserved" generally means "unused but may be used in future implementations", and requiring you to use the "reset value" is a way of ensuring forward binary compatibility. If you had such a device no doubt there would be a corresponding library update to support it - but it would require re-compilation of the code. The risk is low and probably only a problem if you attempt to run this binary on a newer incompatible part that used this bits.
I agree with Clifford, the ST CubeMC / HAL / LL library code is, in places, some of the worst written code imaginable. I have a particular issue with lines such as "TIMx->CCER &= ~TIM_CCER_CC1E" where TIM_CCER_CC1e is defined as 0x0001 and the CCER register contains reserved bits that should remain at 0. There are hundreds of such examples all throughout the library code. ST remain silent to my request for advice.

Demodulating GFSK

I'm trying to demodulate a GFSK signal coming from an nRF24L01+ transceiver chip (hooked up to my Arduino). I've followed this guide so far:
https://www.bitcraze.io/2015/06/sniffing-crazyflies-radio-with-hackrf-blue/#comment-38046
..and managed to manually demodulate a package (the address and the message I sent 'martijn' are clearly recoverable):
https://drive.google.com/open?id=0B9CJ42CGPiF2TWoyelRmWldZcU0
However, now I want to receive packets and decode them as they come in. Someone already made a decoder for this job, but somehow it fails to find my nRF24 packets:
https://wiki.bitcraze.io/misc:hacks:hackrf
My Arduino code for sending the packets is as followed:
#include <SPI.h>
#include <nRF24L01.h>
#include <RF24.h>
#include <RF24_config.h>
RF24 radio(9,10);
const uint64_t pipe = 0xe7e7e7e7e7;
char package[] = "martijn";
void setup() {
Serial.begin(9600);
radio.begin();
radio.setDataRate(RF24_1MBPS);
radio.setChannel(95);
radio.openWritingPipe(pipe);
radio.enableDynamicPayloads();
radio.setAutoAck(true);
radio.powerUp();
}
void loop() {
radio.write(&package, strlen(package));
delay(1);
}
Basically I just want to use GNU Radio Companion to obtain the nRF24 packets, and send their binary data into a file. I'm fine with writing my own decoder. However, I have no clue on how to get this binary data from the incoming signals.
(The comments at the bitcraze site are also mine)
I've be very happy if someone could help me (or even point me in the right direction). Thanks in advance!
After the Quadrature Demod you have to use a Clock recovery block. The M&M Clock Recovery of GNU Radio should do the job. This block will dramatically increase the performance of the decoding.
However you have to take care some parameters that this block requires. The most important is the 'omega'. 'Omega' roughly speaking corresponds to the number of samples per symbol. For example, if your GFSK baudrate is 9600 and your incoming signal from the hardware is 96000, each symbol corresponds to 10 samples. The omega can be any float number. Note however, that clock recovery does not work for large omega values. So try to keep the omega up to 8.0. To do that, either adjust properly the hardware sampling rate or do some resampling.
After the Clock Recover just use a 'Binary Slicer' block. This will convert the floats to bits of 0's and 1's. Using the Pack K bits block you can convert the bit stream into byte stream, that can easily saved to file with a 'File Sink'.
Here is a good step-by-step tutorial for an FSK receiver. GFSK adds only a Gaussian filter so the procedure is quite the same for both of them.

Arduino + OV7670 - Without FIFO - Reading Snapshot

I know that there is a lot in internet (http://forum.arduino.cc/index.php?topic=159557.0 for example) about OV7670 and I read a lot about it, but seems something is missing.
First of all I took a look into the way how can we read pixel by pixel from the camera to build the rectangular 600 X 480 image, and this was quite easy to understand considering HREF, VSYNCH and PCLOCK described on documentation here: http://www.voti.nl/docs/OV7670.pdf. I understand XCLOCK as an input I need to give to OV7670 as a kind of cycle controller and RESET would be something to reset it.
So at this point I thought that the functionality of such camera would be covered by wiring the following pins:
D0..D7 - for data (pixel) connected to arduino digital pins 0 to 7 as INPUT on arduino board
XCLK - for camera clock connected to arduino digital pin 8 as OUTPUT from arduino board
PCLK - for pixel clock connected to arduino digital pin 9 as INPUT on arduino board
HREF - to define when a line starts / ends connected to arduino digital pin 10 as INPUT on arduino board
VSYCH - to define when a frame starts / ends connected to arduino digital pin 11 as INPUT on arduino board
GRD - groud connected to arduino GRD
3V3 - 3,3 INPUT connected to arduino 3,3v
RESET - connected to arduino RESET
PWDN - connected to arduino GRD
The implementation for such approach from my point of view would be something like:
Code:
for each loop function do
write high to XCLK
if VSYNCH is HIGH
return;
if HREF is LOW
return;
if lastPCLOCK was HIGH and currentPCLOCK is LOW
readPixelFromDataPins();
end for
My readPixelFromDataPins() basically read just the first byte (as I'm just testing if I can even read something from the camera), and it is written as follows:
Code:
byte readPixelFromDataPins() {
byte result = 0;
for (int i = 0; i < 8; i++) {
result = result << 1 | digitalRead(data_p[i]);
}
return result;
}
In order to check if something is being read from the camera I just print it to the Serial 9600, the byte read from data pins as a number. But currently I'm receiving only zero values. The code I'm using to retrieve an image is stored here: https://gist.github.com/franciscospaeth/8503747.
Did somebody that makes OV7670 work with Arduino already figure out what am I doing wrong? I suppose I'm using the XCLOCK wrongly right? What shall I do to get it working?
I searched a lot and I didn't found any SSCCE (http://sscce.org/) for this camera using arduino, if somebody have it please let me know.
This question is present on arduino forum (http://forum.arduino.cc/index.php?topic=211741.0) too.
your idea is not bad but ...
the xclock need to be a clock (in your program is just a transition from 0 to 1 and is freezing there)
you need also to use I2C with SIOC and SIOD for configuring the camera (or you can use the default settings, but I am not sure if is the correct output format for you, 30F/s,VGA, YUV format ....)
your code execution is slower using the serial output in the same loop with reading data
I will recommend you to toggle the xclock pin and to move the pixel print in a if(). Also you will be able to read Data only in a very precise time, if you want to read only one byte, than after a transition from 0 to 1 of HREF you need to wait for a new transition from 0 to 1 of PCLK (you will be able to see only one 0-1 transition of HREF after 784x2 transitions of PCLK, (640 active pixels + 144 dead time for each line) x 2 (for YUV or RGB are 2 bytes received for each pixel) )
Hello I am Mr_Arduino from the arduino forums. Your issue is that you are reading pixels too slow please do not use digital read to do such a thing. Also if you insist on using a separate function just to read a byte make sure the function is being inlined. You can do this by declaring your function as static inline. Also as mentioned above how are you generating the clock. You can generate the XCLK using PWM on the arduino.
I have created a working example here:
https://github.com/ComputerNerd/arduino-camera-tft/blob/master/captureimage.c
Edit: a 3rd party has copied part but not all of the code from the above link into the answer here. However, the link must remain as the code posted below requires additional files from that source to actually work.
Edit 2: Removed irrelevant code. You will need to modify what you do with the data.
void capImg(void){
cli();
uint8_t w,ww;
uint8_t h;
w=160;
h=240;
tft_setXY(0,0);
CS_LOW;
RS_HIGH;
RD_HIGH;
DDRA=0xFF;
//DDRC=0;
#ifdef MT9D111
while (PINE&32){}//wait for low
while (!(PINE&32)){}//wait for high
#else
while (!(PINE&32)){}//wait for high
while (PINE&32){}//wait for low
#endif
while (h--){
ww=w;
while (ww--){
WR_LOW;
while (PINE&16){}//wait for low
PORTA=PINC;
WR_HIGH;
while (!(PINE&16)){}//wait for high
WR_LOW;
while (PINE&16){}//wait for low
PORTA=PINC;
WR_HIGH;
while (!(PINE&16)){}//wait for high
WR_LOW;
while (PINE&16){}//wait for low
PORTA=PINC;
WR_HIGH;
while (!(PINE&16)){}//wait for high
WR_LOW;
while (PINE&16){}//wait for low
PORTA=PINC;
WR_HIGH;
while (!(PINE&16)){}//wait for high
}
}
CS_HIGH;
sei();
}
You can also find it on github.
You can use my instruction: how to retrieve image from ov7670 It contains all the steps you need. There is also instuction to setup FrameGrabber: how to run framegrabber

ARM NEON assembly and floating point rounding

I'm working on code optimization for ARM processors using NEON. However I have a problem: my algorithm contains the following floating point computation:
round(x*b - y*a)
Where results can be both positive and negative.
Actually I'm using 2 VMUL and 1 VSUB to make parallel computation (4 values per operation using Q registers and 32bit floats).
There is a way I can handle this problem? If the results were all the same sign I know I can simply add or subtract 0.5
First, NEON suffers from long latency especially after float multiplications.
You won't gain very much with two vmuls and one vsub due to this compared to vfp programming.
Therefore, your code should look like :
vmul.f32 result, x, b
vmls.f32 result, y, a
Those multiply-accumulate/substract instructions are issued back-to-back with the previous multiply instruction without any latency. (9 cycles saved in this case)
Unfortunately however, I don't understand your actual question. Why would someone want to round float values? Apparently you intend to extract the integer part rounded, and there are several ways to do this, and I cannot tell you anything more cause your question is as always too vague.
I've been following your questions in this forum for quite some time, and I simply cannot get rid of the feeling that you're lacking something very fundamental.
I suggest you to read the assembly reference guide pdf from ARM first.
I have no knowledge in assembly, but using the NEON intrinsics in C (I mention their assembly equivalents to help you browse the documentation, even though I would not be able to use them myself), the algorithm for a round function could be:
// Prepare 3 vectors filled with all 0.5, all -0.5, and all 0
// Corresponding assembly instruction is VDUP
float32x4_t plus = vdupq_n_f32(0.5);
float32x4_t minus = vdupq_n_f32(-0.5);
float32x4_t zero = vdupq_n_f32(0);
// Assuming the result of x*a-y*b is stored in the following vector:
float32x4_t xa_yb;
// Compare vector with 0
// Corresponding assembly instruction is VCGT
uint32x4_t more_than_zero = vcgtq_f32(xa_yb, zero);
// Resulting vector will be set to all 1-bits for values where the comparison
// is true, all 0-bits otherwise.
// Use bit select to choose if you have to add or substract 0.5
// Corresponding assembly instruction is VBSL, its syntax is quite alike
// `more_than_zero ? plus : minus`.
float32x4_t to_add = vbslq_f32(more_than_zero, plus, minus);
// Add this vector to the vector to round
// Corresponding assembly instruction is VADD,
// but I guess you knew this one :D
float32x4_t rounded = vaddq_f32(xa_yb, to_add);
// Then cast to integers!
I guess you'll be able to convert this to assembly (I'm not, anyway)
Note that I have no idea if this is really more efficient than standard code, non-SIMD code!

Assembly code for optimized bitshifting of a vector

i'm trying to write a routine that will logically bitshift by n positions to the right all elements of a vector in the most efficient way possible for the following vector types: BYTE->BYTE, WORD->WORD, DWORD->DWORD and WORD->BYTE (assuming that only 8 bits are present in the result). I would like to have three routines for each type depending on the type of processor (SSE2 supported, only MMX suppported, only standard instruction se supported). Therefore i need 12 functions in total.
I have already found by myself how to backup and restore the registers that i need, how to make a loop, how to copy data into regular registers or MMX registers and how to shift by 1 position logically.
Because i'm not familiar with assembly language that's about it.
Which registers should i use for each instruction set?
How will the availability of the large vector (an image) in L1 cache be optimized?
How do i find the next element of the vector (a pointer kind of thing), i know i can make a mov by address and i assume i have to increment the address by 1, 2 or 4 depending on my type of data?
Although i have all the ideas, writing the code is a bit difficult at this point.
Thank you.
Arnaud.
Edit:
Here is what i'm trying to do for MMX for a shift by 1 on a DWORD:
__asm("push mm"); // backup register
__asm("push cx"); // backup register
__asm("mov %cx, length"); // initialize loop
__asm("loopstart_shift1:"); // start label
__asm("movd %xmm0, r/m32"); // get 32 bits data
__asm("psrlq %xmm0, 1"); // right shift 32 bits data logically (stuffs 0 on the left) by 1
__asm("mov r/m32,%xmm0"); // set 32 bits data
__asm("dec %cx"); // decrement index
__asm("cmp %cx,0");
__asm("jnz loopstart_shift1");
__asm("pop cx"); // restore register
__asm("pop mm"); // restore register
__asm("emms"); // leave MMX state
I strongly suggest you pause and take a look at using intrinsics with C or C++ instead of trying to write raw asm - that way the C/C++ compiler will take care of all the register allocation, instruction scheduling and general housekeeping tasks and you can just focus on the important parts, e.g. instead of using psrlq see _m_psrlq in mmintrin.h. (Better yet, look at using 128 bit SSE intrinsics.)
Sounds like you'd benefit from either using or looking into BitMagic's source. its entirely intrinsics based too, which makes its far more portable (though from the looks of it your using GCC, so it might have to get an MSVC to GCC intrinics mapping).