How to receive strings from HC05 Bluetooth module using ATmega16 microcontroller - embedded
I am having problem in receiving string from HC05 to ATmega16. I am able receive characters but not able to receive strings.
I want to control DC motor wirelessly using ATmega16 and Bluetooth module (HC05). I am sending the timer OCR1A values from serial monitor app to ATmega16 by HC05 but not succeeded.
#define F_CPU 16000000UL
#include<string.h>
#include <avr/io.h>
#include <util/delay.h>
#include <stdlib.h>
#include <stdio.h>
void UART_init()
{
UCSRB |= (1 << RXEN) | (1 << TXEN);
UCSRC |= (1 << URSEL) | (1 << UCSZ0) | (1 << UCS Z1);
UBRRL = 0x67;
}
unsigned char UART_RxChar()
{
while( (UCSRA & (1 << RXC)) == 0 );
return(UDR);
}
void UART_TxChar( char ch )
{
while( !(UCSRA & (1 << UDRE)) ); /* Wait for empty transmit buffer*/
UDR = ch ;
}
void UART_SendString( char* str )
{
unsigned char j = 0;
while( j <= 2 )
{
UART_TxChar( str[j] );
j++;
}
}
int main( void )
{
char buff[3];
char j;
int i = 0, k = 0;
DDRD = (1 << PD5);
UART_init();
while( 1 )
{
buff[0] = UART_RxChar();
buff[1] = UART_RxChar();
buff[2] = UART_RxChar();
j = UART_RxChar();
if( j == '!' )
{
UART_SendString( buff ); // this is to check whether the atmega16 received correct values for timer or not.
UART_SendString( "\n" );
}
}
}
The expected result is when I enter the number in serial monitor app, I should get back the same number on serial monitor app.
In the actual result I am getting different characters sometimes and empty some times.
The string buff is unterminated, so UART_SendString( buff ); will send whatever junk follows the received three characters until a NUL (0) byte is found.
char buff[4] = {0};
Will have room for the NUL and the initialisation will ensure that buff[3] is a NUL terminator.
Alternatively, send the three characters individually since without the terminator they do not constitute a valid C (ASCIIZ) string.
Apart from the lack of nul termination, you code requires input of exactly the form nnn!nnn!nnn!.... If the other end is in fact sending lines with CR or CR+LF terminators - nnn!<newline>nnn!<newline>nnn!<newline>... your receive loop will get out of sync.
A safer solution is to use the previously received three characters whenever a '!' character is received. This can be done in a number of ways - for long buffers a ring-buffer would be advised, but for just three characters it is probably efficient enough to simply shift characters left when inserting a new character - for example:
char buff[4] ;
for(;;)
{
memset( buff, '0', sizeof(buff) - 1 ) ;
char ch = 0 ;
while( (ch != '!' )
{
ch = UART_RxChar() ;
if( isdigit(ch) )
{
// Shift left one digit
memmove( buff, &buff[1], sizeof(buff) - 2 ) ;
// Insert new digit at the right
buff[sizeof(buff) - 2] = ch ;
}
else if( ch != '!' )
{
// Unexpected character, reset buffer
memset( buff, '0', sizeof(buff) - 1 ) ;
}
}
UART_SendString( buff ) ;
UART_SendString( "\n" ) ;
}
This also has the advantage that it will work when the number entered is less than three digits, and will discard any sequence containing non-digit characters.
Related
Questions about this serial communication code? [Cortex-M4]
I'm looking at the following code from STMicroelectronics on implementing USART communication with interrupts #include <stm32f10x_lib.h> // STM32F10x Library Definitions #include <stdio.h> #include "STM32_Init.h" // STM32 Initialization /*---------------------------------------------------------------------------- Notes: The length of the receive and transmit buffers must be a power of 2. Each buffer has a next_in and a next_out index. If next_in = next_out, the buffer is empty. (next_in - next_out) % buffer_size = the number of characters in the buffer. *----------------------------------------------------------------------------*/ #define TBUF_SIZE 256 /*** Must be a power of 2 (2,4,8,16,32,64,128,256,512,...) ***/ #define RBUF_SIZE 256 /*** Must be a power of 2 (2,4,8,16,32,64,128,256,512,...) ***/ /*---------------------------------------------------------------------------- *----------------------------------------------------------------------------*/ #if TBUF_SIZE < 2 #error TBUF_SIZE is too small. It must be larger than 1. #elif ((TBUF_SIZE & (TBUF_SIZE-1)) != 0) #error TBUF_SIZE must be a power of 2. #endif #if RBUF_SIZE < 2 #error RBUF_SIZE is too small. It must be larger than 1. #elif ((RBUF_SIZE & (RBUF_SIZE-1)) != 0) #error RBUF_SIZE must be a power of 2. #endif /*---------------------------------------------------------------------------- *----------------------------------------------------------------------------*/ struct buf_st { unsigned int in; // Next In Index unsigned int out; // Next Out Index char buf [RBUF_SIZE]; // Buffer }; static struct buf_st rbuf = { 0, 0, }; #define SIO_RBUFLEN ((unsigned short)(rbuf.in - rbuf.out)) static struct buf_st tbuf = { 0, 0, }; #define SIO_TBUFLEN ((unsigned short)(tbuf.in - tbuf.out)) static unsigned int tx_restart = 1; // NZ if TX restart is required /*---------------------------------------------------------------------------- USART1_IRQHandler Handles USART1 global interrupt request. *----------------------------------------------------------------------------*/ void USART1_IRQHandler (void) { volatile unsigned int IIR; struct buf_st *p; IIR = USART1->SR; if (IIR & USART_FLAG_RXNE) { // read interrupt USART1->SR &= ~USART_FLAG_RXNE; // clear interrupt p = &rbuf; if (((p->in - p->out) & ~(RBUF_SIZE-1)) == 0) { p->buf [p->in & (RBUF_SIZE-1)] = (USART1->DR & 0x1FF); p->in++; } } if (IIR & USART_FLAG_TXE) { USART1->SR &= ~USART_FLAG_TXE; // clear interrupt p = &tbuf; if (p->in != p->out) { USART1->DR = (p->buf [p->out & (TBUF_SIZE-1)] & 0x1FF); p->out++; tx_restart = 0; } else { tx_restart = 1; USART1->CR1 &= ~USART_FLAG_TXE; // disable TX interrupt if nothing to send } } } /*------------------------------------------------------------------------------ buffer_Init initialize the buffers *------------------------------------------------------------------------------*/ void buffer_Init (void) { tbuf.in = 0; // Clear com buffer indexes tbuf.out = 0; tx_restart = 1; rbuf.in = 0; rbuf.out = 0; } /*------------------------------------------------------------------------------ SenChar transmit a character *------------------------------------------------------------------------------*/ int SendChar (int c) { struct buf_st *p = &tbuf; // If the buffer is full, return an error value if (SIO_TBUFLEN >= TBUF_SIZE) return (-1); p->buf [p->in & (TBUF_SIZE - 1)] = c; // Add data to the transmit buffer. p->in++; if (tx_restart) { // If transmit interrupt is disabled, enable it tx_restart = 0; USART1->CR1 |= USART_FLAG_TXE; // enable TX interrupt } return (0); } /*------------------------------------------------------------------------------ GetKey receive a character *------------------------------------------------------------------------------*/ int GetKey (void) { struct buf_st *p = &rbuf; if (SIO_RBUFLEN == 0) return (-1); return (p->buf [(p->out++) & (RBUF_SIZE - 1)]); } /*---------------------------------------------------------------------------- MAIN function *----------------------------------------------------------------------------*/ int main (void) { buffer_Init(); // init RX / TX buffers stm32_Init (); // STM32 setup printf ("Interrupt driven Serial I/O Example\r\n\r\n"); while (1) { // Loop forever unsigned char c; printf ("Press a key. "); c = getchar (); printf ("\r\n"); printf ("You pressed '%c'.\r\n\r\n", c); } // end while } // end main My questions are the following: In the handler function, when does the statement ((p->in - p->out) & ~(RBUF_SIZE-1)) ever evaluate to a value other than zero? If RBUF_SIZE is a power of 2 as indicated, then ~(RBUF_SIZE-1) should always be zero. Is it checking if p->in > p->out? Even if this isn't true, the conditional should evaluate to zero anyway, right? In the line following, the statement p->buf [p->in & (RBUF_SIZE-1)] = (USART1->DR & 0x1FF); is made. Why does the code AND p->in with RBUF_SIZE-1? What kind of buffer are we using in this code? FIFO?
Not so. For example, assuming 32-bit arithmetic, if RBUF_SIZE == 0x00000100 then RBUF_SIZE-1 == 0x000000FF and ~(RBUF_SIZE-1) == 0xFFFFFF00 (it's a bitwise NOT, not a logical NOT). The check you refer to is therefore effectively the same as (p->in - p->out) < RBUF_SIZE, and it's not clear why it is superior. ARM GCC 7.2.1 produces identical length code for the two (-O1). p->in & (RBUF_SIZE-1) is the same as p->in % RBUF_SIZE when p->in is unsigned. Again, not sure why the former would be used when the latter is clearer; sure, it effectively forces the compiler to compute the modulo using an AND operation, but given that RBUF_SIZE is known at compile time to be a power of two my guess is that most compilers could figure this out (again, ARM GCC 7.2.1 certainly can, I've just tried it - it produces the same instructions either way). Looks like it. FIFO implemented as a circular buffer.
how to select row and column in LCD display
I want to display letter on specific row and column in 16x2 LCD display with 8051 MCU. For Example: Display "R" at 2nd column in first row Display "W" at 3rd column in second row I use these routines for the LCD: #include<reg51.h> /* Data pins connected to port P1 of 8051 */ #define Data_Port_Pins (P1) sbit Register_Select_Pin = P2^0; /* Register Pin of LCD connected to Pin 0 of Port P2 */ sbit Read_Write_Pin = P2^1; /* Read/Write Pin of LCD connected to Pin 1 of Port P2 */ sbit Enable_Pin = P2^2; /* EN pin connected to pin 2 of port P2 */ /* Function for creating delay in milliseconds */ void Delay(unsigned int wait) { volatile unsigned i, j; for(i = 0; i < wait; i++) for(j = 0; j < 1200; j++); } /* Function to send command instruction to LCD */ void LCD_Command (unsigned char command) { Data_Port_Pins = command; Register_Select_Pin =0; Read_Write_Pin=0; Enable_Pin =1; Delay (2); Enable_Pin =0; } /* Function to send display data to LCD */ void LCD_Data (unsigned char Data) { Data_Port_Pins = Data; Register_Select_Pin=1; Read_Write_Pin=0; Enable_Pin =1; Delay(2); Enable_Pin =0; } /* Function to prepare the LCD and get it ready */ void LCD_Initialization() { LCD_Command (0x38); LCD_Command (0x0e); LCD_Command (0x01); LCD_Command (0x81); } And this is my attempt: Does it make any sense? void LCD_Position( char row, char column) { unsigned char cmd = 0x80 ; /* Start address */ if( row != 0 ) /*If second row selected ...*/ { cmd += 0x40 ; /*add start address of second row */ } cmd += row & 0x0f ; LCD_Command (cmd); }
Refer to the data sheet for the LCD device in question. For the common 1602 type module (which the initialisation sequence shown suggests is what you are using) you set the position for the next data write using the Set DDRAM address instruction. In 2-line display mode the 1st line starts at address 0x00, and the the 2nd line starts at 0x40. void LCD_Position( int row, int pos) { LCD_Command( 0x80 | // Set DDRAM Address (row == 0) ? 0x00 : 0x40 | // Row selector (pos & 0x0f) ) ; // Position in row } Given (from the data sheet): The code sets DB7 to 1 (0x80)indicating the Set DDRAM Addresss instruction. The other bits are address bits, but there are more locations in the display RAM than the width of the display, so only 0x00 to 0x0f and 0x40 to 0x4f refer to visible display locations. So if the second row is selected, 0x40 is masked in ((row == 0) ? 0x00 : 0x40), then the character position is masked in ((pos & 0x0f)). Although I have used bit-wise manipulation, the expression could equally be performed arithmetically: 0x80 + (row == 0) ? 0x00 : 0x40 + (pos & 0x0f) In both cases the & 0x0f ensures the command is not modified and that the character is placed on the display even if the position if out-of-range. Less succinctly, but perhaps easier to follow: // Set DDRAM Address command - start of row 0 unsigned char cmd = 0x80 ; // If second row selected ... if( row != 0 ) { // ... add start address of second row cmd += 0x40 ; } // Add row offset. Masked to protect other // bits from change if row is out of range. cmd += row & 0x0f ; // Write command LCD_Command( cmd ) ;
How to validate an ASCII string with a two digit hexadecimal checksum appended?
I am using a Renesas 16 bt MCU with HEW (High-performance Embedded Workbench) compiler. The system receives ACSII data of the form: <data><cc> where <cc> comprises two ASCII hex digits corresponding to the 8-bit bitwise XOR of all the preceding characters. The maximum length of the string including <cc> is 14. Here is my attempt: #pragma INTERRUPT Interrupt_Rx0 void Interrupt_Rx0 (void) { unsigned char rx_byte, rx_status_byte,hex; char buffer[15],test[5]; int r,k[15]; char * pEnd; unsigned char dat,arr[14],P3; unsigned int i,P1[10]; rx_byte = u0rbl; //get rx data rx_status_byte = u0rbh; if ((rx_status_byte & 0x80) == 0x00) //if no error { if ((bf_rx0_start == 0) && (rx_byte == '?') && (bf_rx0_ready == 0)) { byte_rx0_buffer[0]=rx_byte; bf_rx0_start = 1; byte_rx0_ptr = 1; } else { if (rx_byte == '?') { bf_rx0_start = 1; byte_rx0_ptr = 0; } if(bf_rx0_start == 1) { byte_rx0_buffer[byte_rx0_ptr++] = rx_byte; sprintf(buffer,"%X",rx_byte); //ASCII CONVERSION dat=strtol(buffer,&pEnd,16); // P1=(int)dat; // sprintf(P1,"%s",dat); delay_ms(2000); k[byte_rx0_ptr++]=dat; } if ((byte_rx0_ptr == 14)) bf_rx0_start = 0;//end further rx until detect new STX } } }
convert this value to hexadec value & xor it ie(3F^30^31^53^52^57=68), if i can do this calculation in program You fundamentally don't understand the difference between values and encodings. Two plus three is five whether you represent the two as "2", "two", or "X X". Addition operates on values, not representations. So to "convert to hexadecimal & xor it" makes no sense. You XOR values, not representations. Hexadecimal is a representation. To maintain a running XOR, just do something like int running_xor=0; at the top and then running_xor ^= rx_byte; each time you receive a byte. It will contain the correct value when you are finished. Set it to zero to reset it. Get hexadecimal completely out of your head. That is just how those values are being printed for your consumption. That has nothing to do with the internal logic of your program which deals only in values.
You would do well to separate out the data validation from the data reception, even to the extent that you don't do it in the interrupt handler; it is likely to be better to buffer the data in the ISR unchecked and defer the data validation to the main code thread or a task-thread if you are using an RTOS. You certainly don't want to be calling heavy-weight library functions such as sprintf() or strtol() in an ISR! Either way, here is a function that would take a pointer to a received string, and its length (to avoid an unnecessary strlen() call since you already know how many characters were received), and returns true if the checksum validates, and false otherwise. It has no restriction on data length - that would be performed by the calling function. If you know that your checksum hex digits will always be either upper or lower-case, you can simplify the decodeHexNibble() function. #include <stdint.h> #include <stdbool.h> uint8_t decodeHexNibble() ; uint8_t decodeHexByte( char* hexbyte ) ; uint8_t decodeHexNibble( char hexdigit ) ; bool checkData( char* data, int length ) { int data_len = length - 2 ; char* bcc_ptr = &data[data_len] ; uint8_t rx_bcc_val = 0 ; uint8_t actual_bcc_val = 0 ; int i = 0 ; // Convert <cc> string to integer rx_bcc_val = decodeHexByte( bcc_ptr ) ; // Calculate XOR of <data> for( i = 0; i < data_len; i++ ) { actual_bcc_val ^= data[i] ; } return actual_bcc_val == rx_bcc_val ; } uint8_t decodeHexNibble( char hexdigit ) { uint8_t nibble ; if( hexdigit >= '0' && hexdigit <= '9' ) { nibble = hexdigit - '0' ; } else if( hexdigit >= 'a' && hexdigit <= 'f' ) { nibble = hexdigit - 'a' + 10 ; } else if( hexdigit >= 'A' && hexdigit <= 'F' ) { nibble = hexdigit - 'A' + 10 ; } else { // Do something 'sensible' with invalid digits nibble = 0 ; } return nibble ; } uint8_t decodeHexByte( char* hexbyte ) { uint8_t byte = hexbyte[0] << 4 ; byte |= hexbyte[1] ; return byte ; }
Determine Position of Most Signifiacntly Set Bit in a Byte
I have a byte I am using to store bit flags. I need to compute the position of the most significant set bit in the byte. Example Byte: 00101101 => 6 is the position of the most significant set bit Compact Hex Mapping: [0x00] => 0x00 [0x01] => 0x01 [0x02,0x03] => 0x02 [0x04,0x07] => 0x03 [0x08,0x0F] => 0x04 [0x10,0x1F] => 0x05 [0x20,0x3F] => 0x06 [0x40,0x7F] => 0x07 [0x80,0xFF] => 0x08 TestCase in C: #include <stdio.h> unsigned char check(unsigned char b) { unsigned char c = 0x08; unsigned char m = 0x80; do { if(m&b) { return c; } else { c -= 0x01; } } while(m>>=1); return 0; //never reached } int main() { unsigned char input[256] = { 0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f, 0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x18,0x19,0x1a,0x1b,0x1c,0x1d,0x1e,0x1f, 0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27,0x28,0x29,0x2a,0x2b,0x2c,0x2d,0x2e,0x2f, 0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x3a,0x3b,0x3c,0x3d,0x3e,0x3f, 0x40,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x4a,0x4b,0x4c,0x4d,0x4e,0x4f, 0x50,0x51,0x52,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5a,0x5b,0x5c,0x5d,0x5e,0x5f, 0x60,0x61,0x62,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0x6a,0x6b,0x6c,0x6d,0x6e,0x6f, 0x70,0x71,0x72,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7a,0x7b,0x7c,0x7d,0x7e,0x7f, 0x80,0x81,0x82,0x83,0x84,0x85,0x86,0x87,0x88,0x89,0x8a,0x8b,0x8c,0x8d,0x8e,0x8f, 0x90,0x91,0x92,0x93,0x94,0x95,0x96,0x97,0x98,0x99,0x9a,0x9b,0x9c,0x9d,0x9e,0x9f, 0xa0,0xa1,0xa2,0xa3,0xa4,0xa5,0xa6,0xa7,0xa8,0xa9,0xaa,0xab,0xac,0xad,0xae,0xaf, 0xb0,0xb1,0xb2,0xb3,0xb4,0xb5,0xb6,0xb7,0xb8,0xb9,0xba,0xbb,0xbc,0xbd,0xbe,0xbf, 0xc0,0xc1,0xc2,0xc3,0xc4,0xc5,0xc6,0xc7,0xc8,0xc9,0xca,0xcb,0xcc,0xcd,0xce,0xcf, 0xd0,0xd1,0xd2,0xd3,0xd4,0xd5,0xd6,0xd7,0xd8,0xd9,0xda,0xdb,0xdc,0xdd,0xde,0xdf, 0xe0,0xe1,0xe2,0xe3,0xe4,0xe5,0xe6,0xe7,0xe8,0xe9,0xea,0xeb,0xec,0xed,0xee,0xef, 0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7,0xf8,0xf9,0xfa,0xfb,0xfc,0xfd,0xfe,0xff }; unsigned char truth[256] = { 0x00,0x01,0x02,0x02,0x03,0x03,0x03,0x03,0x04,0x04,0x04,0x04,0x04,0x04,0x04,0x04, 0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05, 0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06, 0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06, 0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07, 0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07, 0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07, 0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07, 0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08, 0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08, 0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08, 0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08, 0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08, 0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08, 0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08, 0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08}; int i,r; int f = 0; for(i=0; i<256; ++i) { r=check(input[i]); if(r !=(truth[i])) { printf("failed %d : 0x%x : %d\n",i,0x000000FF & ((int)input[i]),r); f += 1; } } if(!f) { printf("passed all\n"); } else { printf("failed %d\n",f); } return 0; } I would like to simplify my check() function to not involve looping (or branching preferably). Is there a bit twiddling hack or hashed lookup table solution to compute the position of the most significant set bit in a byte?
Your question is about an efficient way to compute log2 of a value. And because you seem to want a solution that is not limited to the C language I have been slightly lazy and tweaked some C# code I have. You want to compute log2(x) + 1 and for x = 0 (where log2 is undefined) you define the result as 0 (e.g. you create a special case where log2(0) = -1). static readonly Byte[] multiplyDeBruijnBitPosition = new Byte[] { 7, 2, 3, 4, 6, 1, 5, 0 }; public static Byte Log2Plus1(Byte value) { if (value == 0) return 0; var roundedValue = value; roundedValue |= (Byte) (roundedValue >> 1); roundedValue |= (Byte) (roundedValue >> 2); roundedValue |= (Byte) (roundedValue >> 4); var log2 = multiplyDeBruijnBitPosition[((Byte) (roundedValue*0xE3)) >> 5]; return (Byte) (log2 + 1); } This bit twiddling hack is taken from Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup where you can see the equivalent C source code for 32 bit values. This code has been adapted to work on 8 bit values. However, you may be able to use an operation that gives you the result using a very efficient built-in function (on many CPU's a single instruction like the Bit Scan Reverse is used). An answer to the question Bit twiddling: which bit is set? has some information about this. A quote from the answer provides one possible reason why there is low level support for solving this problem: Things like this are the core of many O(1) algorithms such as kernel schedulers which need to find the first non-empty queue signified by an array of bits.
That was a fun little challenge. I don't know if this one is completely portable since I only have VC++ to test with, and I certainly can't say for sure if it's more efficient than other approaches. This version was coded with a loop but it can be unrolled without too much effort. static unsigned char check(unsigned char b) { unsigned char r = 8; unsigned char sub = 1; unsigned char s = 7; for (char i = 0; i < 8; i++) { sub = sub & ((( b & (1 << s)) >> s--) - 1); r -= sub; } return r; }
I'm sure everyone else has long since moved on to other topics but there was something in the back of my mind suggesting that there had to be a more efficient branch-less solution to this than just unrolling the loop in my other posted solution. A quick trip to my copy of Warren put me on the right track: Binary search. Here's my solution based on that idea: Pseudo-code: // see if there's a bit set in the upper half if ((b >> 4) != 0) { offset = 4; b >>= 4; } else offset = 0; // see if there's a bit set in the upper half of what's left if ((b & 0x0C) != 0) { offset += 2; b >>= 2; } // see if there's a bit set in the upper half of what's left if > ((b & 0x02) != 0) { offset++; b >>= 1; } return b + offset; Branch-less C++ implementation: static unsigned char check(unsigned char b) { unsigned char adj = 4 & ((((unsigned char) - (b >> 4) >> 7) ^ 1) - 1); unsigned char offset = adj; b >>= adj; adj = 2 & (((((unsigned char) - (b & 0x0C)) >> 7) ^ 1) - 1); offset += adj; b >>= adj; adj = 1 & (((((unsigned char) - (b & 0x02)) >> 7) ^ 1) - 1); return (b >> adj) + offset + adj; } Yes, I know that this is all academic :)
It is not possible in plain C. The best I would suggest is the following implementation of check. Despite quite "ugly" I think it runs faster than the ckeck version in the question. int check(unsigned char b) { if(b&128) return 8; if(b&64) return 7; if(b&32) return 6; if(b&16) return 5; if(b&8) return 4; if(b&4) return 3; if(b&2) return 2; if(b&1) return 1; return 0; }
Edit: I found a link to the actual code: http://www.hackersdelight.org/hdcodetxt/nlz.c.txt The algorithm below is named nlz8 in that file. You can choose your favorite hack. /* From last comment of: http://stackoverflow.com/a/671826/315052 > Hacker's Delight explains how to correct for the error in 32-bit floats > in 5-3 Counting Leading 0's. Here's their code, which uses an anonymous > union to overlap asFloat and asInt: k = k & ~(k >> 1); asFloat = > (float)k + 0.5f; n = 158 - (asInt >> 23); (and yes, this relies on > implementation-defined behavior) - Derrick Coetzee Jan 3 '12 at 8:35 */ unsigned char check (unsigned char b) { union { float asFloat; int asInt; } u; unsigned k = b & ~(b >> 1); u.asFloat = (float)k + 0.5f; return 32 - (158 - (u.asInt >> 23)); } Edit -- not exactly sure what the asker means by language independent, but below is the equivalent code in python. import ctypes class Anon(ctypes.Union): _fields_ = [ ("asFloat", ctypes.c_float), ("asInt", ctypes.c_int) ] def check(b): k = int(b) & ~(int(b) >> 1) a = Anon(asFloat=(float(k) + float(0.5))) return 32 - (158 - (a.asInt >> 23))
how to change the input of this program? (C Language)
How can I replace the use of (FILE) and (fopen) with (scanf) to get the input values and send in these 2 functions? I want to use this function in Objective-c code. For more info you can see the whole code here link static void stemfile(FILE * f) { while(TRUE) { int ch = getc(f); if (ch == EOF) return; if (LETTER(ch)) { int i = 0; while(TRUE) { if (i == i_max) increase_s(); ch = tolower(ch); /* forces lower case */ s[i] = ch; i++; ch = getc(f); if (!LETTER(ch)) { ungetc(ch,f); break; } } s[stem(s,0,i-1)+1] = 0; /* the previous line calls the stemmer and uses its result to zero-terminate the string in s */ printf("%s",s); } else putchar(ch); } } int main(int argc, char * argv[]) { int i; s = (char *) malloc(i_max+1); for (i = 1; i < argc; i++) { FILE * f = fopen(argv[i],"r"); if (f == 0) { fprintf(stderr,"File %s not found\n",argv[i]); exit(1); } stemfile(f); } free(s); return 0; }
The scanf() function cannot be a direct replacement for the existing code. The existing code (which is not very well written IMO), splits up the input character stream into letters (defined by the LETTER() macro to be either uppercase or lowercase characters), and non-letters, and converts these letter sequences into lowercase before applying the stem() function to them. The scanf() function, on the other hand extracts primitive types (int, char, double, etc.) and explicitly delimited strings from the input stream. The delimiters in the given code (i.e. anything that is not LETTER()) is too vague for scanf() (though not for a regular expression). scanf() needs a specific character on each end of a substring to look for. Also, scanf() cannot convert to lowercase automatically. Assuming your input continues to be files, I think the easiest solution might be to leave the code as-is and use it, convoluted as it may be. There is nothing about it that shouldn't run as part of a larger Objective-C program. Objective-C, after all, still provides access to the C standard library, at least within the limits that the operating system sets (iOS is far more limiting than MacOS, if your are on an Apple platform). The general problem here is that of tokenization: breaking an input sequence of unclassified symbols (like characters) into sequence of classified tokens (like words and spaces). A common approach to the problem is to use a finite state machine/automaton (FSA/FSM) to apply parsing logic to the input sequence and extract the tokens as they are encountered. An FSA can be a bit hard to set up, but it is very robust and general.
I'm still not sure why you would want to use scanf() in main(). It would presumably mean changing the interface of stemfile() (including the name since it would no longer be processing a file) to take a character string as input. And scanf() is going to make life difficult; it will read strings separated by blanks, which may be part of its attraction, but it will include any punctuation that is included in the 'word'. As Randall noted, the code in the existing function is a little obsure; I think it could be written more simply as follows: #include <stdio.h> #include <ctype.h> #define LETTER(x) isalpha(x) extern int stem(char *s, int lo, int hi); static void stemfile(FILE * f) { int ch; while ((ch = getc(f)) != EOF) { if (LETTER(ch)) { char s[1024]; int i = 0; s[i++] = ch; while ((ch = getc(f)) != EOF && LETTER(ch)) s[i++] = ch; if (ch != EOF) ungetc(ch, f); s[i] = '\0'; s[stem(s, 0, i-1)+1] = 0; /* the previous line calls the stemmer and uses its result to zero-terminate the string in s */ printf("%s", s); } else putchar(ch); } } I've slightly simplified things by making s into a simple local variable (it appears to have been a global, as does imax), removing imax and the increase_s() function. Those are largely incidental to the operation of the function. If you want this to process a (null-terminated) string instead, then: static void stemstring(const char *src) { char ch; while ((ch = *src++) != '\0') { if (LETTER(ch)) { int i = 0; char s[1024]; s[i++] = ch; while ((ch = *src++) != '\0' && LETTER(ch)) s[i++] = ch; if (ch != '\0') src--; s[i-1] = '\0'; s[stem(s,0,i-1)+1] = 0; /* the previous line calls the stemmer and uses its result to zero-terminate the string in s */ printf("%s",s); } else putchar(ch); } } This systematically changes getc(f) into *src++, EOF into \0, and ungetc() into src--. It also (safely) changes the type of ch from int (necessary for I/O) to char. If you are worried about buffer overflow, you have to work a bit harder in the function, but few words in practice will be even 1024 bytes (and you could use 4096 as easily as 1024, with correspondingly smaller - infinitesimal - chance of real data overflowing the buffer. You need to judge whether that is a 'real' risk for you. The main program can become quite simply: int main(void) { char string[1024]; while (scanf("%1023s", string) == 1) stemstring(string); return(0); } Clearly, because of the '1023' in the format, this will never overflow the inner buffer. (NB: Removed the . from "%.1023s" in first version of this answer; scanf() is not the same as printf()!). Challenged: does this work? Yes - this code below (adding a dummy stem() function and slightly modifying the printing) works reasonably well for me: #include <stdio.h> #include <ctype.h> #include <assert.h> #define LETTER(x) isalpha(x) #define MAX(x, y) (((x) > (y)) ? (x) : (y)) static int stem(const char *s, int begin, int end) { assert(s != 0); return MAX(end - begin - 3, 3); } static void stemstring(const char *src) { char ch; while ((ch = *src++) != '\0') { if (LETTER(ch)) { int i = 0; char s[1024]; s[i++] = ch; while ((ch = *src++) != '\0' && LETTER(ch)) s[i++] = ch; if (ch != '\0') src--; s[i-1] = '\0'; s[stem(s,0,i-1)+1] = 0; /* the previous line calls the stemmer and uses its result to zero-terminate the string in s */ printf("<<%s>>\n",s); } else putchar(ch); } putchar('\n'); } int main(void) { char string[1024]; while (scanf("%1023s", string) == 1) stemstring(string); return(0); } Example dialogue H: assda23 C: <<assd>> C: 23 H: 3423///asdrrrf12312 C: 3423///<<asdr>> C: 12312 H: 12//as//12 C: 12//<<a>> C: //12 The lines marked H: are human input (the H: was not part of the input); the lines marked C: are computer output. Next attempt The trouble with concentrating on grotesquely overlong words (1023-characters and more) is that you can overlook the simple. With scanf() reading data, you automatically get single 'words' with no spaces in them as input. Here's a debugged version of stemstring() with debugging printing code in place. The problem was two off-by-one errors. One was in the assignment s[i-1] = '\0'; where the -1 was not needed. The other was in the handling of the end of a string of letters; the while ((ch = *src++) != '\0') leftsrcone place too far, which led to interesting effects with short words entered after long words (when the difference in length was 2 or more). There's a fairly detailed trace of the test case I devised, using words such as 'great' and 'book' which you diagnosed (correctly) as being mishandled. Thestem()` function here simply prints its inputs and outputs, and returns the full length of the string (so there is no stemming occurring). #include <stdio.h> #include <ctype.h> #include <assert.h> #define LETTER(x) isalpha(x) #define MAX(x, y) (((x) > (y)) ? (x) : (y)) static int stem(const char *s, int begin, int end) { int len = end - begin + 1; assert(s != 0); printf("ST (%d,%d) <<%*.*s>> RV %d\n", begin, end, len, len, s, len); // return MAX(end - begin - 3, 3); return len; } static void stemstring(const char *src) { char ch; printf("-->> stemstring: <<%s>>\n", src); while ((ch = *src++) != '\0') { if (ch != '\0') printf("LP <<%c%s>>\n", ch, src); if (LETTER(ch)) { int i = 0; char s[1024]; s[i++] = ch; while ((ch = *src++) != '\0' && LETTER(ch)) s[i++] = ch; src--; s[i] = '\0'; printf("RD (%d) <<%s>>\n", i, s); s[stem(s, 0, i-1)+1] = '\0'; /* the previous line calls the stemmer and uses its result to zero-terminate the string in s */ printf("RS <<%s>>\n", s); } else printf("NL <<%c>>\n", ch); } //putchar('\n'); printf("<<-- stemstring\n"); } int main(void) { char string[1024]; while (scanf("%1023s", string) == 1) stemstring(string); return(0); } The debug-laden output is shown (the first line is the typed input; the rest is the output from the program): what a great book this is! What.hast.thou.done? -->> stemstring: <<what>> LP <<what>> RD (4) <<what>> ST (0,3) <<what>> RV 4 RS <<what>> <<-- stemstring -->> stemstring: <<a>> LP <<a>> RD (1) <<a>> ST (0,0) <<a>> RV 1 RS <<a>> <<-- stemstring -->> stemstring: <<great>> LP <<great>> RD (5) <<great>> ST (0,4) <<great>> RV 5 RS <<great>> <<-- stemstring -->> stemstring: <<book>> LP <<book>> RD (4) <<book>> ST (0,3) <<book>> RV 4 RS <<book>> <<-- stemstring -->> stemstring: <<this>> LP <<this>> RD (4) <<this>> ST (0,3) <<this>> RV 4 RS <<this>> <<-- stemstring -->> stemstring: <<is!>> LP <<is!>> RD (2) <<is>> ST (0,1) <<is>> RV 2 RS <<is>> LP <<!>> NL <<!>> <<-- stemstring -->> stemstring: <<What.hast.thou.done?>> LP <<What.hast.thou.done?>> RD (4) <<What>> ST (0,3) <<What>> RV 4 RS <<What>> LP <<.hast.thou.done?>> NL <<.>> LP <<hast.thou.done?>> RD (4) <<hast>> ST (0,3) <<hast>> RV 4 RS <<hast>> LP <<.thou.done?>> NL <<.>> LP <<thou.done?>> RD (4) <<thou>> ST (0,3) <<thou>> RV 4 RS <<thou>> LP <<.done?>> NL <<.>> LP <<done?>> RD (4) <<done>> ST (0,3) <<done>> RV 4 RS <<done>> LP <<?>> NL <<?>> <<-- stemstring The techniques shown - printing diagnostic information at key points in the program - is one way of debugging a program such as this. The alternative is stepping through the code with a source code debugger - gdb or its equivalent. I probably more often use print statements, but I'm an old fogey who finds IDE's too hard to use (because they don't behave like the command line I'm used to). Granted, it isn't your code any more, but I do think you should have been able to do most of the debugging yourself. I'm grateful that you reported the trouble with my code. However, you also need to learn how to diagnose problems in other people's code; how to instrument it; how to characterize and locate the problems. You could then report the problem with precision - "you goofed with your end of word condition, and ...".