How to change deflate stream output format(raw, zlib, gzip) when use zlib? - gzip

Zlib can output three format, I try to search the docs and zlib.h, but can't find a clear explanation about the options, anyone have any ideas?

From the zlib.h documentation of deflateInit2():
windowBits can also be -8..-15 for raw deflate. In this case, -windowBits
determines the window size. deflate() will then generate raw deflate data
with no zlib header or trailer, and will not compute a check value.
windowBits can also be greater than 15 for optional gzip encoding. Add
16 to windowBits to write a simple gzip header and trailer around the
compressed data instead of a zlib wrapper.

Fill in the blanks
int get_file_format(int n) {
if (n == 0) return 31;
else if (n == 1) return 15;
else if (n == 2) return -15;
else if (n >= 9 && n <= 15) return n; /* zlib with window size 2^9 to 2^15 */
else if (n >= 25 && n <= 31) return n; /* gzip with window size 2^9 to 2^15 */
else if (n >= -15 && n <= -9)return n; /* raw with window size 2^9 to 2^15 */
else return Z_ERRNO;
}
z_hist_sz = get_file_format(n);
ret = deflateInit2(&strm, COMPRESS_LEVEL, Z_DEFLATED, z_hist_sz ...)

Related

Are the compressed bytes inside GZIP and PKZIP files compatible?

This question is a follow-up to "How are zlib, gzip and zip related? What do they have in common and how are they different?" The answers are very detailed but they never quite answer my specific question.
Given a valid GZIP file, should I always be able to extract the deflate-bytes inside and use those bytes to construct a valid PKZIP file with the same contents, without decompressing and recompressing that byte stream?
For example, imagine I have a collection of GZIP files. Could I write a program that quickly (by avoiding deflate/inflate) constructs an equivalent PKZIP file of those files by cutting the GZIP headers off the source files and building a PKZIP structure around the byte streams? (Also the same in reverse by taking any valid PKZIP file and quickly convert them into many GZIP files?)
Both file formats appear to use the same "deflate" algorithm, but is it exactly the same deflate algorithm?
Yes. It is exactly the same deflate format.
(The deflate algorithm can be, and in fact often is different, producing different deflate streams. However that is irrelevant to your application. The format is compatible, and any compliant inflator will be able to decompress the gzip deflate data transplanted into a zip file.)
I forgot why I wrote this, but the C code below will convert a gzip file to a single-entry zip file, with some constraints on the gzip file.
/*
gz2zip.c version 1.0, 31 July 2018
Copyright (C) 2018 Mark Adler
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
Mark Adler
madler#alumni.caltech.edu
*/
// Convert gzip (.gz) file to a single entry zip file. See the comments before
// gz2zip() for more details and caveats.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#if defined(MSDOS) || defined(OS2) || defined(WIN32) || defined(__CYGWIN__)
# include <fcntl.h>
# include <io.h>
# define SET_BINARY_MODE(file) setmode(fileno(file), O_BINARY)
#else
# define SET_BINARY_MODE(file)
#endif
#define local static
// Exit on error.
local void bail(char *why) {
fprintf(stderr, "gz2zip abort: %s\n", why);
exit(1);
}
// Type to track number of bytes written.
typedef struct {
FILE *out;
off_t off;
} tally_t;
// Write len bytes at dat to t.
local void put(tally_t *t, void const *dat, size_t len) {
size_t ret = fwrite(dat, 1, len, t->out);
if (ret != len)
bail("write error");
t->off += len;
}
// Write 16-bit integer n in little-endian order to t.
local void put2(tally_t *t, unsigned n) {
unsigned char dat[2];
dat[0] = n;
dat[1] = n >> 8;
put(t, dat, 2);
}
// Write 32-bit integer n in little-endian order to t.
local void put4(tally_t *t, unsigned long n) {
put2(t, n);
put2(t, n >> 16);
}
// Write n zeros to t.
local void putz(tally_t *t, unsigned n) {
unsigned char const buf[1] = {0};
while (n--)
put(t, buf, 1);
}
// Convert the Unix time unix to DOS time in the four bytes at *dos. If there
// is a conversion error for any reason, store the current time in DOS format
// at *dos. The Unix time in seconds is rounded up to an even number of
// seconds, since the DOS time can only represent even seconds. If the Unix
// time is before 1980, the minimum DOS time of Jan 1, 1980 is used.
local void unix2dos(unsigned char *dos, time_t unix) {
unix += unix & 1;
struct tm *s = localtime(&unix);
if (s == NULL) {
unix = time(NULL); // on error, use current time
unix += unix & 1;
s = localtime(&unix);
if (s == NULL)
bail("internal error"); // shouldn't happen
}
if (s->tm_year < 80) { // no DOS time before 1980
dos[0] = 0; dos[1] = 0; // use midnight,
dos[2] = (1 << 5) + 1; dos[3] = 0; // Jan 1, 1980
}
else {
dos[0] = (s->tm_min << 5) + (s->tm_sec >> 1);
dos[1] = (s->tm_hour << 3) + (s->tm_min >> 3);
dos[2] = ((s->tm_mon + 1) << 5) + s->tm_mday;
dos[3] = ((s->tm_year - 80) << 1) + ((s->tm_mon + 1) >> 3);
}
}
// Chunk size for reading and writing raw deflate data.
#define CHUNK 16384
// Read the gzip file from in and write it as a single-entry zip file to out.
// This assumes that the gzip file has a single member, that it has no junk
// after the gzip trailer, and that it contains less than 4GB of uncompressed
// data. The gzip file is not decompressed or validated, other than checking
// for the proper header format. The modification time from the gzip header is
// used for the zip entry, unless it is not present, in which case the current
// local time is used for the zip entry. The file name from the gzip header is
// used for the zip entry, unless it is not present, in which case "-" is used.
// This does not use the Zip64 format, so the offsets in the resulting zip file
// must be less than 4GB. If name is not NULL, then the zero-terminated string
// at name is used as the file name for the single entry. Whether the file name
// comes from the gzip header or from name, it is truncated to 64K-1 characters
// if necessary.
//
// It is recommended that unzip -t be used on the resulting file to verify its
// integrity. If the gzip files do not obey the constraints above, then the zip
// file will not be valid.
local void gz2zip(FILE *in, FILE *out, char *name) {
// zip file constant headers for local, central, and end record
unsigned char const loc[] = {'P', 'K', 3, 4, 20, 0, 8, 0, 8, 0};
unsigned char const cen[] = {'P', 'K', 1, 2, 20, 0, 20, 0, 8, 0, 8, 0};
unsigned char const end[] = {'P', 'K', 5, 6, 0, 0, 0, 0, 1, 0, 1, 0};
// gzip header
unsigned char head[10];
// zip file modification date, CRC, and sizes -- initialize to zero for the
// local header (the actual CRC and sizes follow the compressed data)
unsigned char desc[16] = {0};
// name from gzip header to use for the zip entry (the maximum size of the
// name is 64K-1 -- if the gzip name is longer, then it is truncated)
unsigned name_len;
char save[65535];
// read and interpret the gzip header, bailing if it is invalid or has an
// unknown compression method or flag bits set
size_t got = fread(head, 1, sizeof(head), in);
if (got < sizeof(head) ||
head[0] != 0x1f || head[1] != 0x8b || head[2] != 8 || (head[3] & 0xe0))
bail("input not gzip");
if (head[3] & 4) { // extra field (ignore)
unsigned extra = getc(in);
int high = getc(in);
if (high == EOF)
bail("premature end of gzip input");
extra += (unsigned)high << 8;
fread(name, 1, extra, in);
}
if (head[3] & 8) { // file name (save)
name_len = 0;
int ch;
while ((ch = getc(in)) != 0 && ch != EOF)
if (name_len < sizeof(name))
save[name_len++] = ch;
}
else { // no file name
name_len = 1;
save[0] = '-';
}
if (head[3] & 16) { // comment (ignore)
int ch;
while ((ch = getc(in)) != 0 && ch != EOF)
;
}
if (head[3] & 2) { // header crc (ignore)
getc(in);
getc(in);
}
// use name from argument if present, otherwise from gzip header
if (name == NULL)
name = save;
else {
name_len = strlen(name);
if (name_len > 65535)
name_len = 65535;
}
// set modification time and date in descriptor from gzip header
time_t mod = head[4] + (head[5] << 8) + ((time_t)(head[6]) << 16) +
((time_t)(head[7]) << 24);
unix2dos(desc, mod ? mod : time(NULL));
// initialize tally of output bytes
tally_t zip = {out, 0};
// write zip local header
off_t locoff = zip.off;
put(&zip, loc, sizeof(loc));
put(&zip, desc, sizeof(desc));
put2(&zip, name_len);
putz(&zip, 2);
put(&zip, name, name_len);
// copy raw deflate stream, saving eight-byte gzip trailer
unsigned char buf[CHUNK + 8];
if (fread(buf, 1, 8, in) != 8)
bail("premature end of gzip input");
off_t comp = 0;
while ((got = fread(buf + 8, 1, CHUNK, in)) != 0) {
put(&zip, buf, got);
comp += got;
memmove(buf, buf + got, 8);
}
// write descriptor based on gzip trailer and compressed count
memcpy(desc + 4, buf, 4);
desc[8] = comp;
desc[9] = comp >> 8;
desc[10] = comp >> 16;
desc[11] = comp >> 24;
memcpy(desc + 12, buf + 4, 4);
put(&zip, desc + 4, sizeof(desc) - 4);
// write zip central directory
off_t cenoff = zip.off;
put(&zip, cen, sizeof(cen));
put(&zip, desc, sizeof(desc));
put2(&zip, name_len);
putz(&zip, 12);
put4(&zip, locoff);
put(&zip, name, name_len);
// write zip end-of-central-directory record
off_t endoff = zip.off;
put(&zip, end, sizeof(end));
put4(&zip, endoff - cenoff);
put4(&zip, cenoff);
putz(&zip, 2);
}
// Convert the gzip file on stdin to a zip file on stdout. If present, the
// first argument is used as the file name in the zip entry.
int main(int argc, char **argv) {
// avoid end-of-line conversions on evil operating systems
SET_BINARY_MODE(stdin);
SET_BINARY_MODE(stdout);
// convert .gz on stdin to .zip on stdout -- error returns use exit()
gz2zip(stdin, stdout, argc > 1 ? argv[1] : NULL);
return 0;
}

Understanding RiscV objdump

I am examining the objdump of a C file that I have compiled using the following commands:
riscv64-unknown-elf-gcc -O0 -o maxmul.o maxmul.c
riscv64-unknown-elf-objdump -d maxmul.o > maxmul.dump
strangely (or not) the addresses appear not to be aligned on 32-bit words but actually on a 16-bit boundary.
Can anyone please explain me why?
Thanks.
objdump excerpt:
00000000000101da <main>:
101da: 7155 addi sp,sp,-208
101dc: e586 sd ra,200(sp)
101de: e1a2 sd s0,192(sp)
101e0: 0980 addi s0,sp,208
...
C-code:
int main()
{
int first[3][3], second[3][3], multiply[3][3];
int golden[3][3];
int sum;
first[0][0] = 1; first[0][1] = 2; first[0][2] = 3;
first[1][0] = 4; first[1][1] = 5; first[1][2] = 6;
first[2][0] = 7; first[2][1] = 8; first[2][2] = 9;
second[0][0] = 9; second[0][1] = 8; second[0][2] = -7;
second[1][0] = -6; second[1][1] = 5; second[1][2] = 4;
second[2][0] = 3; second[2][1] = 2; second[2][2] = -1;
golden[0][0] = 6; golden[0][1] = 24; golden[0][2] = -2;
golden[1][0] = 24; golden[1][1] = 69; golden[1][2] = -14;
golden[2][0] = 42; golden[2][1] = 1140; golden[2][2] = -26;
int i, ii, iii;
for (i = 0; i < 3; i++) {
for (ii = 0; ii < 3; ii++) {
for (iii = 0; iii < 3; iii++) {
//printf("first[%d][%d] * second[%d][%d] \n", i, iii, iii, ii);
//printf("%d * %d (%d,%d)\n", first[i][ii], second[ii][i], i, ii);
sum += first[i][iii] * second[iii][ii];
}
//printf("sum = %d\n", sum);
multiply[i][ii] = sum;
sum = 0;
}
}
int c, d;
int err;
for ( c = 0; c < 3; c++) {
for ( d = 0; d < 3; d++) {
//printf("%d\t", multiply[c][d]);
if (multiply[c][d] != golden[c][d]) {
fail(golden[c][d], multiply[c][d]);
err++;
}
}
//printf("\n");
}
if (err == 0) {
pass();
}
return 0;
}
I am suspecting that your gcc compiles by default with the compressed instruction format where instructions can be 16b & 32b intermix - in such case, 16b instructions are 16b aligned as you can see in the disassembled code.
Objdump provides the address, the encoding, and the mnemonics ; the encoding in your case is always 16b, which means that the compiler have selected 16b instructions when possible.
By enabling verbose mode (-verbose), you can see that, by default,-march=rv64imafdc and -mabi=lp64d. The default targetted ISA is the compressed one, and the targetted ABI requires Double floats extension.
By setting -march=rv64imafd and letting ABI unchanged, gcc successfully compiles using instructions that are only 32b because compressed ISA is no more enabled.
The addresses of instruction are then always 32b aligned.
When compiling (or assembling) to RV64GC or RV32GC (or another target that enables the "C" Standard Extension Compressed Instructions), the compiler (or assembler) automatically replaces some instructions with compressed ones.
Non-compressed instructions are encoded in 32 bit, while compressed instructions are encoded in 16 bit.
When a compressed instruction is emitted it changes the alignment for the next instruction. Either from 32 bit to 16 bit or from 16 bit to 32 bit. That means not only 16 bit wide instructions may be aligned to a 16 bit address but also 32 bit wide ones. IOW both types of instructions (compressed and normal) are tightly packed side by side.
By default, objdump -d doesn't explicitly indicate that an instruction is compressed because it uses the same mnemonic as for the uncompressed variant. Although the number of bytes in the displayed raw instruction gives it away (4 vs. 2 bytes).
However, you can tell objdump to use separate mnemonics for compressed instructions such that they are more easily recognizable (those start with c. then), e.g.:
$ riscv64-unknown-elf-objdump -d -M no-aliases rotate
[..]
101e4: 00d66533 or a0,a2,a3
101e8: 8082 c.jr ra
00000000000101ea <rotr>:
101ea: 00b55633 srl a2,a0,a1
[..]
Note that with the switch -M no-aliases pseudo-instructions aren't displayed anymore, but the corresponding instruction(s) instead.

java method: java.lang.Integer.numberOfLeadingZeros(int) can be optimized

the origin code is :
public static int numberOfLeadingZeros(int i) {
// HD, Figure 5-6
if (i == 0)
return 32;
int n = 1;
if (i >>> 16 == 0) { n += 16; i <<= 16; }
if (i >>> 24 == 0) { n += 8; i <<= 8; }
if (i >>> 28 == 0) { n += 4; i <<= 4; }
if (i >>> 30 == 0) { n += 2; i <<= 2; }
n -= i >>> 31;
return n;
}
I think it can be optimized ,should add following condition:
if (i < 0)
return 0;
the fully optimized code is :
public static int numberOfLeadingZeros(int i) {
if(i<=0) {
return i < 0 ? 0 : 32;
}
int n = 1;
if (i >>> 16 == 0) { n += 16; i <<= 16; }
if (i >>> 24 == 0) { n += 8; i <<= 8; }
if (i >>> 28 == 0) { n += 4; i <<= 4; }
if (i >>> 30 == 0) { n += 2; i <<= 2; }
n -= i >>> 31;
return n;
}
In theory yes, your suggestion makes sense.
In practice, unless you use an exotic JVM, it will not make any difference because the method is intrinsic, so the code that is executed is not the code you can find in the Java class.
For example on x86/64 cpus, the code is here and uses the bsrl CPU instruction, which is as fast as you can hope for.
Besides the fact that this method will likely get replaced by an intrinsic operation for hot spots, this check for negative numbers is only an improvement, if the number is negative. For positive numbers, it is just an additional condition to be evaluated.
So the worth of this optimization depends on the likelihood of negative arguments at this function. When I consider typical use cases of this function, I’d consider negative values a corner case rather than typical argument.
Note that the special handling of zero at the beginning is not an optimization, but a requirement as the algorithm wouldn’t return the correct result for zero without that special handling.
Since your bug report yield to finding an alternative (also shown in your updated question) which improves the negative number case without affecting the performance of the positive number case, as it fuses the required zero test and the test for negative numbers into a single pre-test, there is nothing preventing the suggested optimization.
Bug has been created on oracle bug database: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8189230

openssl is acting open to any size key

how does openssl works with key as it is taking any size of key (1 byte to any size). What is the procedure to go to actual key here ..
openssl enc -d -des-ecb -in cipher.txt -out text.out -K '530343412312345445123345677812345678812324'
how does openssl works with key ... What is the procedure...
It depends on the program, but procedures are usually consistent across the library. In you example, you are using the openssl dec, so you are using the dec sub-program. The source code is available in <openssl dir>/apps/enc.c (enc and dec are part of enc.c).
Here's the relevant parts:
unsigned char key[EVP_MAX_KEY_LENGTH],iv[EVP_MAX_IV_LENGTH];
unsigned char salt[PKCS5_SALT_LEN];
...
char *hkey=NULL,*hiv=NULL,*hsalt = NULL;
The argument to -K is stored in hkey:
else if (strcmp(*argv,"-K") == 0)
{
if (--argc < 1) goto bad;
hkey= *(++argv);
}
Then, around line 580:
if ((hkey != NULL) && !set_hex(hkey,key,sizeof key))
{
/* Handle failure */
}
set_hex is shown below and hex decodes the argument passed in through -K. It back fills the unused length with 0's via the memset. The unused length is EVP_MAX_KEY_LENGTH minus the length -K argument (after hex decoding).
Finally, around line 610:
if (!EVP_CipherInit_ex(ctx, NULL, NULL, key, iv, enc))
{
/* Handle failure */
}
Note: -k (small k) takes a different code path and uses EVP_BytesToKey to derive the key.
int set_hex(char *in, unsigned char *out, int size)
{
int i,n;
unsigned char j;
n=strlen(in);
if (n > (size*2))
{
BIO_printf(bio_err,"hex string is too long\n");
return(0);
}
memset(out,0,size);
for (i=0; i<n; i++)
{
j=(unsigned char)*in;
*(in++)='\0';
if (j == 0) break;
if ((j >= '0') && (j <= '9'))
j-='0';
else if ((j >= 'A') && (j <= 'F'))
j=j-'A'+10;
else if ((j >= 'a') && (j <= 'f'))
j=j-'a'+10;
else
{
BIO_printf(bio_err,"non-hex digit\n");
return(0);
}
if (i&1)
out[i/2]|=j;
else
out[i/2]=(j<<4);
}
return(1);
}
My observation to the case gave following conclusion:
It takes hex value
If the size is less then 8 bytes it pads 0
It takes first 8 bytes as key

Determine Position of Most Signifiacntly Set Bit in a Byte

I have a byte I am using to store bit flags. I need to compute the position of the most significant set bit in the byte.
Example Byte: 00101101 => 6 is the position of the most significant set bit
Compact Hex Mapping:
[0x00] => 0x00
[0x01] => 0x01
[0x02,0x03] => 0x02
[0x04,0x07] => 0x03
[0x08,0x0F] => 0x04
[0x10,0x1F] => 0x05
[0x20,0x3F] => 0x06
[0x40,0x7F] => 0x07
[0x80,0xFF] => 0x08
TestCase in C:
#include <stdio.h>
unsigned char check(unsigned char b) {
unsigned char c = 0x08;
unsigned char m = 0x80;
do {
if(m&b) { return c; }
else { c -= 0x01; }
} while(m>>=1);
return 0; //never reached
}
int main() {
unsigned char input[256] = {
0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,
0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x18,0x19,0x1a,0x1b,0x1c,0x1d,0x1e,0x1f,
0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27,0x28,0x29,0x2a,0x2b,0x2c,0x2d,0x2e,0x2f,
0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x3a,0x3b,0x3c,0x3d,0x3e,0x3f,
0x40,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x4a,0x4b,0x4c,0x4d,0x4e,0x4f,
0x50,0x51,0x52,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5a,0x5b,0x5c,0x5d,0x5e,0x5f,
0x60,0x61,0x62,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0x6a,0x6b,0x6c,0x6d,0x6e,0x6f,
0x70,0x71,0x72,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7a,0x7b,0x7c,0x7d,0x7e,0x7f,
0x80,0x81,0x82,0x83,0x84,0x85,0x86,0x87,0x88,0x89,0x8a,0x8b,0x8c,0x8d,0x8e,0x8f,
0x90,0x91,0x92,0x93,0x94,0x95,0x96,0x97,0x98,0x99,0x9a,0x9b,0x9c,0x9d,0x9e,0x9f,
0xa0,0xa1,0xa2,0xa3,0xa4,0xa5,0xa6,0xa7,0xa8,0xa9,0xaa,0xab,0xac,0xad,0xae,0xaf,
0xb0,0xb1,0xb2,0xb3,0xb4,0xb5,0xb6,0xb7,0xb8,0xb9,0xba,0xbb,0xbc,0xbd,0xbe,0xbf,
0xc0,0xc1,0xc2,0xc3,0xc4,0xc5,0xc6,0xc7,0xc8,0xc9,0xca,0xcb,0xcc,0xcd,0xce,0xcf,
0xd0,0xd1,0xd2,0xd3,0xd4,0xd5,0xd6,0xd7,0xd8,0xd9,0xda,0xdb,0xdc,0xdd,0xde,0xdf,
0xe0,0xe1,0xe2,0xe3,0xe4,0xe5,0xe6,0xe7,0xe8,0xe9,0xea,0xeb,0xec,0xed,0xee,0xef,
0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7,0xf8,0xf9,0xfa,0xfb,0xfc,0xfd,0xfe,0xff };
unsigned char truth[256] = {
0x00,0x01,0x02,0x02,0x03,0x03,0x03,0x03,0x04,0x04,0x04,0x04,0x04,0x04,0x04,0x04,
0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,
0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,
0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,
0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,
0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,
0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,
0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08};
int i,r;
int f = 0;
for(i=0; i<256; ++i) {
r=check(input[i]);
if(r !=(truth[i])) {
printf("failed %d : 0x%x : %d\n",i,0x000000FF & ((int)input[i]),r);
f += 1;
}
}
if(!f) { printf("passed all\n"); }
else { printf("failed %d\n",f); }
return 0;
}
I would like to simplify my check() function to not involve looping (or branching preferably). Is there a bit twiddling hack or hashed lookup table solution to compute the position of the most significant set bit in a byte?
Your question is about an efficient way to compute log2 of a value. And because you seem to want a solution that is not limited to the C language I have been slightly lazy and tweaked some C# code I have.
You want to compute log2(x) + 1 and for x = 0 (where log2 is undefined) you define the result as 0 (e.g. you create a special case where log2(0) = -1).
static readonly Byte[] multiplyDeBruijnBitPosition = new Byte[] {
7, 2, 3, 4,
6, 1, 5, 0
};
public static Byte Log2Plus1(Byte value) {
if (value == 0)
return 0;
var roundedValue = value;
roundedValue |= (Byte) (roundedValue >> 1);
roundedValue |= (Byte) (roundedValue >> 2);
roundedValue |= (Byte) (roundedValue >> 4);
var log2 = multiplyDeBruijnBitPosition[((Byte) (roundedValue*0xE3)) >> 5];
return (Byte) (log2 + 1);
}
This bit twiddling hack is taken from Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup where you can see the equivalent C source code for 32 bit values. This code has been adapted to work on 8 bit values.
However, you may be able to use an operation that gives you the result using a very efficient built-in function (on many CPU's a single instruction like the Bit Scan Reverse is used). An answer to the question Bit twiddling: which bit is set? has some information about this. A quote from the answer provides one possible reason why there is low level support for solving this problem:
Things like this are the core of many O(1) algorithms such as kernel schedulers which need to find the first non-empty queue signified by an array of bits.
That was a fun little challenge. I don't know if this one is completely portable since I only have VC++ to test with, and I certainly can't say for sure if it's more efficient than other approaches. This version was coded with a loop but it can be unrolled without too much effort.
static unsigned char check(unsigned char b)
{
unsigned char r = 8;
unsigned char sub = 1;
unsigned char s = 7;
for (char i = 0; i < 8; i++)
{
sub = sub & ((( b & (1 << s)) >> s--) - 1);
r -= sub;
}
return r;
}
I'm sure everyone else has long since moved on to other topics but there was something in the back of my mind suggesting that there had to be a more efficient branch-less solution to this than just unrolling the loop in my other posted solution. A quick trip to my copy of Warren put me on the right track: Binary search.
Here's my solution based on that idea:
Pseudo-code:
// see if there's a bit set in the upper half
if ((b >> 4) != 0)
{
offset = 4;
b >>= 4;
}
else
offset = 0;
// see if there's a bit set in the upper half of what's left
if ((b & 0x0C) != 0)
{
offset += 2;
b >>= 2;
}
// see if there's a bit set in the upper half of what's left
if > ((b & 0x02) != 0)
{
offset++;
b >>= 1;
}
return b + offset;
Branch-less C++ implementation:
static unsigned char check(unsigned char b)
{
unsigned char adj = 4 & ((((unsigned char) - (b >> 4) >> 7) ^ 1) - 1);
unsigned char offset = adj;
b >>= adj;
adj = 2 & (((((unsigned char) - (b & 0x0C)) >> 7) ^ 1) - 1);
offset += adj;
b >>= adj;
adj = 1 & (((((unsigned char) - (b & 0x02)) >> 7) ^ 1) - 1);
return (b >> adj) + offset + adj;
}
Yes, I know that this is all academic :)
It is not possible in plain C. The best I would suggest is the following implementation of check. Despite quite "ugly" I think it runs faster than the ckeck version in the question.
int check(unsigned char b)
{
if(b&128) return 8;
if(b&64) return 7;
if(b&32) return 6;
if(b&16) return 5;
if(b&8) return 4;
if(b&4) return 3;
if(b&2) return 2;
if(b&1) return 1;
return 0;
}
Edit: I found a link to the actual code: http://www.hackersdelight.org/hdcodetxt/nlz.c.txt
The algorithm below is named nlz8 in that file. You can choose your favorite hack.
/*
From last comment of: http://stackoverflow.com/a/671826/315052
> Hacker's Delight explains how to correct for the error in 32-bit floats
> in 5-3 Counting Leading 0's. Here's their code, which uses an anonymous
> union to overlap asFloat and asInt: k = k & ~(k >> 1); asFloat =
> (float)k + 0.5f; n = 158 - (asInt >> 23); (and yes, this relies on
> implementation-defined behavior) - Derrick Coetzee Jan 3 '12 at 8:35
*/
unsigned char check (unsigned char b) {
union {
float asFloat;
int asInt;
} u;
unsigned k = b & ~(b >> 1);
u.asFloat = (float)k + 0.5f;
return 32 - (158 - (u.asInt >> 23));
}
Edit -- not exactly sure what the asker means by language independent, but below is the equivalent code in python.
import ctypes
class Anon(ctypes.Union):
_fields_ = [
("asFloat", ctypes.c_float),
("asInt", ctypes.c_int)
]
def check(b):
k = int(b) & ~(int(b) >> 1)
a = Anon(asFloat=(float(k) + float(0.5)))
return 32 - (158 - (a.asInt >> 23))