SMHasher setup? - testing

The SMHasher test suite for hash functions is touted as the best of the lot. But the latest version I've got (from rurban) gives absolutely no clue on how to check your proposed hash function (it does include an impressive battery of hash functions, but some of interest --if only for historic value-- are missing). Add that I'm a complete CMake newbie.

It's actually quite simple. You just need to install CMake.
Building SMHasher
To build SMHasher on a Linux/Unix machine:
git clone https://github.com/rurban/smhasher
cd smhasher/
git submodule init
git submodule update
cmake .
make
Adding a new hash function
To add a new function, you can edit just three files: Hashes.cpp, Hashes.h and main.cpp.
For example, I will add the ElfHash:
unsigned long ElfHash(const unsigned char *s)
{
unsigned long h = 0, high;
while (*s)
{
h = (h << 4) + *s++;
if (high = h & 0xF0000000)
h ^= high >> 24;
h &= ~high;
}
return h;
}
First, need to modify it slightly to take a seed and length:
uint32_t ElfHash(const void *key, int len, uint32_t seed)
{
unsigned long h = seed, high;
const uint8_t *data = (const uint8_t *)key;
for (int i = 0; i < len; i++)
{
h = (h << 4) + *data++;
if (high = h & 0xF0000000)
h ^= high >> 24;
h &= ~high;
}
return h;
}
Add this function definition to Hashes.cpp. Also add the following to Hashes.h:
uint32_t ElfHash(const void *key, int len, uint32_t seed);
inline void ElfHash_test(const void *key, int len, uint32_t seed, void *out) {
*(uint32_t *) out = ElfHash(key, len, seed);
}
In file main.cpp add the following line into array g_hashes:
{ ElfHash_test, 32, 0x0, "ElfHash", "ElfHash 32-bit", POOR, {0x0} },
(The third value is self-verification. You will learn this only after running the test once.)
Finally, rebuild and run the test:
make
./SMHasher ElfHash
It will show you all the tests that this hash function fails. (It is very bad.)

Related

Addressing pins of Register in microcontrollers

I'm working on Keil software and using LM3S316 microcontroller. Usually we address registers in microcontrollers in form of:
#define GPIO_PORTC_DATA_R (*((volatile uint32_t *)0x400063FC))
My question is how can I access to single pin of register for example, if I have this method:
char process_key(int a)
{ PC_0 = a ;}
How can I get PC_0 and how to define it?
Thank you
Given say:
#define PIN0 (1u<<0)
#define PIN1 (1u<<1)
#define PIN2 (1u<<2)
// etc...
Then:
char process_key(int a)
{
if( a != 0 )
{
// Set bit
GPIO_PORTC_DATA_R |= PIN0 ;
}
else
{
// Clear bit
GPIO_PORTC_DATA_R &= ~PIN0 ;
}
}
A generalisation of this idiomatic technique is presented at How do you set, clear, and toggle a single bit?
However the read-modify-write implied by |= / &= can be problematic if the register might be accessed in different thread/interrupt contexts, as well as adding a possibly undesirable overhead. Cortex-M3/4 parts have a feature known as bit-banding that allows individual bits to be addressed directly and atomically. Given:
volatile uint32_t* getBitBandAddress( volatile const void* address, int bit )
{
__IO uint32_t* bit_address = 0;
uint32_t addr = reinterpret_cast<uint32_t>(address);
// This bit maniplation makes the function valid for RAM
// and Peripheral bitband regions
uint32_t word_band_base = addr & 0xf0000000u;
uint32_t bit_band_base = word_band_base | 0x02000000u;
uint32_t offset = addr - word_band_base;
// Calculate bit band address
bit_address = reinterpret_cast<__IO uint32_t*>(bit_band_base + (offset * 32u) + (static_cast<uint32_t>(bit) * 4u));
return bit_address ;
}
Then you can have:
char process_key(int a)
{
static volatile uint32_t* PC0_BB_ADDR = getBitBandAddress( &GPIO_PORTC_DATA_R, 0 ) ;
*PC0_BB_ADDR = a ;
}
You could of course determine and hard-code the bit-band address; for example:
#define PC0 (*((volatile uint32_t *)0x420C7F88u))
Then:
char process_key(int a)
{
PC0 = a ;
}
Details of the bit-band address calculation can be found ARM Cortex-M Technical Reference Manual, and there is an on-line calculator here.

How to link the libsvm library in google colab when executing CUDA? What is the proper linking flag for libsvm?

I am working on google colab and i want to use libsvm library in my project. I downloaded libsvm and installed it. Now when i use !nvcc -o command and run the code using CUDA i am getting errors like,
undefined reference to `svm_get_nr_class
undefined reference to 'svm_predict_probability'
undefined reference to `svm_free_and_destroy_model
I guess the problem is that libsvm is not properly linked, As i use -l with proper flags to compile with nvcc, but i don't know what to use with -l to properly link libsvm and use it.
i downloaded libsvm using
!git clone https://github.com/cjlin1/libsvm
%cd libsvm/
!make && make install
%cd /content/libsvm/python/
!make
import sys
sys.path.append('/content/libsvm/python')
%cd /content
now when i run this program
%%cuda --name Blind_Deblurring_Cuda.cu
#include <iostream>
#include <fstream>
#include <iostream>
#include <fstream>
#include "/content/brisque.h"
#include "/content/libsvm/svm.h"
#include <vector>
#include <stdio.h>
#include "fstream"
#include "iostream"
#include <algorithm>
#include <iterator>
#include <cmath>
#include<stdlib.h>
#include <math.h>
#include <curand.h>
#include <opencv2/core/cuda.hpp>
#include <opencv2/core.hpp>
#include "opencv2/imgproc.hpp"
#include "opencv2/imgcodecs.hpp"
#include <opencv2/core/core.hpp>
#include <iostream>
#include "opencv2/highgui.hpp"
#include <opencv2/core/utility.hpp>
//rescaling based on training data i libsvm
float rescale_vector[36][2];
using namespace std;
using namespace cv;
float computescore(string imagename);
void ComputeBrisqueFeature(Mat& orig, vector<double>& featurevector);
int read_range_file() {
//check if file exists
char buff[100];
int i;
string range_fname = "allrange";
FILE* range_file = fopen(range_fname.c_str(), "r");
if(range_file == NULL) return 1;
//assume standard file format for this program
fgets(buff, 100, range_file);
fgets(buff, 100, range_file);
//now we can fill the array
for(i = 0; i < 36; ++i) {
float a, b, c;
fscanf(range_file, "%f %f %f", &a, &b, &c);
rescale_vector[i][0] = b;
rescale_vector[i][1] = c;
}
return 0;
}
int main(int argc, char** argv)
{
if(argc < 2) {
cout << "Input Image argument not given." << endl;
return -1;
}
//read in the allrange file to setup internal scaling array
if(read_range_file()) {
cerr<<"unable to open allrange file"<<endl;
return -1;
}
float qualityscore;
qualityscore = computescore(argv[1]);
cout << "Quality Score: " << qualityscore << endl;
}
float computescore(string imagename) {
// pre-loaded vectors from allrange file
float min_[36] = {0.336999 ,0.019667 ,0.230000 ,-0.125959 ,0.000167 ,0.000616 ,0.231000 ,-0.125873 ,0.000165 ,0.000600 ,0.241000 ,-0.128814 ,0.000179 ,0.000386 ,0.243000 ,-0.133080 ,0.000182 ,0.000421 ,0.436998 ,0.016929 ,0.247000 ,-0.200231 ,0.000104 ,0.000834 ,0.257000 ,-0.200017 ,0.000112 ,0.000876 ,0.257000 ,-0.155072 ,0.000112 ,0.000356 ,0.258000 ,-0.154374 ,0.000117 ,0.000351};
float max_[36] = {9.999411, 0.807472, 1.644021, 0.202917, 0.712384, 0.468672, 1.644021, 0.169548, 0.713132, 0.467896, 1.553016, 0.101368, 0.687324, 0.533087, 1.554016, 0.101000, 0.689177, 0.533133, 3.639918, 0.800955, 1.096995, 0.175286, 0.755547, 0.399270, 1.095995, 0.155928, 0.751488, 0.402398, 1.041992, 0.093209, 0.623516, 0.532925, 1.042992, 0.093714, 0.621958, 0.534484};
double qualityscore;
int i;
struct svm_model* model; // create svm model object
Mat orig = imread(imagename, 1); // read image (color mode)
vector<double> brisqueFeatures; // feature vector initialization
ComputeBrisqueFeature(orig, brisqueFeatures); // compute brisque features
// use the pre-trained allmodel file
string modelfile = "allmodel";
//if((model=svm_load_model(modelfile.c_str()))==0) {
//fprintf(stderr,"can't open model file allmodel\n");
// exit(1);
//}
// float min_[37];
// float max_[37];
struct svm_node x[37];
// rescale the brisqueFeatures vector from -1 to 1
// also convert vector to svm node array object
for(i = 0; i < 36; ++i) {
float min = min_[i];
float max = max_[i];
x[i].value = -1 + (2.0/(max - min) * (brisqueFeatures[i] - min));
x[i].index = i + 1;
}
x[36].index = -1;
int nr_class=svm_get_nr_class(model);
double *prob_estimates = (double *) malloc(nr_class*sizeof(double));
// predict quality score using libsvm class
qualityscore = svm_predict_probability(model,x,prob_estimates);
free(prob_estimates);
svm_free_and_destroy_model(&model);
return qualityscore;
}
void ComputeBrisqueFeature(Mat& orig, vector<double>& featurevector)
{
Mat orig_bw_int(orig.size(), CV_64F, 1);
// convert to grayscale
cvtColor(orig, orig_bw_int, COLOR_BGR2GRAY);
// create a copy of original image
Mat orig_bw(orig_bw_int.size(), CV_64FC1, 1);
orig_bw_int.convertTo(orig_bw, 1.0/255);
orig_bw_int.release();
// orig_bw now contains the grayscale image normalized to the range 0,1
int scalenum = 2; // number of times to scale the image
for (int itr_scale = 1; itr_scale<=scalenum; itr_scale++)
{
// resize image
Size dst_size(orig_bw.cols/cv::pow((double)2, itr_scale-1), orig_bw.rows/pow((double)2, itr_scale-1));
Mat imdist_scaled;
resize(orig_bw, imdist_scaled, dst_size, 0, 0, INTER_CUBIC); // INTER_CUBIC
imdist_scaled.convertTo(imdist_scaled, CV_64FC1, 1.0/255.0);
// calculating MSCN coefficients
// compute mu (local mean)
Mat mu(imdist_scaled.size(), CV_64FC1, 1);
GaussianBlur(imdist_scaled, mu, Size(7, 7), 1.166);
Mat mu_sq;
cv::pow(mu, double(2.0), mu_sq);
//compute sigma (local sigma)
Mat sigma(imdist_scaled.size(), CV_64FC1, 1);
cv::multiply(imdist_scaled, imdist_scaled, sigma);
GaussianBlur(sigma, sigma, Size(7, 7), 1.166);
cv::subtract(sigma, mu_sq, sigma);
cv::pow(sigma, double(0.5), sigma);
add(sigma, Scalar(1.0/255), sigma); // to avoid DivideByZero Error
Mat structdis(imdist_scaled.size(), CV_64FC1, 1);
subtract(imdist_scaled, mu, structdis);
divide(structdis, sigma, structdis); // structdis is MSCN image
// Compute AGGD fit to MSCN image
double lsigma_best, rsigma_best, gamma_best;
structdis = AGGDfit(structdis, lsigma_best, rsigma_best, gamma_best);
featurevector.push_back(gamma_best);
featurevector.push_back((lsigma_best*lsigma_best + rsigma_best*rsigma_best)/2);
// Compute paired product images
// indices for orientations (H, V, D1, D2)
int shifts[4][2]={{0,1},{1,0},{1,1},{-1,1}};
for(int itr_shift=1; itr_shift<=4; itr_shift++)
{
// select the shifting index from the 2D array
int* reqshift = shifts[itr_shift-1];
// declare shifted_structdis as pairwise image
Mat shifted_structdis(imdist_scaled.size(), CV_64F, 1);
// create copies of the images using BwImage constructor
// utility constructor for better subscript access (for pixels)
BwImage OrigArr(structdis);
BwImage ShiftArr(shifted_structdis);
// create pair-wise product for the given orientation (reqshift)
for(int i=0; i<structdis.rows; i++)
{
for(int j=0; j<structdis.cols; j++)
{
if(i+reqshift[0]>=0 && i+reqshift[0]<structdis.rows && j+reqshift[1]>=0 && j+reqshift[1]<structdis.cols)
{
ShiftArr[i][j]=OrigArr[i + reqshift[0]][j + reqshift[1]];
}
else
{
ShiftArr[i][j]=0;
}
}
}
// Mat structdis_pairwise;
shifted_structdis = ShiftArr.equate(shifted_structdis);
// calculate the products of the pairs
multiply(structdis, shifted_structdis, shifted_structdis);
// fit the pairwise product to AGGD
shifted_structdis = AGGDfit(shifted_structdis, lsigma_best, rsigma_best, gamma_best);
double constant = sqrt(tgamma(1/gamma_best))/sqrt(tgamma(3/gamma_best));
double meanparam = (rsigma_best-lsigma_best)*(tgamma(2/gamma_best)/tgamma(1/gamma_best))*constant;
// push the calculated parameters from AGGD fit to pair-wise products
featurevector.push_back(gamma_best);
featurevector.push_back(meanparam);
featurevector.push_back(cv::pow(lsigma_best,2));
featurevector.push_back(cv::pow(rsigma_best,2));
}
}
}
// function to compute best fit parameters from AGGDfit
Mat AGGDfit(Mat structdis, double& lsigma_best, double& rsigma_best, double& gamma_best)
{
// create a copy of an image using BwImage constructor (brisque.h - more info)
BwImage ImArr(structdis);
long int poscount=0, negcount=0;
double possqsum=0, negsqsum=0, abssum=0;
for(int i=0;i<structdis.rows;i++)
{
for (int j =0; j<structdis.cols; j++)
{
double pt = ImArr[i][j]; // BwImage provides [][] access
if(pt>0)
{
poscount++;
possqsum += pt*pt;
abssum += pt;
}
else if(pt<0)
{
negcount++;
negsqsum += pt*pt;
abssum -= pt;
}
}
}
lsigma_best = cv::pow(negsqsum/negcount, 0.5);
rsigma_best = cv::pow(possqsum/poscount, 0.5);
double gammahat = lsigma_best/rsigma_best;
long int totalcount = (structdis.cols)*(structdis.rows);
double rhat = cv::pow(abssum/totalcount, static_cast<double>(2))/((negsqsum + possqsum)/totalcount);
double rhatnorm = rhat*(cv::pow(gammahat,3) +1)*(gammahat+1)/pow(pow(gammahat,2)+1,2);
double prevgamma = 0;
double prevdiff = 1e10;
float sampling = 0.001;
for (float gam=0.2; gam<10; gam+=sampling) //possible to coarsen sampling to quicken the code, with some loss of accuracy
{
double r_gam = tgamma(2/gam)*tgamma(2/gam)/(tgamma(1/gam)*tgamma(3/gam));
double diff = abs(r_gam-rhatnorm);
if(diff> prevdiff) break;
prevdiff = diff;
prevgamma = gam;
}
gamma_best = prevgamma;
return structdis.clone();
}
And then try to compile using
!nvcc -o /content/src/Blind_Deblurring_Cuda /content/src/Blind_Deblurring_Cuda.cu -lopencv_core -lopencv_imgcodecs -lopencv_imgproc -lopencv_highgui -lopencv_ml
It gives the following error
/tmp/tmpxft_00003d8d_00000000-10_Blind_Deblurring_Cuda.o: In function `computescore(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)':
tmpxft_00003d8d_00000000-5_Blind_Deblurring_Cuda.cudafe1.cpp:(.text+0x9bc): undefined reference to `svm_get_nr_class'
tmpxft_00003d8d_00000000-5_Blind_Deblurring_Cuda.cudafe1.cpp:(.text+0x9fd): undefined reference to `svm_predict_probability'
tmpxft_00003d8d_00000000-5_Blind_Deblurring_Cuda.cudafe1.cpp:(.text+0xa27): undefined reference to `svm_free_and_destroy_model'
collect2: error: ld returned 1 exit status

How to calculate CRC32 over blocks that are splitted and buffered of a large data?

Let's say I have a 1024kb data, which is 1kB buffered and transfered 1024 times from a transmitter to a receiver.
The last buffer contains a calculated CRC32 value as the last 4 bytes.
However, the receiver has to calculate the CRC32 buffer by buffer, because of the RAM constraints.
I wonder how to apply a linear distributed addition of CRC32 calculations to match the total CRC32 value.
I looked at CRC calculation and its distributive preference. The calculation and its linearity is not much clear to implement.
So, is there a mathematical expression for addition of calculated CRC32s over buffers to match with the CRC32 result which is calculated over total?
Such as:
int CRC32Total = 0;
int CRC32[1024];
for(int i = 0; i < 1024; i++){
CRC32Total = CRC32Total + CRC32[i];
}
Kind Regards
You did not provide any clues as to what implementation or even what language for which you "looked at CRC calculation". However every implementation I've seen is designed to compute CRCs piecemeal, exactly like you want.
For the crc32() routine provided in zlib, it is used thusly (in C):
crc = crc32(0, NULL, 0); // initialize CRC value
crc = crc32(crc, firstchunk, 1024); // update CRC value with first chunk
crc = crc32(crc, secondchunk, 1024); // update CRC with second chunk
...
crc = crc32(crc, lastchunk, 1024); // complete CRC with the last chunk
Then crc is the CRC of the concatenation of all of the chunks. You do not need a function to combine the CRCs of individual chunks.
If for some other reason you do want a function to combine CRCs, e.g. if you need to split the CRC calculation over multiple CPUs, then zlib provides the crc32_combine() function for that purpose.
When you start the transfer, reset the CrcChecksum to its initial value with the OnFirstBlock method. For every block received, call the OnBlockReceived to update the checksum. Note that the blocks must be processed in the correct order. When the final block has been processed, the final CRC is in the CrcChecksum variable.
// In crc32.c
uint32_t UpdateCrc(uint32_t crc, const void *data, size_t length)
const uint8_t *current = data;
while (length--)
crc = (crc >> 8) ^ Crc32Lookup[(crc & 0xFF) ^ *current++];
}
// In your block processing application
static uint32_t CrcChecksum;
void OnFirstBlock(void) {
CrcChecksum = 0;
}
void OnBlockReceived(const void *data, size_t length) {
CrcChecksum = UpdateCrc(CrcChecksum, data, length);
}
To complement my comment to your question, I have added code here that goes thru the whole process: data generation as a linear array, CRC32 added to the transmitted data, injection of errors, and reception in 'chunks' with computed CRC32 and detection of errors. You're probably only interested in the 'reception' part, but I think having a complete example makes it more clear for your comprehension.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>
// ---------------------- buildCRC32table ------------------------------
static const uint32_t CRC32_POLY = 0xEDB88320;
static const uint32_t CRC32_XOR_MASK = 0xFFFFFFFF;
static uint32_t CRC32TABLE[256];
void buildCRC32table (void)
{
uint32_t crc32;
for (uint16_t byte = 0; byte < 256; byte++)
{
crc32 = byte;
// iterate thru all 8 bits
for (int i = 0; i < 8; i++)
{
uint8_t feedback = crc32 & 1;
crc32 = (crc32 >> 1);
if (feedback)
{
crc32 ^= CRC32_POLY;
}
}
CRC32TABLE[byte] = crc32;
}
}
// -------------------------- myCRC32 ----------------------------------
uint32_t myCRC32 (uint32_t previousCRC32, uint8_t *pData, int dataLen)
{
uint32_t newCRC32 = previousCRC32 ^ CRC32_XOR_MASK; // remove last XOR mask (or add first)
// add new data to CRC32
while (dataLen--)
{
uint32_t crc32Top24bits = newCRC32 >> 8;
uint8_t crc32Low8bits = newCRC32 & 0x000000FF;
uint8_t data = *pData++;
newCRC32 = crc32Top24bits ^ CRC32TABLE[crc32Low8bits ^ data];
}
newCRC32 ^= CRC32_XOR_MASK; // put XOR mask back
return newCRC32;
}
// ------------------------------ main ---------------------------------
int main()
{
// build CRC32 table
buildCRC32table();
uint32_t crc32;
// use a union so we can access the same data linearly (TX) or by chunks (RX)
union
{
uint8_t array[1024*1024];
uint8_t chunk[1024][1024];
} data;
// use time to seed randomizer so we have different data every run
srand((unsigned int)time(NULL));
/////////////////////////////////////////////////////////////////////////// Build data to be transmitted
////////////////////////////////////////////////////////////////////////////////////////////////////////
// populate array with random data sparing space for the CRC32 at the end
for (int i = 0; i < (sizeof(data.array) - sizeof(uint32_t)); i++)
{
data.array[i] = (uint8_t) (rand() & 0xFF);
}
// now compute array's CRC32
crc32 = myCRC32(0, data.array, sizeof(data.array) - sizeof(uint32_t));
printf ("array CRC32 = 0x%08X\n", crc32);
// to store the CRC32 into the array, we want to remove the XOR mask so we can compute the CRC32
// of all received data (including the CRC32 itself) and expect the same result all the time,
// regardless of the data, when no errors are present
crc32 ^= CRC32_XOR_MASK;
// load CRC32 at the very end of the array
data.array[sizeof(data.array) - 1] = (uint8_t)((crc32 >> 24) & 0xFF);
data.array[sizeof(data.array) - 2] = (uint8_t)((crc32 >> 16) & 0xFF);
data.array[sizeof(data.array) - 3] = (uint8_t)((crc32 >> 8) & 0xFF);
data.array[sizeof(data.array) - 4] = (uint8_t)((crc32 >> 0) & 0xFF);
/////////////////////////////////////////////// At this point, data is transmitted and errors may happen
////////////////////////////////////////////////////////////////////////////////////////////////////////
// to make things interesting, let's add one bit error with 1/8 probability
if ((rand() % 8) == 0)
{
uint32_t index = rand() % sizeof(data.array);
uint8_t errorBit = 1 << (rand() & 0x7);
// add error
data.array[index] ^= errorBit;
printf("Error injected on byte %u, bit mask = 0x%02X\n", index, errorBit);
}
else
{
printf("No error injected\n");
}
/////////////////////////////////////////////////////// Once received, the data is processed in 'chunks'
////////////////////////////////////////////////////////////////////////////////////////////////////////
// now we access the data and compute its CRC32 one chunk at a time
crc32 = 0; // initialize CRC32
for (int i = 0; i < 1024; i++)
{
crc32 = myCRC32(crc32, data.chunk[i], sizeof data.chunk[i]);
}
printf ("Final CRC32 = 0x%08X\n", crc32);
// because the CRC32 algorithm applies an XOR mask at the end, when we have no errors, the computed
// CRC32 will be the mask itself
if (crc32 == CRC32_XOR_MASK)
{
printf ("No errors detected!\n");
}
else
{
printf ("Errors detected!\n");
}
}

What does PKCS5_PBKDF2_HMAC_SHA1 return value mean?

I'm attempting to use OpenSSL's PKCS5_PBKDF2_HMAC_SHA1 method. I gather that it returns 0 if it succeeds, and some other value otherwise. My question is, what does a non-zero return value mean? Memory error? Usage error? How should my program handle it (retry, quit?)?
Edit: A corollary question is, is there any way to figure this out besides reverse-engineering the method itself?
is there any way to figure this out besides reverse-engineering the method itself?
PKCS5_PBKDF2_HMAC_SHA1 looks like one of those undocumented functions because I can't find it in the OpenSSL docs. OpenSSL has a lot of them, so you should be prepared to study the sources if you are going to use the library.
I gather that it returns 0 if it succeeds, and some other value otherwise.
Actually, its reversed. Here's how I know...
$ grep -R PKCS5_PBKDF2_HMAC_SHA1 *
crypto/evp/evp.h:int PKCS5_PBKDF2_HMAC_SHA1(const char *pass, int passlen,
crypto/evp/p5_crpt2.c:int PKCS5_PBKDF2_HMAC_SHA1(const char *pass, int passlen,
...
So, you find the function's implementation in crypto/evp/p5_crpt2.c:
int PKCS5_PBKDF2_HMAC_SHA1(const char *pass, int passlen,
const unsigned char *salt, int saltlen, int iter,
int keylen, unsigned char *out)
{
return PKCS5_PBKDF2_HMAC(pass, passlen, salt, saltlen, iter,
EVP_sha1(), keylen, out);
}
Following PKCS5_PBKDF2_HMAC:
$ grep -R PKCS5_PBKDF2_HMAC *
...
crypto/evp/evp.h:int PKCS5_PBKDF2_HMAC(const char *pass, int passlen,
crypto/evp/p5_crpt2.c:int PKCS5_PBKDF2_HMAC(const char *pass, int passlen,
...
And again, from crypto/evp/p5_crpt2.c:
int PKCS5_PBKDF2_HMAC(const char *pass, int passlen,
const unsigned char *salt, int saltlen, int iter,
const EVP_MD *digest,
int keylen, unsigned char *out)
{
unsigned char digtmp[EVP_MAX_MD_SIZE], *p, itmp[4];
int cplen, j, k, tkeylen, mdlen;
unsigned long i = 1;
HMAC_CTX hctx_tpl, hctx;
mdlen = EVP_MD_size(digest);
if (mdlen < 0)
return 0;
HMAC_CTX_init(&hctx_tpl);
p = out;
tkeylen = keylen;
if(!pass)
passlen = 0;
else if(passlen == -1)
passlen = strlen(pass);
if (!HMAC_Init_ex(&hctx_tpl, pass, passlen, digest, NULL))
{
HMAC_CTX_cleanup(&hctx_tpl);
return 0;
}
while(tkeylen)
{
if(tkeylen > mdlen)
cplen = mdlen;
else
cplen = tkeylen;
/* We are unlikely to ever use more than 256 blocks (5120 bits!)
* but just in case...
*/
itmp[0] = (unsigned char)((i >> 24) & 0xff);
itmp[1] = (unsigned char)((i >> 16) & 0xff);
itmp[2] = (unsigned char)((i >> 8) & 0xff);
itmp[3] = (unsigned char)(i & 0xff);
if (!HMAC_CTX_copy(&hctx, &hctx_tpl))
{
HMAC_CTX_cleanup(&hctx_tpl);
return 0;
}
if (!HMAC_Update(&hctx, salt, saltlen)
|| !HMAC_Update(&hctx, itmp, 4)
|| !HMAC_Final(&hctx, digtmp, NULL))
{
HMAC_CTX_cleanup(&hctx_tpl);
HMAC_CTX_cleanup(&hctx);
return 0;
}
HMAC_CTX_cleanup(&hctx);
memcpy(p, digtmp, cplen);
for(j = 1; j < iter; j++)
{
if (!HMAC_CTX_copy(&hctx, &hctx_tpl))
{
HMAC_CTX_cleanup(&hctx_tpl);
return 0;
}
if (!HMAC_Update(&hctx, digtmp, mdlen)
|| !HMAC_Final(&hctx, digtmp, NULL))
{
HMAC_CTX_cleanup(&hctx_tpl);
HMAC_CTX_cleanup(&hctx);
return 0;
}
HMAC_CTX_cleanup(&hctx);
for(k = 0; k < cplen; k++)
p[k] ^= digtmp[k];
}
tkeylen-= cplen;
i++;
p+= cplen;
}
HMAC_CTX_cleanup(&hctx_tpl);
return 1;
}
So it looks like 0 on failure, and 1 on success. You should not see other values. And if you get a 0, then all the OUT parameters are junk.
Memory error? Usage error?
Well, sometimes you can call ERR_get_error. If you call it and it makes sense, then the error code is good. If the error code makes no sense, then its probably not good.
Sadly, that's the way I handle it because the library is not consistent with setting error codes. For example, here's the library code to load the RDRAND engine.
Notice the code clears the error code on failure if its a 3rd generation Ivy Bridge (that's the capability being tested), and does not clear or set an error otherwise!!!
void ENGINE_load_rdrand (void)
{
extern unsigned int OPENSSL_ia32cap_P[];
if (OPENSSL_ia32cap_P[1] & (1<<(62-32)))
{
ENGINE *toadd = ENGINE_rdrand();
if(!toadd) return;
ENGINE_add(toadd);
ENGINE_free(toadd);
ERR_clear_error();
}
}
How should my program handle it (retry, quit?)?
It looks like a hard failure.
Finally, that's exactly how I navigate the sources in this situation. If you don't like grep you can try ctags or another source code browser.

Determine Position of Most Signifiacntly Set Bit in a Byte

I have a byte I am using to store bit flags. I need to compute the position of the most significant set bit in the byte.
Example Byte: 00101101 => 6 is the position of the most significant set bit
Compact Hex Mapping:
[0x00] => 0x00
[0x01] => 0x01
[0x02,0x03] => 0x02
[0x04,0x07] => 0x03
[0x08,0x0F] => 0x04
[0x10,0x1F] => 0x05
[0x20,0x3F] => 0x06
[0x40,0x7F] => 0x07
[0x80,0xFF] => 0x08
TestCase in C:
#include <stdio.h>
unsigned char check(unsigned char b) {
unsigned char c = 0x08;
unsigned char m = 0x80;
do {
if(m&b) { return c; }
else { c -= 0x01; }
} while(m>>=1);
return 0; //never reached
}
int main() {
unsigned char input[256] = {
0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,
0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x18,0x19,0x1a,0x1b,0x1c,0x1d,0x1e,0x1f,
0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27,0x28,0x29,0x2a,0x2b,0x2c,0x2d,0x2e,0x2f,
0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x3a,0x3b,0x3c,0x3d,0x3e,0x3f,
0x40,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x4a,0x4b,0x4c,0x4d,0x4e,0x4f,
0x50,0x51,0x52,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5a,0x5b,0x5c,0x5d,0x5e,0x5f,
0x60,0x61,0x62,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0x6a,0x6b,0x6c,0x6d,0x6e,0x6f,
0x70,0x71,0x72,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7a,0x7b,0x7c,0x7d,0x7e,0x7f,
0x80,0x81,0x82,0x83,0x84,0x85,0x86,0x87,0x88,0x89,0x8a,0x8b,0x8c,0x8d,0x8e,0x8f,
0x90,0x91,0x92,0x93,0x94,0x95,0x96,0x97,0x98,0x99,0x9a,0x9b,0x9c,0x9d,0x9e,0x9f,
0xa0,0xa1,0xa2,0xa3,0xa4,0xa5,0xa6,0xa7,0xa8,0xa9,0xaa,0xab,0xac,0xad,0xae,0xaf,
0xb0,0xb1,0xb2,0xb3,0xb4,0xb5,0xb6,0xb7,0xb8,0xb9,0xba,0xbb,0xbc,0xbd,0xbe,0xbf,
0xc0,0xc1,0xc2,0xc3,0xc4,0xc5,0xc6,0xc7,0xc8,0xc9,0xca,0xcb,0xcc,0xcd,0xce,0xcf,
0xd0,0xd1,0xd2,0xd3,0xd4,0xd5,0xd6,0xd7,0xd8,0xd9,0xda,0xdb,0xdc,0xdd,0xde,0xdf,
0xe0,0xe1,0xe2,0xe3,0xe4,0xe5,0xe6,0xe7,0xe8,0xe9,0xea,0xeb,0xec,0xed,0xee,0xef,
0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7,0xf8,0xf9,0xfa,0xfb,0xfc,0xfd,0xfe,0xff };
unsigned char truth[256] = {
0x00,0x01,0x02,0x02,0x03,0x03,0x03,0x03,0x04,0x04,0x04,0x04,0x04,0x04,0x04,0x04,
0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,
0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,
0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,
0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,
0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,
0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,
0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08};
int i,r;
int f = 0;
for(i=0; i<256; ++i) {
r=check(input[i]);
if(r !=(truth[i])) {
printf("failed %d : 0x%x : %d\n",i,0x000000FF & ((int)input[i]),r);
f += 1;
}
}
if(!f) { printf("passed all\n"); }
else { printf("failed %d\n",f); }
return 0;
}
I would like to simplify my check() function to not involve looping (or branching preferably). Is there a bit twiddling hack or hashed lookup table solution to compute the position of the most significant set bit in a byte?
Your question is about an efficient way to compute log2 of a value. And because you seem to want a solution that is not limited to the C language I have been slightly lazy and tweaked some C# code I have.
You want to compute log2(x) + 1 and for x = 0 (where log2 is undefined) you define the result as 0 (e.g. you create a special case where log2(0) = -1).
static readonly Byte[] multiplyDeBruijnBitPosition = new Byte[] {
7, 2, 3, 4,
6, 1, 5, 0
};
public static Byte Log2Plus1(Byte value) {
if (value == 0)
return 0;
var roundedValue = value;
roundedValue |= (Byte) (roundedValue >> 1);
roundedValue |= (Byte) (roundedValue >> 2);
roundedValue |= (Byte) (roundedValue >> 4);
var log2 = multiplyDeBruijnBitPosition[((Byte) (roundedValue*0xE3)) >> 5];
return (Byte) (log2 + 1);
}
This bit twiddling hack is taken from Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup where you can see the equivalent C source code for 32 bit values. This code has been adapted to work on 8 bit values.
However, you may be able to use an operation that gives you the result using a very efficient built-in function (on many CPU's a single instruction like the Bit Scan Reverse is used). An answer to the question Bit twiddling: which bit is set? has some information about this. A quote from the answer provides one possible reason why there is low level support for solving this problem:
Things like this are the core of many O(1) algorithms such as kernel schedulers which need to find the first non-empty queue signified by an array of bits.
That was a fun little challenge. I don't know if this one is completely portable since I only have VC++ to test with, and I certainly can't say for sure if it's more efficient than other approaches. This version was coded with a loop but it can be unrolled without too much effort.
static unsigned char check(unsigned char b)
{
unsigned char r = 8;
unsigned char sub = 1;
unsigned char s = 7;
for (char i = 0; i < 8; i++)
{
sub = sub & ((( b & (1 << s)) >> s--) - 1);
r -= sub;
}
return r;
}
I'm sure everyone else has long since moved on to other topics but there was something in the back of my mind suggesting that there had to be a more efficient branch-less solution to this than just unrolling the loop in my other posted solution. A quick trip to my copy of Warren put me on the right track: Binary search.
Here's my solution based on that idea:
Pseudo-code:
// see if there's a bit set in the upper half
if ((b >> 4) != 0)
{
offset = 4;
b >>= 4;
}
else
offset = 0;
// see if there's a bit set in the upper half of what's left
if ((b & 0x0C) != 0)
{
offset += 2;
b >>= 2;
}
// see if there's a bit set in the upper half of what's left
if > ((b & 0x02) != 0)
{
offset++;
b >>= 1;
}
return b + offset;
Branch-less C++ implementation:
static unsigned char check(unsigned char b)
{
unsigned char adj = 4 & ((((unsigned char) - (b >> 4) >> 7) ^ 1) - 1);
unsigned char offset = adj;
b >>= adj;
adj = 2 & (((((unsigned char) - (b & 0x0C)) >> 7) ^ 1) - 1);
offset += adj;
b >>= adj;
adj = 1 & (((((unsigned char) - (b & 0x02)) >> 7) ^ 1) - 1);
return (b >> adj) + offset + adj;
}
Yes, I know that this is all academic :)
It is not possible in plain C. The best I would suggest is the following implementation of check. Despite quite "ugly" I think it runs faster than the ckeck version in the question.
int check(unsigned char b)
{
if(b&128) return 8;
if(b&64) return 7;
if(b&32) return 6;
if(b&16) return 5;
if(b&8) return 4;
if(b&4) return 3;
if(b&2) return 2;
if(b&1) return 1;
return 0;
}
Edit: I found a link to the actual code: http://www.hackersdelight.org/hdcodetxt/nlz.c.txt
The algorithm below is named nlz8 in that file. You can choose your favorite hack.
/*
From last comment of: http://stackoverflow.com/a/671826/315052
> Hacker's Delight explains how to correct for the error in 32-bit floats
> in 5-3 Counting Leading 0's. Here's their code, which uses an anonymous
> union to overlap asFloat and asInt: k = k & ~(k >> 1); asFloat =
> (float)k + 0.5f; n = 158 - (asInt >> 23); (and yes, this relies on
> implementation-defined behavior) - Derrick Coetzee Jan 3 '12 at 8:35
*/
unsigned char check (unsigned char b) {
union {
float asFloat;
int asInt;
} u;
unsigned k = b & ~(b >> 1);
u.asFloat = (float)k + 0.5f;
return 32 - (158 - (u.asInt >> 23));
}
Edit -- not exactly sure what the asker means by language independent, but below is the equivalent code in python.
import ctypes
class Anon(ctypes.Union):
_fields_ = [
("asFloat", ctypes.c_float),
("asInt", ctypes.c_int)
]
def check(b):
k = int(b) & ~(int(b) >> 1)
a = Anon(asFloat=(float(k) + float(0.5)))
return 32 - (158 - (a.asInt >> 23))