Java - PBKDF2 with HMACSHA256 as the PRF - authentication

I've been given the task of creating a Login API for our project and I'm supposed to use PBKDF2 with HMACSHA256 as the PRF. The plain text password is hashed using MD5 and then fed into the PBKDF2 to generate a derived key. The problem is, I'm not able to get the same output as what the project documentation is telling me.
Here's the PBKDF2 Implementation in Java:
public class PBKDF2
{
public static byte[] deriveKey( byte[] password, byte[] salt, int iterationCount, int dkLen )
throws java.security.NoSuchAlgorithmException, java.security.InvalidKeyException
{
SecretKeySpec keyspec = new SecretKeySpec( password, "HmacSHA256" );
Mac prf = Mac.getInstance( "HmacSHA256" );
prf.init( keyspec );
// Note: hLen, dkLen, l, r, T, F, etc. are horrible names for
// variables and functions in this day and age, but they
// reflect the terse symbols used in RFC 2898 to describe
// the PBKDF2 algorithm, which improves validation of the
// code vs. the RFC.
//
// dklen is expressed in bytes. (16 for a 128-bit key)
int hLen = prf.getMacLength(); // 20 for SHA1
int l = Math.max( dkLen, hLen); // 1 for 128bit (16-byte) keys
int r = dkLen - (l-1)*hLen; // 16 for 128bit (16-byte) keys
byte T[] = new byte[l * hLen];
int ti_offset = 0;
for (int i = 1; i <= l; i++) {
F( T, ti_offset, prf, salt, iterationCount, i );
ti_offset += hLen;
}
if (r < hLen) {
// Incomplete last block
byte DK[] = new byte[dkLen];
System.arraycopy(T, 0, DK, 0, dkLen);
return DK;
}
return T;
}
private static void F( byte[] dest, int offset, Mac prf, byte[] S, int c, int blockIndex ) {
final int hLen = prf.getMacLength();
byte U_r[] = new byte[ hLen ];
// U0 = S || INT (i);
byte U_i[] = new byte[S.length + 4];
System.arraycopy( S, 0, U_i, 0, S.length );
INT( U_i, S.length, blockIndex );
for( int i = 0; i < c; i++ ) {
U_i = prf.doFinal( U_i );
xor( U_r, U_i );
}
System.arraycopy( U_r, 0, dest, offset, hLen );
}
private static void xor( byte[] dest, byte[] src ) {
for( int i = 0; i < dest.length; i++ ) {
dest[i] ^= src[i];
}
}
private static void INT( byte[] dest, int offset, int i ) {
dest[offset + 0] = (byte) (i / (256 * 256 * 256));
dest[offset + 1] = (byte) (i / (256 * 256));
dest[offset + 2] = (byte) (i / (256));
dest[offset + 3] = (byte) (i);
}
// ctor
private PBKDF2 () {}
}
I used test vectors found here PBKDF2-HMAC-SHA2 test vectors to verify the correctness of the implementation and it all checked out. I'm not sure why I couldn't the same results with an MD5 hashed password.
Parameters:
Salt: 000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F
Iterations Count: 1000
DKLen: 16 (128-bit derived key)
Using "foobar" as the plaintext password, the expected results are:
PWHash = MD5(PlaintextPassword) = 3858f62230ac3c915f300c664312c63f
PWKey = PBKDF2(PWHash, Salt, IterationsCount, DKLen) = 33C37758EFA6780C5E52FAB3B50F329C
What I get:
PWHash = 3858f62230ac3c915f300c664312c63f
PWKey = 0bd0c7d8339df2c66ce4b6e1e91ed3f1

The iterations count was supposed to 4096, not 1000.

The generation of int l seems wrong. You have specified the maximum between dkLen and hLen but the spec says l = CEIL (dkLen / hLen) with
CEIL (x) is the "ceiling" function, i.e. the smallest integer greater than, or equal to, x.
I think l would be more accurately defined as l = (int)Math.ceil( (double)dkLen / (double)hLen )

Related

Android specific frequency AudioTrack infinity dutaion

i made specific frequency sound, used AutioTrack
private static final int duration = 10; // seconds
private static final int sampleRate = 8000;
private static final int numSamples = duration * sampleRate;
private static final double sample[] = new double[numSamples];
private static double freqOfTone = 0; // hz
final byte generatedSnd[] = new byte[2 * numSamples];
final AudioTrack p7 = new AudioTrack(AudioManager.STREAM_MUSIC,
sampleRate, AudioFormat.CHANNEL_CONFIGURATION_MONO,
AudioFormat.ENCODING_PCM_16BIT, numSamples,
AudioTrack.MODE_STATIC);
for (int i = 0; i < numSamples; ++i) {
sample[i] = Math.sin(2 * Math.PI * i / (sampleRate/hz));
}
// convert to 16 bit pcm sound array
// assumes the sample buffer is normalised.
int idx = 0;
for (final double dVal : sample) {
// scale to maximum amplitude
final short val = (short) ((dVal * 32767));
// in 16 bit wav PCM, first byte is the low order byte
generatedSnd[idx++] = (byte) (val & 0x00ff);
generatedSnd[idx++] = (byte) ((val & 0xff00) >>> 8);
}
p7.write(generatedSnd, 0, generatedSnd.length);
p7.play();
like this. first line is set duration
but i want to make infinity duration. (not loop)
is it possible?
please help me

Calculation of hash value of a matrix

As part of my final year project, I am testing the Bouncycastle library on SHA-3.
I have found the source code to calculate the hash value of a string:
String input = "hello" ;
SHA3.DigestSHA3 digestSHA3 = new SHA3.Digest256();
byte[] digest = digestSHA3.digest(input.getBytes());
System.out.println("SHA3-256 = " + Hex.toHexString(digest));
but i want to calculate the hash value of a matrix, Anyone who can help me with this?
You need to uniquely convert matrix to byte array. One of the possible solutions:
private static byte[] intToBytes(int value) {
return new byte[] {
(byte)(value >>> 24),
(byte)(value >>> 16),
(byte)(value >>> 8),
(byte)value
};
}
public static void main(String[] args) throws Exception {
int[][] matrix = new int[3][5];
SHA3.DigestSHA3 sha3 = new SHA3.Digest256();
int height = matrix.length;
int width = matrix[0].length;
sha3.update(intToBytes(height)); // add height of the matrix
sha3.update(intToBytes(width)); // add width
for (int i = 0; i < height; i++) {
for (int j = 0; j < width; j++) {
sha3.update(intToBytes(matrix[i][j])); // add all values
}
}
byte[] digest = digest.digest();
}

Convert short array to byte array and vice-versa and int to byte array and vice-versa in Objective-c

I am confused to convert byte array to short vice-versa and also int to byte array vice-versa in Objective-C.
I have seen in Java like following
public static short byteArrayToShort(byte[] b) {
if (b.length > 1) {
return (ByteBuffer.wrap(b)).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get();
} else {
return b[0];
}
}
/**
* Short to byte array.
*
* #param value the value
* #return the byte[]
*/
public static byte[] shortToByteArray(short value) {
return ByteBuffer.allocate(2).order(ByteOrder.LITTLE_ENDIAN).putShort(value).array();
}
/**
* Int to byte array.
*
* #param value the value
* #return the byte[]
*/
public static byte[] intToByteArray(int value) {
return ByteBuffer.allocate(4).order(ByteOrder.LITTLE_ENDIAN).putInt(value).array();
}
/**
* Convert the byte array to an short starting from the given offset.
*
* #param b The byte array
* #return The integer
*/
public static int byteArrayToInt(byte[] b) {
if (b.length > 1) {
return (ByteBuffer.wrap(b)).order(ByteOrder.LITTLE_ENDIAN).asIntBuffer().get();
} else {
return b[0];
}
}
In Objective-C I have tried like following:
//Byte to Short array
- (uint16_t*) byte2short:(uint8_t *)bytes size:(int)size{
uint16_t*shorts = (uint16_t*)malloc(size/2);
for (int i=0; i < size/2; i++){
shorts[i] = (bytes[i*2+1] << 8) | bytes[i*2];
}
return shorts;
}
//Short to Byte array
- (uint8_t *) short2byte:(uint16_t*)shorts size:(int)size{
uint8_t *bytes = (uint8_t *)malloc(size*2);
for (int i = 0; i < size; i++)
{
bytes[i * 2] = (uint16_t) (shorts[i] & 0x00FF);
bytes[(i * 2) + 1] = (uint16_t) (shorts[i] >> 8);
shorts[i] = 0;
}
return bytes;
}
I have tried like this and also I dont have idea in conversion of int to Byte array in Objective-c.
Please suggest me
The problem with your code is that you are assuming that malloc somehow "knows" about the size of whatever is being allocated, in the same way that Java's array new knows the difference between allocating 5 ints and 5 shorts. Well, malloc does not. Unless you tell it otherwise, it allocates the required number of bytes. That's why when you do this
uint16_t*shorts = (uint16_t*)malloc(size/2);
and then write size/2 uint16_t into it, you overrun the buffer.
A proper way of allocating an array of primitives in C (and in Objective-C, which is a superset of C) is as follows:
size_t count = (size+1)/2; // Do not assume that size is even
uint16_t *shorts = malloc(sizeof(uint16_t)*count);
Now you have enough memory to fit all your shorts.
In your other function you should use
uint8_t *bytes = malloc(sizeof(uint8_t)*size*2);
Note that the cast is unnecessary in both cases. The type of bytes variable matters, though, because that's what determines the actual address written to in bytes[i * 2] and bytes[(i * 2)+1] expressions:
for (int i = 0; i < size; i++)
{
bytes[i * 2] = (uint8_t) (shorts[i] & 0xFF);
bytes[(i * 2) + 1] = (uint8_t) (shorts[i] >> 8);
shorts[i] = 0;
}

How to increase an ipv6 address based on mask in java?

i am trying to increment ipv6 address based on mask.
i am getting problem when there is F in place of increment.
could any one plz check this
public String IncrementIPV6ForPrefixLength (String IPv6String, int times) throws UnknownHostException
{
int result , carry = 0, i;
int bits;
int mask=0;
int index=IPv6String.indexOf("/");
mask=Integer.parseInt(IPv6String.substring(index+1, IPv6String.length()));
IPv6String=IPv6String.substring(0, index);
InetAddress iaddr=InetAddress.getByName(IPv6String);
byte[] IPv6Arr=iaddr.getAddress();
if(mask > 128 || mask < 0)
return null;
i = mask/8;
bits = mask%8;
if(bits>0)
{
result = ((int)(IPv6Arr[i]>>(8-bits))) + times;
IPv6Arr[i] =(byte) ((result << (8-bits)) | (IPv6Arr[i] & (0xff >> (bits))));
carry = (result << (8-bits))/256;
times /= 256;
}
i--;
for(;i>=0;i--)
{
result = ((int)IPv6Arr[i]) + ((times + carry)& 0xFF);
IPv6Arr[i] = (byte)(result % 256);
carry = result / 256;
if(carry == 0)
{
iaddr=InetAddress.getByAddress(IPv6Arr);
String s=iaddr.toString();
if(s.indexOf('/') != -1){
s = s.substring(1, s.length()).toUpperCase();
}
StringBuffer buff =new StringBuffer("");
String[] ss = s.split(":");
for(int k=0;k<ss.length;k++){
int Differ = 4 - ss[k].length();
for(int j = 0; j<Differ;j++){
buff.append("0");
}
buff.append(ss[k]);
if(k!=7)buff=buff.append(":");
}
return buff.toString()+"/"+mask;
}
times /= 256;
}
return null;
}
input like this:
FD34:4FB7:FFFF:A13F:1325:2252:1525:325F/48
FD34:41B7:FFFF::/48
FD34:4FBF:F400:A13E:1325:2252:1525:3256/35
output like this
if increment by 1
FD34:4FB8:0000:A13F:1325:2252:1525:325F/48
FD34:41B8:0000::/48
FD34:4FC0:0400:A13E:1325:2252:1525:3256/35
if increment by 2
FD34:4FB8:0001:A13F:1325:2252:1525:325F/48
FD34:41B8:0001::/48
FD34:4FC0:1400:A13E:1325:2252:1525:3256/35
can u plz find where i am doing wrong.
Disregarding the posted code, try to model the operation as a direct numerical operation on the 128-bit number that the IPv6 address really is. Convert to BigInteger and use BigInteger.add.

How to quickly find a image in another image using CUDA?

In my current project I need to find pixel exact position of image contained in another image of larger size. Smaller image is never rotated or stretched (so should match pixel by pixel) but it may have different brightness and some pixels in the image may be distorted. My first attemp was to do it on CPU but it was too slow. The calculations are very parallel, so I decided to use the GPU. I just started to learn CUDA and wrote my first CUDA app. My code works but it still is too slow even on GPU. When the larger image has a dimension of 1024x1280 and smaller is 128x128 program performs calculations in 2000ms on GeForce GTX 560 ti. I need to get results in less than 200ms. In the future I'll probably need a more complex algorithm, so I'd rather have even more computational power reserve. The question is how I can optimise my code to achieve that speed up?
CUDAImageLib.dll:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <cutil.h>
//#define SUPPORT_ALPHA
__global__ void ImageSearch_kernel(float* BufferOut, float* BufferB, float* BufferS, unsigned int bw, unsigned int bh, unsigned int sw, unsigned int sh)
{
unsigned int bx = threadIdx.x + blockIdx.x * blockDim.x;
unsigned int by = threadIdx.y + blockIdx.y * blockDim.y;
float diff = 0;
for (unsigned int y = 0; y < sh; ++y)
{
for (unsigned int x = 0; x < sw; ++x)
{
unsigned int as = (x + y * sw) * 4;
unsigned int ab = (x + bx + (y + by) * bw) * 4;
#ifdef SUPPORT_ALPHA
diff += ((abs(BufferS[as] - BufferB[ab]) + abs(BufferS[as + 1] - BufferB[ab + 1]) + abs(BufferS[as + 2] - BufferB[ab + 2])) * BufferS[as + 3] * BufferB[ab + 3]);
#else
diff += abs(BufferS[as] - BufferB[ab]);
diff += abs(BufferS[as + 1] - BufferB[ab + 1]);
diff += abs(BufferS[as + 2] - BufferB[ab + 2]);
#endif
}
}
BufferOut[bx + (by * (bw - sw))] = diff;
}
extern "C" int __declspec(dllexport) __stdcall ImageSearchGPU(float* BufferOut, float* BufferB, float* BufferS, int bw, int bh, int sw, int sh)
{
int aBytes = (bw * bh) * 4 * sizeof(float);
int bBytes = (sw * sh) * 4 * sizeof(float);
int cBytes = ((bw - sw) * (bh - sh)) * sizeof(float);
dim3 threadsPerBlock(32, 32);
dim3 numBlocks((bw - sw) / threadsPerBlock.x, (bh - sh) / threadsPerBlock.y);
float *dev_B = 0;
float *dev_S = 0;
float *dev_Out = 0;
unsigned int timer = 0;
float sExecutionTime = 0;
cudaError_t cudaStatus;
// Choose which GPU to run on, change this on a multi-GPU system.
cudaStatus = cudaSetDevice(0);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaSetDevice failed! Do you have a CUDA-capable GPU installed?");
goto Error;
}
// Allocate GPU buffers for three vectors (two input, one output) .
cudaStatus = cudaMalloc((void**)&dev_Out, cBytes);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMalloc failed!");
goto Error;
}
cudaStatus = cudaMalloc((void**)&dev_B, aBytes);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMalloc failed!");
goto Error;
}
cudaStatus = cudaMalloc((void**)&dev_S, bBytes);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMalloc failed!");
goto Error;
}
// Copy input vectors from host memory to GPU buffers.
cudaStatus = cudaMemcpy(dev_B, BufferB, aBytes, cudaMemcpyHostToDevice);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMemcpy failed!");
goto Error;
}
cudaStatus = cudaMemcpy(dev_S, BufferS, bBytes, cudaMemcpyHostToDevice);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMemcpy failed!");
goto Error;
}
cutCreateTimer(&timer);
cutStartTimer(timer);
// Launch a kernel on the GPU with one thread for each element.
ImageSearch_kernel<<<numBlocks, threadsPerBlock>>>(dev_Out, dev_B, dev_S, bw, bh, sw, sh);
// cudaDeviceSynchronize waits for the kernel to finish, and returns
// any errors encountered during the launch.
cudaStatus = cudaDeviceSynchronize();
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaDeviceSynchronize returned error code %d after launching addKernel!\n", cudaStatus);
goto Error;
}
cutStopTimer(timer);
sExecutionTime = cutGetTimerValue(timer);
// Copy output vector from GPU buffer to host memory.
cudaStatus = cudaMemcpy(BufferOut, dev_Out, cBytes, cudaMemcpyDeviceToHost);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMemcpy failed!");
goto Error;
}
Error:
cudaFree(dev_Out);
cudaFree(dev_B);
cudaFree(dev_S);
return (int)sExecutionTime;
}
extern "C" int __declspec(dllexport) __stdcall FindMinCPU(float* values, int count)
{
int minIndex = 0;
float minValue = 3.4e+38F;
for (int i = 0; i < count; ++i)
{
if (values[i] < minValue)
{
minValue = values[i];
minIndex = i;
}
}
return minIndex;
}
C# test app:
using System;
using System.Collections.Generic;
using System.Text;
using System.Diagnostics;
using System.Drawing;
namespace TestCUDAImageSearch
{
class Program
{
static void Main(string[] args)
{
using(Bitmap big = new Bitmap("Big.png"), small = new Bitmap("Small.png"))
{
Console.WriteLine("Big " + big.Width + "x" + big.Height + " Small " + small.Width + "x" + small.Height);
Stopwatch sw = new Stopwatch();
sw.Start();
Point point = CUDAImageLIb.ImageSearch(big, small);
sw.Stop();
long t = sw.ElapsedMilliseconds;
Console.WriteLine("Image found at " + point.X + "x" + point.Y);
Console.WriteLine("total time=" + t + "ms kernel time=" + CUDAImageLIb.LastKernelTime + "ms");
}
Console.WriteLine("Hit key");
Console.ReadKey();
}
}
}
//#define SUPPORT_HSB
using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;
using System.Drawing;
using System.Drawing.Imaging;
namespace TestCUDAImageSearch
{
public static class CUDAImageLIb
{
[DllImport("CUDAImageLib.dll")]
private static extern int ImageSearchGPU(float[] bufferOut, float[] bufferB, float[] bufferS, int bw, int bh, int sw, int sh);
[DllImport("CUDAImageLib.dll")]
private static extern int FindMinCPU(float[] values, int count);
private static int _lastKernelTime = 0;
public static int LastKernelTime
{
get { return _lastKernelTime; }
}
public static Point ImageSearch(Bitmap big, Bitmap small)
{
int bw = big.Width;
int bh = big.Height;
int sw = small.Width;
int sh = small.Height;
int mx = (bw - sw);
int my = (bh - sh);
float[] diffs = new float[mx * my];
float[] b = ImageToFloat(big);
float[] s = ImageToFloat(small);
_lastKernelTime = ImageSearchGPU(diffs, b, s, bw, bh, sw, sh);
int minIndex = FindMinCPU(diffs, diffs.Length);
return new Point(minIndex % mx, minIndex / mx);
}
public static List<Point> ImageSearch(Bitmap big, Bitmap small, float maxDeviation)
{
int bw = big.Width;
int bh = big.Height;
int sw = small.Width;
int sh = small.Height;
int mx = (bw - sw);
int my = (bh - sh);
int nDiff = mx * my;
float[] diffs = new float[nDiff];
float[] b = ImageToFloat(big);
float[] s = ImageToFloat(small);
_lastKernelTime = ImageSearchGPU(diffs, b, s, bw, bh, sw, sh);
List<Point> points = new List<Point>();
for(int i = 0; i < nDiff; ++i)
{
if (diffs[i] < maxDeviation)
{
points.Add(new Point(i % mx, i / mx));
}
}
return points;
}
#if SUPPORT_HSB
private static float[] ImageToFloat(Bitmap img)
{
int w = img.Width;
int h = img.Height;
float[] pix = new float[w * h * 4];
int i = 0;
for (int y = 0; y < h; ++y)
{
for (int x = 0; x < w; ++x)
{
Color c = img.GetPixel(x, y);
pix[i] = c.GetHue() / 360;
pix[i + 1] = c.GetSaturation();
pix[i + 2] = c.GetBrightness();
pix[i + 3] = c.A;
i += 4;
}
}
return pix;
}
#else
private static float[] ImageToFloat(Bitmap bmp)
{
int w = bmp.Width;
int h = bmp.Height;
int n = w * h;
float[] pix = new float[n * 4];
System.Diagnostics.Debug.Assert(bmp.PixelFormat == PixelFormat.Format32bppArgb);
Rectangle r = new Rectangle(0, 0, w, h);
BitmapData bmpData = bmp.LockBits(r, ImageLockMode.ReadOnly, bmp.PixelFormat);
System.Diagnostics.Debug.Assert(bmpData.Stride > 0);
int[] pixels = new int[n];
System.Runtime.InteropServices.Marshal.Copy(bmpData.Scan0, pixels, 0, n);
bmp.UnlockBits(bmpData);
int j = 0;
for (int i = 0; i < n; ++i)
{
pix[j] = (pixels[i] & 255) / 255.0f;
pix[j + 1] = ((pixels[i] >> 8) & 255) / 255.0f;
pix[j + 2] = ((pixels[i] >> 16) & 255) / 255.0f;
pix[j + 3] = ((pixels[i] >> 24) & 255) / 255.0f;
j += 4;
}
return pix;
}
#endif
}
}
Looks like what you are talking about is a well known problem: Template matching. The easiest way forward is to convolve the Image (the bigger image) with the template (the smaller image). You could implement convolutions in one of two ways.
1) Modify the convolutions example from the CUDA SDK (similar to what you are doing anyway).
2) Use FFTs to implement the convolution. Ref. Convolution theorem. You will need to remember
% MATLAB format
L = size(A) + size(B) - 1;
conv2(A, B) = IFFT2(FFT2(A, L) .* FFT2(B, L));
You could use cufft to implement the 2 dimensional FFTs (After padding them appropriately). You will need to write a kernel that does element wise multiplication and then normalizes the result (because CUFFT does not normalize) before performing the inverse FFT.
For the sizes you mention, (1024 x 1280 and 128 x 128), the inputs must be padded to atleast ((1024 + 128 - 1) x (1280 + 128 -1) = 1151 x 1407). But FFTs are fastest when the (padded) inputs are powers of 2. So you will need to pad both the large and small images to size 2048 x 2048.
You could speed up your calculations by using faster memory access, for example by using
Texture Cache for the big image
Shared Memory or Constant Cache for the small image or parts of it.
But your real problem is the whole approach of your comparison. Comparing the images pixel by pixel at every possible location will never be efficient. There is just too much work to do. First you should think about finding ways to
Select the interesting image regions in the big image where the small image might be contained and only search in these
Find a faster comparison mechanism, by something representing the images that are not their pixels values. You should be able to compare the images by computing a representation with less data, e.g. a color histogram, or integral images.