HMAC Implementation - Pseudo code - cryptography

I am having to implement my own HMAC-SHA256 for use in an embedded project. I am having trouble getting it to work. I cant even get the pseudo code, hand-calculated to work, so I know I am doing something wrong!
My pseduoCode calcuations. Following the diagram in wikipedia
1 function hmac (key, message)
2 if (length(key) > blocksize) then
3 // keys longer than blocksize are shortened
4 key = hash(key)
5 end if
6 if (length(key) < blocksize) then
7 // keys shorter than blocksize are zero-padded
8 key = key ∥ zeroes(blocksize - length(key))
9 end if
10
11 // Where blocksize is that of the underlying hash function
12 o_key_pad = [0x5c * blocksize] ⊕ key
13 i_key_pad = [0x36 * blocksize] ⊕ key // Where ⊕ is exclusive or (XOR)
14 // Where ∥ is concatenation
15 return hash(o_key_pad ∥ hash(i_key_pad ∥ message))
16 end function
When I do by-hand calculations for key="mykey" and message="helloworld" I get the following:
key = 0x6d796b6579000000000000000000000000000000000000000000000000000000
o_key_pad = 0x31253739255c5c5c5c5c5c5c5c5c5c5c5c5c5c5c5c5c5c5c5c5c5c5c5c5c5c5c
i_key_pad = 0x5b4f5d534f363636363636363636363636363636363636363636363636363636
hash(i_key_pad ∥ message) = 6fb2e91de7b8b5ec6283846ff7245cd6eb4a4fd26056b529bd42d99fcf3314d2
and the overall hmac of 0d76a16089f85cd2169bb64b6f2c818e6a404a218896483fcd97fee5cce185ae

When fixing the key length and calulating the inner and outer padding, you need to use the blocksize of the underlying hash function, which is not the same as its output size. This is the size of the input blocks that the function operates on. In the case of SHA256, the blocksize is 512 bits (64 bytes) and the output size is 256 bits (32 bytes).
Your results are what you get if you use 32 as the blocksize.
Using the correct length blocksize the key, o_key_pad and i_key_pad are basically the same, only twice as long with trailing 00, 5c or 36 bytes respectively.
The result of the inner hash (i.e. hash(i_key_pad ∥ message) is:
8bf029764919f9e35249d0d55ffb8fd6c62fe23a85c1515e0120c5005aa813d5
and the final value (hash(o_key_pad ∥ hash(i_key_pad ∥ message))) is:
7fdfaa9c9c0931f52d9ebf2538bc99700f2e771f3af1c1d93945c2256c11aedd
which matches the result I get from OpenSSL’s HMAC implementation.

This is the code I came up with:
/**
* This function takes in a key, the length of that key, a message (null terminated) and a pointer to a char[32] or greater array
* It calculates the HMAC-SHA256 of the given key message combo and returns the resulting code in binary form, 32 hex pairs
1 #example ???? todo function hmac (key, message)
2 if (length(key) > blocksize) then
3 // keys longer than blocksize are shortened
4 key = hash(key)
5 end if
6 if (length(key) < blocksize) then
7 // keys shorter than blocksize are zero-padded
8 key = key ∥ zeroes(blocksize - length(key))
9 end if
10
11 // Where blocksize is that of the underlying hash function
12 o_key_pad = [0x5c * blocksize] ⊕ key
13 i_key_pad = [0x36 * blocksize] ⊕ key // Where ⊕ is exclusive or (XOR)
14 // Where ∥ is concatenation
15 return hash(o_key_pad ∥ hash(i_key_pad ∥ message))
16 end function
* #param key todo
* #param length todo
* #param message todo
* #param hmac todo
*/
void hmac(char key[], int length, char message[], char *hmac){
int msgLen = strlen(message); //get the length of the message to be encrypted
char keyFinal[BLOCK_SIZE] = {0}; //setup array for the data to go into
if(length > BLOCK_SIZE){ //if the given data is too long, hash it
shaHash(key, keyFinal);
}
if(length < BLOCK_SIZE){ //if the given data is too short, pad it with 0x00
int i;
for(i = 0; i < BLOCK_SIZE; i++){
if(i < length){ //read in the data
keyFinal[i] = key[i];
}else{ //if there is no more data to read, read in zeros
keyFinal[i] = 0x00;
}
}
}
if(length == BLOCK_SIZE){ //if the given data is the right size, transfer it to keyFinal
int i;
for(i = 0; i < BLOCK_SIZE; i++){
keyFinal[i] = key[i];
}
}
char oKeyPad[BLOCK_SIZE] = {0}; //setup the oKeyPad
char iKeyPad[BLOCK_SIZE] = {0}; //setup the ikeypad
int i;
for(i = 0; i < BLOCK_SIZE; i++){ //for each item in key final, xor it with O_KEY_PAD and I_KEY_PAD
oKeyPad[i] = keyFinal[i] ^ O_KEY_PAD;
iKeyPad[i] = keyFinal[i] ^ I_KEY_PAD;
}
char iandmesg[BLOCK_SIZE+MAX_SHA]; //setup the inner hash ikeypad concat with message
char hash_iandmesg[HASH_LEN] = {0}; //get ready to get bytes back from the hashing function
//make the message to be hashed, ikeypad concatinated with message
for(i = 0; i < BLOCK_SIZE; i++){ //read in ikeypad
iandmesg[i] = iKeyPad[i];
}
for(i = BLOCK_SIZE; i < (msgLen + BLOCK_SIZE); i++){ //read in message
iandmesg[i] = message[i-BLOCK_SIZE];
}
shaHash_len(iandmesg, (msgLen+BLOCK_SIZE), hash_iandmesg); //create the inner hash (ikeypad + message)
char oandihash[(BLOCK_SIZE + HASH_LEN)]; //setup the outter hash, okeypad + (hash of ikeypad + message)
//make the message to be hashed, okeypad concatinated with the hash of (ikeypad + message)
for(i = 0; i < BLOCK_SIZE; i++){ //read in okeypad
oandihash[i] = oKeyPad[i];
}
for(i = BLOCK_SIZE; i < (BLOCK_SIZE + HASH_LEN); i++){ //read in hash of ikeypad + message
oandihash[i] = hash_iandmesg[i-BLOCK_SIZE];
}
//return the result of the hash of (okeypad + hash(ikeypad + message))
shaHash_len(oandihash, (BLOCK_SIZE + HASH_LEN), hmac);
}

Related

Android specific frequency AudioTrack infinity dutaion

i made specific frequency sound, used AutioTrack
private static final int duration = 10; // seconds
private static final int sampleRate = 8000;
private static final int numSamples = duration * sampleRate;
private static final double sample[] = new double[numSamples];
private static double freqOfTone = 0; // hz
final byte generatedSnd[] = new byte[2 * numSamples];
final AudioTrack p7 = new AudioTrack(AudioManager.STREAM_MUSIC,
sampleRate, AudioFormat.CHANNEL_CONFIGURATION_MONO,
AudioFormat.ENCODING_PCM_16BIT, numSamples,
AudioTrack.MODE_STATIC);
for (int i = 0; i < numSamples; ++i) {
sample[i] = Math.sin(2 * Math.PI * i / (sampleRate/hz));
}
// convert to 16 bit pcm sound array
// assumes the sample buffer is normalised.
int idx = 0;
for (final double dVal : sample) {
// scale to maximum amplitude
final short val = (short) ((dVal * 32767));
// in 16 bit wav PCM, first byte is the low order byte
generatedSnd[idx++] = (byte) (val & 0x00ff);
generatedSnd[idx++] = (byte) ((val & 0xff00) >>> 8);
}
p7.write(generatedSnd, 0, generatedSnd.length);
p7.play();
like this. first line is set duration
but i want to make infinity duration. (not loop)
is it possible?
please help me

Constructing bitmask ? bitwise packet

I have been wanting to experiment with this project Axon with an iOS app connecting over a tcp connection. Towards the end of the doc the protocol is explained as so
The wire protocol is simple and very much zeromq-like, where is a BE 24 bit unsigned integer representing a maximum length of roughly ~16mb. The data byte is currently only used to store the codec, for example "json" is simply 1, in turn JSON messages received on the client end will then be automatically decoded for you by selecting this same codec.
With the diagram
octet: 0 1 2 3 <length>
+------+------+------+------+------------------...
| meta | <length> | data ...
+------+------+------+------+------------------...
I have had experience working with binary protocols creating a packet such as:
NSUInteger INT_32_LENGTH = sizeof(uint32_t);
uint32_t length = [data length]; // data is an NSData object
NSMutableData *packetData = [NSMutableData dataWithCapacity:length + (INT_32_LENGTH * 2)];
[packetData appendBytes:&requestType length:INT_32_LENGTH];
[packetData appendBytes:&length length:INT_32_LENGTH];
[packetData appendData:data];
So my question is how would you create the data packet for the Axon request, I would assume some bit shifting, which I am not too clued up on.
Allocate 1 array of char or unsigned char with size == packet_size;
Decalre constants:
const int metaFieldPos = 0;
const int sizeofMetaField = sizeof(char);
const int lengthPos = metaFieldPos + sizeofMetaField;
const int sizeofLengthField = sizeof(char) * 3;
const int dataPos = lengthPos + sizeofLengthField;
If you got the data and can recognize begining of the packet, you can use constants above to
navigate by pointers.
May be these functions will help you (They use Qt, but you can easily translate them to library, that you use)
quint32 Convert::uint32_to_uint24(const quint32 value){
return value & (quint32)(0x00FFFFFFu);
}
qint32 Convert::int32_to_uint24(const qint32 value){
return value & (qint32)(0x00FFFFFF);
}
quint32 Convert::bytes_to_uint24(const char* from){
quint32 result = 0;
quint8 shift = 0;
for (int i = 0; i < bytesIn24Bits; i++) {
result |= static_cast<quint32>(*reinterpret_cast<const quint8 *>(from + i)) << shift;
shift+=bitsInByte;
}
return result;
}
void Convert::uint32_to_uint24Bytes(const quint32 value, char* from){
quint8 shift = 0;
for (int i = 0; i < bytesIn24Bits; i++) {
const quint32 buf = (value >> shift) & 0xFFu;
*(from + i) = *reinterpret_cast<const char *>(&buf);
shift+=bitsInByte;
}
}
QByteArray Convert::uint32_to_uint24QByteArray (const quint32 value){
QByteArray bytes;
bytes.resize(sizeof(value));
*reinterpret_cast<quint32 *>(bytes.data()) = value;
bytes.chop(1);
return bytes;
}

How to increase an ipv6 address based on mask in java?

i am trying to increment ipv6 address based on mask.
i am getting problem when there is F in place of increment.
could any one plz check this
public String IncrementIPV6ForPrefixLength (String IPv6String, int times) throws UnknownHostException
{
int result , carry = 0, i;
int bits;
int mask=0;
int index=IPv6String.indexOf("/");
mask=Integer.parseInt(IPv6String.substring(index+1, IPv6String.length()));
IPv6String=IPv6String.substring(0, index);
InetAddress iaddr=InetAddress.getByName(IPv6String);
byte[] IPv6Arr=iaddr.getAddress();
if(mask > 128 || mask < 0)
return null;
i = mask/8;
bits = mask%8;
if(bits>0)
{
result = ((int)(IPv6Arr[i]>>(8-bits))) + times;
IPv6Arr[i] =(byte) ((result << (8-bits)) | (IPv6Arr[i] & (0xff >> (bits))));
carry = (result << (8-bits))/256;
times /= 256;
}
i--;
for(;i>=0;i--)
{
result = ((int)IPv6Arr[i]) + ((times + carry)& 0xFF);
IPv6Arr[i] = (byte)(result % 256);
carry = result / 256;
if(carry == 0)
{
iaddr=InetAddress.getByAddress(IPv6Arr);
String s=iaddr.toString();
if(s.indexOf('/') != -1){
s = s.substring(1, s.length()).toUpperCase();
}
StringBuffer buff =new StringBuffer("");
String[] ss = s.split(":");
for(int k=0;k<ss.length;k++){
int Differ = 4 - ss[k].length();
for(int j = 0; j<Differ;j++){
buff.append("0");
}
buff.append(ss[k]);
if(k!=7)buff=buff.append(":");
}
return buff.toString()+"/"+mask;
}
times /= 256;
}
return null;
}
input like this:
FD34:4FB7:FFFF:A13F:1325:2252:1525:325F/48
FD34:41B7:FFFF::/48
FD34:4FBF:F400:A13E:1325:2252:1525:3256/35
output like this
if increment by 1
FD34:4FB8:0000:A13F:1325:2252:1525:325F/48
FD34:41B8:0000::/48
FD34:4FC0:0400:A13E:1325:2252:1525:3256/35
if increment by 2
FD34:4FB8:0001:A13F:1325:2252:1525:325F/48
FD34:41B8:0001::/48
FD34:4FC0:1400:A13E:1325:2252:1525:3256/35
can u plz find where i am doing wrong.
Disregarding the posted code, try to model the operation as a direct numerical operation on the 128-bit number that the IPv6 address really is. Convert to BigInteger and use BigInteger.add.

How to calculate CRC-16 from HEX values?

In my code i need to calculate CRC-16 16 bit values for the HEX values stored as NSdata, below is the code snippet to calculate CRC-16 in c.
void UpdateCRC(unsigned short int *CRC, unsigned char x)
{
// This function uses the initial CRC value passed in the first
// argument, then modifies it using the single character passed
// as the second argument, according to a CRC-16 polynomial
// Arguments:
// CRC -- pointer to starting CRC value
// x -- new character to be processed
// Returns:
// The function does not return any values, but updates the variable
// pointed to by CRC
static int const Poly = 0xA001;
int i;
bool flag;
*CRC ^= x;
for (i=0; i<8; i++)
// CRC-16 polynomial
{
flag = ((*CRC & 1) == 1);
*CRC = (unsigned short int)(*CRC >> 1);
if (flag)
*CRC ^= Poly;
}
return;
}
NSdata which holds the hex values like below
const char connectByteArray[] = {
0x21,0x01,0x90,0x80,0x5F
};
NSData* data = [NSData dataWithBytes: connectByteArray length:sizeof(connectByteArray)];
I solved using the following C program, I hope it may help someone ..cheers!!!
#include <string.h>
#include <stdio.h>
const int order = 16;
const unsigned long polynom = 0x8005;
const int direct = 1;
const unsigned long crcinit = 0;
const unsigned long crcxor = 0;
const int refin = 1;
const int refout = 1;
// 'order' [1..32] is the CRC polynom order, counted without the leading '1' bit
// 'polynom' is the CRC polynom without leading '1' bit
// 'direct' [0,1] specifies the kind of algorithm: 1=direct, no augmented zero bits
// 'crcinit' is the initial CRC value belonging to that algorithm
// 'crcxor' is the final XOR value
// 'refin' [0,1] specifies if a data byte is reflected before processing (UART) or not
// 'refout' [0,1] specifies if the CRC will be reflected before XOR
// Data character string
const unsigned char string[] = {0x05,0x0f,0x01,0x00,0x00,0x99};
// internal global values:
unsigned long crcmask;
unsigned long crchighbit;
unsigned long crcinit_direct;
unsigned long crcinit_nondirect;
unsigned long crctab[256];
// subroutines
unsigned long reflect (unsigned long crc, int bitnum) {
// reflects the lower 'bitnum' bits of 'crc'
unsigned long i, j=1, crcout=0;
for (i=(unsigned long)1<<(bitnum-1); i; i>>=1) {
if (crc & i) crcout|=j;
j<<= 1;
}
return (crcout);
}
void generate_crc_table() {
// make CRC lookup table used by table algorithms
int i, j;
unsigned long bit, crc;
for (i=0; i<256; i++) {
crc=(unsigned long)i;
if (refin) crc=reflect(crc, 8);
crc<<= order-8;
for (j=0; j<8; j++) {
bit = crc & crchighbit;
crc<<= 1;
if (bit) crc^= polynom;
}
if (refin) crc = reflect(crc, order);
crc&= crcmask;
crctab[i]= crc;
}
}
unsigned long crctablefast (unsigned char* p, unsigned long len) {
// fast lookup table algorithm without augmented zero bytes, e.g. used in pkzip.
// only usable with polynom orders of 8, 16, 24 or 32.
unsigned long crc = crcinit_direct;
if (refin) crc = reflect(crc, order);
if (!refin) while (len--) crc = (crc << 8) ^ crctab[ ((crc >> (order-8)) & 0xff) ^ *p++];
else while (len--) crc = (crc >> 8) ^ crctab[ (crc & 0xff) ^ *p++];
if (refout^refin) crc = reflect(crc, order);
crc^= crcxor;
crc&= crcmask;
return(crc);
}
unsigned long crctable (unsigned char* p, unsigned long len) {
// normal lookup table algorithm with augmented zero bytes.
// only usable with polynom orders of 8, 16, 24 or 32.
unsigned long crc = crcinit_nondirect;
if (refin) crc = reflect(crc, order);
if (!refin) while (len--) crc = ((crc << 8) | *p++) ^ crctab[ (crc >> (order-8)) & 0xff];
else while (len--) crc = ((crc >> 8) | (*p++ << (order-8))) ^ crctab[ crc & 0xff];
if (!refin) while (++len < order/8) crc = (crc << 8) ^ crctab[ (crc >> (order-8)) & 0xff];
else while (++len < order/8) crc = (crc >> 8) ^ crctab[crc & 0xff];
if (refout^refin) crc = reflect(crc, order);
crc^= crcxor;
crc&= crcmask;
return(crc);
}
unsigned long crcbitbybit(unsigned char* p, unsigned long len) {
// bit by bit algorithm with augmented zero bytes.
// does not use lookup table, suited for polynom orders between 1...32.
unsigned long i, j, c, bit;
unsigned long crc = crcinit_nondirect;
for (i=0; i<len; i++) {
c = (unsigned long)*p++;
if (refin) c = reflect(c, 8);
for (j=0x80; j; j>>=1) {
bit = crc & crchighbit;
crc<<= 1;
if (c & j) crc|= 1;
if (bit) crc^= polynom;
}
}
for (i=0; i<order; i++) {
bit = crc & crchighbit;
crc<<= 1;
if (bit) crc^= polynom;
}
if (refout) crc=reflect(crc, order);
crc^= crcxor;
crc&= crcmask;
return(crc);
}
unsigned long crcbitbybitfast(unsigned char* p, unsigned long len) {
// fast bit by bit algorithm without augmented zero bytes.
// does not use lookup table, suited for polynom orders between 1...32.
unsigned long i, j, c, bit;
unsigned long crc = crcinit_direct;
for (i=0; i<len; i++) {
c = (unsigned long)*p++;
if (refin) c = reflect(c, 8);
for (j=0x80; j; j>>=1) {
bit = crc & crchighbit;
crc<<= 1;
if (c & j) bit^= crchighbit;
if (bit) crc^= polynom;
}
}
if (refout) crc=reflect(crc, order);
crc^= crcxor;
crc&= crcmask;
return(crc);
}
int main() {
// test program for checking four different CRC computing types that are:
// crcbit(), crcbitfast(), crctable() and crctablefast(), see above.
// parameters are at the top of this program.
// Result will be printed on the console.
int i;
unsigned long bit, crc;
// at first, compute constant bit masks for whole CRC and CRC high bit
crcmask = ((((unsigned long)1<<(order-1))-1)<<1)|1;
crchighbit = (unsigned long)1<<(order-1);
// check parameters
if (order < 1 || order > 32) {
printf("ERROR, invalid order, it must be between 1..32.\n");
return(0);
}
if (polynom != (polynom & crcmask)) {
printf("ERROR, invalid polynom.\n");
return(0);
}
if (crcinit != (crcinit & crcmask)) {
printf("ERROR, invalid crcinit.\n");
return(0);
}
if (crcxor != (crcxor & crcmask)) {
printf("ERROR, invalid crcxor.\n");
return(0);
}
// generate lookup table
generate_crc_table();
// compute missing initial CRC value
if (!direct) {
crcinit_nondirect = crcinit;
crc = crcinit;
for (i=0; i<order; i++) {
bit = crc & crchighbit;
crc<<= 1;
if (bit) crc^= polynom;
}
crc&= crcmask;
crcinit_direct = crc;
}
else {
crcinit_direct = crcinit;
crc = crcinit;
for (i=0; i<order; i++) {
bit = crc & 1;
if (bit) crc^= polynom;
crc >>= 1;
if (bit) crc|= crchighbit;
}
crcinit_nondirect = crc;
}
// call CRC algorithms using the CRC parameters above and print result to the console
printf("\n");
printf("CRC tester v1.1 written on 13/01/2003 by Sven Reifegerste (zorc/reflex)\n");
printf("-----------------------------------------------------------------------\n");
printf("\n");
printf("Parameters:\n");
printf("\n");
printf(" polynom : 0x%x\n", polynom);
printf(" order : %d\n", order);
printf(" crcinit : 0x%x direct, 0x%x nondirect\n", crcinit_direct, crcinit_nondirect);
printf(" crcxor : 0x%x\n", crcxor);
printf(" refin : %d\n", refin);
printf(" refout : %d\n", refout);
printf("\n");
printf(" data string : '%s' (%d bytes)\n", string, strlen(string));
printf("\n");
printf("Results:\n");
printf("\n");
printf(" crc bit by bit : 0x%x\n", crcbitbybit((unsigned char *)string, 6));
printf(" crc bit by bit fast : 0x%x\n", crcbitbybitfast((unsigned char *)string, strlen(string)));
if (!(order&7)) printf(" crc table : 0x%x\n", crctable((unsigned char *)string, strlen(string)));
if (!(order&7)) printf(" crc table fast : 0x%x\n", crctablefast((unsigned char *)string, strlen(string)));
return(0);
}

Java - PBKDF2 with HMACSHA256 as the PRF

I've been given the task of creating a Login API for our project and I'm supposed to use PBKDF2 with HMACSHA256 as the PRF. The plain text password is hashed using MD5 and then fed into the PBKDF2 to generate a derived key. The problem is, I'm not able to get the same output as what the project documentation is telling me.
Here's the PBKDF2 Implementation in Java:
public class PBKDF2
{
public static byte[] deriveKey( byte[] password, byte[] salt, int iterationCount, int dkLen )
throws java.security.NoSuchAlgorithmException, java.security.InvalidKeyException
{
SecretKeySpec keyspec = new SecretKeySpec( password, "HmacSHA256" );
Mac prf = Mac.getInstance( "HmacSHA256" );
prf.init( keyspec );
// Note: hLen, dkLen, l, r, T, F, etc. are horrible names for
// variables and functions in this day and age, but they
// reflect the terse symbols used in RFC 2898 to describe
// the PBKDF2 algorithm, which improves validation of the
// code vs. the RFC.
//
// dklen is expressed in bytes. (16 for a 128-bit key)
int hLen = prf.getMacLength(); // 20 for SHA1
int l = Math.max( dkLen, hLen); // 1 for 128bit (16-byte) keys
int r = dkLen - (l-1)*hLen; // 16 for 128bit (16-byte) keys
byte T[] = new byte[l * hLen];
int ti_offset = 0;
for (int i = 1; i <= l; i++) {
F( T, ti_offset, prf, salt, iterationCount, i );
ti_offset += hLen;
}
if (r < hLen) {
// Incomplete last block
byte DK[] = new byte[dkLen];
System.arraycopy(T, 0, DK, 0, dkLen);
return DK;
}
return T;
}
private static void F( byte[] dest, int offset, Mac prf, byte[] S, int c, int blockIndex ) {
final int hLen = prf.getMacLength();
byte U_r[] = new byte[ hLen ];
// U0 = S || INT (i);
byte U_i[] = new byte[S.length + 4];
System.arraycopy( S, 0, U_i, 0, S.length );
INT( U_i, S.length, blockIndex );
for( int i = 0; i < c; i++ ) {
U_i = prf.doFinal( U_i );
xor( U_r, U_i );
}
System.arraycopy( U_r, 0, dest, offset, hLen );
}
private static void xor( byte[] dest, byte[] src ) {
for( int i = 0; i < dest.length; i++ ) {
dest[i] ^= src[i];
}
}
private static void INT( byte[] dest, int offset, int i ) {
dest[offset + 0] = (byte) (i / (256 * 256 * 256));
dest[offset + 1] = (byte) (i / (256 * 256));
dest[offset + 2] = (byte) (i / (256));
dest[offset + 3] = (byte) (i);
}
// ctor
private PBKDF2 () {}
}
I used test vectors found here PBKDF2-HMAC-SHA2 test vectors to verify the correctness of the implementation and it all checked out. I'm not sure why I couldn't the same results with an MD5 hashed password.
Parameters:
Salt: 000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F
Iterations Count: 1000
DKLen: 16 (128-bit derived key)
Using "foobar" as the plaintext password, the expected results are:
PWHash = MD5(PlaintextPassword) = 3858f62230ac3c915f300c664312c63f
PWKey = PBKDF2(PWHash, Salt, IterationsCount, DKLen) = 33C37758EFA6780C5E52FAB3B50F329C
What I get:
PWHash = 3858f62230ac3c915f300c664312c63f
PWKey = 0bd0c7d8339df2c66ce4b6e1e91ed3f1
The iterations count was supposed to 4096, not 1000.
The generation of int l seems wrong. You have specified the maximum between dkLen and hLen but the spec says l = CEIL (dkLen / hLen) with
CEIL (x) is the "ceiling" function, i.e. the smallest integer greater than, or equal to, x.
I think l would be more accurately defined as l = (int)Math.ceil( (double)dkLen / (double)hLen )