What would be a good solution to send a JavaCard RSAPublicKey via APDU?
Get exponent and modules and pack them into a byte array?
Yes, you need to send both exponent and modulus serialized together as a byte array. These two methods solve your issue:
//reads the key object and stores it into the buffer
private final short serializeKey(RSAPublicKey key, byte[] buffer, short offset) {
short expLen = key.getExponent(buffer, (short) (offset + 2));
Util.setShort(buffer, offset, expLen);
short modLen = key.getModulus(buffer, (short) (offset + 4 + expLen));
Util.setShort(buffer, offset + 2 + expLen, modLen);
return (short) (4 + expLen + modLen);
}
//reads the key from the buffer and stores it inside the key object
private final short deserializeKey(RSAPublicKey key, byte[] buffer, short offset) {
short expLen = Util.getShort(buffer, offset);
key.setExponent(buffer, (short) (offset + 2), expLen);
short modLen = Util.getShort(buffer, (short) (offset + 2 + expLen));
key.setModulus(buffer, (short) (offset + 4 + expLen), modLen);
return (short) (4 + expLen + modLen);
}
Related
I found the following page in the web:
https://users.ece.cmu.edu/~koopman/crc/crc64.html
It lists the performance of a handful of 64 bit CRC polynomials. The optimal payload for a hamming distance of 3 is listed as 18446744073709551551 bit. A polynomial providing that HD 3 payload is 0xd6c9e91aca649ad4 (Koopman notation).
On the same website there is also some basic "HDLen" C code that can compute the performance of any polynomial (https://users.ece.cmu.edu/~koopman/crc/hdlen.html). I checked that code and the HD 3 optimized loop is very simple, similar to this:
Poly_t accum = cPoly;
Length_t len = 0;
while(accum != cTopBitSet)
{
accum = (accum & 1) ? (accum >> 1) ^ cPoly) : (accum >> 1);
len++;
}
18446744073709551551 is a huge number. It is almost the full range of a 64 bit integral. Even that simple loop would run centuries on the most powerful CPU core available.
It also appears to me that this loop can not be parallelized since each iteration depends from the previous iteration.
It is claimed that payload is optimal amongst all possible 64 bit polynomials which means that all possible 64 bit polynomials would have been checked for their individual HD 3 performance. This task can be parallelized, still the huge number of candidate polynomials seems to be undoable.
I can't see a way to even compute a single (good) polynomial's (HD 3) performance. Not to mention all possible 64 bit wide polynomials.
So I wonder: How has the number been found? What kind of code or method (in contrast to the simple HDLen software) was used to find the mentioned optimal HD 3 payload?
It is a primitive polynomial, where it can be shown that the HD=3 length of any primitive polynomial over GF(2) is 2n-(n+1), where n is the degree of the polynomial.
It can be shown pretty quickly whether a polynomial over a finite field is primitive or not.
Also, it is possible to compute the CRC of a very sparse codeword of n bits in O(log n) time instead of O(n) time. Here is an example in C, demonstrating the case mentioned for the provided CRC:
#include <stdio.h>
#include <stdint.h>
// Jones' 64-bit primitive polynomial (the constant excludes the x^64 term):
// 1 + x^3 + x^5 + x^7 + x^8 + x^10 + x^12 + x^13 + x^16 + x^19 + x^22 + x^23 +
// x^26 + x^28 + x^31 + x^32 + x^34 + x^36 + x^37 + x^41 + x^44 + x^46 + x^47 +
// x^48 + x^49 + x^52 + x^55 + x^56 + x^58 + x^59 + x^61 + x^63 + x^64
#define POLY 0xad93d23594c935a9
#define HIGH 0x8000000000000000 // high bit set
// Return polynomial a times polynomial b modulo p (POLY). a must be non-zero.
static uint64_t multmodp(uint64_t a, uint64_t b) {
uint64_t prod = 0;
for (;;) {
if (a & 1) {
prod ^= b;
if (a == 1)
break;
}
a >>= 1;
b = b & HIGH ? (b << 1) ^ POLY : b << 1;
}
return prod;
}
// x2n_table[n] is x^2^n mod p.
static uint64_t x2n_table[64];
// Initialize x2n_table[].
static void x2n_table_init(void) {
uint64_t p = 2; // first entry is x^2^0 == x^1
x2n_table[0] = p;
for (size_t n = 1; n < 64; n++)
x2n_table[n] = p = multmodp(p, p);
}
// Compute x^n modulo p. This takes O(log n) time.
static uint64_t xtonmodp(uintmax_t n) {
uint64_t x = 1;
int k = 0;
for (;;) {
if (n & 1)
x = multmodp(x2n_table[k], x);
n >>= 1;
if (n == 0)
break;
k++;
}
return x;
}
// Feed n zero bits into the CRC, taking O(log n) time.
static uint64_t crc64zeros(uint64_t crc, uint64_t n) {
return multmodp(xtonmodp(n), crc);
}
// Feed one one bit into the CRC.
static uint64_t crc64one(uint64_t crc) {
return crc & HIGH ? crc << 1 : (crc << 1) ^ POLY;
}
// Return the CRC-64 of one one bit, followed by n zero bits, followed by one
// more one bit.
static uint64_t crc64_one_zeros_one(uint64_t n) {
return crc64one(crc64zeros(crc64one(0), n));
}
int main(void) {
x2n_table_init();
uint64_t n = -2; // code word with 2^64 bits: a 1, 2^64-2 0's, and a 1
printf("%llx\n", crc64_one_zeros_one(n)); // prints 0
return 0;
}
That calculation completes in about 7.4 µs on my machine. As opposed to the bit-at-a-time calculation, which would take about 560 years on my machine.
I am trying to read the WebP image header, according to the WebP Container Specification of Extended File Format.
fun get24bit(data: ByteArray, index: Int): Int {
return ((data[0 + index].toInt()) or (data[1 + index].toInt() shl 8) or (data[2 + index].toInt() shl 16))
}
fun get32bit(data: ByteArray, index: Int): Int {
return get24bit(data, index) or (data[3 + index].toInt() shl 24)
}
// data -> File(fileName).readBytes() for testing purpose
fun webpExtract(data: ByteArray) {
println(String(data.copyOfRange(0, 4)))
println("Size: ${get32bit(data, 4)}")
println(String(data.copyOfRange(8, 12)))
println(String(data.copyOfRange(12, 16)))
// 16, 17, 18, 19 reserved
val width = 1 + get24bit(data, 20)
val height = 1 + get24bit(data, 23)
println("Width: $width, Height: $height")
}
And the outputs are:
RIFF
Size: -52
WEBP
VP8X
Width: 17, Height: 32513
The String outputs are alright, but the Size is getting negative and Width and Heights are wrong i.e. They should be 128 and 128 respectively (for the test image I've used).
Is there something wrong in the code? I am not able to figure out what's the problem.
I've also verified the actual C++ implementation here in github. My code does the same bit shifting, but the results are not correct. As far as I know, left shifting does not has anything to do with unsigned and signed right?
Don't know the Spec was incomplete or something, I logged the byte values and found a pattern somehow. And found that the dimensions are at 24-26 and 27-29 indexes.
val width = 1 + (get24bit(data, 24))
val height = 1 + (get24bit(data, 27))
This does the trick! Hopefully it is helpful to note this point as long as documentation is not updated.
The accepted answer only works for certain WebP files (Extertended format VP8X) but there are other two formats (lossy VP8 and lossless VP8L) that don't work with that answer.
The 3 formats have different ways to get the dimensions.
fun getWebPDimensions(imgFile: File) {
val stream = FileInputStream(imgFile)
val data = stream.readNBytes(30)
// All formats consist of a file header (12 bytes) and a ChunkHeader (8 bytes)
// The first four ChunkHeader bytes contain the 4 characters of the format (12 to 15):
val imageFormat = String(Arrays.copyOfRange(data, 12, 16)) // exclusive range
val width: Int
val height: Int
when(imageFormat) {
"VP8 " -> { // last character is a space
// Simple File Format (Lossy)
// The data is in the VP8 specification and the decoding guide explains how to get the dimensions: https://datatracker.ietf.org/doc/html/rfc6386#section-19.1
// The formats consists of the frame_tag (3 bytes), start code (3 bytes), horizontal_size_code (2 bytes) and vertical_size_code (2 bytes)
// The size is 14 bits, use a mask to remove the last two digits
width = get16bit(data, 26) and 0x3FFF
height = get16bit(data, 28) and 0x3FFF
}
"VP8X" -> {
// Extended File Format, size position specified here: https://developers.google.com/speed/webp/docs/riff_container#extended_file_format
// The width starts 4 bytes after the ChunkHeader with a size of 3 bytes, the height comes after.
width = 1 + (get24bit(data, 24))
height = 1 + (get24bit(data, 27))
}
"VP8L" -> {
// Simple File Format (Lossless), specification here: https://developers.google.com/speed/webp/docs/webp_lossless_bitstream_specification#3_riff_header
// The format consists of a signature (1 byte), 14 bit width (2 bytes) and 14 bit height (2 bytes)
// The width and height are in consecutive bits
val firstBytes = get16bit(data, 21)
width = 1 + (firstBytes and 0x3FFF)
val lastTwoDigits = (firstBytes and 0xC000) shr 14 // the last 2 bits correspond to the first 2 bits of the height
// Extract the remaining 12 bits and shift them to add space for the two digits
height = 1 + ((get16bit(data, 23) and 0xFFF shl 2) or lastTwoDigits)
}
}
}
private fun get16bit(data: ByteArray, index: Int): Int {
// The mask (0xFF) converts the byte from signed (this is how java originally reads the byte) to unsigned
return data[index].toInt() and 0xFF or (data[index + 1].toInt() and 0xFF shl 8)
}
private fun get24bit(data: ByteArray, index: Int): Int {
return get16bit(data, index) or (data[index + 2].toInt() and 0xFF shl 16)
}
Let's say I have a 1024kb data, which is 1kB buffered and transfered 1024 times from a transmitter to a receiver.
The last buffer contains a calculated CRC32 value as the last 4 bytes.
However, the receiver has to calculate the CRC32 buffer by buffer, because of the RAM constraints.
I wonder how to apply a linear distributed addition of CRC32 calculations to match the total CRC32 value.
I looked at CRC calculation and its distributive preference. The calculation and its linearity is not much clear to implement.
So, is there a mathematical expression for addition of calculated CRC32s over buffers to match with the CRC32 result which is calculated over total?
Such as:
int CRC32Total = 0;
int CRC32[1024];
for(int i = 0; i < 1024; i++){
CRC32Total = CRC32Total + CRC32[i];
}
Kind Regards
You did not provide any clues as to what implementation or even what language for which you "looked at CRC calculation". However every implementation I've seen is designed to compute CRCs piecemeal, exactly like you want.
For the crc32() routine provided in zlib, it is used thusly (in C):
crc = crc32(0, NULL, 0); // initialize CRC value
crc = crc32(crc, firstchunk, 1024); // update CRC value with first chunk
crc = crc32(crc, secondchunk, 1024); // update CRC with second chunk
...
crc = crc32(crc, lastchunk, 1024); // complete CRC with the last chunk
Then crc is the CRC of the concatenation of all of the chunks. You do not need a function to combine the CRCs of individual chunks.
If for some other reason you do want a function to combine CRCs, e.g. if you need to split the CRC calculation over multiple CPUs, then zlib provides the crc32_combine() function for that purpose.
When you start the transfer, reset the CrcChecksum to its initial value with the OnFirstBlock method. For every block received, call the OnBlockReceived to update the checksum. Note that the blocks must be processed in the correct order. When the final block has been processed, the final CRC is in the CrcChecksum variable.
// In crc32.c
uint32_t UpdateCrc(uint32_t crc, const void *data, size_t length)
const uint8_t *current = data;
while (length--)
crc = (crc >> 8) ^ Crc32Lookup[(crc & 0xFF) ^ *current++];
}
// In your block processing application
static uint32_t CrcChecksum;
void OnFirstBlock(void) {
CrcChecksum = 0;
}
void OnBlockReceived(const void *data, size_t length) {
CrcChecksum = UpdateCrc(CrcChecksum, data, length);
}
To complement my comment to your question, I have added code here that goes thru the whole process: data generation as a linear array, CRC32 added to the transmitted data, injection of errors, and reception in 'chunks' with computed CRC32 and detection of errors. You're probably only interested in the 'reception' part, but I think having a complete example makes it more clear for your comprehension.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>
// ---------------------- buildCRC32table ------------------------------
static const uint32_t CRC32_POLY = 0xEDB88320;
static const uint32_t CRC32_XOR_MASK = 0xFFFFFFFF;
static uint32_t CRC32TABLE[256];
void buildCRC32table (void)
{
uint32_t crc32;
for (uint16_t byte = 0; byte < 256; byte++)
{
crc32 = byte;
// iterate thru all 8 bits
for (int i = 0; i < 8; i++)
{
uint8_t feedback = crc32 & 1;
crc32 = (crc32 >> 1);
if (feedback)
{
crc32 ^= CRC32_POLY;
}
}
CRC32TABLE[byte] = crc32;
}
}
// -------------------------- myCRC32 ----------------------------------
uint32_t myCRC32 (uint32_t previousCRC32, uint8_t *pData, int dataLen)
{
uint32_t newCRC32 = previousCRC32 ^ CRC32_XOR_MASK; // remove last XOR mask (or add first)
// add new data to CRC32
while (dataLen--)
{
uint32_t crc32Top24bits = newCRC32 >> 8;
uint8_t crc32Low8bits = newCRC32 & 0x000000FF;
uint8_t data = *pData++;
newCRC32 = crc32Top24bits ^ CRC32TABLE[crc32Low8bits ^ data];
}
newCRC32 ^= CRC32_XOR_MASK; // put XOR mask back
return newCRC32;
}
// ------------------------------ main ---------------------------------
int main()
{
// build CRC32 table
buildCRC32table();
uint32_t crc32;
// use a union so we can access the same data linearly (TX) or by chunks (RX)
union
{
uint8_t array[1024*1024];
uint8_t chunk[1024][1024];
} data;
// use time to seed randomizer so we have different data every run
srand((unsigned int)time(NULL));
/////////////////////////////////////////////////////////////////////////// Build data to be transmitted
////////////////////////////////////////////////////////////////////////////////////////////////////////
// populate array with random data sparing space for the CRC32 at the end
for (int i = 0; i < (sizeof(data.array) - sizeof(uint32_t)); i++)
{
data.array[i] = (uint8_t) (rand() & 0xFF);
}
// now compute array's CRC32
crc32 = myCRC32(0, data.array, sizeof(data.array) - sizeof(uint32_t));
printf ("array CRC32 = 0x%08X\n", crc32);
// to store the CRC32 into the array, we want to remove the XOR mask so we can compute the CRC32
// of all received data (including the CRC32 itself) and expect the same result all the time,
// regardless of the data, when no errors are present
crc32 ^= CRC32_XOR_MASK;
// load CRC32 at the very end of the array
data.array[sizeof(data.array) - 1] = (uint8_t)((crc32 >> 24) & 0xFF);
data.array[sizeof(data.array) - 2] = (uint8_t)((crc32 >> 16) & 0xFF);
data.array[sizeof(data.array) - 3] = (uint8_t)((crc32 >> 8) & 0xFF);
data.array[sizeof(data.array) - 4] = (uint8_t)((crc32 >> 0) & 0xFF);
/////////////////////////////////////////////// At this point, data is transmitted and errors may happen
////////////////////////////////////////////////////////////////////////////////////////////////////////
// to make things interesting, let's add one bit error with 1/8 probability
if ((rand() % 8) == 0)
{
uint32_t index = rand() % sizeof(data.array);
uint8_t errorBit = 1 << (rand() & 0x7);
// add error
data.array[index] ^= errorBit;
printf("Error injected on byte %u, bit mask = 0x%02X\n", index, errorBit);
}
else
{
printf("No error injected\n");
}
/////////////////////////////////////////////////////// Once received, the data is processed in 'chunks'
////////////////////////////////////////////////////////////////////////////////////////////////////////
// now we access the data and compute its CRC32 one chunk at a time
crc32 = 0; // initialize CRC32
for (int i = 0; i < 1024; i++)
{
crc32 = myCRC32(crc32, data.chunk[i], sizeof data.chunk[i]);
}
printf ("Final CRC32 = 0x%08X\n", crc32);
// because the CRC32 algorithm applies an XOR mask at the end, when we have no errors, the computed
// CRC32 will be the mask itself
if (crc32 == CRC32_XOR_MASK)
{
printf ("No errors detected!\n");
}
else
{
printf ("Errors detected!\n");
}
}
Can anyone share a sample code on how to implement CBCBlockCipherMac in objective C. here is how far I got and its giving a different result from the java implementation.
const unsigned char key[16] = "\x1\x2\x3\x4\x5\x6\x7\x8\x9\x0\x1\x2\x3\x4\x5\x6";
const unsigned char data[14] = "\x54\x68\x69\x73\x69\x73\x6d\x79\x73\x74\x72\x69\x6e\x67";
CMAC_CTX *ctx = CMAC_CTX_new();
ret = CMAC_Init(ctx, key, sizeof(key), EVP_des_ede3(), 0);
printf("CMAC_Init = %d\n", ret);
ret = CMAC_Update(ctx, data, sizeof(data));
printf("CMAC_Update = %d\n", ret);
size_t size;
//unsigned int size;
unsigned char tag[4];
ret = CMAC_Final(ctx, tag, &size);
printf("CMAC_Final = %d, size = %u\n", ret, size);
CMAC_CTX_free(ctx);
printf("expected: 391d1520\n"
"got: ");
size_t index;
for (index = 0; index < sizeof(tag) - 1; ++index) {
printf("%02x", tag[index]);
if ((index + 1) % 4 == 0) {
printf(" ");
}
}
printf("%02x\n", tag[sizeof(tag) - 1]);
And my java code looks like this
String *data = "Thisismystring";
String *keyString = "1234567890123456";
bytes[]mac = new byte[4];
CBCBlockCipherMac macCipher = new CBCBlockCipherMac(DESedeEngine);
DESedeParameters keyParameter = new DESedeParameters(keyString.getBytes());
DESedeEngine engine = new DESedeEngine();
engine,init(true, keyParameter);
byte[] dataBytes = data.getBytes();
macCipher.update(dataBytes,0,data.length());
macCipher.doFinal(mac,0);
byte[] macBytesEncoded = Hex.encode(mac);
String macString = new String(macBytesEncoded);
This gives me "391d1520". But the objective c gives me "01000000"
CMAC is not the same as CBC MAC. CMAC has an an additional step at the beginning and the end of the calculation. If possible I would suggest you upgrade your Java code to use CMAC, as CBC is not as secure, e.g. using org.bouncycastle.crypto.macs.CMac.
OpenSSL does not seem to implement CBC MAC directly (at least, I cannot find any reference to it). So if you need it, you need to implement it yourself.
You can use CBC mode encryption with a zero IV and take the last 16 bytes of the encryption. Of course, this means you need to store the rest of the ciphertext in a buffer somewhere, or you need to use the update functions smartly (reusing the same buffer over and over again for the ciphertext).
I have implemented the following code on an embedded platform that attempts to communicate with an XBee. The embedded platform that executes the code below is not an xbee:
int main()
{
char payload[12] = {0x61,0x88,0x00,0x64,0x00,0x00,0x00,0x00,0x00,0xEC,0x00,0x00}
payload[2] = 0x10;
payload[9] = 0x01;
char data = 'H'; // Send simple ASCII character to XBee
payload[11]= data;
while (1)
sendByteofData(payload,12);
}
void sendByteOfData(char * payload, int len)
{
int x;
for (x=0;x<4;x++)
// This function sends IEEE 802.15.4 frames, and I know it
// works because they are detected in the [sniffer][3].
send_IEEE_802_15_4_frame(payload,len);
}
payload[2] = payload[2] % 256 + 1;
payload[9] = payload[9] % 256 + 1;
if (payload[9] % 256 == 0 )
payload[9] = 0x01;
else
payload[9] %= 256;
}
To my surprise the above code actually sent one byte from the embedded platform to the XBee successfully. however, the infinite loop at the end of main() should have produced a stream of bytes.
My suspicion is I need to set payload[2] and payload[9] correctly, and there is probably a flaw in the incremental modulo 256 algorithm shown above.
How do I get a continuous stream of bytes?
A few thoughts...
Make your payload an array of unsigned char or, even better, uint8_t.
To update payload[2] and payload[9], simplify your code:
++payload[2];
++payload[9];
if (payload[9] == 0) payload[9] = 1;
Add a delay between sends. You might even need to wait for a response before sending the next character.
Since it's a payload of unsigned 8-bit values, they'll automatically roll from 255 to 0. I assume your special case code for payload[9] is trying to roll from 255 to 1 (instead of 0).
Make sure your payload doesn't need to include a checksum of some sort. Updating those two bytes would have an affect on a checksum byte.