Question regarding ip checksum code - header

unsigned short /* this function generates header checksums */
csum (unsigned short *buf, int nwords)
{
unsigned long sum;
for (sum = 0; nwords > 0; nwords--) // add words(16bits) together
{
sum += *buf++;
}
sum = (sum >> 16) + (sum & 0xffff); //add carry over
sum += (sum >> 16); //MY question: what exactly does this step do??? add possible left-over
//byte? But hasn't it already been added in the loop (if
//any)?
return ((unsigned short) ~sum);
}
I assume nwords in the number of 16bits word, not 8bits byte (if there are odd byte, nword is rounded to next large), is it correct? Say ip_hdr has 27 bytes totally, then nword will be 14 instead of 13, right?
The line sum = (sum >> 16) + (sum & 0xffff) is to add carry over to make 16bit complement
sum += (sum >> 16); What's the purpose of this step? Add left-over byte? But left-over byte has already been added in the loop?
Thanks!

You're correct. Step 3 condenses sum, a 32-bit long, into a 16-bit unsigned short, which is the length of the checksum. This is for performance purposes, allowing one to calculate the checksum without tracking overflow until the end. It does this both in step 2 and step 3 because it may have overflowed from step 2. Then it returns just the inverted lower 16 bits of sum.
This is a bit clearer:
http://www.sysnet.ucsd.edu/~cfleizac/iptcphdr.html

Related

Implementing a side channel timing attack

I'm working on a project implementing a side channel timing attack in C on HMAC. I've done so by computing the hex encoded tag and brute forcing byte-by-byte by taking advantage of strcmp's timing optimization. So for every digit in my test tag, I calculate the amount of time it takes for every hex char to verify. I take the hex char that corresponds to the highest amount of time calculated and infer that it is the correct char in the tag and move on to the next byte. However, strcmp's timing is very unpredictable. Although it is easy to see the timing differences between comparing two equal strings and two totally different strings, I'm having difficulty finding the char that takes my test string the most time to compute when every other string I'm comparing to is very similar (only differing by 1 byte).
The changeByte method below takes in customTag, which is the tag that has been computed up to that point in time and attempts to find the correct byte corresponding to index. changeByte is called n time where n=length of the tag. hexTag is a global variable that is the correct tag. timeCompleted stores the average time taken to compute the testTag at each of the hex characters for a char position. Any help would be appreciated, thank you for your time.
// Checks if the index of the given byte is correct or not
void changeByte(unsigned char *k, unsigned char * m, unsigned char * algorithm, unsigned char * customTag, int index)
{
long iterations=50000;
// used for every byte sequence to test the timing
unsigned char * tempTag = (unsigned char *)(malloc(sizeof (unsigned char)*(strlen(customTag)+1 ) ));
sprintf(tempTag, "%s", customTag);
int timeIndex=0;
// stores the time completed for every respective ascii char
double * timeCompleted = (double *)(malloc (sizeof (double) * 16));
// iterates through hex char 0-9, a-f
for (int i=48; i<=102;i++){
if (i >= 58 && i <=96)continue;
double total=0;
for (long j=0; j<iterations; j++){
// calculates the time it takes to complete for every char in that position
tempTag[index]=(unsigned char)i;
struct rusage usage;
struct timeval start, end;
getrusage(RUSAGE_SELF, &usage);
start=usage.ru_stime;
for (int k=0; k<50000; k++)externalStrcmp(tempTag, hexTag); // this is just calling strcmp in another file
getrusage (RUSAGE_SELF, &usage);
end=usage.ru_stime;
}
double startTime=((double)start.tv_sec + (double)start.tv_usec)/10000;
double endTime=((double)end.tv_sec+(double)end.tv_usec)/10000;
total+=endTime-startTime;
}
double val=total/iterations;
timeCompleted[timeIndex]=val;
timeIndex++;
}
// sets next char equal to the hex char corresponding to the index
customTag[index]=getCorrectChar (timeCompleted);
free(timeCompleted);
free(tempTag);
}
// finds the highest time. The hex char corresponding with the highest time it took the
// verify function to complete is the correct one
unsigned char getCorrectChar(double * timeCompleted)
{
double high =-1;
int index=0;
for (int i=0; i<16; i++){
if (timeCompleted[i]>high){
high=timeCompleted[i];
index=i;
}
}
return (index+48)<=57 ?(unsigned char) (index+48) : (unsigned char)(index+87);
}
I'm not sure if it's the main problem, but you add seconds to microseconds directly as though 1us == 1s. It will give wrong results when number of seconds in startTime and endTime differs.
And the scaling factor between usec and sec is 1 000 000 (thx zaph). So that should work better:
double startTime=(double)start.tv_sec + (double)start.tv_usec/1000000;
double endTime=(double)end.tv_sec + (double)end.tv_usec/1000000;

Sum of Hamming Distances

I started preparing for an interview and came across this problem:
An array of integers is given
Now calculate the sum of Hamming distances of all pairs of integers in the array in their binary representation.
Example:
given {1,2,3} or {001,010,011} (used 3 bits just to simplify)
result= HD(001,010)+HD(001,011)+HD(010,011)= 2+1+1=4;
The only optimization, from a purely brute force solution, I know I can use here, is in the individual calculation of Hamming Distance as seen here:
int hamming_distance(unsigned x, unsigned y)
{
int dist;
unsigned val;
dist = 0;
val = x ^ y; // XOR
// Count the number of bits set
while (val != 0)
{
// A bit is set, so increment the count and clear the bit
dist++;
val &= val - 1;
}
// Return the number of differing bits
return dist;
}
What's the best way to go about solving this problem?
Here is my C++ implementation, with O(n) complexity and O(1) space.
int sumOfHammingDistance(vector<unsigned>& nums) {
int n = sizeof(unsigned) * 8;
int len = nums.size();
vector<int> countOfOnes(n, 0);
for (int i = 0; i < len; i++) {
for (int j = 0; j < n; j++) {
countOfOnes[j] += (nums[i] >> j) & 1;
}
}
int sum = 0;
for (int count: countOfOnes) {
sum += count * (len - count);
}
return sum;
}
You can consider the bit-positions separately. That gives you 32 (or some other number) of easier problems, where you still have to calculate the sum of all pairs of hamming distances, except now it's over 1-bit numbers.
The hamming distance between two 1-bit numbers is their XOR.
And now it has become the easiest case of this problem - it's already split per bit.
So to reiterate the answer to that question, you take a bit position, count the number of 0's and the number of 1's, multiply those to get the contribution of this bit position. Sum those for all bit positions. It's even simpler than the linked problem, because the weight of the contribution of every bit is 1 in this problem.

Efficient algorithm to convert(sum) 128-bit data in q-register to 16-bit data

I have 128-bit data in q-register. I want to sum the individual 16-bit block in this q-register to finally have a 16-bit final sum (any carry beyond 16-bit should be taken and added to the LSB of this 16-bit num).
what I want to achieve is:
VADD.U16 (some 16-bit variable) {q0[0] q0[1] q0[2] ......... q0[7]}
but using intrinsics,
would appreciate if someone could give me an algorithm for this.
I tried using pair-wise addition, but I'm ending up with rather a clumsy solution..
Heres how it looks:
int convert128to16(uint16x8_t data128){
uint16_t data16 = 0;
uint16x4_t ddata;
print16(data128);
uint32x4_t data = vpaddlq_u16(data128);
print32(data);
uint16x4_t data_hi = vget_high_u16(data);
print16x4(data_hi);
uint16x4_t data_low = vget_low_u16(data);
print16x4(data_low);
ddata = vpadd_u16( data_hi, data_low);
print16x4(ddata);
}
It's still incomplete and a bit clumsy.. Any help would be much appreciated.
You can use the horizontal add instructions:
Here is a fragment:
uint16x8_t input = /* load your data128 here */
uint64x2_t temp = vpaddlq_u32 (vpaddlq_u16 (input));
uint64x1_t result = vadd_u64 (vget_high_u64 (temp),
vget_low_u64 (temp));
// result now contains the sum of all 16 bit unsigned words
// stored in data128.
// to add the values that overflow from 16 bit just do another 16 bit
// horizontal addition and return the lowest 16 bit as the final result:
uint16x4_t w = vpadd_u16 (
vreinterpret_u16_u64 (result),
vreinterpret_u16_u64 (result));
uint16_t wrappedResult = vget_lane_u16 (w, 0);
I f your goal is to sum the 16 bit chunks (modulo 16 bit), the following fragment would do:
uin16_t convert128to16(uint16x8_t data128){
data128 += (data128 >> 64);
data128 += (data128 >> 32);
data128 += (data128 >> 16);
return data128 & 0xffff;
}

Non repeating random numbers in Objective-C

I'm using
for (int i = 1, i<100, i++)
int i = arc4random() % array count;
but I'm getting repeats every time. How can I fill out the chosen int value from the range, so that when the program loops I will not get any dupe?
It sounds like you want shuffling of a set rather than "true" randomness. Simply create an array where all the positions match the numbers and initialize a counter:
num[ 0] = 0
num[ 1] = 1
: :
num[99] = 99
numNums = 100
Then, whenever you want a random number, use the following method:
idx = rnd (numNums); // return value 0 through numNums-1
val = num[idx]; // get then number at that position.
num[idx] = val[numNums-1]; // remove it from pool by overwriting with highest
numNums--; // and removing the highest position from pool.
return val; // give it back to caller.
This will return a random value from an ever-decreasing pool, guaranteeing no repeats. You will have to beware of the pool running down to zero size of course, and intelligently re-initialize the pool.
This is a more deterministic solution than keeping a list of used numbers and continuing to loop until you find one not in that list. The performance of that sort of algorithm will degrade as the pool gets smaller.
A C function using static values something like this should do the trick. Call it with
int i = myRandom (200);
to set the pool up (with any number zero or greater specifying the size) or
int i = myRandom (-1);
to get the next number from the pool (any negative number will suffice). If the function can't allocate enough memory, it will return -2. If there's no numbers left in the pool, it will return -1 (at which point you could re-initialize the pool if you wish). Here's the function with a unit testing main for you to try out:
#include <stdio.h>
#include <stdlib.h>
#define ERR_NO_NUM -1
#define ERR_NO_MEM -2
int myRandom (int size) {
int i, n;
static int numNums = 0;
static int *numArr = NULL;
// Initialize with a specific size.
if (size >= 0) {
if (numArr != NULL)
free (numArr);
if ((numArr = malloc (sizeof(int) * size)) == NULL)
return ERR_NO_MEM;
for (i = 0; i < size; i++)
numArr[i] = i;
numNums = size;
}
// Error if no numbers left in pool.
if (numNums == 0)
return ERR_NO_NUM;
// Get random number from pool and remove it (rnd in this
// case returns a number between 0 and numNums-1 inclusive).
n = rand() % numNums;
i = numArr[n];
numArr[n] = numArr[numNums-1];
numNums--;
if (numNums == 0) {
free (numArr);
numArr = 0;
}
return i;
}
int main (void) {
int i;
srand (time (NULL));
i = myRandom (20);
while (i >= 0) {
printf ("Number = %3d\n", i);
i = myRandom (-1);
}
printf ("Final = %3d\n", i);
return 0;
}
And here's the output from one run:
Number = 19
Number = 10
Number = 2
Number = 15
Number = 0
Number = 6
Number = 1
Number = 3
Number = 17
Number = 14
Number = 12
Number = 18
Number = 4
Number = 9
Number = 7
Number = 8
Number = 16
Number = 5
Number = 11
Number = 13
Final = -1
Keep in mind that, because it uses statics, it's not safe for calling from two different places if they want to maintain their own separate pools. If that were the case, the statics would be replaced with a buffer (holding count and pool) that would "belong" to the caller (a double-pointer could be passed in for this purpose).
And, if you're looking for the "multiple pool" version, I include it here for completeness.
#include <stdio.h>
#include <stdlib.h>
#define ERR_NO_NUM -1
#define ERR_NO_MEM -2
int myRandom (int size, int *ppPool[]) {
int i, n;
// Initialize with a specific size.
if (size >= 0) {
if (*ppPool != NULL)
free (*ppPool);
if ((*ppPool = malloc (sizeof(int) * (size + 1))) == NULL)
return ERR_NO_MEM;
(*ppPool)[0] = size;
for (i = 0; i < size; i++) {
(*ppPool)[i+1] = i;
}
}
// Error if no numbers left in pool.
if (*ppPool == NULL)
return ERR_NO_NUM;
// Get random number from pool and remove it (rnd in this
// case returns a number between 0 and numNums-1 inclusive).
n = rand() % (*ppPool)[0];
i = (*ppPool)[n+1];
(*ppPool)[n+1] = (*ppPool)[(*ppPool)[0]];
(*ppPool)[0]--;
if ((*ppPool)[0] == 0) {
free (*ppPool);
*ppPool = NULL;
}
return i;
}
int main (void) {
int i;
int *pPool;
srand (time (NULL));
pPool = NULL;
i = myRandom (20, &pPool);
while (i >= 0) {
printf ("Number = %3d\n", i);
i = myRandom (-1, &pPool);
}
printf ("Final = %3d\n", i);
return 0;
}
As you can see from the modified main(), you need to first initialise an int pointer to NULL then pass its address to the myRandom() function. This allows each client (location in the code) to have their own pool which is automatically allocated and freed, although you could still share pools if you wish.
You could use Format-Preserving Encryption to encrypt a counter. Your counter just goes from 0 upwards, and the encryption uses a key of your choice to turn it into a seemingly random value of whatever radix and width you want.
Block ciphers normally have a fixed block size of e.g. 64 or 128 bits. But Format-Preserving Encryption allows you to take a standard cipher like AES and make a smaller-width cipher, of whatever radix and width you want (e.g. radix 2, width 16), with an algorithm which is still cryptographically robust.
It is guaranteed to never have collisions (because cryptographic algorithms create a 1:1 mapping). It is also reversible (a 2-way mapping), so you can take the resulting number and get back to the counter value you started with.
AES-FFX is one proposed standard method to achieve this. I've experimented with some basic Python code which is based on the AES-FFX idea, although not fully conformant--see Python code here. It can e.g. encrypt a counter to a random-looking 7-digit decimal number, or a 16-bit number.
You need to keep track of the numbers you have already used (for instance, in an array). Get a random number, and discard it if it has already been used.
Without relying on external stochastic processes, like radioactive decay or user input, computers will always generate pseudorandom numbers - that is numbers which have many of the statistical properties of random numbers, but repeat in sequences.
This explains the suggestions to randomise the computer's output by shuffling.
Discarding previously used numbers may lengthen the sequence artificially, but at a cost to the statistics which give the impression of randomness.
The best way to do this is create an array for numbers already used. After a random number has been created then add it to the array. Then when you go to create another random number, ensure that it is not in the array of used numbers.
In addition to using secondary array to store already generated random numbers, invoking random no. seeding function before every call of random no. generation function might help to generate different seq. of random numbers in every run.

Functions to compress and uncompress array of integers

I was recently asked to complete a task for a c++ role, however as the application was decided not to be progressed any further I thought that I would post here for some feedback / advice / improvements / reminder of concepts I've forgotten.
The task was:
The following data is a time series of integer values
int timeseries[32] = {67497, 67376, 67173, 67235, 67057, 67031, 66951,
66974, 67042, 67025, 66897, 67077, 67082, 67033, 67019, 67149, 67044,
67012, 67220, 67239, 66893, 66984, 66866, 66693, 66770, 66722, 66620,
66579, 66596, 66713, 66852, 66715};
The series might be, for example, the closing price of a stock each day
over a 32 day period.
As stored above, the data will occupy 32 x sizeof(int) bytes = 128 bytes
assuming 4 byte ints.
Using delta encoding , write a function to compress, and a function to
uncompress data like the above.
Ok, so before this point I had never looked at compression so my solution is far from perfect. The manner in which I approached the problem is by compressing the array of integers into a array of bytes. When representing the integer as a byte I keep the calculate most
significant byte (msb) and keep everything up to this point, whilst throwing the rest away. This is then added to the byte array. For negative values I increment the msb by 1 so that we can
differentiate between positive and negative bytes when decoding by keeping the leading
1 bit values.
When decoding I parse this jagged byte array and simply reverse my
previous actions performed when compressing. As mentioned I have never looked at compression prior to this task so I did come up with my own method to compress the data. I was looking at C++/Cli recently, had not really used it previously so just decided to write it in this language, no particular reason. Below is the class, and a unit test at the very bottom. Any advice / improvements / enhancements will be much appreciated.
Thanks.
array<array<Byte>^>^ CDeltaEncoding::CompressArray(array<int>^ data)
{
int temp = 0;
int original;
int size = 0;
array<int>^ tempData = gcnew array<int>(data->Length);
data->CopyTo(tempData, 0);
array<array<Byte>^>^ byteArray = gcnew array<array<Byte>^>(tempData->Length);
for (int i = 0; i < tempData->Length; ++i)
{
original = tempData[i];
tempData[i] -= temp;
temp = original;
int msb = GetMostSignificantByte(tempData[i]);
byteArray[i] = gcnew array<Byte>(msb);
System::Buffer::BlockCopy(BitConverter::GetBytes(tempData[i]), 0, byteArray[i], 0, msb );
size += byteArray[i]->Length;
}
return byteArray;
}
array<int>^ CDeltaEncoding::DecompressArray(array<array<Byte>^>^ buffer)
{
System::Collections::Generic::List<int>^ decodedArray = gcnew System::Collections::Generic::List<int>();
int temp = 0;
for (int i = 0; i < buffer->Length; ++i)
{
int retrievedVal = GetValueAsInteger(buffer[i]);
decodedArray->Add(retrievedVal);
decodedArray[i] += temp;
temp = decodedArray[i];
}
return decodedArray->ToArray();
}
int CDeltaEncoding::GetMostSignificantByte(int value)
{
array<Byte>^ tempBuf = BitConverter::GetBytes(Math::Abs(value));
int msb = tempBuf->Length;
for (int i = tempBuf->Length -1; i >= 0; --i)
{
if (tempBuf[i] != 0)
{
msb = i + 1;
break;
}
}
if (!IsPositiveInteger(value))
{
//We need an extra byte to differentiate the negative integers
msb++;
}
return msb;
}
bool CDeltaEncoding::IsPositiveInteger(int value)
{
return value / Math::Abs(value) == 1;
}
int CDeltaEncoding::GetValueAsInteger(array<Byte>^ buffer)
{
array<Byte>^ tempBuf;
if(buffer->Length % 2 == 0)
{
//With even integers there is no need to allocate a new byte array
tempBuf = buffer;
}
else
{
tempBuf = gcnew array<Byte>(4);
System::Buffer::BlockCopy(buffer, 0, tempBuf, 0, buffer->Length );
unsigned int val = buffer[buffer->Length-1] &= 0xFF;
if ( val == 0xFF )
{
//We have negative integer compressed into 3 bytes
//Copy over the this last byte as well so we keep the negative pattern
System::Buffer::BlockCopy(buffer, buffer->Length-1, tempBuf, buffer->Length, 1 );
}
}
switch(tempBuf->Length)
{
case sizeof(short):
return BitConverter::ToInt16(tempBuf,0);
case sizeof(int):
default:
return BitConverter::ToInt32(tempBuf,0);
}
}
And then in a test class I had:
void CTestDeltaEncoding::TestCompression()
{
array<array<Byte>^>^ byteArray = CDeltaEncoding::CompressArray(m_testdata);
array<int>^ decompressedArray = CDeltaEncoding::DecompressArray(byteArray);
int totalBytes = 0;
for (int i = 0; i<byteArray->Length; i++)
{
totalBytes += byteArray[i]->Length;
}
Assert::IsTrue(m_testdata->Length * sizeof(m_testdata) > totalBytes, "Expected the total bytes to be less than the original array!!");
//Expected totalBytes = 53
}
This smells a lot like homework to me. The crucial phrase is: "Using delta encoding."
Delta encoding means you encode the delta (difference) between each number and the next:
67497, 67376, 67173, 67235, 67057, 67031, 66951, 66974, 67042, 67025, 66897, 67077, 67082, 67033, 67019, 67149, 67044, 67012, 67220, 67239, 66893, 66984, 66866, 66693, 66770, 66722, 66620, 66579, 66596, 66713, 66852, 66715
would turn into:
[Base: 67497]: -121, -203, +62
and so on. Assuming 8-bit bytes, the original numbers require 3 bytes apiece (and given the number of compilers with 3-byte integer types, you're normally going to end up with 4 bytes apiece). From the looks of things, the differences will fit quite easily in 2 bytes apiece, and if you can ignore one (or possibly two) of the least significant bits, you can fit them in one byte apiece.
Delta encoding is most often used for things like sound encoding where you can "fudge" the accuracy at times without major problems. For example, if you have a change from one sample to the next that's larger than you've left space to encode, you can encode a maximum change in the current difference, and add the difference to the next delta (and if you don't mind some back-tracking, you can distribute some to the previous delta as well). This will act as a low-pass filter, limiting the gradient between samples.
For example, in the series you gave, a simple delta encoding requires ten bits to represent all the differences. By dropping the LSB, however, nearly all the samples (all but one, in fact) can be encoded in 8 bits. That one has a difference (right shifted one bit) of -173, so if we represent it as -128, we have 45 left. We can distribute that error evenly between the preceding and following sample. In that case, the output won't be an exact match for the input, but if we're talking about something like sound, the difference probably won't be particularly obvious.
I did mention that it was an exercise that I had to complete and the solution that I received was deemed not good enough, so I wanted some constructive feedback seeing as actual companies never decide to tell you what you did wrong.
When the array is compressed I store the differences and not the original values except the first as this was my understanding. If you had looked at my code I have provided a full solution but my question was how bad was it?