Here are two similar constraint blocks, one written using decimal notation, and the other using hexadecimal notation. The first works as expected, but the second only generates positive values (including 0) out of the 5 available values:
-- positive and negative values generated as expected
var rnd_byte : int(bits: 8);
for i from 0 to 9 {
gen rnd_byte keeping {
soft it == select {
90 : [-1, -128 , 127, 1];
10 : 0x00;
};
};
print rnd_byte;
};
-- only positive values (including 0) generated!!!
var rnd_byte : int(bits: 8);
for i from 0 to 9 {
gen rnd_byte keeping {
soft it == select {
90 : [0xFF, 0x80, 0x7F, 0x01];
10 : 0x00;
};
};
print rnd_byte;
};
How can I make the second example behave as the first one, but keep the hexadecimal notation. I don't want to write large decimal numbers.
some more about this issue - with procedural code there is auto casting. so you can write
var rnd_byte : int( bits : 8);
rnd_byte = 0xff;
and it will result with rnd_byte == -1.
constraints work with int (bits :8 ) semantics, and this code would fail:
var rnd_byte : int( bits : 8);
gen rnd_byte keeping {it == 0xff};
as suggested - for getting 0xff - define the field as unsigned.
0xff and 0x80 are not in the range of the rnd_byte data type. You need to declare rnd_byte as uint(bits:8).
Alternatively, try to typecast the literals (I could not verify the syntax):
(0xff).as_a(int(bits:8))
In procedural code, automatic casting between numeric types takes care of the absolute majority of cases. However, in generation numbers are viewed by their natural values, as in int(bits:*) semantics. Hex notation means the value is unsigned.
Related
Imagine this piece of code:
void Function(int16 *src, int *indices, float *dst, int cnt, float mul)
{
for (int i=0; i<cnt; i++) dst[i] = float(src[indices[i]]) * mul;
};
This really asks for gather intrinsics e.g. _mm_i32gather_epi32. I got great success with these when loading floats, but are there any for 16-bit ints? Another problem here is that I need to transition from 16-bits on the input to 32-bits (float) on the output.
There is indeed no instruction to gather 16bit integers, but (assuming there is no risk of memory-access violation) you can just load 32bit integers starting at the corresponding addresses, and mask out the upper halves of each value.
For uint16_t this would be a simple bit-and, for signed integers you can shift the values to the left in order to have the sign bit at the most-significant position. You can then (arithmetically) shift back the values before converting them to float, or, since you multiply them anyway, just scale the multiplication factor accordingly.
Alternatively, you could load from two bytes earlier and arithmetically shift to the right. Either way, your bottle-neck will likely be the load-ports (vpgatherdd requires 8 load-uops. Together with the load for the indices you have 9 loads distributed on two ports, which should result in 4.5 cycles for 8 elements).
Untested possible AVX2 implementation (does not handle the last elements, if cnt is not a multiple of 8 just execute your original loop at the end):
void Function(int16_t const *src, int const *indices, float *dst, size_t cnt, float mul_)
{
__m256 mul = _mm256_set1_ps(mul_*float(1.0f/0x10000));
for (size_t i=0; i+8<=cnt; i+=8){ // todo handle last elements
// load indicies:
__m256i idx = _mm256_loadu_si256(reinterpret_cast<__m256i const*>(indices + i));
// load 16bit integers in the lower halves + garbage in the upper halves:
__m256i values = _mm256_i32gather_epi32(reinterpret_cast<int const*>(src), idx, 2);
// shift each value to upper half (removes garbage, makes sure sign is at the right place)
// values are too large by a factor of 0x10000
values = _mm256_slli_epi32(values, 16);
// convert to float, scale and multiply:
__m256 fvalues = _mm256_mul_ps(_mm256_cvtepi32_ps(values), mul);
// store result
_mm256_storeu_ps(dst, fvalues);
}
}
Porting this to AVX-512 should be straight-forward.
I have the following sequence:
extend CONFIG_ADC_CLK ocp_master_sequence_q {
divide_by : uint(bits:4);
align_by : uint(bits:4);
body()#driver.clock is {
var div : uint(bits:3);
case divide_by {
1 : { div = 0; };
2 : { div = 1; };
4 : { div = 2; };
8 : { div = 3; };
16 : { div = 4; };
default : { dut_error(divide_by," is not a legal Clock division for ADC"); };
};
gad_regs.gad_clk_gen.clk_algn = align_by;
gad_regs.gad_clk_gen.clk_dev = div;
do WR_REG seq keeping {.reg==gad_regs.gad_clk_gen;};
};
};//extend CONFIG_ADC_CLK ocp_master_sequence_q {
In the test I use the sequence :
do CONFIG_ADC_CLK seq keeping {.divide_by== 3;.align_by==0;};
For some reason the compiler refer the number of the field divide_byas hex number instead of decimal.
How can I ensure that it will refer it as decimal?
This is not related to sequences and not related to how numbers are assigned to fields. It's just about how numeric values are formatted in printing and string operations. The actual value of a field has nothing to do with how it is printed.
By default, dut_error(), message(), out(), append() and other string formatting routines use the current setting of config print -radix. So, you probably have it set to HEX in your environment.
If you need this specific dut_error() to always use decimal format, no matter what the config setting is, you can use dec(), like this:
dut_error(dec(divide_by)," is not a legal Clock division for ADC");
By the way, when using the second variant of those routines, such as dut_errorf() or appendf(), you can determine the radix by providing the right % parameter, e.g., %d for decimals or %x for hexa, for example, the above dut_error() might be rewritten as:
dut_errorf("%d is not a legal Clock division for ADC", divide_by);
Here, you can also use %s, in which case the config radix setting is still used.
I have the following code:
#include <stdio.h>
int main(void) {
char list[3][7] = { "One", "Two", "Three"} ;
char item[7]; // originally I had posted "char item[3];" by mistake
int i;
for( i=0; i<2; i++ ) {
sprintf(item, "%-7s", list[i]);
printf( "%d %s", i, item );
}
printf("\n\r");
for( i=0; i<2; i++ ) {
sprintf(item, "%-7s", list[i]);
printf( "%d %s", i, item );
}
printf("\n\r");
return 0;
}
I expect the following output
0 One 1 Two
0 One 1 Two
However, instead I get:
0 One 1 Two
0 1 Two
Note the missing text "One" the second time it prints.
Can someone explain what's happening here?
Thanks!
With item declared as:
char item[7];
the code exposes undefined behaviour because sprintf(item, "%-7s") attempts to write at least 8 characters into item.
The documentation of sprintf() explains (the emphasis is mine):
Writes the results to a character string buffer. The behavior is undefined if the string to be written (plus the terminating null character) exceeds the size of the array pointed to by buffer.
-7 in the format string "%-7s" is interpreted as:
(optional) integer value or * that specifies minimum field width. The result is padded with space characters (by default), if required, on the left when right-justified, or on the right if left-justified. In the case when * is used, the width is specified by an additional argument of type int. If the value of the argument is negative, it results with the - flag specified and positive field width. (Note: This is the minimum width: The value is never truncated.)
In order to avoid the undefined behaviour, the size of item must be at least 8 but keep in mind that if the string to format is longer than 7 characters it is not truncated, the result becomes longer than 8 characters and it overflows item again.
Why you get the output you get?
The calls to sprintf(item, "%-7s", list[i]); in the first loop write 8 characters in a buffer of 7 characters. The extra character (which is \0) incidentally happens to overwrite the first character of list[0] changing it into an empty string. This is just one random behaviour, compiling the code with a different compiler or different compiling options could produce a different behaviour.
When you do the sprintf(item, "%-7s", list[i]); you are, essentially, copying the string from list[i] into your char array item.
So list[2] -> "three" is 5 chars plus the nul terminator, but item is only 3 chars long -- you are overflowing item and writing over some other memory, which could very well be part of list.
Change item to be char item[7] so it matches the length of 7 declared in your 2nd dimension in list[3][7]. When I did that I got your expected output.
(I used https://repl.it/languages/C to test)
I have 128-bit data in q-register. I want to sum the individual 16-bit block in this q-register to finally have a 16-bit final sum (any carry beyond 16-bit should be taken and added to the LSB of this 16-bit num).
what I want to achieve is:
VADD.U16 (some 16-bit variable) {q0[0] q0[1] q0[2] ......... q0[7]}
but using intrinsics,
would appreciate if someone could give me an algorithm for this.
I tried using pair-wise addition, but I'm ending up with rather a clumsy solution..
Heres how it looks:
int convert128to16(uint16x8_t data128){
uint16_t data16 = 0;
uint16x4_t ddata;
print16(data128);
uint32x4_t data = vpaddlq_u16(data128);
print32(data);
uint16x4_t data_hi = vget_high_u16(data);
print16x4(data_hi);
uint16x4_t data_low = vget_low_u16(data);
print16x4(data_low);
ddata = vpadd_u16( data_hi, data_low);
print16x4(ddata);
}
It's still incomplete and a bit clumsy.. Any help would be much appreciated.
You can use the horizontal add instructions:
Here is a fragment:
uint16x8_t input = /* load your data128 here */
uint64x2_t temp = vpaddlq_u32 (vpaddlq_u16 (input));
uint64x1_t result = vadd_u64 (vget_high_u64 (temp),
vget_low_u64 (temp));
// result now contains the sum of all 16 bit unsigned words
// stored in data128.
// to add the values that overflow from 16 bit just do another 16 bit
// horizontal addition and return the lowest 16 bit as the final result:
uint16x4_t w = vpadd_u16 (
vreinterpret_u16_u64 (result),
vreinterpret_u16_u64 (result));
uint16_t wrappedResult = vget_lane_u16 (w, 0);
I f your goal is to sum the 16 bit chunks (modulo 16 bit), the following fragment would do:
uin16_t convert128to16(uint16x8_t data128){
data128 += (data128 >> 64);
data128 += (data128 >> 32);
data128 += (data128 >> 16);
return data128 & 0xffff;
}
I was recently asked to complete a task for a c++ role, however as the application was decided not to be progressed any further I thought that I would post here for some feedback / advice / improvements / reminder of concepts I've forgotten.
The task was:
The following data is a time series of integer values
int timeseries[32] = {67497, 67376, 67173, 67235, 67057, 67031, 66951,
66974, 67042, 67025, 66897, 67077, 67082, 67033, 67019, 67149, 67044,
67012, 67220, 67239, 66893, 66984, 66866, 66693, 66770, 66722, 66620,
66579, 66596, 66713, 66852, 66715};
The series might be, for example, the closing price of a stock each day
over a 32 day period.
As stored above, the data will occupy 32 x sizeof(int) bytes = 128 bytes
assuming 4 byte ints.
Using delta encoding , write a function to compress, and a function to
uncompress data like the above.
Ok, so before this point I had never looked at compression so my solution is far from perfect. The manner in which I approached the problem is by compressing the array of integers into a array of bytes. When representing the integer as a byte I keep the calculate most
significant byte (msb) and keep everything up to this point, whilst throwing the rest away. This is then added to the byte array. For negative values I increment the msb by 1 so that we can
differentiate between positive and negative bytes when decoding by keeping the leading
1 bit values.
When decoding I parse this jagged byte array and simply reverse my
previous actions performed when compressing. As mentioned I have never looked at compression prior to this task so I did come up with my own method to compress the data. I was looking at C++/Cli recently, had not really used it previously so just decided to write it in this language, no particular reason. Below is the class, and a unit test at the very bottom. Any advice / improvements / enhancements will be much appreciated.
Thanks.
array<array<Byte>^>^ CDeltaEncoding::CompressArray(array<int>^ data)
{
int temp = 0;
int original;
int size = 0;
array<int>^ tempData = gcnew array<int>(data->Length);
data->CopyTo(tempData, 0);
array<array<Byte>^>^ byteArray = gcnew array<array<Byte>^>(tempData->Length);
for (int i = 0; i < tempData->Length; ++i)
{
original = tempData[i];
tempData[i] -= temp;
temp = original;
int msb = GetMostSignificantByte(tempData[i]);
byteArray[i] = gcnew array<Byte>(msb);
System::Buffer::BlockCopy(BitConverter::GetBytes(tempData[i]), 0, byteArray[i], 0, msb );
size += byteArray[i]->Length;
}
return byteArray;
}
array<int>^ CDeltaEncoding::DecompressArray(array<array<Byte>^>^ buffer)
{
System::Collections::Generic::List<int>^ decodedArray = gcnew System::Collections::Generic::List<int>();
int temp = 0;
for (int i = 0; i < buffer->Length; ++i)
{
int retrievedVal = GetValueAsInteger(buffer[i]);
decodedArray->Add(retrievedVal);
decodedArray[i] += temp;
temp = decodedArray[i];
}
return decodedArray->ToArray();
}
int CDeltaEncoding::GetMostSignificantByte(int value)
{
array<Byte>^ tempBuf = BitConverter::GetBytes(Math::Abs(value));
int msb = tempBuf->Length;
for (int i = tempBuf->Length -1; i >= 0; --i)
{
if (tempBuf[i] != 0)
{
msb = i + 1;
break;
}
}
if (!IsPositiveInteger(value))
{
//We need an extra byte to differentiate the negative integers
msb++;
}
return msb;
}
bool CDeltaEncoding::IsPositiveInteger(int value)
{
return value / Math::Abs(value) == 1;
}
int CDeltaEncoding::GetValueAsInteger(array<Byte>^ buffer)
{
array<Byte>^ tempBuf;
if(buffer->Length % 2 == 0)
{
//With even integers there is no need to allocate a new byte array
tempBuf = buffer;
}
else
{
tempBuf = gcnew array<Byte>(4);
System::Buffer::BlockCopy(buffer, 0, tempBuf, 0, buffer->Length );
unsigned int val = buffer[buffer->Length-1] &= 0xFF;
if ( val == 0xFF )
{
//We have negative integer compressed into 3 bytes
//Copy over the this last byte as well so we keep the negative pattern
System::Buffer::BlockCopy(buffer, buffer->Length-1, tempBuf, buffer->Length, 1 );
}
}
switch(tempBuf->Length)
{
case sizeof(short):
return BitConverter::ToInt16(tempBuf,0);
case sizeof(int):
default:
return BitConverter::ToInt32(tempBuf,0);
}
}
And then in a test class I had:
void CTestDeltaEncoding::TestCompression()
{
array<array<Byte>^>^ byteArray = CDeltaEncoding::CompressArray(m_testdata);
array<int>^ decompressedArray = CDeltaEncoding::DecompressArray(byteArray);
int totalBytes = 0;
for (int i = 0; i<byteArray->Length; i++)
{
totalBytes += byteArray[i]->Length;
}
Assert::IsTrue(m_testdata->Length * sizeof(m_testdata) > totalBytes, "Expected the total bytes to be less than the original array!!");
//Expected totalBytes = 53
}
This smells a lot like homework to me. The crucial phrase is: "Using delta encoding."
Delta encoding means you encode the delta (difference) between each number and the next:
67497, 67376, 67173, 67235, 67057, 67031, 66951, 66974, 67042, 67025, 66897, 67077, 67082, 67033, 67019, 67149, 67044, 67012, 67220, 67239, 66893, 66984, 66866, 66693, 66770, 66722, 66620, 66579, 66596, 66713, 66852, 66715
would turn into:
[Base: 67497]: -121, -203, +62
and so on. Assuming 8-bit bytes, the original numbers require 3 bytes apiece (and given the number of compilers with 3-byte integer types, you're normally going to end up with 4 bytes apiece). From the looks of things, the differences will fit quite easily in 2 bytes apiece, and if you can ignore one (or possibly two) of the least significant bits, you can fit them in one byte apiece.
Delta encoding is most often used for things like sound encoding where you can "fudge" the accuracy at times without major problems. For example, if you have a change from one sample to the next that's larger than you've left space to encode, you can encode a maximum change in the current difference, and add the difference to the next delta (and if you don't mind some back-tracking, you can distribute some to the previous delta as well). This will act as a low-pass filter, limiting the gradient between samples.
For example, in the series you gave, a simple delta encoding requires ten bits to represent all the differences. By dropping the LSB, however, nearly all the samples (all but one, in fact) can be encoded in 8 bits. That one has a difference (right shifted one bit) of -173, so if we represent it as -128, we have 45 left. We can distribute that error evenly between the preceding and following sample. In that case, the output won't be an exact match for the input, but if we're talking about something like sound, the difference probably won't be particularly obvious.
I did mention that it was an exercise that I had to complete and the solution that I received was deemed not good enough, so I wanted some constructive feedback seeing as actual companies never decide to tell you what you did wrong.
When the array is compressed I store the differences and not the original values except the first as this was my understanding. If you had looked at my code I have provided a full solution but my question was how bad was it?