This performs the Or8Way function, Why is the or1out[1] 0? - hdl

Shouldn't it be 1 ( 1or1 = 1; 1or0 = 1)

Or8way takes a 8bit bus as input and outputs a single bit bus. That bit is 1 if any of the bits in the input are 1, and 0 if all of the bits in the input are 0. Thus, there really is no or1out[1]. There is only or1out, a single-bit signal.
You can confirm this by looking at the definition comment for Or8Way:
/**
* 8-way Or:
* out = (in[0] or in[1] or ... or in[7])
*/

Related

Difference between 1 and 1'b1 in Verilog

What is the difference between just giving 1 and giving 1'b1 in verilog code?
The 1 is 32 bits wide, thus is the equivalent of 32'b00000000_00000000_00000000_00000001
The 1'b1 is one bit wide.
There are several places where you should be aware of the difference in length but the one most likely to catch you out is in concatenations. {}
reg [ 7:0] A;
reg [ 8:0] B;
assign A = 8'b10100101;
assign B = {1'b1,A}; // B is 9'b110100101
assign B = {1,A}; // B is 9'b110100101
assign B = {A,1'b1}; // B is 9'b101001011
assign B = {A,1}; // B is 9'b000000001 !!!!
So, what's the difference between, say,
logic [7:0] count;
...
count <= count + 1'b1;
and
logic [7:0] count;
...
count <= count + 1;
Not a lot. In the first case your simulator/synthesiser will do this:
i) expand the 1'b1 to 8'b1 (because count is 8 bits wide)
ii) do all the maths using 8 bits (because now everything is 8 bits wide).
In the second case your simulator/synthesiser will do this:
i) do all the maths using 32 bits (because 1 is 32 bits wide)
ii) truncate the 32-bit result to 8 bits wide (because count is 8 bits wide)
The behaviour will be the same. However, that is not always the case. This:
count <= (count * 8'd255) >> 8;
and this:
count <= (count * 255) >> 8;
will behave differently. In the first case, 8 bits will be used for the multiplication (the width of the 8 in the >> 8 is irrelevant) and so the multiplication will overflow; in the second case, 32 bits will be used for the multiplication and so everything will be fine.
1'b1 is an binary, unsigned, 1-bit wide integral value. In the original verilog specification, 1 had the same type as integer. It was signed, but its width was unspecified. A tool could choose the width base on its host implementation of the int type.
Since Verilog 2001 and SystemVerilog 2005, the width of integer and int was fixed at 32-bits. However, because of this original unspecified width, and the fact that so many people write 0 or 1 without realizing that it is now 32-bits wide, the standard does not allow you to use an unbased literal inside a concatenation. {A,1} is illegal.

BigQuery UDF using BYTES datatype

I am currently trying to calculate the Hamming distance between two binary strings in BigQuery using User defined functions in Javascript, my schema is quite simple:
row_id STRING
descriptors BYTES REPEATED
phash BYTES
What I am finding a bit confusing is the fact that you apparently deal with BYTES in BigQuery as a Base64 string, I imported both functions atob() and btoa() so I would be able to work with the binary form of the byte strings instead of the Base64 representation:
My Query currently looks like this:
CREATE TEMP FUNCTION f_PHASH_distance(ph1 BYTES, ph2 BYTES)
RETURNS INT64
LANGUAGE js AS
"""
return HammingDistance(ph1, ph2);
"""
OPTIONS (
library=["gs://test.appspot.com/HammingDistance.js",
"gs://test.appspot.com/btoa_atob.js"]
);
SELECT f_PHASH_distance(phash, CAST("9Slp3g9OgVI=" AS BYTES))
FROM ims.images WHERE row_id = "2333USX"
And the row with id = "2333USX" phash is equal to "9Slp3g9OgVI=" in base64, which means that the Hamming distance is 0. But instead of 0 I am currently getting is 35 on BigQuery.
HammingDistance.js has the following content:
function HammingDistance(a, b){
var count = 0;
for(var i = 0; i < a.length; i++){
// calculate XOR between the two chars
var xor = a.charCodeAt(i) ^ b.charCodeAt(i);
// count number of 1's on the result
for(var j = 0; j < 16; j++){
//add if LSB is 1
count += xor % 2;
//right shift the variable
xor = xor >> 1;
}
}
return count;
}
/**
* Calculates the distance between two Perceptual hashes of two images encoded
* in base 64
*/
function PHASHDistance(a, b){
return HammingDistance(atob(a), atob(b));
}
And testing it in the JS console of my browser I do get the expected result. So I assume that I am doing something wrong with the casts but the documentation is very scarce on UDFs with BYTE parameters.
Any help would be much appreciated.
It looks like the problem is that you are casting "9Slp3g9OgVI=" to bytes rather than converting it to bytes from base64. I think you want this instead:
SELECT f_PHASH_distance(phash, FROM_BASE64("9Slp3g9OgVI="))
FROM ims.images WHERE row_id = "2333USX"
You might be better off using SQL functions rather than JavaScript functions, though, since JavaScript normally isn't as fast. Here's a Hamming distance implementation in SQL, assuming that the bytes have equal lengths:
#standardSQL
CREATE TEMP FUNCTION HammingDistance(b1 BYTES, b2 BYTES) AS (
BIT_COUNT(b1 ^ b2)
);
WITH Input AS (
SELECT b'defdef' AS bytes UNION ALL
SELECT b'123de4' UNION ALL
SELECT b'abc123'
)
SELECT HammingDistance(b'abcdef', bytes)
FROM Input;
It takes the bitwise XOR of the two byte values, then checks how many bits are not the same.
In case someone is looking for a solution in the case of comparing regular strings (not binary ones as this question), look at my answer here

MathProg (AMPL) - Variable Array Sized by Another Variable

I am writing my first GNU MathProg (AMPL) program to find the minimum switch (vertex) count instances of a HyperX topology (graph) for a given radix, number of hosts, and bisection bandwidth. This is a simple first program because all of the equations have been described in the following paper: http://cal.snu.ac.kr/files/2009.sc.hyperx.pdf
I have read the specification and example programs, but I am stuck on a very simple syntax error. I need to have the following two variables: L, the number of dimensions in the network, and an array S of length L, where each element of S is the number of switches in each dimension. In my MathProg program, I express this as:
var L >= 1, integer;
var S{1 .. L} >= 2, integer;
However, when I run $ glpsol --check --math hyperx.mod, I get the following error:
hyperx.mod:28: operand following .. has invalid type
Context: ...isec ; param radix ; var L >= 1 , integer ; var S { 1 .. L }
If anybody can help explain how I should properly express this relationship, I will be grateful. Also, I am including the entire program I have written for reference and extra help. I expect there to be many syntax errors in my program, but until I fix the first one, I have no way of finding the rest.
/*
* A MathProg linear program to find an optimal HyperX topology of a
* given network size, switch radix, and bisection bandwidth. Optimal
* is simplistically defined as minimum switch count network.
*
* A HyperX topology is a multi-dimensional network (graph) where, in
* each dimension, the switches are fully connected. Every switch
* (vertex) is a point in an L-dimensional integer lattic. Each switch
* is identified by a multi-index I = (I_1, ..., I_L) where 0 <= I_k <
* S_k for each k = 1..L, where S_k is the number of switches in each
* dimension. A switch connects to all others whose multi-index is the
* same in all but one coordinate.
*/
/* Network size in number of hosts. */
param hosts;
/* Desired bisection bandwidth. */
param bisec;
/* Maximum switch radix. */
param radix;
/* The number of dimensions in the HyperX. */
var L >= 1, integer;
/* The number of switches in each dimension. */
var S{1 .. L} >= 2, integer;
/*
* Relative bandwidth of the dimension, i.e., the number of links in a
* given dimension.
*/
var K{1 .. L} >= 1, integer;
/* The number Terminals (hosts) per switch */
var T >= 1, integer;
/* Minimize the total number of switches. */
minimize cost: prod{i in 1..L} S[i];
/* The total number of links must be less than the switch radix. */
s.t. Radix: T + sum{i in 1..L} K[i] * (S[i] - 1) <= radix;
/* There must be enough hosts in the network. */
s.t. Hosts: T * prod{i in 1..L} S[i] >= hosts;
/* There must be enough bandwidth. */
s.t. Bandwidth: min{K[i]*S[i]} / (2 * T) >= bisec;
/* The order of the dimensions doesn't matter, so constrain them */
s.t. SwitchDimen: forall{i in 1..(L-1)} S[i] <= S[i+1];
/*
* Bisection bandwidth depends on the smallest S_i * K_i, so we know
* that the smallest switch count dimension needs the most links.
*/
s.t. LinkDimen: forall{i in 1..(L-1)} K[i] >= K[i+1];
# TODO: I would like to constrain the search such that the number of
# terminals, T, is bounded to T >= (hosts / O), where O is the switch
# count of the smallest switch count topology discovered so far, but I
# don't know how to do this.
/* Data section */
data;
param hosts := 32
param bisec := 0.5
param radix := 64
end;
Fixed number of variables in a problem is a common assumption in solvers and algebraic modelling languages including AMPL/MathProg. Therefore you can only use constant expressions, in particular parameters, not variables in indexing expressions. One possible solution is to make L a parameter, resolve your problem for different values of L and select the one that gives the best objective value. This can be done with a simple AMPL script.

Getting and setting single bits in a byte-array using vb.net

I have a byte array with 512 Elements and need to get and set a single bit of a byte in this array.
The operation must not change any other bits, only the specified one.
So if I have a byte like &B00110011 and would like to change the third bit to 1 it should be &B00110111.
Like this:
Dim myarray(511) as byte
myarray(3).2 = 1 ---> This would change the third bit (start counting at 0) of the third byte to 1
I know it should be easily possible using bit-masking but I don't have the time to try for days to get it working.
Thanks for help!!!
Jan
A simple way to do this is using shifts. If you want to set the Nth bit of a number to 1:
mask = 1 << n ' if n is 3, mask results in 00001000
bytevalue = bytevalue or mask
To set a bit as 0:
mask = 255 - (1 << n) ' if n is 3, mask results in 11110111
bytevalue = bytevalue and mask
In both examples, bytevalue is the byte in which you want to alter and mask is also a byte.
EDIT: To retrieve the state of a bit easily is a lot like setting a bit, Where IsSet is a boolean:
mask = 1 << n ' just as above
IsSet = (bytevalue and mask) <> 0
Why don't you use the BitArray class?

What's the fastest way to divide an integer by 3?

int x = n / 3; // <-- make this faster
// for instance
int a = n * 3; // <-- normal integer multiplication
int b = (n << 1) + n; // <-- potentially faster multiplication
The guy who said "leave it to the compiler" was right, but I don't have the "reputation" to mod him up or comment. I asked gcc to compile int test(int a) { return a / 3; } for an ix86 and then disassembled the output. Just for academic interest, what it's doing is roughly multiplying by 0x55555556 and then taking the top 32 bits of the 64 bit result of that. You can demonstrate this to yourself with eg:
$ ruby -e 'puts(60000 * 0x55555556 >> 32)'
20000
$ ruby -e 'puts(72 * 0x55555556 >> 32)'
24
$
The wikipedia page on Montgomery division is hard to read but fortunately the compiler guys have done it so you don't have to.
This is the fastest as the compiler will optimize it if it can depending on the output processor.
int a;
int b;
a = some value;
b = a / 3;
There is a faster way to do it if you know the ranges of the values, for example, if you are dividing a signed integer by 3 and you know the range of the value to be divided is 0 to 768, then you can multiply it by a factor and shift it to the left by a power of 2 to that factor divided by 3.
eg.
Range 0 -> 768
you could use shifting of 10 bits, which multiplying by 1024, you want to divide by 3 so your multiplier should be 1024 / 3 = 341,
so you can now use (x * 341) >> 10
(Make sure the shift is a signed shift if using signed integers), also make sure the shift is an actually shift and not a bit ROLL
This will effectively divide the value 3, and will run at about 1.6 times the speed as a natural divide by 3 on a standard x86 / x64 CPU.
Of course the only reason you can make this optimization when the compiler cant is because the compiler does not know the maximum range of X and therefore cannot make this determination, but you as the programmer can.
Sometime it may even be more beneficial to move the value into a larger value and then do the same thing, ie. if you have an int of full range you could make it an 64-bit value and then do the multiply and shift instead of dividing by 3.
I had to do this recently to speed up image processing, i needed to find the average of 3 color channels, each color channel with a byte range (0 - 255). red green and blue.
At first i just simply used:
avg = (r + g + b) / 3;
(So r + g + b has a maximum of 768 and a minimum of 0, because each channel is a byte 0 - 255)
After millions of iterations the entire operation took 36 milliseconds.
I changed the line to:
avg = (r + g + b) * 341 >> 10;
And that took it down to 22 milliseconds, its amazing what can be done with a little ingenuity.
This speed up occurred in C# even though I had optimisations turned on and was running the program natively without debugging info and not through the IDE.
See How To Divide By 3 for an extended discussion of more efficiently dividing by 3, focused on doing FPGA arithmetic operations.
Also relevant:
Optimizing integer divisions with Multiply Shift in C#
Depending on your platform and depending on your C compiler, a native solution like just using
y = x / 3
Can be fast or it can be awfully slow (even if division is done entirely in hardware, if it is done using a DIV instruction, this instruction is about 3 to 4 times slower than a multiplication on modern CPUs). Very good C compilers with optimization flags turned on may optimize this operation, but if you want to be sure, you are better off optimizing it yourself.
For optimization it is important to have integer numbers of a known size. In C int has no known size (it can vary by platform and compiler!), so you are better using C99 fixed-size integers. The code below assumes that you want to divide an unsigned 32-bit integer by three and that you C compiler knows about 64 bit integer numbers (NOTE: Even on a 32 bit CPU architecture most C compilers can handle 64 bit integers just fine):
static inline uint32_t divby3 (
uint32_t divideMe
) {
return (uint32_t)(((uint64_t)0xAAAAAAABULL * divideMe) >> 33);
}
As crazy as this might sound, but the method above indeed does divide by 3. All it needs for doing so is a single 64 bit multiplication and a shift (like I said, multiplications might be 3 to 4 times faster than divisions on your CPU). In a 64 bit application this code will be a lot faster than in a 32 bit application (in a 32 bit application multiplying two 64 bit numbers take 3 multiplications and 3 additions on 32 bit values) - however, it might be still faster than a division on a 32 bit machine.
On the other hand, if your compiler is a very good one and knows the trick how to optimize integer division by a constant (latest GCC does, I just checked), it will generate the code above anyway (GCC will create exactly this code for "/3" if you enable at least optimization level 1). For other compilers... you cannot rely or expect that it will use tricks like that, even though this method is very well documented and mentioned everywhere on the Internet.
Problem is that it only works for constant numbers, not for variable ones. You always need to know the magic number (here 0xAAAAAAAB) and the correct operations after the multiplication (shifts and/or additions in most cases) and both is different depending on the number you want to divide by and both take too much CPU time to calculate them on the fly (that would be slower than hardware division). However, it's easy for a compiler to calculate these during compile time (where one second more or less compile time plays hardly a role).
For 64 bit numbers:
uint64_t divBy3(uint64_t x)
{
return x*12297829382473034411ULL;
}
However this isn't the truncating integer division you might expect.
It works correctly if the number is already divisible by 3, but it returns a huge number if it isn't.
For example if you run it on for example 11, it returns 6148914691236517209. This looks like a garbage but it's in fact the correct answer: multiply it by 3 and you get back the 11!
If you are looking for the truncating division, then just use the / operator. I highly doubt you can get much faster than that.
Theory:
64 bit unsigned arithmetic is a modulo 2^64 arithmetic.
This means for each integer which is coprime with the 2^64 modulus (essentially all odd numbers) there exists a multiplicative inverse which you can use to multiply with instead of division. This magic number can be obtained by solving the 3*x + 2^64*y = 1 equation using the Extended Euclidean Algorithm.
What if you really don't want to multiply or divide? Here is is an approximation I just invented. It works because (x/3) = (x/4) + (x/12). But since (x/12) = (x/4) / 3 we just have to repeat the process until its good enough.
#include <stdio.h>
void main()
{
int n = 1000;
int a,b;
a = n >> 2;
b = (a >> 2);
a += b;
b = (b >> 2);
a += b;
b = (b >> 2);
a += b;
b = (b >> 2);
a += b;
printf("a=%d\n", a);
}
The result is 330. It could be made more accurate using b = ((b+2)>>2); to account for rounding.
If you are allowed to multiply, just pick a suitable approximation for (1/3), with a power-of-2 divisor. For example, n * (1/3) ~= n * 43 / 128 = (n * 43) >> 7.
This technique is most useful in Indiana.
I don't know if it's faster but if you want to use a bitwise operator to perform binary division you can use the shift and subtract method described at this page:
Set quotient to 0
Align leftmost digits in dividend and divisor
Repeat:
If that portion of the dividend above the divisor is greater than or equal to the divisor:
Then subtract divisor from that portion of the dividend and
Concatentate 1 to the right hand end of the quotient
Else concatentate 0 to the right hand end of the quotient
Shift the divisor one place right
Until dividend is less than the divisor:
quotient is correct, dividend is remainder
STOP
For really large integer division (e.g. numbers bigger than 64bit) you can represent your number as an int[] and perform division quite fast by taking two digits at a time and divide them by 3. The remainder will be part of the next two digits and so forth.
eg. 11004 / 3 you say
11/3 = 3, remaineder = 2 (from 11-3*3)
20/3 = 6, remainder = 2 (from 20-6*3)
20/3 = 6, remainder = 2 (from 20-6*3)
24/3 = 8, remainder = 0
hence the result 3668
internal static List<int> Div3(int[] a)
{
int remainder = 0;
var res = new List<int>();
for (int i = 0; i < a.Length; i++)
{
var val = remainder + a[i];
var div = val/3;
remainder = 10*(val%3);
if (div > 9)
{
res.Add(div/10);
res.Add(div%10);
}
else
res.Add(div);
}
if (res[0] == 0) res.RemoveAt(0);
return res;
}
If you really want to see this article on integer division, but it only has academic merit ... it would be an interesting application that actually needed to perform that benefited from that kind of trick.
Easy computation ... at most n iterations where n is your number of bits:
uint8_t divideby3(uint8_t x)
{
uint8_t answer =0;
do
{
x>>=1;
answer+=x;
x=-x;
}while(x);
return answer;
}
A lookup table approach would also be faster in some architectures.
uint8_t DivBy3LU(uint8_t u8Operand)
{
uint8_t ai8Div3 = [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, ....];
return ai8Div3[u8Operand];
}