Eigenvalues do not match between Eigen, Numpy, LAPACKE (Ubuntu), Intel MKL - numpy

I have the following matrix M of doubles:
55.774375 61.0225 62.805625
-122.045 -125.61125 -122.045
62.805625 61.0225 55.774375
(the matrix is part of an algorithm to estimate the parameters of an ellipse from a 2D point cloud, see https://autotrace.sourceforge.net/WSCG98.pdf for reference).
And now comes the interesting part. Determining the eigenvalues and (right) eigenvectors of the matrix with different packages leads to different results for the (right) eigenvectors. Eigenvalues are for every package the same:
[-5.09420041e-13 -7.03125000e+00 -7.03125000e+00]
For numpy with python:
M = np.array([[55.774375, 61.0225, 62.805625],
[-122.045, -125.61125, -122.045, ],
[62.805625, 61.0225, 55.774375]])
eval, evec = np.linalg.eig(M)
I get:
[[ 0.41608575 0.37443021 -0.80942954]
[-0.80854518 -0.82367703 0.34119147]
[ 0.41608575 0.42586167 0.47792489]]
With Eigen C++, the code looks as follows
Eigen::Matrix3d M;
M << 55.774375, 61.0225, 62.805625,
-122.045, -125.61125, -122.045,
62.805625, 61.0225, 55.774375;
Eigen::EigenSolver<Eigen::MatrixXd> solver;
solver.compute(M);
I get for the eigenvectors
0.416086 0.376456 -0.462421
-0.808545 -0.823758 0.820878
0.416086 0.423914 -0.335151
With LAPACKE (apt install liblapack-dev lapacke lapacke-dev)
double Marr[]{55.774375, 61.0225, 62.805625,
-122.045, -125.61125, -122.045,
62.805625, 61.0225, 55.774375};
char jobvl = 'N';
char jobvr = 'V';
int n=3;
int lda = n;
int ldvl = n;
int ldvr = n;
int lwork = -1;
int info;
double wr[n], wi[n], vl[ldvl*n], vr[ldvr*n];
LAPACKE_dgeev( LAPACK_ROW_MAJOR, 'V', 'V', n, Marr, lda, wr, wi,
vl, ldvl, vr, ldvr );
if( info > 0 ) {
printf( "The algorithm failed to compute eigenvalues.\n" );
exit( 1 );
}
I get for the eigenvectors
0.416086 0.376456 -0.788993
-0.808545 -0.823758 0.565975
0.416086 0.423914 0.239087
Similar are the results for Intel MKL.
I checked the determinant of M and it is close to zero (-4.0031989907207254e-05).
What I would like to understand is
Why do the eigenvectors for same eigenvalues differ so much between the libraries? Is this because of the different numerical methods used to approx them?
I understand that an Eigenvalue d has many associated eigenvectors, i.e. if v is the eigenvector for d than q * v (q being a scalar) is also an eigenvector of d. Since the second and the third eigenvalue are the same, I would assume that there is some scalar that transforms one into the other, but this doesn't seem to be the case.
My algorithm fails in C++ (in python it is working) due to the different eigenvectors. Is there a way out?

Related

Can Cplex prioritize some variables so that they're likely to be chosen together?

So my problem contains a vehicle that moves from one node to the next. I have a bunch of nodes that may or may not be related to each other. I want the nodes that are similar to each other to be visited by the vehicle as much as possbible.
Is there any possible ways that i can prioritize the related nodes so that they're more likely to be grouped together? I thought to create sets or tuples that represent the different groups, and to have a variable X[i][j] = 1 if the vehicle moves from node i to node j, but i'm stuck at the "prioritize i and j if they come from the same set" part. Is it the boolean value that makes it impossible to render that? Should I modify my formulations somehow?
This is my code for the problem for now, i still haven't come out with the priority part
int nNode = 20;
range N = 1..nNode; //set of locations to visit
range V = 0..nNode; //set of locations plus the depot
range Vehicle = 1..6; //there are six vehicles
range boxType = 1..3; //three types of boxes to be transported
int demand[V][boxType] =...; //demand for a location in terms of different boxes
int timeBox[boxType] =...; //time associated with the actions on a type of box
dvar int+ totalLoad[Vehicle];
dvar int+ load[Vehicle][boxType]; //load in terms of box type
dvar boolean X[V][V][Vehicle]; /*1 if the vehicle Vehicle goes from node V to the
next node, 0 if not*/
dvar int+ t[Vehicle]; //total time a vehicle spends
dvar int time[Vehicle]; /*equals |t[vehicle] - target cycle time|, this is to make sure
each vehicle spends as close to target cycle time as possible*/
minimize sum (v in Vehicle)time[v];
subject to
{
forall (i in V)
sum (j in V, k in Vehicle)X[i][j][k] == 1; /* so that each starting node will have
exactly one destination node, i.e it will belong to exactly 1 route only*/
forall (j in V)
sum (i in V, k in Vehicle)X[i][j][k] == 1; // similar but for ending node
forall (k in Vehicle)
totalLoad[k] == sum(i in V, j in V)X[i][j][k]* (sum(b in boxType)demand[j][b]); /*total
load of a vehicle equals the total boxes collected at each stop on its path */
forall (b in boxType, k in Vehicle)
load[k][b] == sum(i in V, j in V) X[i][j][k]*(sum(j in Vehicle)demand[j][b]); /* calculate
separate number of boxes for each route*/
forall (k in Vehicle)
{
time[k] >= t[k] - 1.5;
time[k] >= - t[k] + 1.5;
time[k] <= t[k] + 1.5;
time[k] <= 2 - t[k] - 1.5; // breakdown of time[k] = |t[k]-1.5|, 1.5 is target cycle time
t[k] == sum(b in boxType) load[k][b]*timeBox[b]; // calculate the total time involved in a route
}
}
You could try adding a term into your objective that penalises giving different values to those sets of variables. Easy enough if there are only two of them but more fiddly if there are bigger subsets and/or lots of subsets to coordinate.
I would do something along the lines of what Tim suggested. Here is a little bit more meat on the bones:
x[i,j,k]=1 => L[g] ≤ k ∀i∈g, ∀j,k A lowerbound on the route k for group g
x[i,j,k]=1 => U[g] ≥ k ∀i∈g, ∀j,k An upperbound on the route k for group g
U[g]-L[g] ≥ 1 => δ[g]=1 δ[g]=1 if g is on different routes
min sum(g,δ[g]) objective
δ[g] ∈ {0,1} δ[g] is a binary variable
One way to implement the first 3 equations is:
L[g] ≤ k⋅x[i,j,k] + M(1-x[i,j,k]) ∀i ∈ g, ∀j,k
U[g] ≥ k⋅x[i,j,k] ∀i ∈ g, ∀j,k
M⋅δ[g] ≥ U[g]-L[g]
here g indicates a group. This makes the problem a multi-objective problem, so you can choose from a few possible approaches for that.
you could use priorities if you do not want to change the model from a logical point of view.
See https://github.com/AlexFleischerParis/zooopl/blob/master/zoopriorities.mod
int nbKids=300;
float costBus40=500;
float costBus30=400;
dvar int+ nbBus40;
dvar int+ nbBus30;
execute
{
nbBus40.priority=100;
nbBus30.priority=0;
}
minimize
costBus40*nbBus40 +nbBus30*costBus30;
subject to
{
40*nbBus40+nbBus30*30>=nbKids;
}
in Making optimization Simple
If you want to change the model from a logical point of view you can change the objective or add a second objective
int nbKids=350;
float costBus40=400;
float costBus30=300;
dvar int+ nbBus40;
dvar int+ nbBus30;
dexpr float absdistancebetweennumbers=abs(nbBus40-nbBus30);
minimize
staticLex(costBus40*nbBus40 +nbBus30*costBus30,absdistancebetweennumbers);
subject to
{
40*nbBus40+nbBus30*30>=nbKids;
}

PyCUDA large nonuniform matrix operations

I am working with large, nonuniform matrices and am having problems with what I believe to be mismatching on the elements.
In example.py, get_simulated_ipp() builds echo and tx, two linear arrays of size 250000 and 25000 respectively. The code also hardcoded sr=25.
My code is attempting to complex multiply tx into echo along different stretches, depending on specified ranges and value of sr. This will then be stored in an array S.
After searching through some other people's examples, I found a way of building blocks and grids here that I thought would work well. I'm unfamiliar with C code, but have been trying to learn over the past week. Here is my code:
#!/usr/bin/python
#This iteration only works on the first and last elements, mismatching after that.
# However, this doesn't result in any empty elements in S
import numpy as np
import example as ex
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
#pull simulated data and get info about it
((echo,tx)) = ex.get_simulated_ipp()
ranges = np.arange(4000,6000).astype(np.int32)
S = np.zeros([len(ranges),len(tx)],dtype=np.complex64)
sr = ex.sr
#copying input to gpu
# will try this explicitly if in/out (in the function call) don't work
block_dim_x = 8 #thread number is product of block dims,
block_dim_y = 8 # want a multiple of 32 (warp multiple)
blocks_x = np.ceil(len(ranges)/block_dim_x).astype(np.int32).item()
blocks_y = np.ceil(len(tx)/block_dim_y).astype(np.int32).item()
kernel_code="""
#include <cuComplex.h>
__global__ void complex_mult(cuFloatComplex *tx, cuFloatComplex *echo, cuFloatComplex *result,
int *ranges, int sr)
{
unsigned int block_num = blockIdx.x + blockIdx.y * gridDim.x;
unsigned int thread_num = threadIdx.x + threadIdx.y * blockDim.x;
unsigned int threads_in_block = blockDim.x * blockDim.y;
unsigned long int idx = threads_in_block * block_num + thread_num;
//aligning the i,j to idx, something is mismatched?
int i = ((idx % (threads_in_block * gridDim.x)) % blockDim.x) +
((block_num % gridDim.x) * blockDim.x);
int j = ((idx - (threads_in_block * block_num)) / blockDim.x) +
((block_num / gridDim.x) * blockDim.y);
result[idx] = cuCmulf(echo[j+ranges[i]*sr], tx[j]);
}
"""
## want something to work like this:
## result[i][j] = cuCmulf(echo[j+ranges[i]*sr], tx[j]);
#includes directory of where cuComplex.h is located
mod = SourceModule(kernel_code, include_dirs=['/usr/local/cuda-7.0/include/'])
complex_mult = mod.get_function("complex_mult")
complex_mult(cuda.In(tx), cuda.In(echo), cuda.Out(S), cuda.In(ranges), np.int32(sr),
block=(block_dim_x,block_dim_y,1),
grid=(blocks_x,blocks_y))
compare = np.zeros_like(S) #built to compare CPU vs GPU calcs
txidx = np.arange(len(tx))
for ri,r in enumerate(ranges):
compare[ri,:] = echo[txidx+r*sr]*tx
print np.subtract(S, compare)
At the bottom here, I've put in a CPU implementation of what I'm attempting to accomplish and put in a subtraction. The result is that the very first and very last elements come out as 0+0j, but the rest do not. The kernel is attempting to align an i and j to the idx so that I can traverse echo, ranges, and tx more easily.
Is there a better way to implement something like this? Also, why might the result not come out as all 0+0j as I intend?
Edit:
Trying a little example to get a better grasp of how the arrays are being indexed with this block/grid configuration, I stumbled upon something very strange. Before, I tried to index the elements, I just wanted to run a little test multiplication. It seems like my block/grid covers all of the ary_in matrix, but the result ends up only doubling the top half of ary_in and the bottom half is returning whatever was left over from the bottom half calculation previously.
If I change blocks_x to 4 so that I cover more space than needed, however, the doubling works fine. If I then run it with a 4x4 grid, with * 3 instead, it'll work out fine with ary_out as ary_in tripled. When I run it again with a 2x4 grid and only doubling, the top half of ary_out returns the doubled ary_in, but the bottom half returns the previous result in memory, a tripled value instead. I would understand this to be something in my index/block/grid mapping wrongly to the values, but I can't figure out what.
ary_in = np.arange(128).reshape((8,16))
print ary_in
ary_out = np.zeros_like(ary_in)
block_dim_x = 4
block_dim_y = 4
blocks_x = 2
blocks_y = 4
limit = block_dim_x * block_dim_y * blocks_x * blocks_y
mod = SourceModule("""
__global__ void indexing_order(int *ary_in, int *ary_out, int n)
{
unsigned int block_num = blockIdx.x + blockIdx.y * gridDim.x;
unsigned int thread_num = threadIdx.x + threadIdx.y * blockDim.x;
unsigned int threads_in_block = blockDim.x * blockDim.y;
unsigned int idx = threads_in_block * block_num + thread_num;
if (idx < n) {
// ary_out[idx] = thread_num;
ary_out[idx] = ary_in[idx] * 2;
}
}
""")
indexing_order = mod.get_function("indexing_order")
indexing_order(drv.In(ary_in), drv.Out(ary_out), np.int32(limit),
block=(block_dim_x,block_dim_y,1),
grid=(blocks_x,blocks_y))
print ary_out
FINAL EDIT:
I figured out the problems. In the edit just above, the ary_in is by default an int64, mismatching with the int initialization in the C code of an int32. This only allocated half the amount of data needed on the GPU for the entire array, so only the top half was moved over and operated on. Adding a .astype(np.int32) solved this problem.
This allowed me to figure out the the ordering of the indexing in my case and fix the main code with:
int i = idx / row_len;
int j = idx % row_len;
I still don't understand how to get this working with non even division of block dimensions into the output array (e.g. 16x16), even with an if (idx
I figured out the problems. In the edit just above, the ary_in is by default an int64, mismatching with the int initialization in the C code of an int32. This only allocated half the amount of data needed on the GPU for the entire array, so only the top half was moved over and operated on. Adding a .astype(np.int32) solved this problem.
This allowed me to figure out the the ordering of the indexing in my case and fix the main code with:
int i = idx / row_len;
int j = idx % row_len;

Collision Angle Detection

I have some questions regarding collision angles. I am trying to code physics for a game and I do not want to use any third party library, actually I want to code each and every thing by myself. I know how to detect collisions between two spheres but I can't figure out, how to find the angle of collision/repulsion between the two spherical objects. I've tried reversing the direction of the objects, but no luck. It would be very nice if you link me to an interesting .pdf file teaching physics programming.
There's a lot of ways to deal with collision
Impulsion
To model a impulsion, you can directly act on the speed of each objects, using the law of reflection, you can "reflect" each speed using the "normal of the impact"
so : v1 = v1 - 2 x ( v1 . n2 ) x n2
and v2 = v2 - 2 x ( v2 . n1 ) x n1
v1 and v2 speeds of sphere s1 and s2
n1 and n2 normal at collision point
Penalty
Here, we have 2 object interpenetrating, and we model the fact that they tend to not interpenetrate anymore, so you create a force that is proportional to the penetration using a spring force
I didn't speak about all the ways, but this are the two simplest I know
the angle between two objects in the 2D or 3D coordinate space can be found by
A * B = |A||B|cosɵ
Both A and B are vectors and ɵ is the angle between both vectors.
the below class can be used to solve basic Vector calculations in games
class 3Dvector
{
private:
float x, y, z;
public:
// purpose: Our constructor
// input: ex- our vector's i component
// why- our vector's j component
// zee- our vector's k component
// output: no explicit output
3Dvector(float ex = 0, float why = 0, float zee = 0)
{
x = ex; y = why; z = zee;
}
// purpose: Our destructor
// input: none
// output: none
~3Dvector() { }
// purpose: calculate the magnitude of our invoking vector
// input: no explicit input
// output: the magnitude of our invoking object
float getMagnitude()
{
return sqrtf(x * x + y * y + z * z);
}
// purpose: multiply our vector by a scalar value
// input: num - the scalar value being multiplied
// output: our newly created vector
3Dvector operator*(float num) const
{
return 3Dvector(x * num, y * num, z * num);
}
// purpose: multiply our vector by a scalar value
// input: num - the scalar value being multiplied
// vec - the vector we are multiplying to
// output: our newly created vector
friend 3Dvector operator*(float num, const 3Dvector &vec)
{
return 3Dvector(vec.x * num, vec.y * num, vec.z * num);
}
// purpose: Adding two vectors
// input: vec - the vector being added to our invoking object
// output: our newly created sum of the two vectors
3Dvector operator+(const 3Dvector &vec) const
{
return 3Dvector(x + vec.x, y + vec.y, z + vec.z);
}
// purpose: Subtracting two vectors
// input: vec - the vector being subtracted from our invoking object
// output: our newly created difference of the two vectors
3Dvector operator-(const 3Dvector &vec) const
{
return 3Dvector(x - vec.x, y - vec.y, z - vec.z);
}
// purpose: Normalize our invoking vector *this changes our vector*
// input: no explicit input
// output: none
void normalize3Dvector(void)
{
float mag = sqrtf(x * x + y * y + z * z);
x /= mag; y /= mag; z /= mag
}
// purpose: Dot Product two vectors
// input: vec - the vector being dotted with our invoking object
// output: the dot product of the two vectors
float dot3Dvector(const 3Dvector &vec) const
{
return x * vec.x + y * vec.y + z * vec.z;
}
// purpose: Cross product two vectors
// input: vec- the vector being crossed with our invoking object
// output: our newly created resultant vector
3Dvector cross3Dvector(const 3Dvector &vec) const
{
return 3Dvector( y * vec.z – z * vec.y,
z * vec.x – x * vec.z,
x * vec.y – y * vec.x);
}
};
I shouldn't be answering my own question but I found what I needed, I guess. It may help other people too. I was just fingering the wikipedia's physics section and I got this.
This link solves my question
The angle in a cartesian system can be found this way:
arctan((Ya-Yb)/(Xa-Xb))
Because this is a retangle triangle where you know the catets (diferences of heights and widths). This will calc the tangent. So the arctan will calc the angle thats have this tangent.
I hope I was helpful.

Discrete Wavelet Transform on images and watermark embedding in LL band coefficients, data is lost when IDWT-DWT is performed again?

I'm writing an image watermarking system to hide a watermark in an image's low frequency band by transforming the image's luminance channel with a Discrete Wavelet Transform, then modifying coefficients in the LL band of the DWT output. I then do an Inverse DWT and rebuild my image.
The problem I'm having is when I modify coefficients in the DWT output, then inverse-DWT, and then DWT again, the modified coefficients are radically different.
For example, one of the output coefficients in the LL band of the 2-scale DWT was -0.10704, I modified this coefficient to be 16.89, then performed the IDWT on my data. I then took the output of the IDWT and performed a DWT on it again, and my coefficient which was modified to be 16.89 became 0.022.
I'm fairly certain that the DWT and IDWT code is correct because I've tested it against other libraries and the output from each transform matches when the filter coefficients and other parameters are the same. (Within what can be expected due to rounding error)
The main problem I have is that I perhaps don't understand the DWT all that well, I thought DWT and IDWT were supposed to be reasonably lossless (Aside from rounding error and such), yet this doesn't seem to be the case here.
I'm hoping someone more familiar with the transform can point me at a possible issue, is it possible that because the coefficients in my other subbands (LH, HL, HH) for that position are insignificant I'm losing data? If so, how can I determine which coefficients this may happen to?
My embedding function is below, coefficients are chosen in the LL band, "strong" is determined to be true if the absolute value of the LH, HH, or HL band for the selected location is larger than the mean value of the corresponding subband.
//If this evaluates to true, then the texture is considered strong.
if ((Math.Abs(LH[i][w]) >= LHmean) || (Math.Abs(HL[i][w]) >= HLmean) || (Math.Abs(HH[i][w]) >= HHmean))
static double MarkCoeff(int index, double coeff,bool strong)
{
int q1 = 16;
int q2 = 8;
int quantizestep = 0;
byte watermarkbit = binaryWM[index];
if(strong)
quantizestep = q1;
else
quantizestep = q2;
coeff /= (double)quantizestep;
double coeffdiff = 0;
if(coeff > 0.0)
coeffdiff = coeff - (int)coeff;
else
coeffdiff = coeff + (int)coeff;
if (1 == ((int)coeff % 2))
{
//odd
if (watermarkbit == 0)
{
if (Math.Abs(coeffdiff) > 0.5)
coeff += 1.0;
else
coeff -= 1.0;
}
}
else
{
//even
if (watermarkbit == 1)
{
if (Math.Abs(coeffdiff) > 0.5)
coeff += 1.0;
else
coeff -= 1.0;
}
}
coeff *= (double)quantizestep;
return coeff;
}

How do I set output flags for ALU in "Nand to Tetris" course?

Although I tagged this homework, it is actually for a course which I am doing on my own for free. Anyway, the course is called "From Nand to Tetris" and I'm hoping someone here has seen or taken the course so I can get some help. I am at the stage where I am building the ALU with the supplied hdl language. My problem is that I can't get my chip to compile properly. I am getting errors when I try to set the output flags for the ALU. I believe the problem is that I can't subscript any intermediate variable, since when I just try setting the flags to true or false based on some random variable (say an input flag), I do not get the errors. I know the problem is not with the chips I am trying to use since I am using all builtin chips.
Here is my ALU chip so far:
/**
* The ALU. Computes a pre-defined set of functions out = f(x,y)
* where x and y are two 16-bit inputs. The function f is selected
* by a set of 6 control bits denoted zx, nx, zy, ny, f, no.
* The ALU operation can be described using the following pseudocode:
* if zx=1 set x = 0 // 16-bit zero constant
* if nx=1 set x = !x // Bit-wise negation
* if zy=1 set y = 0 // 16-bit zero constant
* if ny=1 set y = !y // Bit-wise negation
* if f=1 set out = x + y // Integer 2's complement addition
* else set out = x & y // Bit-wise And
* if no=1 set out = !out // Bit-wise negation
*
* In addition to computing out, the ALU computes two 1-bit outputs:
* if out=0 set zr = 1 else zr = 0 // 16-bit equality comparison
* if out<0 set ng = 1 else ng = 0 // 2's complement comparison
*/
CHIP ALU {
IN // 16-bit inputs:
x[16], y[16],
// Control bits:
zx, // Zero the x input
nx, // Negate the x input
zy, // Zero the y input
ny, // Negate the y input
f, // Function code: 1 for add, 0 for and
no; // Negate the out output
OUT // 16-bit output
out[16],
// ALU output flags
zr, // 1 if out=0, 0 otherwise
ng; // 1 if out<0, 0 otherwise
PARTS:
// Zero the x input
Mux16( a=x, b=false, sel=zx, out=x2 );
// Zero the y input
Mux16( a=y, b=false, sel=zy, out=y2 );
// Negate the x input
Not16( in=x, out=notx );
Mux16( a=x, b=notx, sel=nx, out=x3 );
// Negate the y input
Not16( in=y, out=noty );
Mux16( a=y, b=noty, sel=ny, out=y3 );
// Perform f
Add16( a=x3, b=y3, out=addout );
And16( a=x3, b=y3, out=andout );
Mux16( a=andout, b=addout, sel=f, out=preout );
// Negate the output
Not16( in=preout, out=notpreout );
Mux16( a=preout, b=notpreout, sel=no, out=out );
// zr flag
Or8way( in=out[0..7], out=zr1 ); // PROBLEM SHOWS UP HERE
Or8way( in=out[8..15], out=zr2 );
Or( a=zr1, b=zr2, out=zr );
// ng flag
Not( in=out[15], out=ng );
}
So the problem shows up when I am trying to send a subscripted version of 'out' to the Or8Way chip. I've tried using a different variable than 'out', but with the same problem. Then I read that you are not able to subscript intermediate variables. I thought maybe if I sent the intermediate variable to some other chip, and that chip subscripted it, it would solve the problem, but it has the same error. Unfortunately I just can't think of a way to set the zr and ng flags without subscripting some intermediate variable, so I'm really stuck!
Just so you know, if I replace the problematic lines with the following, it will compile (but not give the right results since I'm just using some random input):
// zr flag
Not( in=zx, out=zr );
// ng flag
Not( in=zx, out=ng );
Anyone have any ideas?
Edit: Here is the appendix of the book for the course which specifies how the hdl works. Specifically look at section 5 which talks about buses and says: "An internal pin (like v above) may not be subscripted".
Edit: Here is the exact error I get: "Line 68, Can't connect gate's output pin to part". The error message is sort of confusing though, since that does not seem to be the actual problem. If I just replace "Or8way( in=out[0..7], out=zr1 );" with "Or8way( in=false, out=zr1 );" it will not generate this error, which is what lead me to look up in the appendix and find that the out variable, since it was derived as intermediate, could not be subscripted.
For anyone else interested, the solution the emulator supports is to use multiple outputs
Something like:
Mux16( a=preout, b=notpreout, sel=no, out=out,out=preout2,out[15]=ng);
This is how I did the ALU:
CHIP ALU {
IN // 16-bit inputs:
x[16], y[16],
// Control bits:
zx, // Zero the x input
nx, // Negate the x input
zy, // Zero the y input
ny, // Negate the y input
f, // Function code: 1 for add, 0 for and
no; // Negate the out output
OUT // 16-bit output
out[16],
// ALU output flags
zr, // 1 if out=0, 0 otherwise
ng; // 1 if out<0, 0 otherwise
PARTS:
Mux16(a=x, b=false, sel=zx, out=M16x);
Not16(in=M16x, out=Nx);
Mux16(a=M16x, b=Nx, sel=nx, out=M16M16x);
Mux16(a=y, b=false, sel=zy, out=M16y);
Not16(in=M16y, out=Ny);
Mux16(a=M16y, b=Ny, sel=ny, out=M16M16y);
And16(a=M16M16x, b=M16M16y, out=And16);
Add16(a=M16M16x, b=M16M16y, out=Add16);
Mux16(a=And16, b=Add16, sel=f, out=F16);
Not16(in=F16, out=NF16);
Mux16(a=F16, b=NF16, sel=no, out=out, out[15]=ng, out[0..7]=zout1, out[8..15]=zout2);
Or8Way(in=zout1, out=zr1);
Or8Way(in=zout2, out=zr2);
Or(a=zr1, b=zr2, out=zr3);
Not(in=zr3, out=zr);
}
The solution as Pax suggested was to use an intermediate variable as input to another chip, such as Or16Way. Here is the code after I fixed the problem and debugged:
CHIP ALU {
IN // 16-bit inputs:
x[16], y[16],
// Control bits:
zx, // Zero the x input
nx, // Negate the x input
zy, // Zero the y input
ny, // Negate the y input
f, // Function code: 1 for add, 0 for and
no; // Negate the out output
OUT // 16-bit output
out[16],
// ALU output flags
zr, // 1 if out=0, 0 otherwise
ng; // 1 if out<0, 0 otherwise
PARTS:
// Zero the x input
Mux16( a=x, b=false, sel=zx, out=x2 );
// Zero the y input
Mux16( a=y, b=false, sel=zy, out=y2 );
// Negate the x input
Not16( in=x2, out=notx );
Mux16( a=x2, b=notx, sel=nx, out=x3 );
// Negate the y input
Not16( in=y2, out=noty );
Mux16( a=y2, b=noty, sel=ny, out=y3 );
// Perform f
Add16( a=x3, b=y3, out=addout );
And16( a=x3, b=y3, out=andout );
Mux16( a=andout, b=addout, sel=f, out=preout );
// Negate the output
Not16( in=preout, out=notpreout );
Mux16( a=preout, b=notpreout, sel=no, out=preout2 );
// zr flag
Or16Way( in=preout2, out=notzr );
Not( in=notzr, out=zr );
// ng flag
And16( a=preout2, b=true, out[15]=ng );
// Get final output
And16( a=preout2, b=preout2, out=out );
}
Have you tried:
// zr flag
Or8way(
in[0]=out[ 0], in[1]=out[ 1], in[2]=out[ 2], in[3]=out[ 3],
in[4]=out[ 4], in[5]=out[ 5], in[6]=out[ 6], in[7]=out[ 7],
out=zr1);
Or8way(
in[0]=out[ 8], in[1]=out[ 9], in[2]=out[10], in[3]=out[11],
in[4]=out[12], in[5]=out[13], in[6]=out[14], in[7]=out[15],
out=zr2);
Or( a=zr1, b=zr2, out=zr );
I don't know if this will work but it seems to make sense from looking at this document here.
I'd also think twice about using out as a variable name since it's confusing trying to figure out the difference between that and the keyword out (as in "out=...").
Following your edit, if you cannot subscript intermediate values, then it appears you will have to implement a separate "chip" such as IsZero16 which will take a 16-bit value as input (your intermediate out) and return one bit indicating its zero-ness that you can load into zr. Or you could make an IsZero8 chip but you'd have to then call it it two stages as you're currently doing with Or8Way.
This seems like a valid solution since you can subscript the input values to a chip.
And, just looking at the error, this may be a different problem to the one you suggest. The phrase "Can't connect gate's output pin to part" would mean to me that you're unable to connect signals from the output parameter back into the chips processing area. That makes sense from an electrical point of view.
You may find you have to store the output into a temporary variable and use that to both set zr and out (since once the signals have been "sent" to the chips output pins, they may no longer be available).
Can we try:
CHIP SetFlags16 {
IN inpval[16];
OUT zflag,nflag;
PARTS:
Or8way(in=inpval[0.. 7],out=zr0);
Or8way(in=inpval[8..15],out=zr1);
Or(a=zr0,b=zr1,out=zflag);
Not(in=inpval[15],out=nflag);
}
and then, in your ALU chip, use this at the end:
// Negate the output
Not16( in=preout, out=notpreout );
Mux16( a=preout, b=notpreout, sel=no, out=tempout );
// flags
SetFlags16(inpval=tempout,zflag=zr,nflag=ng);
// Transfer tempout to out (may be a better way).
Or16(a=tempout,b=tempout,out=out);
Here's one also with a new chip but it feels cleaner
/**
* Negator16 - negates the input 16-bit value if the selection flag is lit
*/
CHIP Negator16 {
IN sel,in[16];
OUT out[16];
PARTS:
Not16(in=in, out=negateIn);
Mux16(a=in, b=negateIn, sel=sel, out=out);
}
CHIP ALU {
// IN and OUT go here...
PARTS:
//Zero x and y if needed
Mux16(a=x, b[0..15]=false, sel=zx, out=x1);
Mux16(a=y, b[0..15]=false, sel=zy, out=y1);
//Create x1 and y1 negations if needed
Negator16(in=x1, sel=nx, out=x2);
Negator16(in=y1, sel=ny, out=y2);
//Create x&y and x+y
And16(a=x2, b=y2, out=andXY);
Add16(a=x2, b=y2, out=addXY);
//Choose between And/Add according to selection
Mux16(a=andXY, b=addXY, sel=f, out=res);
// negate if needed and also set negative flag
Negator16(in=res, sel=no, out=res1, out=out, out[15]=ng);
// set zero flag (or all bits and negate)
Or16Way(in=res1, out=nzr);
Not(in=nzr, out=zr);
}