PCoA (Principal *Coordinate* Analysis) in Accord.net - accord.net

I've been trying to use PCA (Principal Component Analysis) in Accord.net but am not getting the correct results for PCoA.
Is there a way to achieve this without writing the algo myself?
var pca = new PrincipalComponentAnalysis()
{
Method = PrincipalComponentMethod.Standardize,
Whiten = true
};
MultivariateLinearRegression transform = pca.Learn(distances);
pca.NumberOfOutputs = 2;
double[][] output = pca.Transform(distances);
note that the "distances" matrix is a NxN 1-correlation matrix of N time-series I get as an input.

Related

How to use the 'sphereize data' option with PCA in TensorFlow

I have used PCA with the 'Sphereize data' option on the following page successfully: https://projector.tensorflow.org/
I wonder how to run the same computation locally using the TensorFlow API. I found the PCA documentation in the API documentation, but I am not sure if sphereizing the data is available somewhere in the API too?
The "sphereize data" option normalizes the data by shifting each point by the centroid and making unit norm.
Here is the code used in Tensorboard (in typescript):
normalize() {
// Compute the centroid of all data points.
let centroid = vector.centroid(this.points, (a) => a.vector);
if (centroid == null) {
throw Error('centroid should not be null');
}
// Shift all points by the centroid and make them unit norm.
for (let id = 0; id < this.points.length; ++id) {
let dataPoint = this.points[id];
dataPoint.vector = vector.sub(dataPoint.vector, centroid);
if (vector.norm2(dataPoint.vector) > 0) {
// If we take the unit norm of a vector of all 0s, we get a vector of
// all NaNs. We prevent that with a guard.
vector.unit(dataPoint.vector);
}
}
}
You can reproduce that normalization using the following python function:
def sphereize_data(x):
"""
x is a 2D Tensor of shape :(num_vectors, dim_vectors)
"""
centroids = tf.reduce_mean(x, axis=0, keepdims=True)
return tf.math.div_no_nan((x - centroids), tf.norm(x - centroids, axis=0, keepdims=True))

How to read parameters of layers of .tflite model in python

I was trying to read tflite model and pull all the parameters of the layers out.
My steps:
I generated flatbuffers model representation by running (please build flatc before):
flatc -python tensorflow/tensorflow/lite/schema/schema.fbs
Result is tflite/ folder that contains layer description files (*.py) and some utilitarian files.
I successfully loaded model:
in case of import Error: set PYTHONPATH to point to the folder where tflite/ is
from tflite.Model import Model
def read_tflite_model(file):
buf = open(file, "rb").read()
buf = bytearray(buf)
model = Model.GetRootAsModel(buf, 0)
return model
I partly pulled model and node parameters out and stacked in iterating over nodes:
Model part:
def print_model_info(model):
version = model.Version()
print("Model version:", version)
description = model.Description().decode('utf-8')
print("Description:", description)
subgraph_len = model.SubgraphsLength()
print("Subgraph length:", subgraph_len)
Nodes part:
def print_nodes_info(model):
# what does this 0 mean? should it always be zero?
subgraph = model.Subgraphs(0)
operators_len = subgraph.OperatorsLength()
print('Operators length:', operators_len)
from collections import deque
nodes = deque(subgraph.InputsAsNumpy())
STEP_N = 0
MAX_STEPS = operators_len
print("Nodes info:")
while len(nodes) != 0 and STEP_N <= MAX_STEPS:
print("MAX_STEPS={} STEP_N={}".format(MAX_STEPS, STEP_N))
print("-" * 60)
node_id = nodes.pop()
print("Node id:", node_id)
tensor = subgraph.Tensors(node_id)
print("Node name:", tensor.Name().decode('utf-8'))
print("Node shape:", tensor.ShapeAsNumpy())
# which type is it? what does it mean?
type_of_tensor = tensor.Type()
print("Tensor type:", type_of_tensor)
quantization = tensor.Quantization()
min = quantization.MinAsNumpy()
max = quantization.MaxAsNumpy()
scale = quantization.ScaleAsNumpy()
zero_point = quantization.ZeroPointAsNumpy()
print("Quantization: ({}, {}), s={}, z={}".format(min, max, scale, zero_point))
# I do not understand it again. what is j, that I set to 0 here?
operator = subgraph.Operators(0)
for i in operator.OutputsAsNumpy():
nodes.appendleft(i)
STEP_N += 1
print("-"*60)
Please point me to documentation or some example of using this API.
My problems are:
I can not get documentation on this API
Iterating over Tensor objects seems not possible for me, as it doesn't have Inputs and Outputs methods. + subgraph.Operators(j=0) I do not understand what j means in here. Because of that my cycle goes through two nodes: input (once) and the next one over and over again.
Iterating over Operator objects is surely possible:
Here we iterate over them all but I can not get how to map Operator and Tensor.
def print_in_out_info_of_all_operators(model):
# what does this 0 mean? should it always be zero?
subgraph = model.Subgraphs(0)
for i in range(subgraph.OperatorsLength()):
operator = subgraph.Operators(i)
print('Outputs', operator.OutputsAsNumpy())
print('Inputs', operator.InputsAsNumpy())
I do not understand how to pull parameters out Operator object. BuiltinOptions method gives me Table object, that I do not know what to map at.
subgraph = model.Subgraphs(0)
What does this 0 mean? should it always be zero? obviously no, but what is it? Id of the subgraph? If so - I'm happy. If no, please try to explain it.

Keras .predict with word embeddings back to string

I'm following the tutorial here: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html, using a different data set. I'm trying to predict the label for a new random string.
I'm doing labelling a bit different:
encoder = LabelEncoder()
encoder.fit(labels)
encoded_Y = encoder.transform(labels)
dummy_y = np_utils.to_categorical(encoded_Y)
And then trying to predict like:
string = "I am a cat"
query = tokenizer.texts_to_sequences(string)
query = pad_sequences(query, maxlen=50)
prediction = model.predict(query)
print(prediction)
I get back an array of arrays like below (perhaps the word embeddings?). What are those and how can I translate them back to a string?
[[ 0.03039312 0.02099193 0.02320454 0.02183384 0.01965107 0.01830118
0.0170384 0.01979697 0.01764384 0.02244077 0.0162186 0.02672437
0.02190582 0.01630476 0.01388928 0.01655456 0.011678 0.02256939
0.02161663 0.01649982 0.02086013 0.0161493 0.01821378 0.01440909
0.01879989 0.01217389 0.02032642 0.01405699 0.01393504 0.01957162
0.01818203 0.01698637 0.02639499 0.02102267 0.01956343 0.01588933
0.01635705 0.01391534 0.01587612 0.01677094 0.01908684 0.02032183
0.01798265 0.02017053 0.01600159 0.01576616 0.01373934 0.01596323
0.01386674 0.01532488 0.01638312 0.0172212 0.01432543 0.01893282
0.02020231]
Save the fitted labels in the encoder:
encoder = LabelEncoder()
encoder = encoder.fit(labels)
encoded_Y = encoder.transform(labels)
dummy_y = np_utils.to_categorical(encoded_Y)
Prediction will give you a class vector. And by using the inverse_transform you will get the label type as from your original input:
prediction = model.predict_classes(query)
label = encoder.inverse_transform(prediction)

org.apache.commons.math3.transform FastFourierTransformer returns different value when input is Complex[] and Double[]

For this question, I'm using the Maths library from Apache
My aim is to get my input back after performing an inverse fourier transform on the absolute value results of the forward fourier transformation of the input values.
When I perform an inverse fourier transform on the Complex value results of the forward fourier transformation of the input, I get the correct output.
What am I possibly doing wrong?
public void fourierTestTemp(){
double[] input = new double[]{1,0,0,0,0,0,0,66,888,0,0,0,0,0,0,0};//Length = 16
double[] result = new double[input.length];//This double array will hold the results of the fourier transform
FastFourierTransformer transformer = new FastFourierTransformer(DftNormalization.UNITARY);//The FastFourierTransformer class by Apache
Complex[] complx = transformer.transform(result, TransformType.FORWARD);//Apply fourier transform to double[]
//Go through Complex value results and obtain absolute value
for (int i = 0; i < complx.length; i++) {
result[i] = complx[i].abs();
}
//Perform inverse transform on the obtained absolute values from the forward transform.
complx = transformer.transform(result, TransformType.INVERSE);
//Go through Complex value results and obtain absolute value
for (int i = 0; i < complx.length; i++) {
result[i] = complx[i].abs();
}
//Print results
for (int i = 0; i < result.length; i++) {
System.out.print(result[i]+",");
}
}
ifft(abs(fft(x))) is only the identity if x is strictly symmetric (can be constructed out of only cosine basis vectors of the DFT). Your test vector is not.
Cosines are symmetric functions. Sines are anti-symmetric.
If x is not symmetric, fft(x) will not be real, thus the abs() function will rotate some of the phase results, thus distorting the ifft output waveform.

How to get an importance of a class using random forest?

I am using randomForest package in my dataset to do a classification, but with the importance command I only get the importance of variables. So, if I want the variable importance by specific categories of variables? Like a specific location in a region variable, how much that region impact in the total. I thought in transformer every class in a dummy, but i don't know if this is really a good idea.
I think you mean "variable importance by specific categories of variables". That has not been implemented, but I guess it would be possible, meaningful and perhaps useful. Of course it would not be meaningful for variables with only two categories.
I would implement it something like:
Train model -> compute out-of-bag prediction performance (OOB-cv1) -> permute specific category by specific variable (reassign this category randomly to other categories, weighted by other category prevalence) -> re-compute out-of-bag- prediction performance (OOB-cv2) -> subtract OOB-cv1 from OOB-cv2
And then I wrote the a function implementing categorical specific variable importance.
library(randomForest)
#Create some classification problem, with mixed categorical and numeric vars
#Cat A of var 1, cat B of var 2 and Cat C of var 3 influence class the most.
X.cat = replicate(3,sample(c("A","B","C"),600,rep=T))
X.val = replicate(2,rnorm(600))
y.cat = 3*(X.cat[,1]=="A") + 3*(X.cat[,2]=="B") + 3*(X.cat[,3]=="C")
y.cat.err = y.cat+rnorm(600)
y.lim = quantile(y.cat.err,c(1/3,2/3))
y.class = apply(replicate(2,y.cat.err),1,function(x) sum(x>y.lim)+1)
y.class = factor(y.class,labels=c("ann","bob","chris"))
X.full = data.frame(X.cat,X.val)
X.full[1:3] = lapply(X.full[1:3],as.factor)
#train forest
rf=randomForest(X.full,y.class,keep.inbag=T,replace=T)
#make function to compute crovalidated classification error
oobErr = function(rf,X) {
preds = predict(rf,X,type="vote",predict.all = T)$individual
preds[rf$inbag!=0]=NA
oob.pred = apply(preds,1,function(x) {
tabx=sort(table(x),dec=T)
majority.vote = names(tabx)[1]
})
return(mean(as.character(rf$y)!=oob.pred))
}
#make function to iterate all categories of categorical variables
#and compute change of OOB class error due to permutation of category
catVar = function(rf,X,nPerm=2) {
ref = oobErr(rf,X)
catVars = which(rf$forest$ncat>1)
lapply(catVars, function(iVar) {
catImp = replicate(nPerm,{
sapply(levels(X[[iVar]]), function(thisCat) {
thisCat.ind = which(thisCat==X[[iVar]])
X[thisCat.ind,iVar] = head(sample(X[[iVar]]),length(thisCat.ind))
varImp = oobErr(rf,X)-ref
})
})
if(nPerm==1) catImp else apply(catImp,1,mean)
})
}
#try it out
out = catVar(rf,X.full,nPerm=4)
print(out) #seems like it works as it should
$X1
A B C
0.14000 0.07125 0.06875
$X2
A B C
0.07458333 0.16083333 0.07666667
$X3
A B C
0.05333333 0.08083333 0.15375000