Federated learning with tensorflowjs - tensorflow

I am implementing federated learning with tensorflowjs. But i am kind of stuck in the federated averaging process. The idea is simple: get updated weights from multiple clients and average it in the server.
I have trained a model on browser, got the updated weights via model.getWeights() method and sent the weights to server for averaging.
//get weights from multiple clients(happens i client-side)
w1 = model.getWeights(); //weights from client 1
w2 = model.getWeights(); //weights from client 2
//calculate average of the weights(server-side)
var mean_weights= [];
let length = w1.length; // length of all weights_array is same
for(var i=0; i<length; i++){
let sum = w1[i].add(w2[i]);
let mean = sum.divide(2); //got confused here, how to calculate mean of tensors ??
mean_weights.push(mean);
}
//apply updates to the model(both client-side and server-side)
model.setWeights(mean_weights);
So my question is:
How do I calculate the mean of tensor array ?
Also, is this the right approach to perform federated averaging via tensorflowjs ?

Yes, but be careful. You can average two tensors with tf.mean like https://stackoverflow.com/users/5069957/edkeveked said. However, remember axis=0 should be shortened to just 0 in JavaScript.
Just to rewrite his code in a second way:
const x = tf.tensor([1, 2, 3, 2, 3, 4], [2, 3]);
x.mean(0).print()
However, you asked if you're doing it right, and that depends on if you're averaging as you go or not. There's an issue with a rolling average.
Example:
If you average (10, 20) then 30, you get (22.5) a different number than averaging (20, 30) then 10 (17.5), which is of course different from averaging all three at the same time, which would give you 20.
Averages do not adhere to an order-irrelevance principle once they've been calculated. It's the division part that removes the associative property. So you'll need to either:
A: Store all model weights and calculate a new average each time based on all previous models
or
B: Add a weighting system to your federated average so more recent models do not significantly affect the system.
Which makes sense?
I recommend B in the situation that you:
Don't want to or cannot store every model and weight ever submitted.
You know some models have seen more valid data, and should be weighted appropriately compared to blind models.
You can computer a weighted average adjusting the denominator for your existing model vs your incoming model.
In JavaScript you can do something simple like this to computer a weighted average between two values:
const modelVal1 = 0
const modelVal2 = 1
const weight1 = 0.5
const weight2 = 1 - weight1
const average = (modelVal1 * weight1) + (modelVal2 * weight2)
The above code is your common evenly weighted average, but as you adjust the weight1, you are rebalancing the scales to significantly adjust the outcome in favor of modelVal1 or modelVal2.
Obviously, you'll need to convert the JavaScript I have shown into tensor mathematical functions, but that's trivial.
Iterate averaging (or weighted average) with weights decaying is often used in Federated learning. See Iterate averaging as regularization for stochastic gradient descent, and Server Averaging for Federated Learning.

To calculate the mean of 2 tensors, you can use tf.mean
const x = tf.tensor1d([1, 2, 3]);
const y = tf.tensor1d([2, 3, 4]);
tf.stack([x, y]).print()
const mean = tf.stack([x, y]).mean(axis=0)
mean.print();

Related

How can I find a standard method of predicting next values of a stock market using Tensorflow?

Thank you for reading. I'm not good at English.
I am wondering how to predict and get future time series data after model training. I would like to get the values after N steps.
I wonder if the time series data has been properly learned and predicted.
How i do this right get the following (next) value?
I want to get the next value using like model.predict or etc
I have x_test and x_test[-1] == t, so the meaning of the next value is t+1, t+2, .... t+n,
In this example I want to get predictions of the next t+1, t+2 ... t+n
First
I tried using stock index data
inputs = total_data[len(total_data) - forecast - look_back:]
inputs = scaler.transform(inputs)
X_test = []
for i in range(look_back, inputs.shape[0]):
X_test.append(inputs[i - look_back:i])
X_test = np.array(X_test)
predicted = model.predict(X_test)
but the result is like below
The results from X_test[-20:] and the following 20 predictions looks like same.
I'm wondering if it's the correct train and predicted value.
I'm wondering if it was a right training and predict.
full source
The method I tried first did not work correctly.
Seconds
I realized something is wrong, I tried using another official data
So, I used the time series in the Tensorflow tutorial to practice predicting the model.
a = y_val[-look_back:]
for i in range(N-step prediction): # predict a new value n times.
tmp = model.predict(a.reshape(-1, look_back, num_feature)) # predicted value
a = a[1:] # remove first
a = np.append(a, tmp) # insert predicted value
The results were predicted in a linear regression shape very differently from the real data.
Output a linear regression that is independent of the real data:
full source (After the 25th line is my code.)
I'm really very curious what is a standard method of predicting next values of a stock market.
Thank you for reading the long question. I seek advice about your priceless opinion.
Q : "How can I find a standard method of predicting next values of a stock market...?"
First - salutes to C64 practitioner!
Next, let me say, there is no standard method - there cannot be ( one ).
Principally - let me draw from your field of a shared experience - one can easily predict the near future flow of laminar fluids ( a technically "working" market instrument - is a model A, for which one can derive a better or worse predictive tool )
That will never work, however, for turbulent states of the fluids ( just read the complexity of the attempts to formulate the many-dimensional high-order PDE for a turbulence ( and it still just approximates the turbulence ) ) -- and this is the fundamentally "working" market ( after some expected fundamental factor was released ( read NFP or CPI ) or some flash-news was announced in the news - ( read a Swiss release of currency-bonding of CHF to some USD parity or Cyprus one time state tax on all speculative deposits ... the financial Big Bangs follow ... )
So, please, do not expect one, the less any simple, model for reasonably precise predictions, working for both the laminar and turbulent fluidics - the real world is for sure way more complex than this :o)

Normalized Mutual Information in Tensorflow

Is that possible to implement normalized mutual information in Tensorflow? I was wondering if I can do that and if I will be able to differentiate it. Let's say that I have predictions P and labels Y in two different tensors. Is there an easy way to use normalized mutual information?
I want to do something similar to this:
https://course.ccs.neu.edu/cs6140sp15/7_locality_cluster/Assignment-6/NMI.pdf
Assume your clustering method gives probability predictions/membership functions p(c|x), e.g., p(c=1|x) is the probability of x in the first cluster. Assume y is the ground truth class label for x.
The normalized mutual information is .
The entropy H(Y) can be estimated following this thread: https://stats.stackexchange.com/questions/338719/calculating-clusters-entropy-python
By definition, the entropy H(C) is , where .
The conditional mutual information where , and .
All terms involving integral can be estimated using sampling, i.e., average over training samples. The overall NMI is differentiable.
I did not misunderstand your question. I was assuming you used a neural network model which outputs logits as you did not provide any info. Then you need to normalise the logits to get p(c|x).
There may be other ways to estimate NMI, but if you discretize the output of whatever model you use, you cannot differentiate them.
TensorFlow code
Assume we have label matrix p_y_on_x and cluster predictions p_c_on_x. Each row of them corresponds to an observation x; each column corresponds to the probability of x in each class and cluster (so each row sums up to one). Further assume uniform probability for p(x) and p(x|y).
Then NMI can then be estimated as below:
p_y = tf.reduce_sum(p_y_on_x, axis=0, keepdim=True) / num_x # 1-by-num_y
h_y = -tf.reduce_sum(p_y * tf.math.log(p_y))
p_c = tf.reduce_sum(p_c_on_x, axis=0) / num_x # 1-by-num_c
h_c = -tf.reduce_sum(p_c * tf.math.log(p_c))
p_x_on_y = p_y_on_x / num_x / p_y # num_x-by-num_y
p_c_on_y = tf.matmul(p_c_on_x, p_x_on_y, transpose_a=True) # num_c-by-num_y
h_c_on_y = -tf.reduce_sum(tf.reduce_sum(p_c_on_y * tf.math.log(p_c_on_y), axis=0) * p_y)
i_y_c = h_c - h_c_on_y
nmi = 2 * i_y_c / (h_y + h_c)
In practice, please be very careful on the probabilities as they should be positive to avoid numeric overflow in tf.math.log.
Please comment if you find any mistakes.

K-Fold Cross-Validation How Many Folds?

Working with K-Fold Cross-Validation I commonly see 5 folds and 10 folds employed. A 1995 paper recommends 10 fold cv. However that conclusion was based on small datasets using models of that time.
I'm just wondering if current use of 5 & 10 folds still harks back to that paper as a convention? Or are there are other good reasons to use 5 or 10 folds rather than say 6, 8, 12 etc?
This is just tradition. These are just nice numbers that people like and divide many things evenly. This works out to nice numbers like 10% and 20% each time. If you used 8, that would 12.5% each. Not as nice a number right?
It's possible for your dataset, another number works better, but it isn't worth the trouble to figure that out. If you tried to publish with 7-fold cross-validation, people would give you funny looks and become suspicious. Stick to the standards.
K-Fold Cross Validation is helpful when the performance of your model shows significant variance based on your Train-Test split.
Using 5 or 10 is neither is a norm nor there is a rule. you can use as many Folds (K= 2, 3, 4, to smart guess).
K fold cross validation is exploited to solve problems where Training data is limited .
I have came across an example in a book (Francois Chollet's book example shared below) where K=4 so it depend on your requirement.
enter code here
`k = 4
num_validation_samples = len(data) // k
np.random.shuffle(data)
validation_scores = []
for fold in range(k):
validation_data = data[num_validation_samples * fold:
num_validation_samples * (fold + 1)]
training_data = data[:num_validation_samples * fold] +
data[num_validation_samples * (fold + 1):
model = get_model()
model.train(training_data)
validation_score = model.evaluate(validation_data)
validation_scores.append(validation_score)
validation_score = np.average(validation_scores)
model = get_model()
model.train(data)
test_score = model.evaluate(test_data)`
Three-fold validation Pictorial Description

weighted regression in SQL

I'm new to SQL, so waiting for someone to shed me some lights hopefully. We got a stored procedure in place using the simple linear regression. Now I want to apply some weighting using a discount factor of lamda, i.e. 1, lamda, lamda^2, ..., lamda^n, while n is the length of the original series.
How should I generate the discounted weight series and apply to the current code structure below?
...
SUM((OASSpline-OASPriorSpline) * (AdjOASDolDur-AdjOASPriorDolDur))/SUM(SQUARE((AdjOASDolDur-AdjOASPriorDolDur))) as Beta, /* Beta = Sxy/Sxx */
SUM(SQUARE((AdjOASDolDur-AdjOASPriorDolDur))) as Sxx,
SUM((OASSpline-OASPriorSpline) * (AdjOASDolDur-AdjOASPriorDolDur)) as Sxy
...
e.g.
If I set discount factor (lamda) = 0.99, my weighting array should be formed generated automatically using the length of 10 from my series:
OASSpline = [1.11,1.45,1.79, 2.14, 2.48, 2.81,3.13,3.42,3.70,5.49]
AdjOASDolDur = [0.75,1.06,1.39, 1.73, 2.10, 2.48,2.85,3.20,3.52,3.61]
OASPriorSpline = 5.49
AdjOASPriorDolDur = 5.61
Weight = [1,0.99,0.9801,0.970299,0.96059601,0.9509900, 0.941480149,0.932065348,0.922744694,0.913517247]
The weighted linear regression should return a beta of 0.81243398, while the current simple linear regression should return a beta of 0.81164174.
Thanks much in advance!
I'll take a stab.
You could look at this article dealing generating sequence numbers and then use the current row number generated as an exponent. Does that work? I think a fair few are bamboozled by the request.

Tensorflow: opt.compute_gradients() returns values different from the weight difference of opt.apply_gradients()

Question: What is the most efficient way to get the delta of my weights in the most efficient way in a TensorFlow network?
Background: I've got the operators hooked up as follows (thanks to this SO question):
self.cost = `the rest of the network`
self.rmsprop = tf.train.RMSPropOptimizer(lr,rms_decay,0.0,rms_eps)
self.comp_grads = self.rmsprop.compute_gradients(self.cost)
self.grad_placeholder = [(tf.placeholder("float", shape=grad[1].get_shape(), name="grad_placeholder"), grad[1]) for grad in self.comp_grads]
self.apply_grads = self.rmsprop.apply_gradients(self.grad_placeholder)
Now, to feed in information, I run the following:
feed_dict = `training variables`
grad_vals = self.sess.run([grad[0] for grad in self.comp_grads], feed_dict=feed_dict)
feed_dict2 = `feed_dict plus gradient values added to self.grad_placeholder`
self.sess.run(self.apply_grads, feed_dict=feed_dict2)
The command of run(self.apply_grads) will update the network weights, but when I compute the differences in the starting and ending weights (run(self.w1)), those numbers are different than what is stored in grad_vals[0]. I figure this is because the RMSPropOptimizer does more to the raw gradients, but I'm not sure what, or where to find out what it does.
So back to the question: How do I get the delta on my weights in the most efficient way? Am I stuck running self.w1.eval(sess) multiple times to get the weights and calc the difference? Is there something that I'm missing with the tf.RMSPropOptimizer function.
Thanks!
RMSprop does not subtract the gradient from the parameters but use more complicated formula involving a combination of:
a momentum, if the corresponding parameter is not 0
a gradient step, rescaled non uniformly (on each coordinate) by the square root of the squared average of the gradient.
For more information you can refer to these slides or this recent paper.
The delta is first computed in memory by tensorflow in the slot variable 'momentum' and then the variable is updated (see the C++ operator).
Thus, you should be able to access it and construct a delta node with delta_w1 = self.rmsprop.get_slot(self.w1, 'momentum'). (I have not tried it yet.)
You can add the weights to the list of things to fetch each run call. Then you can compute the deltas outside of TensorFlow since you will have the iterates. This should be reasonably efficient, although it might incur an extra elementwise difference, but to avoid that you might have to hack around in the guts of the optimizer and find where it puts the update before it applies it and fetch that each step. Fetching the weights each call shouldn't do wasteful extra evaluations of part of the graph at least.
RMSProp does complicated scaling of the learning rate for each weight. Basically it divides the learning rate for a weight by a running average of the magnitudes of recent gradients of that weight.