Training the global and local model in federated learning - tensorflow

While I am studying Federated Learning, I have some questions that popped up in my mind that needed some clarification.
We first have defined clients, each client will be split into training and testing sets. The training data are used to train the local models. Now, what testing data are used for? are they used to test the global model? or to test each local model?
when training the global model, we first calculate the resulted weight of each local model, and then send it to the global model. In modeling the local clients, is there any validity check on the model itself before sending to the global model or it is sent anyway and then it will be updated by the global model.
Are there any papers explaining these points?

Testing data is used to check your model accuracy. This can be useful for both local model and global model. However, since the objective of the federated learning is to build a unique global model, I would use the test data with the global model. There are, however, some approaches in which the local models'accuracy against a test set are used to give a weight to the local model before the "fusion" into the global model. This is sometimes reffered to as weighted FedAvG (federated averaging)
In a "controlled" Federated Learning scenario, there is no reason to check each local model before being sent to the master. However, in realistic scenario, there are a lot of considerations regarding security that should be considered, therefore you might need something more robust than a simple "validity check"

Related

Serving hundreds of models with Tensorflow serving

I would like to serve about ~600 models with Tensorflow Serving.
I am trying to find a solution to eventually reduce the number of models:
My models have the same architecture, only the weights changes.
Is it possible to load only one model and changing the weights?
Would it be possible to aggregate all those models together and effectively, the first layer of the model would be an ID and the input features for that model?
Has anyone tried having couple of hundreds models running on one machine? I have find this cortex solution, but wanted to avoid using another tech.
https://towardsdatascience.com/how-to-deploy-1-000-models-on-one-cpu-with-tensorflow-serving-ec4297bff54b
If the models have the same architecture but different weight, you can try merging all those model into a "super model". However I would need to know more about the task to see if that's possible.
To serve 600 models, you would need a very powerful machine and lot of memory (depending on how big your models are and how much you use them in parallel).
You can either run TFServe yourself, or use a provider such as Inferrd.com/Google/AWS.

How to deploy a live learning tensor flow model in cloud?

How do I deploy a tensor flow model in cloud which can learn and update the weights when given as input . Since most of the deployment methods I saw involved model freezing which implied freezing of weights also . Is it possible or is the latter the only way ?
Freezing the model is the most compact form and lets you have a smaller inference node which you can call for just prediction and only has the necessary information to do just that.
If you want to have and model and make it available to learn online and also make inference you could have so it has all the graph loaded with the newest weights. For security save the weights from time to time. Of course you could have two programs one for inference with the latest frozen model and another one that you up from time to time to make a new training, using the last saved weights. I recommend you the second option. Hope it helps!

executing multiple models in tensorflow with a single session

I'm trying to run several models of neural networks in tensorflow in parallel, each model is independent of the rest. It is necessary to create a session for each of the executions I launch with tensorflow or I could reuse the same session for each of the models ?. Thank you
A session is linked to a specific Tensorflow Graph instance. If you want to have one session for all, you need to put all your models in the same graph. This may cause you naming problems for tensors and is IMO generally a bad idea (you should keep things that are not related to each other separate).
Having everything in the same graph also raises your model's resources requirements (you always load everything even if you run only a sub-graph), which is another reason to split things in independent graphs. With independent graphs, you'll have to use multiple sessions.

Deep Learning with TensorFlow on Compute Engine VM

I'm actualy new in Machine Learning, but this theme is vary interesting for me, so Im using TensorFlow to classify some images from MNIST datasets...I run this code on Compute Engine(VM) at Google Cloud, because my computer is to weak for this. And the code actualy run well, but the problam is that when I each time enter to my VM and run the same code I need to wait while my model is training on CNN, and after I can make some tests or experiment with my data to plot or import some external images to impruve my accuracy etc.
Is There is some way to save my result of trainin model just once, some where, that when I will decide for example to enter to the same VM tomorrow...and dont wait anymore while my model is training. Is that possible to do this ?
Or there is maybe some another way to do something similar ?
You can save a trained model in TensorFlow and then use it later by loading it; that way you only have to train your model once, and use it as many times as you want. To do that, you can follow the TensorFlow documentation regarding that topic, where you can find information on how to save and load the model. In short, you will have to use the SavedModelBuilder class to define the type and location of your saved model, and then add the MetaGraphs and variables you want to save. Loading the saved model for posterior usage is even easier, as you will only have to run a command pointing to the location of the file in which the model was exported.
On the other hand, I would strongly recommend you to change your working environment in such a way that it can be more profitable for you. In Google Cloud you have the Cloud ML Engine service, which might be good for the type of work you are developing. It allows you to train your models and perform predictions without the need of an instance running all the required software. I happen to have worked a little bit with TensorFlow recently, and at first I was also working with a virtualized instance, but after following some tutorials I was able to save some money by migrating my work to ML Engine, as you are only charged for the usage. If you are using your VM only with that purpose, take a look at it.
You can of course consult all the available documentation, but as a first quickstart, if you are interested in ML Engine, I recommend you to have a look at how to train your models and how to get your predictions.

How can I change the network dynamically in tensorflow?

I have a deep fully connected network.
I want to be able to change the structure of middle layers of the network dynamically.
What is the best way of doing that?
What I did right now is to create an output placeholder for my network. I thought I will create a network dynamically by using feed_dict. However, when I run it it says.
`ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ... `
Tensorflow won't make this easy for you. Once you define the graph and open a session it's fixed. I believe you need to define a new graph, copy over your variables, and move on from there every time you want to alter the architecture. Kinda annoying for experimenting with this kind of stuff.
I have a friend/fellow researcher who's been experimenting with dynamic neural network architectures and is tackling this in pytorch, which has specific support for dynamically altering network architectures.