I am building a federated learning model using Tensorflow Federated.
Based on what I have read in the tutorials and papers, I understood that the state-of-the-art method (FedAvg) is working by selecting a random subset of clients at each round.
My concern is:
I am having a small number of clients. Totally I have 8 clients, I select 6 clients for training and I kept 2 for testing.
All of the data are provided on my local device, so I am using the TFF as the simulation environment.
If I use all of the 6 clients in all of the rounds during federated communication rounds, would this be a wrong execution of the FedAvg method?
Note that I am planning also to use the same experiment used in this paper. That aims to use different server optimization methods and compare their performance. So, would (all clients participating procedure) works here or not?
Thanks in advance
This is certainly a valid application of FedAvg and the variants proposed in the linked paper, though one that is only studied empirically in a subset of the literature. On the other hand, many theoretical analyses of FedAvg assume a similar situation to the one you're describing; at the bottom of page 4 of that linked paper, you will see that the analysis is performed in this so-called 'full participation' regime, where every client participates on every round.
Often the setting you describe is called 'cross silo'; see, e.g., section 7.5 of Advances and Open Problems in Federated Learning, which will also contain many useful pointers for the cross-silo literature.
Finally, depending on the application, consider that it may be more natural to literally train on all clients, reserving portions of each clients' data for validation and test. Questions around natural partitions of data to model the 'setting we care about' are often thorny in the federated setting.
Related
I'm topic modeling a corpus of English 20th century correspondence using LDA and I've been using topic coherence (as well as silhouette scores) to evaluate my topics. I use gensim's CoherenceModel with c_v coherence and the highest I've ever gotten was a 0.35 score in all the models I've tested, even in the topics that make the most sense to me in qualitative evaluation, even after extensive pre-processing and hyperparameter comparison.
So I basically accepted that that's the best I'd get, but in order to write about it now I've been reading up on topic coherence and I've understood it's a pipeline and it models human judgement. One thing I can't seen to find clear info on, though: Is it based exclusively on calculations made on my corpus, or is it based on some external data as well? Like trained on external corpora that might have nothing to do with my domain? Should I use u_mass instead?
Yes, except u_mass, they all use external reference datasets. However, it may not be a bad thing, as those reference datasets provide richer information.
Part of federated learning research is based on operations performed on the communications between the server and clients such as dropping part of the updates (drop some gradients describing a model) exchanged between clients and server or discarding an update from a specific client in a certain communication round. I want to know if such capabilities are supported by Tensorflow-federated (TFF) framework and how they are supported because, from a first look, it seems to me the level of abstraction of TFF API does not allow such operations. Thank you.
TFF's language design intentionally avoids a notion of client identity; there is desire to avoid making a "Client X" addressable and discarding its update or sending it different data.
However, there may be a way to run simulations of the type of computations mentioned. TFF does support expressing the following:
Computations that condition on properties of tensors, for example ignore an update that has nan values. One way this could be accomplished would be by writing a tff.tf_computation that conditionally zeros out the weight of updates before tff.federated_mean. This technique is used in tff.learning.build_federated_averaing_process()
Simulations that run a different computations on different sets of clients (where a set maybe a single client). Since the reference executor parameterizes clients by the data they posses, a writer of TFF could write two tff.federated_computations, apply them to different simulation data, and combine the results.
I'm looking into designing a software platform that will aid linguists and anthropologists in their study of previously unstudied languages. Statistics show that around 1,000 languages exist that have never been studied by a person outside of their respective speaker groups.
My goal is to utilize TensorFlow to make a platform that will allow linguists to study and document these languages more efficiently, and to help them create written systems for the ones that don't have a written system already. One of their current methods of accomplishing such a task is three-fold: 1) Record a native speaker conversing in the language, 2) Listening to that recording and trying to transcribe it into the IPA, 3) From the phonetics, analyzing the phonemics and phonotactics of the language to eventually create a written system for the speaker.
My proposed platform would cut that research time down from a minimum of a year to a maximum of six months. Before I start, I have some questions...
What would be required to train TensorFlow to transcribe live audio into the IPA? Has this already been done? and if so, how would I utilize a previous solution for this project? Is a project like this even possible with TensorFlow? if not, what would you recommend using instead?
My apologies for the magnitude of this question. I don't have much experience in the realm of machine learning, as I am just beginning the research process for this project. Any help is appreciated!
I guess I will take a first shot at answering this. Since the question is pretty general, my answer will have to be pretty general as well.
What would be required. At the very least you would have to have a large dataset of pre-transcribed data. Ideally a large amount of spoken language audio mapped to characters in the phonetic alphabet, so the system could learn the sound of individual characters rather than whole transcribed words. If such a dataset doesn't exist, a less granular dataset could be used, mapping single words to their transcriptions. Then you would need a model, that is the actual neural network architecture implemented in code. And lastly you would need some computing resources. This is not something you can train casually, you would either have to buy some time in a cloud based machine learning framework (like Google Cloud ML) or build a fairly expensive machine to train at home.
Has this been done? I don't know. I don't think so. There have been published papers reporting various degrees of success at training systems to transcribe speech. Here is one, for example, http://deeplearning.stanford.edu/lexfree/lexfree.pdf It seems that since the alphabet you want to transcribe to is specifically designed to capture the way words sound rather than just write down the words you might have more success at training such a model.
Is it possible with TensorFlow. Yes, most likely. TensorFlow is well suited for implementing most modern deep learning architectures. Unless you end up designing some really weird and very original model for this purpose, TensorFlow should work just fine.
Edit: after some thought in part 1, you would have to use a dataset mapping spoken words to their transcriptions, since I expect that the same sound pronounced separately would be different from when the same sound is used in a word.
This has actually been done, albeit in PyTorch, by a group at CMU: https://github.com/xinjli/allosaurus
i am currently working on an existing infrastructure where i have about a 1000 customer sites connected to about 5 different hubs. A customer site may connect to one or two hubs to ensure reliability but each customer site is connected to at least one hub. I want to ensure if the current system is the best or can be optimised to have better connection from customer sites to hubs, to help improve connectivity and reliability. Can you suggest good Optimisation Algorithms to look into?. Thank you
Sounds like you're doing some variation of the Facility Problem.
This is a well-known problem, and while there are algorithms that can solve for the global optimum (Djiskra's Algorithm, or other variants of Dynamic Programming), they do not scale well (i.e. you run into the curse of dimensionality). You could try this, but 1000 sounds already pretty big (depends on your exact problem formulation though).
I'd recommend taking a look at this coursera mooc Discrete Optimization. You don't have to take the whole course, but in the "Assignments" section of the video lectures, he also explains a variant of the Facility problem, some possible approaches to think about, and once you've decided which one you want to use, you can look deeper into that particular approach.
Can you give an example of such tasks?
I'm particularly interested in tasks, relevant to quite large amount of people, which could be solved by using distributed computing. (Not a global projects, such as SETI#Home, Folding#Home, etc)
As example we can take rendering and http://www.renderfarm.fi community.
Cryptocurrencies mining is not relevant.
Thank you!
Well, I don't know much about rendering, but when talking about tasks that can be solved by distributed computing, you will probably want to take a look on Bag-of-Tasks (BoT) applications.
"Bag-of-Tasks applications (those parallel applications whose tasks are
independent) are both relevant and amendable for execution on computational grids. In fact, one can argue that Bag-of-Tasks applications
are the applications most suited for grids, where communication can
easily become a bottleneck for tightly-coupled parallel applications."
This was taken from a paper that talks exactly about Bag-of-Tasks applications with grid computing. You can read the full paper here.
Now finding a task relevant to users is a matter of creativity. This list of distributed computing projects might give you some insights.
Setting up the BOINC server and, mainly, programming the BOINC applications will be the hard tasks here. This BOINC wiki helps you to have a notion of what is needed on the "background" of a BOINC project.
Old question, but fresh answer.
I have my own Distributed Computing Library written completely in C++ (search for gridman raspberry pi).
I am using it for:
- Distributed Neural Networks training / validation
- Distributed raytracing (for fun)
- Distributed MD5 crunching (for fun)
- Distributed WEP crunching (for fun)
- Distributed WPA crunching (for fun)
And in general, i always think of it this way: If something takes too long time for me, then i split this into several PC's. Real world examples?
Take Investment Banking for example, all these models have to be calculated milion times with different parameters.
Take Neural Networks - a good example, learning takes ages (depends on data) - if you split this into 10 PC, your results are obtained 10 times faster.