is TFX (Tensorflow Extended) being used in big companies

is TFX (Tensorflow Extended) being used in big companies - tensorflow

I've recently stumbled upon tensorflow extended, and I've finally started to understand the need and uses of it. It makes the whole machine learning pipeline a lot easier and looks like it can be used to automate the tasks. I wanted to ask seniors in this field, are big companies using it, is it extensively being used or will it get obsolete
Thank you

From the article Towards ML Engineering: A Brief History Of TensorFlow Extended (TFX)
TFX is used by thousands of users within Alphabet, and it powers hundreds of popular Alphabet products, including Cloud AI services on Google Cloud Platform (GCP). On any given day there are thousands of TFX pipelines running, which are processing exabytes of data and producing tens of thousands of models, which in turn are performing hundreds of millions of inferences per second.
From the presentation TensorFlow Extended An End to End Machine Learning Platform for TensorFlow:
On the TFX case studies page, you can see companies like:
Airbnb
Airbus
ARM
Coca Cola
Intel
Lenovo
PayPal
Spotify
Twitter
...

Related

Kubernetes + TF serving - how to use hundred of ML models without running hundred of idle pods up and running?

I have hundreds of models, based on categories, projects,s, etc. Some of the models are heavily used while other models are not used very frequently.
How can I trigger a scale-up operation only in case needed (For the models that are not frequently used), instead of running hundreds of pods serving hundreds of models while most of them are not being used - which is a huge waste of computing resources.

What you are trying to do is to scale deployment to zero when these are not used.
K8s does not provide such functionality out of the box.
You can achieve it using Knative Pod Autoscaler.
Knative is probably the most mature solution available at the moment of writing this answer.
There are also some more experimental solutions like osiris or zero-pod-autoscaler you may find interesting and that may be a good fit for your usecase.

What is needed for a recommendation engine based on word/text input

I'm new to the Machine-Learning (AI) technology. I'm developing a messenger app for Android/IOs where I would like to recommend the users based on the texts/word/conversation a product from a relative small product portfolio.
Example 1:
In case the user of the messenger writes a sentence including the words "vine", "dinner", "date" the AI should recommend a bottle of vine to the user.
Example 2:
In case the user of the app writes that he has drunk a good coffee this morning, the AI should recommend a mug to the user.
Example 3:
In case the user writes something about a cute boy she met last day, the AI should recommend a "teddy bear" to the user.
I'm a Software Developer since almost 20 year with experience in the development of C/C++/Java based application (Android and IOs apps) as well as some experience in Google Cloud Platform. The ML/AI technology is completely new to me. Okay, I know the basics (input data is needed to train the ML/AI system etc.), but I wonder If there is already a framework which could help me to develop such a system which solves the above described uses-case.
I would appreciate it, if you could give me some hints where and how to start.
Thank you and regards

It is definitely possible to implement such an application, in case you want to do it in Google Cloud you will need some understanding of Tensorflow.
First of all, I recommend to you to do the Machine Learning Crash Course, for a good introduction to Machine Learning and to start to familiarize yourself with TensorFlow. Afterwards I recommend to take a look into Tensorflow tutorials which will give you a more practical introduction to Tensorflow, and include various examples on building/training/testing models.
Once you are famirialized with Tensorflow, you can jump into learning how to run jobs in the Machine Learning engine, you can start by following the quickstart. The documentation includes detailed guides on how to use the ml-engine, plus multiple samples and tutorials.
Since I believe that your application would fall into the Recommender System type, here you can see an example model, in Google Cloud ML Engine, on how to recommend items to users based on his previous searches. In your case, you would have to build a model in order to recommend items to users based on his previous words in the sentence.
The second option, in case you don't want to go through the hassle of building a new model from scratch, would be to use the Google Cloud Natural Language API, which you can understand as pre-trained models using Google (incredibly big) data. In your case, I believe that the Content Classifying API would help you achieve what your application intends to do, however, the outputs (which you can see here) are limited to what the model was trained to do, and might not be specific enough for your application, however it is an easy solution and you can still profit of this API in order to extract labels/information and send it as input to another model.
I hope that these links provide you with some foundations on what is possible to do with Tensorflow in the ML Engine, and are useful to you.

IPA (International Phonetic Alphabet) Transcription with Tensorflow

I'm looking into designing a software platform that will aid linguists and anthropologists in their study of previously unstudied languages. Statistics show that around 1,000 languages exist that have never been studied by a person outside of their respective speaker groups.
My goal is to utilize TensorFlow to make a platform that will allow linguists to study and document these languages more efficiently, and to help them create written systems for the ones that don't have a written system already. One of their current methods of accomplishing such a task is three-fold: 1) Record a native speaker conversing in the language, 2) Listening to that recording and trying to transcribe it into the IPA, 3) From the phonetics, analyzing the phonemics and phonotactics of the language to eventually create a written system for the speaker.
My proposed platform would cut that research time down from a minimum of a year to a maximum of six months. Before I start, I have some questions...
What would be required to train TensorFlow to transcribe live audio into the IPA? Has this already been done? and if so, how would I utilize a previous solution for this project? Is a project like this even possible with TensorFlow? if not, what would you recommend using instead?
My apologies for the magnitude of this question. I don't have much experience in the realm of machine learning, as I am just beginning the research process for this project. Any help is appreciated!

I guess I will take a first shot at answering this. Since the question is pretty general, my answer will have to be pretty general as well.
What would be required. At the very least you would have to have a large dataset of pre-transcribed data. Ideally a large amount of spoken language audio mapped to characters in the phonetic alphabet, so the system could learn the sound of individual characters rather than whole transcribed words. If such a dataset doesn't exist, a less granular dataset could be used, mapping single words to their transcriptions. Then you would need a model, that is the actual neural network architecture implemented in code. And lastly you would need some computing resources. This is not something you can train casually, you would either have to buy some time in a cloud based machine learning framework (like Google Cloud ML) or build a fairly expensive machine to train at home.
Has this been done? I don't know. I don't think so. There have been published papers reporting various degrees of success at training systems to transcribe speech. Here is one, for example, http://deeplearning.stanford.edu/lexfree/lexfree.pdf It seems that since the alphabet you want to transcribe to is specifically designed to capture the way words sound rather than just write down the words you might have more success at training such a model.
Is it possible with TensorFlow. Yes, most likely. TensorFlow is well suited for implementing most modern deep learning architectures. Unless you end up designing some really weird and very original model for this purpose, TensorFlow should work just fine.
Edit: after some thought in part 1, you would have to use a dataset mapping spoken words to their transcriptions, since I expect that the same sound pronounced separately would be different from when the same sound is used in a word.

This has actually been done, albeit in PyTorch, by a group at CMU: https://github.com/xinjli/allosaurus

Which OCR was used by Google for making Google Books?

And especially did they release some API to use it in our development?
I think that an OCR trained over million of books should have simply become perfect and I've also heard that Google improved a lot his algorithm through neural networks for the recognizing of numbers of houses in street view.
So I really wonder if this API are available.

Basically for google books , Google customs ocropus opensource OCR with a lot of techniques.
The funniest is the crowdsourcing one via the captchas composed of 2 word, one generated and insuring security, the second one is an unrecognized word in google books or streetview (building numbers) ...

Examples of distributed computing tasks relatively common among users

Can you give an example of such tasks?
I'm particularly interested in tasks, relevant to quite large amount of people, which could be solved by using distributed computing. (Not a global projects, such as SETI#Home, Folding#Home, etc)
As example we can take rendering and http://www.renderfarm.fi community.
Cryptocurrencies mining is not relevant.
Thank you!

Well, I don't know much about rendering, but when talking about tasks that can be solved by distributed computing, you will probably want to take a look on Bag-of-Tasks (BoT) applications.
"Bag-of-Tasks applications (those parallel applications whose tasks are
independent) are both relevant and amendable for execution on computational grids. In fact, one can argue that Bag-of-Tasks applications
are the applications most suited for grids, where communication can
easily become a bottleneck for tightly-coupled parallel applications."
This was taken from a paper that talks exactly about Bag-of-Tasks applications with grid computing. You can read the full paper here.
Now finding a task relevant to users is a matter of creativity. This list of distributed computing projects might give you some insights.
Setting up the BOINC server and, mainly, programming the BOINC applications will be the hard tasks here. This BOINC wiki helps you to have a notion of what is needed on the "background" of a BOINC project.

Old question, but fresh answer.
I have my own Distributed Computing Library written completely in C++ (search for gridman raspberry pi).
I am using it for:
- Distributed Neural Networks training / validation
- Distributed raytracing (for fun)
- Distributed MD5 crunching (for fun)
- Distributed WEP crunching (for fun)
- Distributed WPA crunching (for fun)
And in general, i always think of it this way: If something takes too long time for me, then i split this into several PC's. Real world examples?
Take Investment Banking for example, all these models have to be calculated milion times with different parameters.
Take Neural Networks - a good example, learning takes ages (depends on data) - if you split this into 10 PC, your results are obtained 10 times faster.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas