I am trying to develop a model which detects the items that are picked up by a user from a basket. Is this achievable using Tensorflow? My doubt is since the basket would contain the same items the user picks up (say fruits), is it possible to report the product that in the user's hand(the products that are picked up by the user) in real-time, rather than the items in the basket? Please advice on what would be a good starting point to achieve this.
I have read and watched general object detection methods using Tensor Flow and various models but nothing seems to deal with a similar solution or I am unable to co-relate. If there are any tutorials on achieving, the links for the same would be even more helpful. Thanks in advance. Please bear with me if my question is naive, I am still a newbie at ML and Tensorflow.
It is achievable using Tensorflow. First consider how you would approach the problem. Since you need to distinguish between objects inside basket you will need ANN with some basic classification and object detection.
Here are some links to get you started:
https://www.tensorflow.org/tutorials/keras/basic_classification
https://www.edureka.co/blog/tensorflow-object-detection-tutorial/
Good luck!
Related
I recently learned how to build ML models using sci-kit learn, tensorflow and keras and I applied my learnings to create a model to predict NBA games. My model ended up doing pretty well, so much so that I'd like to create a self-updating database that scrapes basketball-reference.com/leagues/ regularly to update game data so that anytime I want a prediction for an upcoming NBA game I can have the parameters ready without needing to do anything.
For this to work I need to know how to scrape basketball-reference.com/leagues/ and I thought I might try to use Selenium but it turned out to be more difficult than I realized. Most of my knowledge with programming/compsci is self-taught so I'm not sure where to turn or what to do to learn how to use Selenium.
Does anyone have any resources they may suggest or maybe some 'pre-requisites' that I should know before I try using Selenium?
I was thinking about checking out a course on Udemy!
I have a task: to determine the sound source location.
I had some experience working with tensorflow, creating predictions on some simple features and datasets. I assume that for this task, there would be necessary to analyze the sound frequences and probably other related data on training and then prediction steps. The sound goes from the headset, so human ear is able to detect the direction.
1) Did somebody already perform that? (unfortunately couldn't find any similar project)
2) What kind of caveats could I meet while trying to achieve that?
3) Am I able to do that using this technology approach? Are there any other sound processing frameworks / technologies / open source projects that could help me ?
I am asking that here, since my research on google, github, stackoverflow didn't show me any relevant results on that specific topic, so any help is highly appreciated!
This is typically done with more traditional DSP with multiple sensors. You might want to look into time difference of arrival(TDOA) and direction of arrival(DOA). Algorithms such as GCC-PHAT and MUSIC will be helpful.
Issues that you might encounter are: DOA accuracy is function of the direct to reverberant ratio of the source, i.e. the more reverberant the environment the harder it is to determine the source location.
Also you might want to consider the number of location dimensions you want to resolve. A point in 3D space is much more difficult than a direction relative to the sensors
Using ML as an approach to this is not entirely without merit but you will have to consider what it is you would be learning, i.e. you probably don't want to learn the test rooms reverberant properties but instead the sensors spatial properties.
Tensorboard seems to arbitrarily select which nodes belong to the main graph and which do not. During graph visualization I can manually add/remove the nodes but it is tedious to do it every run.
Is there a way to programmatically embed this information (which nodes belong on the main graph) during writing the graph summary?
According to this github issue , it's not feasible at the moment.
And according to this quote :
Thanks #lightcatcher for the suggestion. #danmane, please take a look
at this issue. If it is something we will not do in the short-term
maybe mark it contributions welcome. If it is something you are
planning to include in your plugin API anyways, please close the issue
to keep the backlog clear.
, and the status of the issue (contributions:welcomed), it's not something that is to be expected in the short term.
This year Google produced 5 different packages for seq2seq:
seq2seq (claimed to be general purpose but
inactive)
nmt (active but supposed to be just
about NMT probably)
legacy_seq2seq
(clearly legacy)
contrib/seq2seq
(not complete probably)
tensor2tensor (similar purpose, also
active development)
Which package is actually worth to use for the implementation? It seems they are all different approaches but none of them stable enough.
I've had too a headache about some issue, which framework to choose? I want to implement OCR using Encoder-Decoder with attention. I've been trying to implement it using legacy_seq2seq (it was main library that time), but it was hard to understand all that process, for sure it should not be used any more.
https://github.com/google/seq2seq: for me it looks like trying to making a command line training script with not writing own code. If you want to learn Translation model, this should work but in other case it may not (like for my OCR), because there is not enough of documentation and too little number of users
https://github.com/tensorflow/tensor2tensor: this is very similar to above implementation but it is maintained and you can add more of own code for ex. reading own dataset. The basic usage is again Translation. But it also enable such task like Image Caption, which is nice. So if you want to try ready to use library and your problem is txt->txt or image->txt then you could try this. It should also work for OCR. I'm just not sure it there is enough documentation for each case (like using CNN at feature extractor)
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/seq2seq: apart from above, this is just pure library, which can be useful when you want to create a seq2seq by yourself using TF. It have a function to add Attention, Sequence Loss etc. In my case I chose that option as then I have much more freedom of choosing the each step of framework. I can choose CNN architecture, RNN cell type, Bi or Uni RNN, type of decoder etc. But then you will need to spend some time to get familiar with all the idea behind it.
https://github.com/tensorflow/nmt : another translation framework, based on tf.contrib.seq2seq library
From my perspective you have two option:
If you want to check the idea very fast and be sure that you are using very efficient code, use tensor2tensor library. It should help you to get early results or even very good final model.
If you want to make a research, not being sure how exactly the pipeline should look like or want to learn about idea of seq2seq, use library from tf.contrib.seq2seq.
I'm currently using a Processing Kinect library which supplies a depth map. I was wondering how I could take that and use it to create a 2D skeleton, if possible. Not looking for any code here, just a general process I could use to achieve those results.
Also, given that we've seen this in several of the Kinect games so far, would it be difficult to have multiple skeletons running at once?
Disclaimer: the reason why you still didn't get an answer for this question is probably because that's a current research problem. So I can't give you a direct answer but will try to help with some information and useful resources for this topic.
There are mainly 2 different approaches to create a skeleton from a depth map. The first one is to use machine learning, the second is purely algorithmic.
For the machine learning one, you'd need many samples of people doing a predetermined move, and use those samples to train your favorite learning algorithm. That's the approach that was taken and implemented by Microsoft in the XBox (source), it works really well BUT you need millions of samples to make it reliable... quite a drawback.
The "algorithmic" approach (understand without using a training set) can be done in many different ways and is a research problem. It's often based on modeling the possible body postures and trying to match that with the depth image received. That's the approach that was chosen by PrimeSense (the guys behind the kinect depth camera technology) for their skeleton tracking tool NITE.
The OpenKinect community maintains a wiki where they list some interesting research material about this topic. You might also be interested in this thread on the OpenNI mailing list.
If you're looking for an implementation of a skeleton tracking tool, PrimeSense released NITE (closed source), the one they made: it's part of the OpenNI framework. That's what's used in most of the videos you might have seen that involve skeleton tracking. I think it's able to handle up to 2 skeletons at the same time, but that requires confirmation.
The best solution is to use FAAST (http://projects.ict.usc.edu/mxr/faast/) which requires OpenNI. I have struggled to get OpenNI to work on my computer. I have not seen an approach yet using Code Laboratories' CL NUI.
An algorithmic approach is http://code.google.com/p/skeletonization/ but you may have a problem because your depthmap only represents surfaces and no closed objects.