training images? Considerations for selection - tensorflow

I'm relatively new and am still learning the basics. I've used NVIDIA DIGITS in the past, and am now looking at Tensorflow. While I've been able to fumble my way around creating some models for a few projects I'm working on, I really want to start diving deeper into what I'm doing, how I'm doing it, and ultimately a better understanding of why.
One area that I would like to start with is the Images that I'm using for training and testing. Can anyone point me to a blog, an article, a paper, or give me some insight in what I need to consider when selecting images to train a new model on. Up until recently, I've been using datasets that have already been selected and that are available for download. Lets say I'm going to start working on a project that involves object detection of ships from a variety of distances and angles.
So my thoughts would be
1) I need a large quantity of images.
2) The images need to contain ships of the different types I would like to detect. (lets just say one class, ships, don't care what type of ships)
3) I also need to have images that have a great variety of distance perspective for the different types of ships.
Ultimately, my thoughts are that the images need to reflect the distance, perspective, and types of ships I would ideally want to identify from the video. Seems simple enough.
However, there are a number of questions
Does the images need to be the same/similar resolution as the camera I'll be using, for best results?
Does the images all need to be the same resolution?
Can I use a single image and just digitally zoom out on the image to give the illusion of different distances?
I'm sure there are a number of other questions that I'm not asking, or should be asking. Are there any guide lines available for creating a solid collection of images to use when creating the collection of images for training and validation?

I recommend thinking through end to end, like would you need to classify ship models as a next step? I recommend going through well known public datasets and actually work with the structure, how to store data, labels, how to handle preprocessing etc.
More importantly, what are you trying to achieve? Talking to experts in the topic does help greatly while preparing your own dataset.
Use open source images if you can, e.g. flickr, google, imagenet.
No, you don't need them to be the same resolution.
It is not ideal to zoom in/out images to use in different categories. Preprocessing images and data augmentation already does this to create more distant representations of the same class. This is why I would recommend hands on approach with an existing dataset first.
Yes, what you need is many, different representations of classes, and a roughly balanced dataset of classes. If you define your data structure well in the beginning, it will save you a ton of time as you won't have to make changes often.

Related

Is There a Quick, Efficient Way to Add Large Numbers of Labels in Either ArcGIS or QGIS?

In 2007, when I was young and foolish and before I knew about Open Street Map, I started an urban historical map project. I was working in Illustrator, it was going to be an interactive Flash piece, and my process was to draw the maps first, with the thought that I'd label some, but not all, of the street later on.
As we know Flash was began to die about 2010 and I put the project away for a number of years. I picked it up again a couple years ago and continued my earlier practice of just drawing streets and water features, this time with the intention of making it a conventional web map. Now I'm pretty close to finishing the drawing of a five-layer (1871, 1903, 1932, 1952 and 2016) historical map of a medium-sized city, though it still lacks labels.
My problem now is how to add large numbers of labels, many of them duplicates. There could be as many as 10,000 for all five layers, though as a practical matter I may have to settle for a smallish fraction of that number. Based on web searches I gather my workflow is unusual and that mine is therefore an unusual problem.
I've exported my maps and brought them into QGIS and played with the software a little. The process of adding labels to objects doesn't seem terribly efficient or user-friendly, but that's probably due to my unfamiliarity with the program.
So my question is this: Are there any tricks to speed up the painful process of adding large numbers of duplicate labels in either QGIS or ArcGIS? Since so many of the streets exist in all five layers, functionality like the ability to select multiple objects in different layers and edit their attributes simultaneously in the Attribute Table would be a godsend. (Doesn't seem possible.) So would the ability to copy the attributes from one object and paste them onto other objects. Or the ability to do either of these things in Illustrator via a plugin and then export the data along with the shapes to a GIS program.
Thanks for your help!
If I understand the issue correctly I think are several different solutions. When you say that you
Typically for a spatial layer in ArcGIS or QGIS you define how to label all features in a layer once by defining a label scheme to use across all features, 1 or 1 million. This assumes that each feature in the layer has one or more attributes in the associated table for the layer.
How are you converting the Illustrator vectors to a spatial layer? DXF?
You will likely have better/faster responses to this question by posting it to the GIS Stack exchange. https://gis.stackexchange.com/

Tensorflow: how to detect audio direction

I have a task: to determine the sound source location.
I had some experience working with tensorflow, creating predictions on some simple features and datasets. I assume that for this task, there would be necessary to analyze the sound frequences and probably other related data on training and then prediction steps. The sound goes from the headset, so human ear is able to detect the direction.
1) Did somebody already perform that? (unfortunately couldn't find any similar project)
2) What kind of caveats could I meet while trying to achieve that?
3) Am I able to do that using this technology approach? Are there any other sound processing frameworks / technologies / open source projects that could help me ?
I am asking that here, since my research on google, github, stackoverflow didn't show me any relevant results on that specific topic, so any help is highly appreciated!
This is typically done with more traditional DSP with multiple sensors. You might want to look into time difference of arrival(TDOA) and direction of arrival(DOA). Algorithms such as GCC-PHAT and MUSIC will be helpful.
Issues that you might encounter are: DOA accuracy is function of the direct to reverberant ratio of the source, i.e. the more reverberant the environment the harder it is to determine the source location.
Also you might want to consider the number of location dimensions you want to resolve. A point in 3D space is much more difficult than a direction relative to the sensors
Using ML as an approach to this is not entirely without merit but you will have to consider what it is you would be learning, i.e. you probably don't want to learn the test rooms reverberant properties but instead the sensors spatial properties.

In Unity Combine Meshes Vs Instance Objects the Difference

I am in a serious need of optimization of my some Unity projects and i have so many objects which are from 3DsMax, so i am wondering if Combining the meshes would have any effect on the memory/performance or i should leave the objects Instance to each other as it would save me some space.
This arise the question that what is the difference between Combined mesh objects or Instance Objects as it will save a lot of memory and hassle if one realy knows the difference and what is better
Looking forward for some Brief information about the two
Thanks
Combining is useful if you have a lot of unique assets that only appear once or twice in a scene, e.g unique buildings in a 3D FPS, but not cloned houses in a SimCity style game. If you have a model that appears many times in a scene it's more performant to have Unity (automatically) batch them, this is Unity's default behaviour. e.g lets say your scene is in an art gallery; if the gallery contains a dozen distinct sculptures then combine them. If it contains a dozen of the same sculpture don't bother, Unity will batch them for you.
However, you should be wary of using different materials, each material adds to the draw count. So, if you had 10 of the same model but using 5 different materials it's going to be expensive. The way round this is to use a texture atlas with a single material, with different UV mapping for each models. This means you have a lot of different models, but save on render time due to the single material.
Also, be aware that transparent shaders much more expensive than opaque, if you have three semi transparent objects in front of each other that's at least 4 render passes.
As you probably know this is a complex subject with a lot of variables (many more than I can describe here) and is best judged by using the profiler.
Here are some general rules of thumb I've learned while creating a game for mobile which naturally is performance critical:
Use as few a materials as possible
Use as fewer textures as possible, share textures between materials
Recycle models as often as possible. Often a model oriented at a different angle or in a different material can look like a whole new model to the player, particularly if their attention is elsewhere in the game
Use LODS extensively
Ensure your models are clean, remove all unnecessary vertices before importing
After importing think if there's anything about the model that could be represented with less vertices
Good use of normal mapping can pay off, depending on the platform. If you can trade in 1000 verts for a 256 px normal map and 50 verts then do it, otherwise dont bother normal mapping just to save a few verts
I created a tutorial that explains draw calls, static batching, lightmapping etc.
https://www.youtube.com/watch?v=x0t2xylbTo8&t=253s

Using tensorflow to identify lego bricks?

having read this article about a guy who uses tensorflow to sort cucumber into nine different classes I was wondering if this type of process could be applied to a large number of classes. My idea would be to use it to identify Lego parts.
At the moment, a site like Bricklink describes more than 40,000 different parts so it would be a bit different than the cucumber example but I am wondering if it sounds suitable. There is no easy way to get hundreds of pictures for each part but does the following process sound feasible :
take pictures of a part ;
try to identify the part using tensorflow ;
if it does not identify the correct part, take more pictures and feed the neural network with them ;
go on with the next part.
That way, each time we encounter a new piece we "teach" the network its reference so that it can better be recognized the next time. Like that and after hundreds of iterations monitored by a human, could we imagine tensorflow to be able to recognize the parts? At least the most common ones?
My question might sound stupid but I am not into neural networks so any advice is welcome. At the moment I have not found any way to identify a lego part based on pictures and this "cucumber example" sounds promising so I am looking for some feedback.
Thanks.
You can read about the work of Jacques Mattheij, he actually uses a customized version of Xception1 running on https://keras.io/.
The introduction is Sorting 2 Metric Tons of Lego.
In Sorting 2 Tons of Lego, The software Side you can read:
The hard challenge to deal with next was to get a training set large
enough to make working with 1000+ classes possible. At first this
seemed like an insurmountable problem. I could not figure out how to
make enough images and to label them by hand in acceptable time, even
the most optimistic calculations had me working for 6 months or longer
full-time in order to make a data set that would allow the machine to
work with many classes of parts rather than just a couple.
In the end the solution was staring me in the face for at least a week
before I finally clued in: it doesn’t matter. All that matters is that
the machine labels its own images most of the time and then all I need
to do is correct its mistakes. As it gets better there will be fewer
mistakes. This very rapidly expanded the number of training images.
The first day I managed to hand-label about 500 parts. The next day
the machine added 2000 more, with about half of those labeled wrong.
The resulting 2500 parts where the basis for the next round of
training 3 days later, which resulted in 4000 more parts, 90% of which
were labeled right! So I only had to correct some 400 parts, rinse,
repeat… So, by the end of two weeks there was a dataset of 20K images,
all labeled correctly.
This is far from enough, some classes are severely under-represented
so I need to increase the number of images for those, perhaps I’ll
just run a single batch consisting of nothing but those parts through
the machine. No need for corrections, they’ll all be labeled
identically.
A recent update is Sorting 2 Tons of Lego, Many Questions, Results.
1CHOLLET, François. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv preprint arXiv:1610.02357, 2016.
I have started this using IBM Watson's Visual Recognition.
I had six different bricks to be recognized on the transport belt background.
I am actually thinking about tensorflow, since I can have it running locally.
The codelab : TensorFlow for Poets, describes almost exactly what you want to achieve,
For a demo of the Watson version:
https://www.ibm.com/developerworks/community/blogs/ibmandgoogle/entry/Lego_bricks_recognition_with_Watosn_lego_and_raspberry_pi?lang=en

Insert skeleton in 3D model programmatically

Background
I'm working on a project where a user gets scanned by a Kinect (v2). The result will be a generated 3D model which is suitable for use in games.
The scanning aspect is going quite well, and I've generated some good user models.
Example:
Note: This is just an early test model. It still needs to be cleaned up, and the stance needs to change to properly read skeletal data.
Problem
The problem I'm currently facing is that I'm unsure how to place skeletal data inside the generated 3D model. I can't seem to find a program that will let me insert the skeleton in the 3D model programmatically. I'd like to do this either via a program that I can control programmatically, or adjust the 3D model file in such a way that skeletal data gets included within the file.
What have I tried
I've been looking around for similar questions on Google and StackOverflow, but they usually refer to either motion capture or skeletal animation. I know Maya has the option to insert skeletons in 3D models, but as far as I could find that is always done by hand. Maybe there is a more technical term for the problem I'm trying to solve, but I don't know it.
I do have a train of thought on how to achieve the skeleton insertion. I imagine it to go like this:
Scan the user and generate a 3D model with Kinect;
1.2. Clean user model, getting rid of any deformations or unnecessary information. Close holes that are left in the clean up process.
Scan user skeletal data using the Kinect.
2.2. Extract the skeleton data.
2.3. Get joint locations and store as xyz-coordinates for 3D space. Store bone length and directions.
Read 3D skeleton data in a program that can create skeletons.
Save the new model with inserted skeleton.
Question
Can anyone recommend (I know, this is perhaps "opinion based") a program to read the skeletal data and insert it in to a 3D model? Is it possible to utilize Maya for this purpose?
Thanks in advance.
Note: I opted to post the question here and not on Graphics Design Stack Exchange (or other Stack Exchange sites) because I feel it's more coding related, and perhaps more useful for people who will search here in the future. Apologies if it's posted on the wrong site.
A tricky part of your question is what you mean by "inserting the skeleton". Typically bone data is very separate from your geometry, and stored in different places in your scene graph (with the bone data being hierarchical in nature).
There are file formats you can export to where you might establish some association between your geometry and skeleton, but that's very format-specific as to how you associate the two together (ex: FBX vs. Collada).
Probably the closest thing to "inserting" or, more appropriately, "attaching" a skeleton to a mesh is skinning. There you compute weight assignments, basically determining how much each bone influences a given vertex in your mesh.
This is a tough part to get right (both programmatically and artistically), and depending on your quality needs, is often a semi-automatic solution at best for the highest quality needs (commercial games, films, etc.) with artists laboring over tweaking the resulting weight assignments and/or skeleton.
There are algorithms that get pretty sophisticated in determining these weight assignments ranging from simple heuristics like just assigning weights based on nearest line distance (very crude, and will often fall apart near tricky areas like the pelvis or shoulder) or ones that actually consider the mesh as a solid volume (using voxels or tetrahedral representations) to try to assign weights. Example: http://blog.wolfire.com/2009/11/volumetric-heat-diffusion-skinning/
However, you might be able to get decent results using an algorithm like delta mush which allows you to get a bit sloppy with weight assignments but still get reasonably smooth deformations.
Now if you want to do this externally, pretty much any 3D animation software will do, including free ones like Blender. However, skinning and character animation in general is something that tends to take quite a bit of artistic skill and a lot of patience, so it's worth noting that it's not quite as easy as it might seem to make characters leap and dance and crouch and run and still look good even when you have a skeleton in advance. That weight association from skeleton to geometry is the toughest part. It's often the result of many hours of artists laboring over the deformations to get them to look right in a wide range of poses.