allocating category to a comment pandas - pandas

My task is to allocate broad and fine category to the text I have in a pandas dataframe.
My df is something like this:
Text
I like this pen
this is the worst light bulb ever
these pants fit me just fine
Desired output:
Text Broad_cat Fine_cat
I like this pen Stationary Pen
this is the worst light bulb ever Electrical light Bulb
these pants fit me just fine Clothing Pants
The text could be from any category, so I cant use a prepared dictionary. These are reviews that I can get from any source. I was hoping that there is an open source python package that can help me with the specific task of categorization of a comment. I already tried YAKE, RAKE, Summa and KeyBERT methods and while each of them are giving me key words, they dont always turn out to be the category. Is this even possible? Any help in this regard is much appreciated.

I presume you have a list of allowed categories?
This a multiclass classification problem.
A fiddly approach is you embed the sentences into some sort of vector space then use a somethign like the softmax function to select the class and then train your model based on training data. This post discusses this.
I think you might be more interested in zero-shot text classification. Hugging face has a pipeline (what of using models for certain tasks) for this with the property candidate_labels. So you should be able to use this with an appropriate model and specify candidate labels... though the underlying model would have support this in some way, but presumably some do. cross-encoder/nli-distilroberta-base appears to support this.
`

Related

How to predict the next item based on the list

I am carrying out a task which I need to know which is the next element based on a list.
Real example:
I have a list of triangles with BLUE and PINK colors of different sizes and these sizes can be alternated.
What I need is to predict what the color of the next triangle will be based on the current list passed. (the size would be great).
I tried to search for Watson AI, TensorFlow and etc. but I couldn't find anything.
I am new to this area and I am looking for that learning.
Could someone indicate something or a set of things that I can use to get closer to that result?
If possible send me an example code or something like that..
If the shapes and colours are random, then it's just guesswork, and a random number generator is probably the best you are going to get.
If there is a limited set, like a pack of cards, then you could use probabilities to guess what is next.
If there is an unlimited set and always a defined pattern where the next colour can be predicted, then you might be able to use a machine learning model. To get started just explore the web and find a getting started with machine learning tutorial. They will be python biased, and scikit-learn is probably all you need.

Methods of labeling human muscles on tensorflow

I want to be able to label all of the muscles on an athletes body. I got a lot of the images that the athletes are almost in the same body pose but the issue that I am running into is that drawing a box around them makes them inaccurate as it ends up overlapping other muscles. Drawing exact lines around them is a bit difficult as they are a lot of smaller muscles and creates inconsistently over 20-30images. I was wondering if there is a way to feed in a human anatomy and then have tensorflow go in and label all of the muscles in given pictures.
Or I was wondering if you all had a different idea on approach this problem that I'm running into.
I don't have anybody else to ask and I've been researching this for awhile so if I missed or overlooked something please forgive me
The way i see is you need to combine with some prepossessing steps to normalize your target object in the image such as:
identify the human,
identify the pose or skeleton (which nowadays many open-source such as openpose-plus),
the pose estimation results can label the limbs, or part of the body from which you can do something either by hand-crafted image processing or other segmentation model.

Cloud Vision API poorly recognizes 7-segment numbers

The simplest example of what I'm trying to recognize:
I use DOCUMENT_TEXT_DETECTION, but in the answer I get the hieroglyphics.
If I use Eng in the ImageContext parameter for the addAllLanguageHints method, then I have 111 in result. Better, but still bad.
Is there any way to indicate that the numbers are recognised or somehow improve the results?
Also, how is the setRepeatedField option in ImageContext is used? I could not find any examples of its use.
Thanks in advance.
Even if it doesn't work out of the box ... you'd need is to classify images using custom labels, when the default labels won't suffice. Cloud Auto ML Vision (select Vision from that blue drop-down menu) let's you train custom models, which can be used to recognize that font. And since the possible amount of shapes is quite limited with that 7-segment display, it shouldn't be too difficult to train it. If you'd get a calculator with a better display, it might also work better. The LCD above looks a little cheap, with those huge spaces and cut-off endings - but nevertheless, one can train it to read that.

using HAAR training for post-it note recognition

I need to be able to detect a variety of coloured post-it notes via a Microsoft Kinect video stream. I have tried using Emgucv for edge detection but it doesn't seem to locate the vertices/edges and also colour segmentation/detection however considering the variety of colours that may not be robust enough.
I am attempting to use HAAR classification. Can anyone suggest the best variety of positive/negative images to use. For example, for the positive images should I take pictures of many different coloured post-it notes in various lighting conditions and orientations? Seeing as it is quite a simple shape ( a square) is using HAAR classification over-complicating things?
I haar classifiers are typically used on black and white images and trigger primarily on morphologic edge like feature. Seems like if you want to find post it notes in an image the easiest method would be to look at colors (since they come in very distinct colors). Have you tried training a SVM of Random forest classifier to detect post it notes based on just color? Once you've identified areas in the image that are probably post it notes you could start looking at things like the shape as additional validation that you are indeed looking at a post it note.
Take a look at the following as an example of how to find rectangles in an image using hough transform:
https://opencv-code.com/tutorials/automatic-perspective-correction-for-quadrilateral-objects/

Suggestions for optimizing a fractal visualization method

I've written up a variation on Melinda Green's Buddhabrot method for visualizing the Mandelbrot set. Here it is:
http://pastebin.com/RH6dD77F
To create an animation I rendered hundreds of the individual images with slight variations. The variation is a transformation of the coefficients of the generating function as if they were an abstract vector in a space of coefficients. All of that produced incredible structures in the video...
http://www.youtube.com/watch?v=S2uMAvL_5Fo
The problem? As you can tell, the quality on each image is rather low because it takes forever using the method I came up with (the copies I have on my computer are a little better quality, but still look like old reel-to-reel movies). I'm hoping to find a few methods for increasing quality or lowering output time.
Thanks for any suggestions. I would really like to produce more detailed versions of these. Obviously there is much more structure in the graininess of these images.
You can try something like boxcounting, http://imagej.nih.gov/ij/plugins/fraclac/FLHelp/BoxCounting.htm. If buddhabrot is some sort mandelbrot you can skip some empty boxes. You can use a kd-tree like in packing lightmaps to subdivide the surface.