customizing Stanford-NER for detecting Actor, Director, production company names in movie reviews - text-mining

I want to train my model to detect the names of actor, director, production company from movie reviews. I have tokenized and tagged my dataset now, I have to set feature properties. I searched a lot about FeatureFactoryNER but I didn't find any reasonable way to completly understand this class.
My question is that how we can define our own features in FeatureFactoryNER in order to train the model in right way? Please help in understanding FeatureFactory, properties and defining customize features?

Related

Finding out the people and book capacity for a new library using analysis

A friend was asked this question which I would like to know what would be an ideal response for this question.
If you were asked to do analysis about how many people and book capacity should a new public library have in the area you live in, and what should be the location for such a library, what are the inputs you would need to perform the analysis, what are the key factors that would need to be considered? How would you test whether your analysis was accurate?
The analysis which I did were as follows: For the book and people capacity as well as for the location:
1 For the location, questions such as: Where are the cities most populated educational institutes located? Where do most college students go after their college/tuitions? Which locations are entertainment hotspots that attract people? For example, Malls, Shopping centers, etc.? Which location is most connected to other parts of the city via different public transport routes like bus, train, and metro?
I think so answering this, would give an idea about the location for where the library should be established.
Coming to the people and book capacity, Well, including the students that are present in the given area, can one estimate an average of the student population for a reference number? (Though I think it doesn't qualify as a correct method?) Can also take an example of an existing library somewhere else and find out how many people do visit it and get a rough estimated number? (This can also be done for getting the book capacity) Finally, I think we can also get a number on how many working professionals, residents and students inhabit the said location and estimate a range on how many people will visit the library?
For the book capacity, A number of books would be educational books depending upon the educational backgrounds of the majority of students, which could be estimated by the number of residing students present there. Besides educational books, the number of fictional and non fictional books could be the average amount of best sellers present in a given month. And lastly, again we can compare with existing libraries which are present in the given vicinity.
For testing our hypothesis, the only way I'm getting is by conducting surveys or by asking people first hand in a QnA interview format near college areas. Any other suggestions?

How can I train a Tensorflow model based on partial objects? (book cover + book edge)

I would be interested in being able to detect books by pointing a phone camera towards their cover. I found several Tensorflow models matching that requirement. However, in my case, I also sometimes know what the edge of the book associated with that cover looks like, which can be a benefit in the case where 2 books have the same cover but a different edge.
How can I benefit from this additional information to detect book covers in a more precise way, especially when the camera's field of view includes part of the cover and part of the book edge?

Speech Synthesis based on a multi-person corpus

As a part of a project, we want to do experiments with synthetic voices where these do not have a singular geographic origin, body, age or gender. We have our own data-set, but I thought of during initial experiments with VCTK and build a voice using Tacotron2 or something similar. Does anyone know if a similar project has been done? Where the physical body that we imagine connected to a voice is intentionally ambiguous. Or other projects where TTS has be trained on a multi-person corpus? Additionally, does anyone know of any caveats or potential problems in terms of this approach? Maybe there could be ways of working with transfer-learning that could be beneficial.
Thanks!
You can check https://github.com/r9y9/deepvoice3_pytorch
Multispeaker samples are available as well as pretrained model you can try.

How to encode inputs like artist or actor

I am currently developing a neural network that tries to make a suggestion for a specific user based on his recent activities. I will try to illustrate my problem with an example.
Now, let's say im trying to suggest new music to a user based on the music he recently listened to. Since people often listen to artists they know, one input of such a neural network might be the artists he recently listened to.
The problem is the encoding of this feature. As the id of the artist in the database has no meaning for the neural network, the only other option that comes to my mind would be one-hot encoding every artist, but that doesn't sound to promising either regarding the thousands of different artists out there.
My question is: How can i encode such a feature?
The approach you describe is called content-based filtering. The intuition is to recommend items to customer A similar to previous items liked by A. An advantage to this approach is that you only need data about one user, which tends to result in a "personalized" approach for recommendation. But some disadvantages include the construction of features (the problem you're dealing with now), the difficulty to build an interesting profile for new users, plus it will also never recommend items outside a user's content profile. As for the difficulty of representation, features are usually handcrafted and abstracted afterwards. For music specifically, features would be things like 'artist', 'genre', etc. and abstraction for informative keywords (if necessary) is widely done using tf-idf.
This may go outside the scope of the question, but I think it is also worth mentioning an alternative approach to this: collaborative filtering. Rather than similar items, here we instead try to find users with similar tastes and recommend products that they liked. The only data you need here are some sort of user ratings or values of how much they (dis)liked some data - eliminating the need for feature design. Furthermore, since we analyze similar persons rather than items for recommendation, this approach tends to also work well for new users. The general flow for collaborative filtering looks like:
Measure similarity between user of interest and all other users
(optional) Select a smaller subset consisting of most similar users
Predict ratings as a weighted combination of "nearest neighbors"
Return the highest rated items
A popular approach for the similarity weighting in the algorithm is based on the Pearson correlation coefficient.
Finally, something to consider here is the need for performance/scalability: calculating pairwise similarities for millions of users is not really light-weight on a normal computer.

Audio analysis for voice, gender diarization/recognition

Does anyone know a library, program, project, etc. that tries to determine how many speakers were active in an audio file, label each speaker, label its gender, etc.?
So far I found the following:
Identifying segments when a person is speaking?
Audio analysis to detect human voice, gender, age and emotion — any prior open-source work done?
The task of identifying how many people are there and assigning segments to speakers in an audio file is known as speaker diarization. Using this keyword for search you can find lots of research papers and some libraries in python. Most of the current research use deep learning models, typically RNN, to generate embeddings and then cluster them into different chunks, ideally which belong to different speakers. It is a difficult task, especially if your files are noisy. I didn't find any library/tool which was very accurate. Even IBM's API is not that accurate.
We have developed some Deep learning models on our own for this task which are exposed through API's. You can take a look at https://developers.deepaffects.com/ for more info. We also have gender and emotion recognition API's.
Disclosure - I work at deepaffects