How can I use training in a project for selecting objects in images? - training-data

Training is a pool of free tasks that performers can complete as practice. Training tasks include correct responses and hints. I created some tasks related to multiple-choice but now I have an issue for using training for selecting objects in images. Is it possible to create?

You can't use a training pool here, because in order for the assignment to be accepted by the system as correct, the object selected by the user must exactly match the control object. This is almost impossible. To select performers, create a regular pool with non-automatic acceptance. Review assignments manually and assign a skill based on the percentage of accepted responses. To only allow skilled performers to label the main pool, set a filter by skill in the pool to "my skill" >= 70. https://yandex.com/support/toloka-requester/concepts/reviewing-assignments.html

Related

Client participation in the federated computation rounds

I am building a federated learning model using Tensorflow Federated.
Based on what I have read in the tutorials and papers, I understood that the state-of-the-art method (FedAvg) is working by selecting a random subset of clients at each round.
My concern is:
I am having a small number of clients. Totally I have 8 clients, I select 6 clients for training and I kept 2 for testing.
All of the data are provided on my local device, so I am using the TFF as the simulation environment.
If I use all of the 6 clients in all of the rounds during federated communication rounds, would this be a wrong execution of the FedAvg method?
Note that I am planning also to use the same experiment used in this paper. That aims to use different server optimization methods and compare their performance. So, would (all clients participating procedure) works here or not?
Thanks in advance
This is certainly a valid application of FedAvg and the variants proposed in the linked paper, though one that is only studied empirically in a subset of the literature. On the other hand, many theoretical analyses of FedAvg assume a similar situation to the one you're describing; at the bottom of page 4 of that linked paper, you will see that the analysis is performed in this so-called 'full participation' regime, where every client participates on every round.
Often the setting you describe is called 'cross silo'; see, e.g., section 7.5 of Advances and Open Problems in Federated Learning, which will also contain many useful pointers for the cross-silo literature.
Finally, depending on the application, consider that it may be more natural to literally train on all clients, reserving portions of each clients' data for validation and test. Questions around natural partitions of data to model the 'setting we care about' are often thorny in the federated setting.

How to design the models in OptaPlanner in my case

I have been started to learn Opataplanner for sometime, I try to figure out a model design for my use case to progress the solution calculation, here is my case in real world in manufactory's production line:
There is a working order involved list of sequential processes
Each kind of machine can handle fixed types of processes.(assume machine quantity are enough)
The involved team has a number of available employees, each employee has the skills for set of processes with their different own working cost time
production line has fixed number of stations available
each station put one machine/employee or leave empty
question: how to design the model to calculate the maximum output of completion product in one day.
confusion: in the case, the single station will have one employee and one machined populated, and specified dynamic processed to be working on. but the input of factors are referred by each other and dynamic: employee => processes skill , process skill => machines
can please help to guide how to design the models?
Maybe some of the examples are close to your requirements. See their docs here. Specifically the task assignment, cheap time scheduling or project job scheduling examples.
Otherwise, follow the domain modeling guidelines.

How to write a custom policy in tf_agents

I wanted to use the contextual bandit agents (LinearThompson Sampling agent) in the tf_Agents.
I am using a custom environment and my rewards are delayed by 3 days. Hence for training, the observations are generated from the saved historical tables (predictions generated 3 days ago) and their corresponding rewards (Also in the table).
Given this, only during training, how do I make the policy to output an action, for a given observation, from the historical tables? And during evaluation I want the policy to behave the usual way, generating the actions using the policy it learned from.
Looks like I need to write a custom policy, that behaves in a way during training and behaves it's usual self (linearthompsonsampling.policy) during evaluation. Unfortunately I couldn't find any examples or documentation for this usecase. Can someone please explain how to code this - an example would be very useful

How to encode inputs like artist or actor

I am currently developing a neural network that tries to make a suggestion for a specific user based on his recent activities. I will try to illustrate my problem with an example.
Now, let's say im trying to suggest new music to a user based on the music he recently listened to. Since people often listen to artists they know, one input of such a neural network might be the artists he recently listened to.
The problem is the encoding of this feature. As the id of the artist in the database has no meaning for the neural network, the only other option that comes to my mind would be one-hot encoding every artist, but that doesn't sound to promising either regarding the thousands of different artists out there.
My question is: How can i encode such a feature?
The approach you describe is called content-based filtering. The intuition is to recommend items to customer A similar to previous items liked by A. An advantage to this approach is that you only need data about one user, which tends to result in a "personalized" approach for recommendation. But some disadvantages include the construction of features (the problem you're dealing with now), the difficulty to build an interesting profile for new users, plus it will also never recommend items outside a user's content profile. As for the difficulty of representation, features are usually handcrafted and abstracted afterwards. For music specifically, features would be things like 'artist', 'genre', etc. and abstraction for informative keywords (if necessary) is widely done using tf-idf.
This may go outside the scope of the question, but I think it is also worth mentioning an alternative approach to this: collaborative filtering. Rather than similar items, here we instead try to find users with similar tastes and recommend products that they liked. The only data you need here are some sort of user ratings or values of how much they (dis)liked some data - eliminating the need for feature design. Furthermore, since we analyze similar persons rather than items for recommendation, this approach tends to also work well for new users. The general flow for collaborative filtering looks like:
Measure similarity between user of interest and all other users
(optional) Select a smaller subset consisting of most similar users
Predict ratings as a weighted combination of "nearest neighbors"
Return the highest rated items
A popular approach for the similarity weighting in the algorithm is based on the Pearson correlation coefficient.
Finally, something to consider here is the need for performance/scalability: calculating pairwise similarities for millions of users is not really light-weight on a normal computer.

Mechanical Turk: how to allow each turker to do just one hit while allow more than one turker to do the same hit?

Hi I am new to Mechanical Turk. I have 10k images and I want to ask turkers to write down a short summary for each image in Mechanical Turk. Since all images in my image set are similar, when a Turker does the similar summarization task more than 10 times, he'll find out some tricks in this task and write down similar summary of the following images.
To increase the diversity and randomness, I want to ask as many different people to do the task as possible. The perfect strategy is that one unique turker is only allowed to label just one image (or less than 10 images), while one image can be summarized by more than one turker. My experiment aims at collect different textual summarization from different people which covers a rich vocabulary set.
If I understand you correctly, you have unique images in total to label. It sounds like you're looking to have each task (HIT) request that a Worker label 10 unique images. This will result in 1K HITs with 10 images per HIT. If you'd like one just one unique Worker to label each image, then you'll set the Number of Assignments Requested to 1. If you'd like for multiple Workers to work on the same image (say, to ensure quality, or just to broaden the number and type of labels you might get) then you'll set Number of Assignments Requested to the # of Unique Workers you'd like to work on each task (HIT).
If I've misunderstood what you're looking to do, just clarify and I'll be happy to revise my answer.
You can learn more about these concepts and more here:
http://docs.aws.amazon.com/AWSMechTurk/latest/RequesterUI/mechanical-turk-concepts.html
Good luck!
Yes, this is possible. According to documentation:
http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMechanicalTurkRequester/amt-dg.pdf (page 4)
You can specify the maximum number of assignments that any Worker can accept for your HITs.You
can set two types of limits:
The maximum number of assignments any Worker can accept for a specific HIT type you've created
The maximum number of assignments any Worker can accept for all your HITs that don't otherwise
have a HIT-type-specific limit already assigned