Mechanical Turk: how to avoid HITs from different batches to be collapsed - mechanicalturk

I have a problem with the distribution of HITs from different batches.
The situation is the following: I have 3 batches with 17 HITs each, and I prepared 3 different templates.
What I would like to do is that whenever a worker accepts my HITs, he is shown the 17 HITs connected to a template, and only those (template 1, batch 1).
Then, if he chooses to do another 17, he is shown the other 17 HITs (template 2, batch 2), etc.
What seems to happen is that they see more than 17 HITs, in a sequence (batch 1, part of batch 2): how can I prevent batches to be collapsed? I thought it would have been enough to publish different batches via different templates.
Many thanks in advance!
Gabriella

They'll be collapsed in the system if nothing differs between the HITType characteristics of the batch. So, in order to keep them separate, change one of those properties (e.g., the title, description, keywords, etc.). This will assign each batch a distinct HITTypeId and keep them separated in the system.

Related

Anylogic fluid batches, changing batch properties (size of batch, insert a batch) dynamically

I have a long pipeline (few hundred kilometers). Batches of different types of fluid are injected into the pipeline, one after the other. So now, somehow I want to access the batches in the pipeline and (1) change the size of a particular batch, (2) insert a new batch between two batches and update its size (3) link a parameter for the batch (such as an ID) and lookup the properties of this batch (access the batch, search for the corresponding ID, then lookup the properties).
The problem is this: where the two fluid batches touches, they mix, creating a new batch, which is a mixture of the two fluids. Thus at the injection end (entry-side) I might have injected two fluids (fluid 1 & 2), say 50,000 cubic meter each. On the exit side, where the fluid arrives, there will be three products - fluid 1 of say 47,000 m3; then a new fluid - mixture, whose composition is 60% fluid 1, 40% fluid 2, of size 5,000 m3; then fluid 2 of size 48,000 m3.
So overall the mass balance is maintained, 100k went in, 100k went out, but there are three fluid-batches that come out, only two that went it, and by reading their "ID"s am able to determine the composition (for example "Fluid 3", which got inserted between "Fluid 1" and "Fluid 2" has composition 60% / 40%).
Thus somehow, at some point in time, I need to access the fluids in the pipe, insert a new batch, set the size and composition of this batch and update the size of the remaining batches.
Is there a way of doing this dynamically, or do you have to somewhere interrupt the process along the line, "capture" the content and re-inject?
It's not possible with the fluid library to modify anything in the pipe while it's on on the pipe
I think the only way to do this, is to actually change the batch to whatever you want before it enters the pipe.
This will not look good if you use different colors for different batches since you want ideally to actually see the mix happening, but might be your only way to achieve this.
Another way i see this working, is by connecting pipes together and customize the initial batch during the connection. You will need to make some java magic for this.
Those are not ideal compared to what you want to do, but I think they are your only options.

How to trim a list/set in ScyllaDB to a specific size?

Is there a way to trim a list/set to a specific size (in terms of number of elements)?
Something similar to LTRIM command on Redis (https://redis.io/commands/ltrim).
The goal is to insert an element to a list/set but ensuring that its final size is always <= X (discarding old entries).
Example of what I would like to be able to do:
CREATE TABLE images (
name text PRIMARY KEY,
owner text,
tags set<text> // A set of text values
);
-- single command
UPDATE images SET tags = ltrim(tags + { 'gray', 'cuddly' }, 10) WHERE name = 'cat.jpg';
-- two commands (Redis style)
UPDATE images SET tags = tags + { 'gray', 'cuddly' } WHERE name = 'cat.jpg';
UPDATE images SET tags = ltrim(tags, 10) WHERE name = 'cat.jpg';
No, there is no such operation in Scylla (or in Cassandra).
The first reason is efficiency: As you may be aware, one reason why writes in Scylla are so efficient is that they do not do a read: Appending an element to a list just writes this single item to a sequential file (a so-called "sstable"). It does not need to read the existing list and check what elements it already has. The operation you propose would have needed to read the existing item before writing, slowing it down significantly.
The second reason is consistency: What happens if multiple operations like you propose are done in parallel, reaching different coordinators and replicas in different order? What happens if after earlier problems, one of the replicas is missing one of the values? There is no magic way to solve these problems, and the general solution that Scylla offers for concurrent Read-Modify-Write operations is LWT (Lightweight Transacations). You can emulate your ltrim operation using LWT but it will be significantly slower than ordinary writes. You will need to read the list to the client, modify it (append, ltrim, etc.) and then write it back with an LWT (with the extra condition that it still has its old value, or using an additional "version number" column).

Presenting only a subset of conditions

I have a simple experiment with 3 blocks. Instead of putting stimulus (words) into three different files, I set them in one file and want to use ‘selected rows’ to assign them to different tasks (blocks). Below is the flow.
[selected rows: np.random.choice(15, size = 5, replace = False)]
The problem is that after 5 trials (first block-level condition), all the words will be reshuffled so that the words appeared in block 1 may also appear in block 2/block 3.
Are there any solutions to achieve that if a word has been used in a block, then it will not appear again in the following blocks? Many thanks!
The issue is probably arising because you are effectively randomising multiple times: both with the selection of the subset of rows but also with the randomisation of the loops themselves. i.e. your loops should be set to be sequential rather than random, because you are handling the randomisation yourself by selecting a subset of rows.
Even if you do that, you now have a second problem: if you choose a subset of 5 rows for each block via np.random.choice(), those selections are independent and so it is quite likely that some variable number of rows will be selected multiple times. So what you need to do is ensure that rows are selected without replacement, across the entire experiment.
I'd suggest that you randomly shuffle all 15 row indices in a list, and then apply subsets of that list in each block. That way you can ensure that there will be no multiple selection of rows. e.g:
row_order = range(15)
np.random.shuffle(row_order)
Then in each of the three blocks, you would use these subsets:
row_order[0:5]
row_order[5:10]
row_order[10:15]
This gives you a randomised selection in each block but with no duplication of rows.

Does batching queue (tf.train.batch) not preserve order?

I've set up a filename-producing queue using tf.train.string_input_producer with the shuffle option set to False, coupled to a batching queue using tf.train.batch (i.e. non-shuffling). Looking at the list of examples being read, while the ordering is almost perfectly preserved, it is not strictly so. For example the first few sample are 4, 2, 1, 3, 5, 6, 7, 8, 9, 11, 10, ..., where the number corresponds to the position of the sample within the first file read. After that the ordering is almost prefect for several hundred samples, but it occasionally switches adjacent samples. Is this expected behavior? Is there some way to enforce that the ordering is preserved, so that one does not have to keep track of what file got read when, etc?
I should say that I conditionally discard some samples by enqueuing either 0 or 1 sample at a time, and setting enqueue_many to True in the batching queue. None of the samples above are being skipped however and so this shouldn't in principle be an issue.
As Yaroslav has mentioned in the comments, a single thread would do the trick. In addition to a single thread, you should set num_epochs = 1. If you don't, it will keep producing batches and it may seem like order is not preserved as it loops from the start again. I hope this works.
Having said that though, I hope someone can come up with a better answer to solving this!

Implementing Round Robin insertions to oracle rac db with the help of sequence

Problem
My system inserts records to oracle rac DB at a rate of 600tps. During the insertion-procedure-call each record is assigned a sequence, so that each record should get distributed among 20 different batch ids (implementation of a round robin mechanism).
Following is the step for selecting batch
1) A record comes. Assigns nextValue from a sequence.
2) Do MOD(sequence,20). It gives values from 0 to 19.
Issue:
3 records comes to DB simultaneously and hits 3 different nodes in RAC
Comes out with sequences 2,102,1002.
MOD for all happens to be same.
All try to get into the same batch.
Round Robin fails here.
Please help to resolve the issue.
This is due to the implementation of Sequences on RAC. When a node is first asked for the next value of a sequence it get a bunch of them (e.g. 100 to 119) and then hands them out until it needs a new lot, when it gets another bunch (160 - 179). While Node 1 is handing out 100 then 101, Node 2 will be handing out 121, 122 etc etc.
The size of the 'bunch' is controlled by as I remember the Cache size defined on a Sequence. If you set a cache size of 0, then you will get no caching, and the sequences will be handed out sequentially. However, doing that will involve the Nodes is management overhead while they work out what the next one actually is, and with 600tps this might not be a good idea: you'd have to try it and see,