Anylogic fluid batches, changing batch properties (size of batch, insert a batch) dynamically - dynamic

I have a long pipeline (few hundred kilometers). Batches of different types of fluid are injected into the pipeline, one after the other. So now, somehow I want to access the batches in the pipeline and (1) change the size of a particular batch, (2) insert a new batch between two batches and update its size (3) link a parameter for the batch (such as an ID) and lookup the properties of this batch (access the batch, search for the corresponding ID, then lookup the properties).
The problem is this: where the two fluid batches touches, they mix, creating a new batch, which is a mixture of the two fluids. Thus at the injection end (entry-side) I might have injected two fluids (fluid 1 & 2), say 50,000 cubic meter each. On the exit side, where the fluid arrives, there will be three products - fluid 1 of say 47,000 m3; then a new fluid - mixture, whose composition is 60% fluid 1, 40% fluid 2, of size 5,000 m3; then fluid 2 of size 48,000 m3.
So overall the mass balance is maintained, 100k went in, 100k went out, but there are three fluid-batches that come out, only two that went it, and by reading their "ID"s am able to determine the composition (for example "Fluid 3", which got inserted between "Fluid 1" and "Fluid 2" has composition 60% / 40%).
Thus somehow, at some point in time, I need to access the fluids in the pipe, insert a new batch, set the size and composition of this batch and update the size of the remaining batches.
Is there a way of doing this dynamically, or do you have to somewhere interrupt the process along the line, "capture" the content and re-inject?

It's not possible with the fluid library to modify anything in the pipe while it's on on the pipe
I think the only way to do this, is to actually change the batch to whatever you want before it enters the pipe.
This will not look good if you use different colors for different batches since you want ideally to actually see the mix happening, but might be your only way to achieve this.
Another way i see this working, is by connecting pipes together and customize the initial batch during the connection. You will need to make some java magic for this.
Those are not ideal compared to what you want to do, but I think they are your only options.

Related

2D Chunked Grid Fast Access

I am working on a program that stores some data in cells (small structs) and processes each one individually. The processing step accesses the 4 neighbors of the cell (2D). I also need them partitioned in chunks because the cells might be distributed randomly trough a very large surface, and having a large grid with mostly empty cells would be a waste. I also use the chunks for some other optimizations (skipping processing of chunks based on some conditions).
I currently have a hashmap of "chunk positions" to chunks (which are the actual fixed size grids). The position is calculated based on the chunk size (like Minecraft). The issue is that, when processing the cells in every chunk, I lose a lot of time doing a lookup to get the chunk of the neighbor. Most of the time, the neighbor is in the same chunk we are processing, so I did a check to prevent looking up a chunk if the neighbor is in the same chunk we are processing.
Is there a better solution to this?
This lacks some details, but hopefully you can employ a solution such as this:
Process the interior of a chunk (ie excluding the edges) separately. During this phase, the neighbours are for sure in the same chunk, so you can do this with zero chunk-lookups. The difference between this and doing a check to see whether a chunk-lookup is necessary, is that there is not even a check. The check is implicit in the loop bounds.
For edges, you can do a few chunk lookups and reuse the result across the edge.
This approach gets worse with smaller chunk sizes, or if you need access to neighbours further than 1 step away. It breaks down entirely in case of random access to cells. If you need to maintain a strict ordering for the processing of cells, this approach can still be used with minor modifications by rearranging it (there wouldn't be strict "process the interior" phase, but you would still have a nice inner loop with zero chunk-lookups).
Such techniques are common in general in cases where the boundary has different behaviour than the interior.

How can I recode 53k unique addresses (saved as objects) w/o One-Hot-Encoding in Pandas?

My data frame has 3.8 million rows and 20 or so features, many of which are categorical. After paring down the number of features, I can "dummy up" one critical column with 20 or so categories and my COLAB with (allegedly) TPU running won't crash.
But there's another column with about 53,000 unique values. Trying to "dummy up" this feature crashes my session. I can't ditch this column.
I've looked up target encoding, but the data set is very imbalanced and I'm concerned about target leakage. Is there a way around this?
EDIT: My target variable is a simple binary one.
Without knowing more details of the problem/feature, there's no obvious way to do this. This is the part of Data Science/Machine Learning that is an art, not a science. A couple ideas:
One hot encode everything, then use a dimensionality reduction algorithm to remove some of the columns (PCA, SVD, etc).
Only one hot encode some values (say limit it to 10 or 100 categories, rather than 53,000), then for the rest, use an "other" category.
If it's possible to construct an embedding for these variables (Not always possible), you can explore this.
Group/bin the values in the columns by some underlying feature. I.e. if the feature is something like days_since_X, bin it by 100 or something. Or if it's names of animals, group it by type instead (mammal, reptile, etc.)

NetLogo: how to read values from a data set, assigning values at each tick?

I'm modelling salmon population dynamics and I have a real data set about temperature and flow. I would like to assign a daily value of these two parameters during each tick, setting the first tick as the first day in the dataset and making it keep reading the file.
How can I do that?
Jacopo
NetLogo has fairly extensive IO capabilities for text files (and thus for CSV). You apparently have your data in a simple CSV file, so you will need to use these capabilities. For simple IO examples, see https://subversion.american.edu/aisaac/notes/netlogo-intro.xhtml#file-based-io There are also lots of examples of reading CSV files on the web (e.g., http://netlogoabm.blogspot.com/2014/01/reading-from-csv-file.html). Unfortunately, NetLogo does not provide a CSV reader.
You suggest you would like to repeatedly read from the file. You will then have to leave the file open for the entire simulation. Each tick you can read in one line from each open file.
Unless it is a very large dataset, I would rather read in all the data into two global lists (e.g., temparatures and flows) at the very beginning. Since you say you want to update the values each tick, use the current tick value to index into these lists. E.g., set temp item ticks temperatures. (Here I assume you only use tick to advance the tick counters, so that you get successive integers. Also if you tick before you start reading data, you'll need to use ticks - 1.)
hth

Mechanical Turk: how to avoid HITs from different batches to be collapsed

I have a problem with the distribution of HITs from different batches.
The situation is the following: I have 3 batches with 17 HITs each, and I prepared 3 different templates.
What I would like to do is that whenever a worker accepts my HITs, he is shown the 17 HITs connected to a template, and only those (template 1, batch 1).
Then, if he chooses to do another 17, he is shown the other 17 HITs (template 2, batch 2), etc.
What seems to happen is that they see more than 17 HITs, in a sequence (batch 1, part of batch 2): how can I prevent batches to be collapsed? I thought it would have been enough to publish different batches via different templates.
Many thanks in advance!
Gabriella
They'll be collapsed in the system if nothing differs between the HITType characteristics of the batch. So, in order to keep them separate, change one of those properties (e.g., the title, description, keywords, etc.). This will assign each batch a distinct HITTypeId and keep them separated in the system.

Using multiple threads for faster execution

Approximate program behavior:
I have a map image with data associated with the map indicated by RGB index. The data has been populated into an MS Access database. I imported the information in the database into my program as an array and sorted them to go in the order I want the program to run.
I want the program to find the nearest pixel that has a different color from the incumbent pixel being compared. (Colors are stored as string attributes of object Pixel)
First question: Should I use integers to represent my colors instead of string? Would this make the comparison function run significantly faster?
In order to find the nearest pixel of different color, the program begins with all 8 adjacent pixels around the incumbent. If a nonMatch is not found, it then continues onto the next "degree", and in this fashion, it spirals out from the incumbent pixel until it hits a nonMatch. When found, the color of the nonMatch is saved as an attribute of incumbent. After I find the nonMatch for each of the Pixels, the data is re-inserted into the database
The program accomplishes what I want in the manner I've written it, but it is very very slow. After 24 hours, I am only about 3% through with execution.
Question Two: Does my program behavior sound about right? Is this algorithm you would use if you had to accomplish this task?
Question Three: Would it be appropriate for me to use threads in order to finish execution of the program faster? How exactly does that work? (I am brand new to threads, but know a little of the syntax)
Question Four: Would it be more "intelligent" for my program to find the nonMatch for each pixel and insert it into the database immediately after finding it? (I'm making a guess that this would be good in multi-threading, because while one record is accessing the database (to insert), another record is accessing the array of pixels (shared global variable in program).
Question Five: If threading is a good idea, I'm guessing I would split the records up into more manageable chunks (i.e. quarters), and have each thread run the same functions for their specified number of records? Am I close at all?
Please let me know if I can clarify or provide code samples, I just figured that this is more of a conceptual topic so do not want to overburden the post.
1.) Yes, integers compare much faster than strings. Additionally the y use much less memory
2.) I would adapt the algorithm in this way:
E.g.: #1: Let's say, for pixel(87,23) you found the nearest nonMatch to be (88,24) at degree=1 - you can immediately invert the relation and record, that the nearest nonMatch to (88,24) is (87,23). On degree=1 you finished 2 pixels with 1 search.
E.g. #2: Let's say, for pixel (17,18) you found the nearest nonMatch to be (17,20) at degree=2. You can immediately record, that all pixels, that border on both (16,19), (17,19) and (18,19) have the nearest noMatch (17,20) at degree=1, and that one of them is the nearest noMatch to (17,20). On degree=2 (or higher), you finished 5 pixels with 1 search.
3.) Using threads is a two-sided sword: You can do searches in parallel, but you need locking if you write to your array. So this depends on how many CPU cores you can throw at the problem. If this is 3 or more, threads will surely speed up the search.
4.) The results from 2.) make it necessary to mark a pixel as "done" in your array, as you might have finished up to 5 pixels with 1 search. I recommend you put those into a queue and use a dedicated thread to write the queue back into the database: MS Access can't handle concurrent updates, so a single database writer thread looks like a good idea.
5.) I recommend you NOT chunk up the array: You will run into problems with pixels on the edges of a chunk having their nearest nonMatch in a different chunk. Instead if you use e.g. 4 Threads, let them run 1.) From NW corner E, then S 2.) From SE Corner W, then N 3.) From NE Corner S, then W 4. From SW Corner N, then E
Yes. Using a integer would make it much faster
You can reuse the work you have done for previous pixel. Eg. If (a,b) is the nearest non-equal pixel of (x,y), it is likely that points around (x,y) might also have (a,b) as the nearest non-equal pixel
You can use different threads to work on different pixels instead of dividing searching for one pixel
IMHO, steps 1&2 should make your program much faster and you might not need multi-threading.
Yes, I'd convert colour strings to Integers for speed, or even Color structures if you intend to display them on the screen.
Don't work directly with the database if you can avoid it. Copy the necessary data out of the database into an array before you start, and copy your results back when you're finished.