Optaplanner dynamic state changes - optaplanner

I want to model a problem in which are multiple different machines and jobs consisting tasks/steps in some machines. the jobs also contain a amount of pieces which will be processed.
After a machine processed a specified amount of pieces it must be maintained (for this example its automated and needs always a specific time). But the maintenance will only trigger if the remaining count is 0 or less then 0 - so if the remaining pieces for machine 2 is 10 it can process a job task with 50 pieces. The remaining pieces after the task will be -40 and will trigger the maintenance. The order of the tasks within a job is predefined and can not be changed. Also between each job task there must be no (time) gap to the following task.
all times are in sec.
the job tasks are defined as tuple (machine, duration)
e.g.:
Machines:
m1:
remaining pieces: 25
maintenance time: 180
m2:
remaining pieces: 10
maintenance time: 100
m3:
remaining pieces: 55
maintenance time: 160
Jobs:
j1:
pieces: 10
tasks: [(m1, 200), (m3, 100)]
j2:
pieces: 25
tasks: [(m1, 100), (m2, 120), (m3, 100)]
j3:
pieces: 5
tasks: [(m2, 180), (m3, 100), (m1, 100)]
so the order in which the tasks will be processed can change a lot in the overall make span - which should me minimized.
Here a visualization with two possibilities:
so how can i archive this? i have the entity for the machine with the member remaining pieces which i can update via shadow variables but how can i "disable" the machine for the maintenance time?
Edit: i forgot that each machine also got a member which contains the value to set the remaining pieces after a maintenance. e.g. 600.
Edit 2: maybe i can change my question a bit: whats the best approach for such a problem? handle the states in separate arrays which contains the state for each second or just calculate the states in the constraints.

Related

How are reserved slots re-allocated between reservation/projects if idle slots are used?

The documentation on Introduction to Reservations: Idle Slots states that idle slots from reservations can be used by other reservations if required
By default, queries running in a reservation automatically use idle slots from other reservations. That means a job can always run as long as there's capacity. Idle capacity is immediately preemptible back to the original assigned reservation as needed, regardless of the priority of the query that needs the resources. This happens automatically in real time.
However, I'm wondering if this can have a negative effect on other reservations in a scenario where idle slots are used but are shortly after required by the "owning" reservation.
To be concrete I would like to understand if i can regard assigned slots as guarantee OR as a best effort.
Example:
Reserved slots: 100
Reservation A: 50 Slots
Reservation B: 50 Slots
"A" starts a query at 14:00:00 and the computation takes 300 seconds if 100 slots are used.
All slots are idle at the start of the query, thus all 100 slots are made available to A.
5 seconds later at 14:00:05 "B" starts a query that takes 30 seconds if 50 slots are used.
Note:
For the sake of simplicity let's assume that both queries have only excactly 1 stage and each computation unit ("job") in the stage takes the full time of the query. I.e. the stage is divided into 100 jobs and if a slot starts the computation it takes the full 300 seconds to finish successfully.
I'm fairly certain that on "multiple stages" or "shorter computation times" (e.g. if the computation can be broken down in 1000 jobs) GBQ would be smart enough to dynamically re-assign the freed up slot the reservation it belongs to.
Questions:
does "B" now have to wait until a slot in "A" finishes?
this would mean ~5 min waiting time
I'm not sure how "realistic" the 5 min are, but I feel this is an important variable since I wouldn't worry about a couple of seconds - but I would worry about a couple of minutes!
or might an already started computation of "A" also be killed mid-flight?
the docu Introduction to Reservations: Slot Scheduling seems to suggest something like this
The goal of the scheduler is to find a medium between being too aggressive with evicting running tasks (which results in wasting slot time) and being too lenient (which results in jobs with long running tasks getting a disproportionate share of the slot time).
Answer via Reddit
A stage may run for quite some time (minutes, even hours in really bad cases) but a stage is run by many workers. And most workers complete their work within a very short time, e.g. milliseconds or seconds. Hence rebalancing, I.e. reallocating slots from one job to another is very fast.
So if a rebalancing happens and a job loses a large part of slots, then it will run a lot slower. And the one that gains slots will run fast. And this change is quick.
So in the above example. As job B starts 5 seconds in, within a second or so it would have acquired most of its slots.
So bottom line:
a query is broken up into "a lot" of units of work
each unit of work finishes pretty fast
this give GBQ to opportunity to re-assign slots

Snakemake: Combine cluster profile with resources: attempt

Let's say I have a rule that for 90% of my data needs 1h, but occasionally needs 3h. In a busy cluster environment, I however do not want to submit all jobs with a time limit of 3h to be save, as this would slow down the scheduling of my jobs.
Hence, I played around with the attempt variable:
resources:
# Increase time limit in factors of 1h, if the job fails due to time limit.
time = lambda wildcards, input, threads, attempt: int(60 * int(attempt))
(one could be even smarter and use powers of 2 to amortize better...).
But this approach forces me to put the base time (1h) direclty into the rule. How can I combine this approach with cluster profiles, where the base time is in some cluster_config.yaml file?
Thanks and so long
Lucas

Routing solver: Unknown Fleet size/ Fleet size optimization/ Infinite fleet size

I'm trying to solve the CVRPTW with multiple-vehicle types (~42). In my case fleet size/ number of vehicles of each type is unknown. I'm expecting solver to solve for most optimal fleet size and their routes.
Initially I tried to model this problem by creating large number of vehicles for each type and their fixed costs. I expected solver will try to minimize fixed cost of vehicles and distance costs. But in that case solver is unable find an initial solution in significant duration. I think that is because large number of vehicles increases number of possible insertions to a significantly and hence solver fails to explore all possible insertions. i.e: If vehicle type A has 100 vehicles. During insertion phase the solver is trying to insert a job in all 100 vehicles. But since all the empty routes/vehicles are identical we should check insertion cost of all filled vehicles and 1 empty vehicle.
Jsprit library provides an option to define INFINITE fleet size. In this case solver assumes each vehicles type has infinite copy. I think during insertion phase solver tries to add a job in already created routes or 1 empty route of each vehicle type.
EDIT 1: I have kept all the jobs (~240) as optional. When I use 10 vehicles of each type (total 420 vehicles) solver returns a solutions with all jobs unassigned after 1 hour. When I reduce number of vehicles of each type to 1(total 42 vehicles), solver returns a feasible solution with 152 unassigned job within 60 seconds.
EDIT 2: When I tried to solve it with 1 vehicle type and 200 copies, solver returns a solutions with all jobs unassigned after 1 hour. When I reduce number of vehicles to 60, I get a solution with all jobs assigned within 2 minutes.

How to setup Arrivals Thread Group(Custom Thread Groups)

I am new to Jmeter I don't have any idea about it. I want to use a Jmeter plugin named as Custom Thread Group -> Arrivals Thread Group available at location https://jmeter-plugins.org/wiki/ArrivalsThreadGroup/ for arrival rate simulation. I searched a lot about these properties but didn't get clear definition or understanding. I have a vague idea about its configuration properties. I wrote the details I know about all these properties as a code comment
Target Rate(arrivals/min): 60
Ramp Up Time(min): 1 // how long to take to "ramp-up" to the full number of threads
Ramp-Up Steps Count: 10 // It divides Ramp-up time into specified parts and ramp-up threads accordingly
Hold Target Rate Time(min): 2// It will repeat the same for the next two minutes
Thread Iterations Limit:
Can anybody help me to understand clearly what is the significance of all these properties?
According to above settings:
Target Rate: 60 arrivals in a minute means there will be one arrival per second. Each second JMeter will kick off a virtual user which will be executing samplers.
Ramp-up time: the time which will be taken to reach the target rate, i.e. JMeter starts from zero arrivals per minute and increases the arrivals rate to 60 arrivals per minute in 60 seconds.
Ramp-up steps: here you can set the “granularity” of increasing arrivals rate, more steps - more smooth pattern, fewer steps - you will have “spikes”
Hold Target Rate: it will keep the threads in steady state for the duration specified. In your case, it will keep a number of threads 60 for the end of the run. As explained in above comment.
So according to settings, JMeter will ramp-up from 0 to 1 arrival per second in one minute plus run the test for 2 minutes.
If I have 1 sampler in Test Plan it will be something like 153 executions, if I have 2 samplers - 153 executions per sampler, in total 306 executions. Approximate request rate will be 50 requests/minute.

Cloud DataFlow performance - are our times to be expected?

Looking for some advice on how best to architect/design and build our pipeline.
After some initial testing, we're not getting the results that we were expecting. Maybe we're just doing something stupid, or our expectations are too high.
Our data/workflow:
Google DFP writes our adserver logs (CSV compressed) directly to GCS (hourly).
A day's worth of these logs has in the region of 30-70 million records, and about 1.5-2 billion for the month.
Perform transformation on 2 of the fields, and write the row to BigQuery.
The transformation involves performing 3 REGEX operations (due to increase to 50 operations) on 2 of the fields, which produces new fields/columns.
What we've got running so far:
Built a pipeline that reads the files from GCS for a day (31.3m), and uses a ParDo to perform the transformation (we thought we'd start with just a day, but our requirements are to process months & years too).
DoFn input is a String, and its output is a BigQuery TableRow.
The pipeline is executed in the cloud with instance type "n1-standard-1" (1vCPU), as we think 1 vCPU per worker is adequate given that the transformation is not overly complex, nor CPU intensive i.e. just a mapping of Strings to Strings.
We've run the job using a few different worker configurations to see how it performs:
5 workers (5 vCPUs) took ~17 mins
5 workers (10 vCPUs) took ~16 mins (in this run we bumped up the instance to "n1-standard-2" to get double the cores to see if it improved performance)
50 min and 100 max workers with autoscale set to "BASIC" (50-100 vCPUs) took ~13 mins
100 min and 150 max workers with autoscale set to "BASIC" (100-150 vCPUs) took ~14 mins
Would those times be in line with what you would expect for our use case and pipeline?
You can also write the output to files and then load it into BigQuery using command line/console. You'd probably save some dollars of instance's uptime. This is what I've been doing after running into issues with Dataflow/BigQuery interface. Also from my experience there is some overhead bringing instances up and tearing them down (could be 3-5 minutes). Do you include this time in your measurements as well?
BigQuery has a write limit of 100,000 rows per second per table OR 6M/per minute. At 31M rows of input that would take ~ 5 minutes of just flat out writes. When you add back the discrete processing time per element & then the synchronization time (read from GCS->dispatch->...) of the graph this looks about right.
We are working on a table sharding model so you can write across a set of tables and then use table wildcards within BigQuery to aggregate across the tables (common model for typical BigQuery streaming use case). I know the BigQuery folks are also looking at increased table streaming limits, but nothing official to share.
Net-net increasing instances is not going to get you much more throughput right now.
Another approach - in the mean time while we work on improving the BigQuery sync - would be to shard your reads using pattern matching via TextIO and then run X separate pipelines targeting X number of tables. Might be a fun experiment. :-)
Make sense?