Iterating over multiple model/data pairs in AMPL - ampl

Is there a good way to iterate over model/dataset pairs in AMPL in a .run file?
Say you have two different models for the same optimization problems, and four datasets. What I would do up to now was to make a .run file for each model/dataset pair, and run each individually, or keep one script for each model and manually solve for each dataset by modifying the data (file); command. But this is obviously tedious and inviable for larger projects.
So is there any good way to do this? What I've tried is something like the following (just a skeleton for clarity):
# Some global options
for {model_ in {'1.mod', '2.mod'}} {
reset;
model (model_);
for {dat in {'1.dat', '2.dat', '3.dat', '4.dat'}} {
update data;
data (dat);
solve;
# Do some post-processing
}
}
But AMPL whines about using the for loop's dummy variable in the model and data command. I've tried declaring symbolic parameters to store the name of the model and data file, but no good either.
Of course this would only be sensible if the models were similar enough, insofar as they could be post-processed in the same way. But there should at least be a way to iterate through the data files within one model without having to go like
update data;
data 1.dat;
solve;
update data;
data 2.dat;
solve;
update data;
data 3.dat;
solve;
[...]
update data;
data 427598.dat;
solve;
right?

Your example code has a reset inside a loop. This will reset the loop parameter, which is going to cause problems. For instance, if you run this:
for {modelname in {"itertest1.mod","itertest2.mod"}}{
display (modelname);
for {dataname in {"itertest_a.dat","itertest_b.dat"}}{
display (dataname);
}
}
it will print out the file names as we might hope:
modelname = itertest1.mod
dataname = itertest_a.dat
dataname = itertest_b.dat
modelname = itertest2.mod
dataname = itertest_a.dat
dataname = itertest_b.dat
But if we add a reset statement:
for {modelname in {"itertest1.mod","itertest2.mod"}}{
reset; ### this is the only line I changed ###
display (modelname);
for {dataname in {"itertest_a.dat","itertest_b.dat"}}{
display (dataname);
}
}
then we get an error: modelname is not defined (because we just reset it).
However, even without that issue, we'll still get a complaint about using the loop variable.
Solution 1: commands
Depending on exactly what you ran, you may have seen an error message advising use of the commands statement, like so:
reset;
for {scriptname in {"script1.run", "script2.run"}}{
commands (scriptname);
}
This will then run whatever commands are in the listed .run files, and you should be able to nest that to call separate scripts to define the models and the data. Since you can't use a blanket reset without killing your loop parameter, you will need to use other options for updating the model files. AMPL offers the option to define multiple "problems" and switch between them, which may or may not be helpful here; I haven't explored it.
The commands statement is documented here, although TBH I find the documentation a bit difficult to follow.
Solution 2: code generation
If all else fails, you can write an iterated AMPL script that generates (and optionally runs) a second script containing all the model/data combinations you want to run, like so:
param outfile symbolic := "runme_temp.run";
for{i in 1..3}{
for{j in 1..4}{
printf "\nreset;" > (outfile);
printf "\nmodel model%s;",i > (outfile);
printf "\ndata data%s;",j > (outfile);
printf "\nsolve;" > (outfile);
# add post-processing here as desired
}
}
close (outfile);
include runme_temp.run;
This will create and then call "runme_temp.run" which looks like this:
reset;
model model1;
data data1;
solve;
reset;
model model1;
data data2;
solve;
etc. etc. Since this does allow you to use a blanket reset; it might simplify the process of cleaning up previous models. Not the most dignified solution, but it works and can be adapted to a wide range of things.
This could be improved by de-nesting the loops:
param outfile symbolic := "runme_temp.run";
for{i in 1..3, j in 1..4}{
printf "\nreset;" > (outfile);
printf "\nmodel model%s;",i > (outfile);
printf "\ndata data%s;",j > (outfile);
printf "\nsolve;" > (outfile);
# add post-processing here as desired
}
close (outfile);
include runme_temp.run;
In this particular example it doesn't make much difference, but multiply-nested loops can run slow in AMPL; using a single multi-index for can make a big difference to performance, so it might be better to get into that habit.

Related

Terraform, how to call 100 times module with seperate values

I have a problem with terraform configuration which I really don't know how to resolve. I have written module for policy assigments, module as parameters taking object with 5 attributes. The question is if it is possible to split into folder structure tfvars file. I mean eg.
I have main folder subscriptions -> folder_subscription_name -> some number of files tfvars with configuration for each of the policy assigment
Example of each of the file
testmap = {
var1 = "test1"
var2 = "test2"
var3 = "test3"
var4 = "test4"
var5 = "test5"
}
In the module I would like to iterate over all of the maps combine into list of maps. Is it good approach? How to achive that or maybe I should use something other to do it like terragrunt ?
Please give me some tips what is the best way to achive that, basically the goal is that I don't want to have one insanely big tvars file with list of 100 maps but splitted into 100 configuration file for each of the assignment.
Based on the comment on the original question:
The question is preety simple how to keep input variables for each of
resource in seperate file instead of keeping all of them in one very
big file
I would say you aim at having different .tfvars files. For example, you could have a dev.tfvars and a prod.tfvars file.
To plan your deployment, you can pass those file with:
terraform plan --var-file=dev.tfvars -out planoutput
To apply those changes
terraform apply planoutput

ImageJ batch processing - opening a series of images containing a specific name and doing stuff on them

I have 25K tif files (please don't ask why) that I want to organize into stacks on image J. Basically for each region of interest (ROI), there are 50 images which breaks down into 25 z-planes for two channels. I want everything in a single stack. And I'd like to batch process the whole folder without opening 50 images 500 times at a time. I've attached a picture of what the file names look like:
Folder organization
r01c01f01p01-ch1.tif - the first 10 characters are unique ID to each ROI, then plane number (p01) then channel - ch1 or ch2, then file extension
Here's what I have so far (which I cobbled together based on other macros so this may not make sense...).This is using the ImageJ macros language.
//Processing loop to process each file in the folder.
for (i=0; i<list.length; i++) {
showProgress(i+1, list.length);
if (endsWith(list[i], ".tif")) { // skip the subfolder (I create a subfolder earlier in the macros)
print("-- Processing file: " + list[i] + " --");
open(dir+list[i]);
imageTitle= getTitle();
newTitle = substring(imageTitle, 0, lengthOf(imageTitle)-10); // r01c01f01p, cutting off plane number and then the rest to just get the ROI ID
//This is where I'm stuck:
// find all files containing newTitle and open them (which would be 50 at a time), then run the following macros on them
run("Images to Stack", "name=Ch1 title=[] use");
run("Duplicate...", "title=Ch2 duplicate");
selectWindow("Ch1");
run("Slice Remover", "first=1 last=50 increment=2");
selectWindow("Ch2");
run("Slice Remover", "first=2 last=50 increment=2");
run("Merge Channels...", "c1=Ch1 c2=Ch2 create");
saveAs("tiff", dirNew + newTitle + "_Stack.tif");
//Close(All)?
}
}
print("-- Done --");
showStatus("Finished.");
setBatchMode(false); // Exit batch mode
run("Collect Garbage");
Thank you!
You could do something like:
for (plane=1; plane<51; plane++) {
open(newTitle+plane+"-ch1.tif");
open(newTitle+place+"-ch2.tif");
}
Which would take care of the opening. I would be inclined to have a loop prior to this which would collate the number of unique "newTitle"'s, as your current setup would end up doing something like opening the first item, assembling the combined TIF, and then repeat the process 25K times if I understand it correctly.
Given that you know the number of unique "r01c01f01p" values, in principle you could do a set of stacked loops akin to:
newTitleArray = newArray();
for (r=1; r<50; r++) {
titleBit = "r0" + toString(r);
for (c=1; c<501; c++) {
titleBit = titleBit + "f0"...
Alternatively, you could set up a loop where you check for unique "r01c01f01p" values and add them to an array. In any case, you'd replace the for "list" loop with the for "newTitleArray" loop, and then continue onto the opener I listed above, instead of your existing one.
If I am understanding correctly, it seems like you might do well to stack by channel first, then merge the two. I am not 100% sure, but I think you could potentially use a macro I have already created to do that. It was originally meant to batch process terabytes of 5D data, so it should be very comfortable handling your volume of images. It is not exactly what you are looking for, but should be super easy to modify (I went a little overboard with the commenting in the code), and I think the only thing it does that you might rather it not is produce max projects from the inputs. I'll throw a link here and look for your reply. If it's of interest, let me know and we can work to make it suit your needs together :-) Otherwise, if you could provide a little more detail about where you're getting stuck and/or where I may have misunderstood, I will do my very best to help!
https://github.com/evanjkiely/FIJIMacros

Many inputs to one output, access wildcards in input files

Apologies if this is a straightforward question, I couldn't find anything in the docs.
currently my workflow looks something like this. I'm taking a number of input files created as part of this workflow, and summarizing them.
Is there a way to avoid this manual regex step to parse the wildcards in the filenames?
I thought about an "expand" of cross_ids and config["chromosomes"], but unsure to guarantee conistent order.
rule report:
output:
table="output/mendel_errors.txt"
input:
files=expand("output/{chrom}/{cross}.in", chrom=config["chromosomes"], cross=cross_ids)
params:
req="h_vmem=4G",
run:
df = pd.DataFrame(index=range(len(input.files), columns=["stat", "chrom", "cross"])
for i, fn in enumerate(input.files):
# open fn / make calculations etc // stat =
# manual regex of filename to get chrom cross // chrom, cross =
df.loc[i] = stat, chrom, choss
This seems a bit awkward when this information must be in the environment somewhere.
(via Johannes Köster on the google group)
To answer your question:
Expand uses functools.product from the standard library. Hence, you could write
from functools import product
product(config["chromosomes"], cross_ids)

Odoo - Understanding recordsets in the new api

Please look at the following code block:
class hr_payslip:
...
#api.multi # same behavior with or without method decorators
def do_something(self):
print "self------------------------------->",self
for r in self:
print "r------------------------------->",r
As you can see I'm overriding a 'hr.payslip' model and I need to access some field inside this method. The problem is that it doesn't make sense to me what gets printed:
self-------------------------------> hr.payslip(hr.payslip(1,),)
r-------------------------------> hr.payslip(hr.payslip(1,),)
Why is it the same thing inside and outside of for loop. If it's always a 'recordset', how would one access one record's field.
My lack of understanding is probably connected to this question also:
Odoo - Cannot loop trough model records
Working on RecordSets always means working on RecordSets. When you loop over one RecordSet you will get RecordSets as looping variable. But you can only access fields directly when the length of a RecordSet is 1 or 0. You can test it fairly easy (more then one payslip in database!):
slips = self.env['hr.payslip'].search([])
# Exception because you cannot access the field directly on multi entries
print slips.id
# Working
print slips.ids
for slip in slips:
print slip.id

How to evenly distribute data in apache pig output files?

I've got a pig-latin script that takes in some xml, uses the XPath UDF to pull out some fields and then stores the resulting fields:
REGISTER udf-lib-1.0-SNAPSHOT.jar;
DEFINE XPath com.blah.udfs.XPath();
docs = LOAD '$input' USING com.blah.storage.XMLLoader('root') as (content:chararray);
results = FOREACH docs GENERATE XPath(content, 'root/id'), XPath(content, 'root/otherField'), content;
store results into '$output';
Note that we're using pig-0.12.0 on our cluster, so I ripped the XPath/XMLLoader classes out of pig-0.14.0 and put them in my own jar so that I could use them in 0.12.
This above script works fine and produces the data that I'm looking for. However, it generates over 1,900 partfiles with only a few mbs in each file. I learned about the default_parallel option, so I set that to 128 to try and get 128 partfiles. I ended up having to add a piece to force a reduce phase to achieve this. My script now looks like:
set default_parallel 128;
REGISTER udf-lib-1.0-SNAPSHOT.jar;
DEFINE XPath com.blah.udfs.XPath();
docs = LOAD '$input' USING com.blah.storage.XMLLoader('root') as (content:chararray);
results = FOREACH docs GENERATE XPath(content, 'root/id'), XPath(content, 'root/otherField'), content;
forced_reduce = FOREACH (GROUP results BY RANDOM()) GENERATE FLATTEN(results);
store forced_reduce into '$output';
Again, this produces the expected data. Also, I now get 128 part-files. My problem now is that the data is not evenly distributed among the part-files. Some have 8 gigs, others have 100 mb. I should have expected this when grouping them by RANDOM() :).
My question is what would be the preferred way to limit the number of part-files yet still have them evenly-sized? I'm new to pig/pig latin and assume I'm going about this in the completely wrong way.
p.s. the reason I care about the number of part-files is because I'd like to process the output with spark and our spark cluster seems to do a lot better with a smaller number of files.
I'm still looking for a way to do this directly from the pig script but for now my "solution" is to repartition the data within the spark process that works on the output of the pig script. I use the RDD.coalesce function to rebalance the data.
From the first code snippet, I am assuming it is map only job since you are not using any aggregates.
Instead of using reducers, set the property pig.maxCombinedSplitSize
REGISTER udf-lib-1.0-SNAPSHOT.jar;
DEFINE XPath com.blah.udfs.XPath();
docs = LOAD '$input' USING com.blah.storage.XMLLoader('root') as (content:chararray);
results = FOREACH docs GENERATE XPath(content, 'root/id'), XPath(content, 'root/otherField'), content;
store results into '$output';
exec;
set pig.maxCombinedSplitSize 1000000000; -- 1 GB(given size in bytes)
x = load '$output' using PigStorage();
store x into '$output2' using PigStorage();
pig.maxCombinedSplitSize - setting this property will make sure each mapper reads around 1 GB data and above code works as identity mapper job, which helps you write data in 1GB part file chunks.