hyperopt with manual data source - optimization

I would like to optimize my parameters with hyperopt.
As I understand, I can specify a range for each parameter. hyperopt selects the test values within this range and is learning from the return value. For example, minimizing the result.
The Problem is, I can not let hyperopt start the test program itself. I need to manual start the program with test parameters and collect the result value in each iteration. Maybe in a csv file as external data source for hyperopt, one line could contain testvalue1,testvalue2,result.
Is this manual saving and data import at hyperopt possible?
Can someone provide an example, this would be very helpful.

Related

Access snakemake expand-ed variables in a script

I often have a snakemake rule like the following:
rule aggregate:
input: expand("samples/{sample}/data.txt", sample=samples)
script:
"scripts/aggregate.py"
This gives aggregate.py the correct list of sample data files in snakemake.input, but it loses the association between samples and their files. I usually need the association sample -> sample file in aggregate.py and to get it in aggregate.py I either (A) recreate the list of files or (B) recreate the list of sample IDs in the same order as the files. Both are unsatisfying due to duplication of data and requiring that two places of code be kept in sync if either changes.
If like this example, there's only one variable being expanded, then adding it to params is OK, i.e. params: samples then zipping that together with inputs. But for more than one expanded variable, there is a big possible error where you give the variables in the different orders in the Snakefile and aggregate.py. That causes a silent error where all the data is mislabeled.
Is there a canonical or recommended way to handle this?
I would better rework the aggregate.py script, and call it from the shell section. This script should not know that it is being called from Snakemake, and get all relevant information from command line. Clean interface between the caller and the script is crucial, and would help you to rethink the task itself.

How to save variables from Uppaal created during the modeling process

I've created a model with Uppaal in which several integer variables change over the course of time. Now I would like to save the values of the variables during the modelling process somewhere (best in xml or a text file). In the Uppaal documentation (https://www.it.uu.se/research/group/darts/uppaal/documentation.shtml) I found the method in point 13 (How do I export and interpret the traces from Uppaal?) and tried the Java API way already, in the hope that it can output the variables as well as the traces. Unfortunately this method seems to be limited to traces. Does anyone know a method to save the variable values from Uppaal?
Hopeful greetings,
Josi
Solution from the comments.
to export the variable value tractory over time, one may use SMC query in the verifier.
For example:
Typeset the following query: simulate 1 [<=300] { Gate.len }
Click Check
Right-click on the query, and from the popup menu choose Simulations (1)
Observe a new window popup with a plot
Right-click on the plot and choose Export Comma Separated Values
Follow the save file dialog and observe the resulting file to contain time and value sequence.
Note that SMC assumes that all channels are broadcast and there are no deadlocks.

Use Pentaho Variable to Dynamically name EXCEL file

I am trying to dynamically name an excel file after processing it for archiving purposes.
If I process Logistics.xlsx I want to save it as U:\Archive\${varDP}.xlsx
Resulting file name U:\Archive\20190709.xlsx
I have tried Get system variable to get the date, This works fine. I have created the field (DateProcessed). However, I am unable to Set variables varDP to DateProcessed.
Thank you
You cannot set and use a variable in the same transformation. If you want to use a variable you should have a job with two transformations: first transformation gets the date and sets the variable; second transformation can then use the variable.
The main reason for that is that all steps initialise at the same time. Therefore, when the variable is read by the step that is using it, it's probably not set yet.
For these cases of Variables usage and passing parameters, i've been forwarding this previous answer, it has a link to another answer of mine where i go step by step of how to pass parameters to another Transformation without 'Set Variables', and in the linked answerm i have sent a downloadable example.

Is binary identical output possible with XlsxWriter?

With the same input is it possible to make the output binary identical using XlsxWriter?
I tried changing the created property to the same date and that helped a little. Still get a lot of differences in sharedStrings.xml.
Thanks
Yes for identical input, if you set the created date in the worksheet properties:
import xlsxwriter
import datetime
for filename in ('hello1.xlsx', 'hello2.xlsx'):
workbook = xlsxwriter.Workbook(filename)
workbook.set_properties({'created': datetime.date(2016, 4, 25)})
worksheet = workbook.add_worksheet()
worksheet.write('A1', 'Hello world')
workbook.close()
Then:
$ cmp hello1.xlsx hello2.xlsx
# No output. Files are the same.
The order in which strings are added to the file will change the layout of the sharedStrings table and thus lead to non-identical files. That is generally the case with Excel as well.
Note: This requires XlsxWriter version 1.0.4 or later to work.
Even though the author of the previous answer appears to have repudiated it, it appears to be correct, but not the whole story. I did my own tests on Python 3.7 and XlsxWriter 1.1.2. You won't notice the creation time issue if your files are small because they'll be written so fast their default creation times of "now()" will be the same.
What's missing from the first answer is that you need to make the same number of calls to the write_* methods. For example, if you call write followed by merge_range on the same cell for one of the workbooks, you need to have the same sequence of calls for the other. You can't skip the write call and just do merge_range, for instance. If you do this, the sharedStrings.xml files will have different values of count even if the value of uniqueCount is the same.
If you can arrange for these things to be true, then your two workbooks should come out as equal at the binary level.

Dynamically set input and output paths in pig using UDFs

I would like to create a no-arg pig script that dynamically creates input and output paths.
The script itself should determine a input file glob based on current date and similarly determine an output file path based on current date. While I know that one can easily pass in parameters I was hoping to have a no-arg script and use a couple of simple jython UDFs to compute these paths.
How do I do that? I can't seem to set variables by calling a UDF. For instance,
%default OUTPUTPATH myfn();
or
path = myfn();
don't seem to work.
Any ideas?
(Why no-args? Because I would like to have a single static amazon data pipeline config that runs the same script each day but under the hood would run the last day's or last week's worth of log files each time.)
Sadly, to my knowledge, there is no way to do this in pure pig. However, you can define these changing variables in a python wrapper. In your case, you'll just define the dict of args like:
d = {
'OUTPATH': myfn(),
}
And then pass that dict like:
P = Pig.compile(path_to_my_script)
Q = P.bind(d)
results = Q.run()
Of course there is a little more to add to the wrapper, but it should be pretty clear from the docs.