Encog One Of - decode - encog

I am using Encog in one of the project and i got stuck while deocding One-Of class.
One of the field's Normalization Action is NormalizationAction.OneOf which have three output. When i evaluate, i want to decode the predicted value. how to decode...?
var eq = new Encog.MathUtil.Equilateral(classCount, normalizationHigh, normalizationLow);
var predictedClassInt = eq.Decode(output);
The above code is for Equilateral. How can i do the same for One-Of.
Thanks,
Kans

Here is sample code (in C#) for decoding one-of-n encoded classes.
var outputIndex = EngineArray.MaxIndex(output);
var classOutput = analyst.Script.Normalize.NormalizedFields[index].Classes[outputIndex].Name;
Means,you get the output array using Network.Compute() first.Then you try to find out, which element in the output array has the maximum value (The Winner). Then you can use that index and the analyst information to get the class name.
So you can use your analyst class. If you have persisted you analyst file, then you can load it to memory using
var analyst = new EncogAnalyst();
analyst.Load(AnalystFilePath.ToString());

Related

How can I access value in sequence type?

There are the following attributes in client_output
weights_delta = attr.ib()
client_weight = attr.ib()
model_output = attr.ib()
client_loss = attr.ib()
After that, I made the client_output in the form of a sequence through
a = tff.federated_collect(client_output) and round_model_delta = tff.federated_map(selecting_fn,a)in here . and I declared
`
#tff.tf_computation() # append
def selecting_fn(a):
#TODO
return round_model_delta
in here. In the process of averaging on the server, I want to average the weights_delta by selecting some of the clients with a small loss value. So I try to access it via a.weights_delta but it doesn't work.
The tff.federated_collect returns a tff.SequenceType placed at tff.SERVER which you can manipulate the same way as for example client dataset is usually handled in a method decorated by tff.tf_computation.
Note that you have to use the tff.federated_collect operator in the scope of a tff.federated_computation. What you probably want to do[*] is pass it into a tff.tf_computation, using the tff.federated_map operator. Once inside the tff.tf_computation, you can think of it as a tf.data.Dataset object and everything in the tf.data module is available.
[*] I am guessing. More detailed explanation of what you would like to achieve would be helpful.

Retrieving Values from Model Object

Im using following code to train my model
trip_model = sm.OLS(x_dependent, y_variables).fit()
and print summary as
trip_model.summary()
I just want to take only the following values out of Summary
F-statistic , coef
how to get it?
The value returned by the fit function is a RegressionResults structure. You can check the documentation to see how to access each particular value:
f_statistic = trip_model.fvalue
coef = trip_model.params

How to evenly distribute data in apache pig output files?

I've got a pig-latin script that takes in some xml, uses the XPath UDF to pull out some fields and then stores the resulting fields:
REGISTER udf-lib-1.0-SNAPSHOT.jar;
DEFINE XPath com.blah.udfs.XPath();
docs = LOAD '$input' USING com.blah.storage.XMLLoader('root') as (content:chararray);
results = FOREACH docs GENERATE XPath(content, 'root/id'), XPath(content, 'root/otherField'), content;
store results into '$output';
Note that we're using pig-0.12.0 on our cluster, so I ripped the XPath/XMLLoader classes out of pig-0.14.0 and put them in my own jar so that I could use them in 0.12.
This above script works fine and produces the data that I'm looking for. However, it generates over 1,900 partfiles with only a few mbs in each file. I learned about the default_parallel option, so I set that to 128 to try and get 128 partfiles. I ended up having to add a piece to force a reduce phase to achieve this. My script now looks like:
set default_parallel 128;
REGISTER udf-lib-1.0-SNAPSHOT.jar;
DEFINE XPath com.blah.udfs.XPath();
docs = LOAD '$input' USING com.blah.storage.XMLLoader('root') as (content:chararray);
results = FOREACH docs GENERATE XPath(content, 'root/id'), XPath(content, 'root/otherField'), content;
forced_reduce = FOREACH (GROUP results BY RANDOM()) GENERATE FLATTEN(results);
store forced_reduce into '$output';
Again, this produces the expected data. Also, I now get 128 part-files. My problem now is that the data is not evenly distributed among the part-files. Some have 8 gigs, others have 100 mb. I should have expected this when grouping them by RANDOM() :).
My question is what would be the preferred way to limit the number of part-files yet still have them evenly-sized? I'm new to pig/pig latin and assume I'm going about this in the completely wrong way.
p.s. the reason I care about the number of part-files is because I'd like to process the output with spark and our spark cluster seems to do a lot better with a smaller number of files.
I'm still looking for a way to do this directly from the pig script but for now my "solution" is to repartition the data within the spark process that works on the output of the pig script. I use the RDD.coalesce function to rebalance the data.
From the first code snippet, I am assuming it is map only job since you are not using any aggregates.
Instead of using reducers, set the property pig.maxCombinedSplitSize
REGISTER udf-lib-1.0-SNAPSHOT.jar;
DEFINE XPath com.blah.udfs.XPath();
docs = LOAD '$input' USING com.blah.storage.XMLLoader('root') as (content:chararray);
results = FOREACH docs GENERATE XPath(content, 'root/id'), XPath(content, 'root/otherField'), content;
store results into '$output';
exec;
set pig.maxCombinedSplitSize 1000000000; -- 1 GB(given size in bytes)
x = load '$output' using PigStorage();
store x into '$output2' using PigStorage();
pig.maxCombinedSplitSize - setting this property will make sure each mapper reads around 1 GB data and above code works as identity mapper job, which helps you write data in 1GB part file chunks.

Pull info from datapool, increment a value and store the datapool

The application I test has some areas where it requires unique data. Specifically, the application will generate a request number that can only be used once. After my test runs I must manually update my datapool reference for this number. Is there any way using java, that I can get the information stored in my datapool, increase the value by one, and then save the data back to the datapool. This way I can keep rft in sync with my application in regard to this number.
Here is an example how to read a value from the datapool, increment it by 1, and save it back to the datapool. It is an adapted example from the book Software Test Engineering with IBM Rational Functional Tester. The original source code is from chapter 5 (and can be downloaded from the book's homepage).
// some imports
import org.eclipse.hyades.edit.datapool.IDatapoolCell;
import org.eclipse.hyades.edit.datapool.IDatapoolEquivalenceClass;
import org.eclipse.hyades.execution.runtime.datapool.IDatapool;
import org.eclipse.hyades.execution.runtime.datapool.IDatapoolRecord;
int value = dpInt("value");
value++;
java.io.File dpFile = new java.io.File((String) getOption(IOptionName.DATASTORE), "SomeDatapool.rftdp");
IDatapool dp = dpFactory().load(dpFile, true);
IDatapoolEquivalenceClass equivalenceClass = (IDatapoolEquivalenceClass) dp.getEquivalenceClass(dp
.getDefaultEquivalenceClassIndex());
IDatapoolRecord record = equivalenceClass.getRecord(0);
IDatapoolCell cell = (IDatapoolCell) record.getCell(0);
cell.setCellValue(value);
DatapoolFactory factory = DatapoolFactory.get();
factory.save((org.eclipse.hyades.edit.datapool.IDatapool) dp);
I think it is quite a lot of code to simply change one value—maybe it is easier to use some other method like writing the value to a normal text file.

D3 Graph Example Using In Memory Object

This seems like it should be simple, but I have spent literally hours without any success.
Take the D3 graph example at http://bl.ocks.org/mbostock/950642. The example uses a local file called graph.json. I have set up a Rails app to serve a similar graph, however I don't want to write a file of the JSON. Rather, I generate the nodes and links into an object such as:
{"nodes":[{"node_type":"Person","name":"Damien","id":"damien_person"}, {"node_type":"Person","name":"Grant","id":"grant_person"}}],
"links":[{"source":"damien_person","target":"grant_person","label":"Friends"}}
Now when I render the D3, I need to update the call d3.json("graph.json", function(json) {...}); to reference my in-memory object rather than the local file (or url). However, everything I've tried breaks my html/javascript. For example I tried setting the var dataset = <%= raw(#myInMemoryObject) %>;, and that works for assignment (I did an alert on the dataset), however I can't get the D3 code to use it.
How can I replace the d3.json call in order to use my in-memory object?
Thank you,
Damien
Your idea of using, for example, var dataset = <%= raw(#myInMemoryObject) %>; is the right way to go but you need to prep your object to be in the right format.
The nodes specified in the links need to either be numeric references to nodes in the nodes array eg. 0 for first, 1 for second
var json ={
"nodes":[{"name":"Damien","id":"a"}, {"name":"Bob","id":"b"}],
"links":[{"source":0, "target":1,"value":1}]
}
or links to the actual objects which make the nodes themselves:
var a = {"name":"Damien","id":"a"};
var b = {"name":"Bob","id":"b"}
var json ={
"nodes":[a,b],
"links":[{"source":a,"target":b,"value":1}]
};
Relevant discussion is here: https://groups.google.com/forum/?fromgroups=#!topic/d3-js/LWuhBeEipz4
Example here: http://jsfiddle.net/5A9eV/1/