I have two questions related to ensemble learning for data streams.
If you have used MOA framework, could you please tell me how to change the base learner for a given algorithm. For instance, I want to change it for OnlineAdaBoost, OnlineAdaC2, onlineSMOTEBagging, and OnlineRUSBoost. The base learner for all of these is Adaptive Random Forest and I want it to be Hoeffding Tree. When I click EDIT to change the baseLearner, nothing happens.
The algorithms in Q1 use ARF as baselearner by default, whereas ARF uses Hoeffding Tree as base learner. Can we say that these algorithms indirectly use Hoeffding Tree as base learner? For a comparison, I must use Hoeffding Tree as base learner for these.
https://javadoc.io/static/nz.ac.waikato.cms.moa/moa/2020.12.0/moa/classifiers/meta/imbalanced/OnlineRUSBoost.html
https://javadoc.io/static/nz.ac.waikato.cms.moa/moa/2020.12.0/moa/classifiers/meta/imbalanced/OnlineSMOTEBagging.html
https://javadoc.io/static/nz.ac.waikato.cms.moa/moa/2020.12.0/moa/classifiers/meta/imbalanced/OnlineAdaC2.html
https://javadoc.io/static/nz.ac.waikato.cms.moa/moa/2020.12.0/moa/classifiers/meta/imbalanced/OnlineAdaBoost.html
If more information is required, I can provide it.
Tx.
If you are using MOA GUI, you can right click in configure textbox to change the command. It was hard to find this, but the base learner could be changed manually using this option.
Related
I have some doubts. I'm doing a BI for my company, and I needed to develop a data converter in ETL because the database to which it's connected (PostgreSQL) is bringing me some negative values within the time CSV. It doesn't make much sense to bring from the database (to which we don't have many accesses) negative data like this:
The solution I found so that I don't need to rely exclusively on dealing directly with the database would be to perform a conversion within the cloudconnect. I verified in my researches that the one that most contemplates would be the normalizer, but there are not many explanations available. Could you give me a hand? because I couldn't parameterize how I could convert this data from 00:00:-50 to 00:00:50 with the normalizer.
It might help you to review our CC documentation: https://help.gooddata.com/cloudconnect/manual/normalizer.html
However, I am not sure if normalizer would be able to process timestamps.
Normalizer is basically a generic transform component with a normalization template. You might as well use reformat component that is more universal.
However, what you are trying to do would require some very custom transform script, written in CTL (CloudConnect transformation language) or java.
You can find some templates and examples in the documentation: https://help.gooddata.com/cloudconnect/manual/ctl-templates-for-transformers.html
I'm trying to build an intelligent field mapper and would like to know whether Optaplanner is the correct fit. The requirement is as follows
I have a UI which accept source and target xml field schema.
The source and target schema contains multiple business fields which can be further classified to multiple business groups. There will be certain rules (which we can consider as constraints as per optaplanner) which needs to be considered during the field mapping. The objective of the tool is to find the field mapping ( Each source field needs to find its best fit target field)
Can optaplanner be used to solve this problem ? I'm confused whether this is a mathematical optimzation problem or a machine learning predictive model problem (For this to work, i need to work on building sufficient labelled mapping data )
Any help will be much appreciated.
From the top of my head, there are 3 potential situations to be in, going from easy to hard:
A) You can use a decision table to figure the mapping decision. Don't use OptaPlanner, use Drools.
B) Given two mapping proposals, you can score which one is better, through a formal scoring function. Use OptaPlanner.
C) You don't know your scoring function in detail yet. You have some vague ideas. You want to use training data of historical decisions to build one. Don't use OptaPlanner, use Machine Learning.
I am currently working with GraphDB to visualize some data that has a graph nature. I have imported the RDF data into graphDB and actually the graph is pretty nice. The only downside is that every single node is orange.
I was wondering then, if graphDB has some mechanism whereby the color of some nodes could be changed based upon a semantic relationship between them. For example:
<Berners_Lee> <created> <web> .
<Berners_Lee> <works_as_a> <teacher>
If I were to load this onto graphDB all nodes would appear by default in orange. Is there any way I can specify that nodes that are pointed by relationship created appear in blue?
I hope everything is clear. Any help would be much appreciated.
The colors are generated automatically and differentiate the types in one graph, which is their main purpose. Also we do not handle properly the case with multiple types for a node, but we have it in mind. The problem with your data is that all of the subject predicates and objects have no type (which makes them the same type). Here is a small example, based on your data which will produce the desired effect.
<Berners_Lee><created><www>;
<works_as_a><teacher>;
a <Person>.
<teacher> a <Occupation>.
You could use ft.dfs to get back feature definitions as input to ft.calculate_feature_matrix or you could just use ft.dfs to compute the feature matrix. Is there a recommended way of using ft.dfs and ft.calculate_feature_matrix for best practice?
If you're in a situation where you might use either, the answer is to use ft.dfs to create both features and a feature matrix. If you're starting with a blank slate, you'll want to be able to examine and use a feature matrix for data analysis and feature selection. For that purpose, you're better off doing both at once with ft.dfs.
There are times when calculate_feature_matrix is the tool to use as well, though you'll often be able to tell if you're in that situation. The main cases are:
You've loaded in features that were previously saved
You want to rebuild the same features on new data
Requirement:
I am trying to develop a language application using antlr4. The language in question is not important. The important thing is that the grammar is very vast (easily >2000 rules!!!). I want to do a number of operations
Extract bunch of informations. These can be call graphs, variable names. constant expressions etc.
Any number of transformations:
if a loop can be expanded, we go ahead and expand it
If we can eliminate dead code we might choose to do that
we might choose to rename all variable names to conform to some norms.
Each of these operations can be applied independent of each other. And after application of these steps I want the rewrite the input as close as possible to the original input.
e.g. So we might want to eliminate loops and rename the variable and then output the result in the original language format.
Questions:
I see a need to build a custom Tree (read AST) for this. So that I can modify the tree with each of the transformations. However when I want to generate the output, I lose the nice abilities of the TokenStreamRewriter. I have to specify how to write each of the nodes of the tree and I lose the original input formatting for the places I didn't do any transformations. Does antlr4 provide a good way to get around this problem?
Is AST the best way to go? Or do I build my own object representation? If so how do I create that object efficiently? Creating object representation is very big pain for such a vast language. But may be better in the long run. Again how do I get back the original formatting?
Is it possible to work just on the parse tree?
Are there similar language applications which do the same thing? If so what strategy do they use?
Any input is welcome.
Thanks in advance.
In general, what you want is called a Program Transformation System (PTS).
PTSs generally have parsers, build ASTs, can prettyprint the ASTs to recover compilable source text. More importantly, they have standard ways to navigate/inspect/modify the ASTs so that you can change them programmatically.
Many offer these capabilities in the form of pattern-matching code fragments written in the surface syntax of the language being transformed; this avoids the need to forever having to know excruciatingly fine details about which nodes are in your AST and how they are related to children. This is incredibly useful when you big complex grammars, as most of our modern (and our legacy languages) all seem to have.
More sophisticated PTSs (very few) provide additional facilities for teasing out the semantics of the source code. It is pretty hard to analyze/transform most code without knowing what scopes individual symbols belong to, or their type, and many other details such as data flow. Full disclosure: I build one of these.