RapidMiner - range by occurence - frequency

I want to filter my results (document occurrence) by the most frequent (the 10 most popular them). How do I do that?

One way would be to use a sort operator followed by a filter. For example, sorting and filtering the sample Iris dataset by a1:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="417" width="675">
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="sort" compatibility="5.2.008" expanded="true" height="76" name="Sort" width="90" x="179" y="30">
<parameter key="attribute_name" value="a1"/>
<parameter key="sorting_direction" value="decreasing"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="5.2.008" expanded="true" height="76" name="Filter Example Range" width="90" x="313" y="30">
<parameter key="first_example" value="1"/>
<parameter key="last_example" value="10"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Sort" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

Related

TableAdapter: Number of query values and destination fields are not the same

I have been scratching my head about why this TableAdapter is giving me the error Number of query values and destination fields are not the same when calling the Update command for a couple days now. It looks good to me, with 5 parameters everywhere except where previously spotted in the comments, but obviously there is something I am missing.
Here's the relevant portions of the xml backing it:
<TableAdapter BaseClass="System.ComponentModel.Component" DataAccessorModifier="AutoLayout, AnsiClass, Class, Public" DataAccessorName="UomListTableAdapter" GeneratorDataComponentClassName="UomListTableAdapter" Name="UomList" UserDataComponentName="UomListTableAdapter">
<MainSource>
<DbSource ConnectionRef="DataConnectionString (MySettings)" DbObjectName="UomList" DbObjectType="Table" FillMethodModifier="Public" FillMethodName="Fill" GenerateMethods="Both" GenerateShortCommands="true" GeneratorGetMethodName="GetData" GeneratorSourceName="Fill" GetMethodModifier="Public" GetMethodName="GetData" QueryType="Rowset" ScalarCallRetval="System.Object, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" UseOptimisticConcurrency="true" UserGetMethodName="GetData" UserSourceName="Fill">
<DeleteCommand>
<DbCommand CommandType="Text" ModifiedByUser="true">
<CommandText>DELETE FROM UomList
WHERE (SetID = ?) AND (Name = ?)</CommandText>
<Parameters>
<Parameter AllowDbNull="true" AutogeneratedName="Param1" ColumnName="" DataSourceName="" DataTypeServer="unknown" DbType="String" Direction="Input" ParameterName="SetID" Precision="0" ProviderType="VarWChar" Scale="0" Size="50" SourceColumn="SetID" SourceColumnNullMapping="false" SourceVersion="Original" />
<Parameter AllowDbNull="false" AutogeneratedName="Param2" ColumnName="" DataSourceName="" DataTypeServer="unknown" DbType="String" Direction="Input" ParameterName="Name" Precision="0" ProviderType="VarWChar" Scale="0" Size="31" SourceColumn="Name" SourceColumnNullMapping="false" SourceVersion="Original" />
</Parameters>
</DbCommand>
</DeleteCommand>
<InsertCommand>
<DbCommand CommandType="Text" ModifiedByUser="true">
<CommandText>INSERT INTO UomList
(SetID, Name, Abbr, Qty, IsBase)
VALUES (?, ?, ?, ?)</CommandText>
<Parameters>
<Parameter AllowDbNull="true" AutogeneratedName="Param3" ColumnName="SetID" DataSourceName="" DataTypeServer="unknown" DbType="String" Direction="Input" ParameterName="SetID" Precision="0" ProviderType="VarWChar" Scale="0" Size="50" SourceColumn="SetID" SourceColumnNullMapping="false" SourceVersion="Current" />
<Parameter AllowDbNull="false" AutogeneratedName="Param4" ColumnName="Name" DataSourceName="" DataTypeServer="unknown" DbType="String" Direction="Input" ParameterName="Name" Precision="0" ProviderType="VarWChar" Scale="0" Size="31" SourceColumn="Name" SourceColumnNullMapping="false" SourceVersion="Current" />
<Parameter AllowDbNull="false" AutogeneratedName="Param5" ColumnName="Abbr" DataSourceName="" DataTypeServer="unknown" DbType="String" Direction="Input" ParameterName="Abbr" Precision="0" ProviderType="VarWChar" Scale="0" Size="31" SourceColumn="Abbr" SourceColumnNullMapping="false" SourceVersion="Current" />
<Parameter AllowDbNull="false" AutogeneratedName="Param6" ColumnName="Qty" DataSourceName="" DataTypeServer="unknown" DbType="Double" Direction="Input" ParameterName="Qty" Precision="0" ProviderType="Double" Scale="0" Size="0" SourceColumn="Qty" SourceColumnNullMapping="false" SourceVersion="Current" />
<Parameter AllowDbNull="false" AutogeneratedName="Param1" ColumnName="IsBase" DataSourceName="" DataTypeServer="unknown" DbType="Boolean" Direction="Input" ParameterName="IsBase" Precision="0" ProviderType="Boolean" Scale="0" Size="0" SourceColumn="IsBase" SourceColumnNullMapping="false" SourceVersion="Current" />
</Parameters>
</DbCommand>
</InsertCommand>
<SelectCommand>
<DbCommand CommandType="Text" ModifiedByUser="true">
<CommandText>SELECT SetID, Name, Abbr, Qty, IsBase
FROM UomList</CommandText>
<Parameters />
</DbCommand>
</SelectCommand>
<UpdateCommand>
<DbCommand CommandType="Text" ModifiedByUser="true">
<CommandText>UPDATE UomList
SET Abbr = ?, Qty = ?, IsBase = ?
WHERE (SetID = ?) AND (Name = ?)</CommandText>
<Parameters>
<Parameter AllowDbNull="false" AutogeneratedName="Param1" ColumnName="Abbr" DataSourceName="" DataTypeServer="unknown" DbType="String" Direction="Input" ParameterName="Abbr" Precision="0" ProviderType="VarWChar" Scale="0" Size="31" SourceColumn="Abbr" SourceColumnNullMapping="false" SourceVersion="Current" />
<Parameter AllowDbNull="false" AutogeneratedName="Param2" ColumnName="Qty" DataSourceName="" DataTypeServer="unknown" DbType="Double" Direction="Input" ParameterName="Qty" Precision="0" ProviderType="Double" Scale="0" Size="0" SourceColumn="Qty" SourceColumnNullMapping="false" SourceVersion="Current" />
<Parameter AllowDbNull="false" AutogeneratedName="Param3" ColumnName="IsBase" DataSourceName="" DataTypeServer="unknown" DbType="Boolean" Direction="Input" ParameterName="IsBase" Precision="0" ProviderType="Boolean" Scale="0" Size="0" SourceColumn="IsBase" SourceColumnNullMapping="false" SourceVersion="Current" />
<Parameter AllowDbNull="true" AutogeneratedName="Param4" ColumnName="SetID" DataSourceName="" DataTypeServer="unknown" DbType="String" Direction="Input" ParameterName="Original_SetID" Precision="0" ProviderType="VarWChar" Scale="0" Size="50" SourceColumn="SetID" SourceColumnNullMapping="false" SourceVersion="Original" />
<Parameter AllowDbNull="false" AutogeneratedName="Param5" ColumnName="Name" DataSourceName="" DataTypeServer="unknown" DbType="String" Direction="Input" ParameterName="Original_Name" Precision="0" ProviderType="VarWChar" Scale="0" Size="31" SourceColumn="Name" SourceColumnNullMapping="false" SourceVersion="Original" />
</Parameters>
</DbCommand>
</UpdateCommand>
</DbSource>
</MainSource>
<Mappings>
<Mapping SourceColumn="SetID" DataSetColumn="SetID" />
<Mapping SourceColumn="Name" DataSetColumn="Name" />
<Mapping SourceColumn="Abbr" DataSetColumn="Abbr" />
<Mapping SourceColumn="Qty" DataSetColumn="Qty" />
<Mapping SourceColumn="IsBase" DataSetColumn="IsBase" />
</Mappings>
<Sources />
</TableAdapter>
<xs:element name="UomList" msprop:Generator_UserTableName="UomList" msprop:Generator_RowDeletedName="UomListRowDeleted" msprop:Generator_RowChangedName="UomListRowChanged" msprop:Generator_RowClassName="UomListRow" msprop:Generator_RowChangingName="UomListRowChanging" msprop:Generator_RowEvArgName="UomListRowChangeEvent" msprop:Generator_RowEvHandlerName="UomListRowChangeEventHandler" msprop:Generator_TableClassName="UomListDataTable" msprop:Generator_TableVarName="tableUomList" msprop:Generator_RowDeletingName="UomListRowDeleting" msprop:Generator_TablePropName="UomList">
<xs:complexType>
<xs:sequence>
<xs:element name="SetID" msprop:Generator_UserColumnName="SetID" msprop:Generator_ColumnPropNameInRow="SetID" msprop:Generator_ColumnVarNameInTable="columnSetID" msprop:Generator_ColumnPropNameInTable="SetIDColumn" minOccurs="0">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="50" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Name" msprop:Generator_UserColumnName="Name" msprop:Generator_ColumnPropNameInRow="Name" msprop:Generator_ColumnVarNameInTable="columnName" msprop:Generator_ColumnPropNameInTable="NameColumn">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="31" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Abbr" msprop:Generator_UserColumnName="Abbr" msprop:Generator_ColumnPropNameInRow="Abbr" msprop:Generator_ColumnVarNameInTable="columnAbbr" msprop:Generator_ColumnPropNameInTable="AbbrColumn" minOccurs="0">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="31" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Qty" msprop:Generator_UserColumnName="Qty" msprop:Generator_ColumnPropNameInRow="Qty" msprop:Generator_ColumnVarNameInTable="columnQty" msprop:Generator_ColumnPropNameInTable="QtyColumn" type="xs:double" minOccurs="0" />
<xs:element name="IsBase" msprop:Generator_UserColumnName="IsBase" msprop:Generator_ColumnPropNameInRow="IsBase" msprop:Generator_ColumnVarNameInTable="columnIsBase" msprop:Generator_ColumnPropNameInTable="IsBaseColumn" type="xs:boolean" minOccurs="0" />
<xs:element name="ListDisplay" msdata:ReadOnly="true" msdata:Expression="IIF(SetID = '', Name, Name + ' ' + Qty)" msprop:Generator_UserColumnName="ListDisplay" msprop:Generator_ColumnPropNameInRow="ListDisplay" msprop:Generator_ColumnVarNameInTable="columnListDisplay" msprop:Generator_ColumnPropNameInTable="ListDisplayColumn" type="xs:string" minOccurs="0" />
<xs:element name="SelDisplay" msdata:ReadOnly="true" msdata:Expression="IIF(SetID = '', Qty, Abbr + '(' + Qty + ')')" msprop:Generator_UserColumnName="SelDisplay" msprop:Generator_ColumnPropNameInRow="SelDisplay" msprop:Generator_ColumnVarNameInTable="columnSelDisplay" msprop:Generator_ColumnPropNameInTable="SelDisplayColumn" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:unique name="UomKey1" msdata:PrimaryKey="true">
<xs:selector xpath=".//mstns:UomList" />
<xs:field xpath="mstns:SetID" />
<xs:field xpath="mstns:Name" />
</xs:unique>
Where is the mismatch in number of parameters?
Reviewing the XML it seems the Insert command seems to have 4 question marks and 5 parameters.
<InsertCommand>
<DbCommand CommandType="Text" ModifiedByUser="true">
<CommandText>INSERT INTO UomList (SetID, Name, Abbr, Qty, IsBase) VALUES (?, ?, ?, ?)
</CommandText>
The Error referencing "Update" is probably about TableAdapter.Update not necessarily the Update command. The TableAdapter.Update is calling the Insert, Update, and Delete commands as needed depending on the state of the rows in the DataTable.

OrientDB Cluster Configuration Using VM

I am trying to form a orientDB cluster using orientdb-enterprise-2.2.3 using the VM hosted on local server. The VM have the configuration of Fedora OS 18. I have attached the orientdb-server-config.xml and hazelcast.xml file.
orientdb-server-config.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<orient-server>
<handlers>
<handler class="com.orientechnologies.orient.graph.handler.OGraphServerHandler">
<parameters>
<parameter value="true" name="enabled"/>
<parameter value="50" name="graph.pool.max"/>
</parameters>
</handler>
<handler class="com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin">
<parameters>
<parameter value="${distributed}" name="enabled"/>
<parameter value="${ORIENTDB_HOME}/config/default-distributed-db-config.json" name="configuration.db.default"/>
<parameter value="${ORIENTDB_HOME}/config/hazelcast.xml" name="configuration.hazelcast"/>
</parameters>
</handler>
<handler class="com.orientechnologies.orient.server.handler.OJMXPlugin">
<parameters>
<parameter value="false" name="enabled"/>
<parameter value="true" name="profilerManaged"/>
</parameters>
</handler>
<handler class="com.orientechnologies.orient.server.handler.OAutomaticBackup">
<parameters>
<parameter value="false" name="enabled"/>
<parameter value="${ORIENTDB_HOME}/config/automatic-backup.json" name="config"/>
</parameters>
</handler>
<handler class="com.orientechnologies.orient.server.handler.OServerSideScriptInterpreter">
<parameters>
<parameter value="true" name="enabled"/>
<parameter value="SQL" name="allowedLanguages"/>
</parameters>
</handler>
</handlers>
<network>
<sockets>
<socket implementation="com.orientechnologies.orient.server.network.OServerTLSSocketFactory" name="ssl">
<parameters>
<parameter value="false" name="network.ssl.clientAuth"/>
<parameter value="config/cert/orientdb.ks" name="network.ssl.keyStore"/>
<parameter value="password" name="network.ssl.keyStorePassword"/>
<parameter value="config/cert/orientdb.ks" name="network.ssl.trustStore"/>
<parameter value="password" name="network.ssl.trustStorePassword"/>
</parameters>
</socket>
<socket implementation="com.orientechnologies.orient.server.network.OServerTLSSocketFactory" name="https">
<parameters>
<parameter value="false" name="network.ssl.clientAuth"/>
<parameter value="config/cert/orientdb.ks" name="network.ssl.keyStore"/>
<parameter value="password" name="network.ssl.keyStorePassword"/>
<parameter value="config/cert/orientdb.ks" name="network.ssl.trustStore"/>
<parameter value="password" name="network.ssl.trustStorePassword"/>
</parameters>
</socket>
</sockets>
<protocols>
<protocol implementation="com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary" name="binary"/>
<protocol implementation="com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpDb" name="http"/>
</protocols>
<listeners>
<listener protocol="binary" socket="default" port-range="2424-2430" ip-address="0.0.0.0"/>
<listener protocol="http" socket="default" port-range="2480-2490" ip-address="0.0.0.0">
<commands>
<command implementation="com.orientechnologies.orient.server.network.protocol.http.command.get.OServerCommandGetStaticContent" pattern="GET|www GET|studio/ GET| GET|*.htm GET|*.html GET|*.xml GET|*.jpeg GET|*.jpg GET|*.png GET|*.gif GET|*.js GET|*.css GET|*.swf GET|*.ico GET|*.txt GET|*.otf GET|*.pjs GET|*.svg GET|*.json GET|*.woff GET|*.woff2 GET|*.ttf GET|*.svgz" stateful="false">
<parameters>
<entry value="Cache-Control: no-cache, no-store, max-age=0, must-revalidate\r\nPragma: no-cache" name="http.cache:*.htm *.html"/>
<entry value="Cache-Control: max-age=120" name="http.cache:default"/>
</parameters>
</command>
<command implementation="com.orientechnologies.orient.graph.server.command.OServerCommandGetGephi" pattern="GET|gephi/*" stateful="false"/>
</commands>
<parameters>
<parameter value="utf-8" name="network.http.charset"/>
<parameter value="true" name="network.http.jsonResponseError"/>
</parameters>
</listener>
</listeners>
</network>
<storages/>
<users>
<user resources="*" password="root" name="root"/>
<user resources="connect,server.listDatabases,server.dblist" password="guest" name="guest"/>
</users>
<properties>
<entry value="1" name="db.pool.min"/>
<entry value="50" name="db.pool.max"/>
<entry value="true" name="profiler.enabled"/>
</properties>
<isAfterFirstTime>true</isAfterFirstTime>
</orient-server>
hazelcast.xml
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.3.xsd"
xmlns="http://www.hazelcast.com/schema/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<group>
<name>orientdb</name>
<password>orientdb</password>
</group>
<network>
<port auto-increment="true">2434</port>
<join>
<multicast enabled="true">
<multicast-group>235.1.1.1</multicast-group>
<multicast-port>2434</multicast-port>
</multicast>
</join>
</network>
<executor-service>
<pool-size>16</pool-size>
</executor-service>
</hazelcast>
But the two nodes are not getting connected in the cluster. I am trying the multicast join for clustering. The individual servers are working fine.
What might be the issue with the multicast clustering?
Hi Deep Mistry try iptables firewall configuration as the ports might be blocked by firewall

How to apply batch model to non batched data in rapidminer

i have created a model using batched validation, is there a way to apply this model to non-batched data?
Here is the sample process I created:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.0.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.0.001" expanded="true" height="68" name="Retrieve distmodel3" width="90" x="45" y="136">
<parameter key="repository_entry" value="../data/distmodel3"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.0.001" expanded="true" height="82" name="Set Role" width="90" x="246" y="187">
<parameter key="attribute_name" value="batchid"/>
<parameter key="target_role" value="batch"/>
<list key="set_additional_roles">
<parameter key="Letter" value="label"/>
<parameter key="Frame" value="batch"/>
<parameter key="Feat1" value="regular"/>
<parameter key="Feat2" value="regular"/>
<parameter key="Feat3" value="regular"/>
<parameter key="Feat4" value="regular"/>
<parameter key="Feat5" value="regular"/>
<parameter key="Feat6" value="regular"/>
<parameter key="Feat7" value="regular"/>
<parameter key="Feat8" value="regular"/>
<parameter key="Gender" value="regular"/>
</list>
</operator>
<operator activated="true" class="batch_x_validation" compatibility="7.0.001" expanded="true" height="124" name="Validation" width="90" x="380" y="85">
<parameter key="create_complete_model" value="false"/>
<parameter key="average_performances_only" value="true"/>
<process expanded="true">
<operator activated="false" class="weka:W-J48" compatibility="7.0.000" expanded="true" height="82" name="W-J48" width="90" x="112" y="34">
<parameter key="U" value="true"/>
<parameter key="C" value="0.25"/>
<parameter key="M" value="2.0"/>
<parameter key="R" value="false"/>
<parameter key="B" value="true"/>
<parameter key="S" value="false"/>
<parameter key="L" value="false"/>
<parameter key="A" value="false"/>
</operator>
<operator activated="true" class="k_nn" compatibility="7.0.001" expanded="true" height="82" name="k-NN" width="90" x="112" y="187">
<parameter key="k" value="3"/>
<parameter key="weighted_vote" value="false"/>
<parameter key="measure_types" value="MixedMeasures"/>
<parameter key="mixed_measure" value="MixedEuclideanDistance"/>
<parameter key="nominal_measure" value="NominalDistance"/>
<parameter key="numerical_measure" value="EuclideanDistance"/>
<parameter key="divergence" value="GeneralizedIDivergence"/>
<parameter key="kernel_type" value="radial"/>
<parameter key="kernel_gamma" value="1.0"/>
<parameter key="kernel_sigma1" value="1.0"/>
<parameter key="kernel_sigma2" value="0.0"/>
<parameter key="kernel_sigma3" value="2.0"/>
<parameter key="kernel_degree" value="3.0"/>
<parameter key="kernel_shift" value="1.0"/>
<parameter key="kernel_a" value="1.0"/>
<parameter key="kernel_b" value="0.0"/>
</operator>
<connect from_port="training" to_op="k-NN" to_port="training set"/>
<connect from_op="k-NN" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="7.0.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="7.0.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
<parameter key="main_criterion" value="first"/>
<parameter key="accuracy" value="true"/>
<parameter key="classification_error" value="true"/>
<parameter key="kappa" value="true"/>
<parameter key="weighted_mean_recall" value="false"/>
<parameter key="weighted_mean_precision" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="false"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_mean_squared_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="false"/>
<parameter key="squared_correlation" value="false"/>
<parameter key="cross-entropy" value="false"/>
<parameter key="margin" value="false"/>
<parameter key="soft_margin_loss" value="false"/>
<parameter key="logistic_loss" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="legacy:write_model" compatibility="7.0.001" expanded="true" height="68" name="Write Model" width="90" x="514" y="187">
<parameter key="model_file" value="C:\Users\Hans\Documents\ModelFile.mod"/>
<parameter key="overwrite_existing_file" value="true"/>
<parameter key="output_type" value="XML Zipped"/>
</operator>
<connect from_op="Retrieve distmodel3" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_op="Write Model" to_port="input"/>
<connect from_op="Validation" from_port="training" to_port="result 1"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
The Batch Validation operator uses an attribute to split the training example set. Because this attribute is explicitly set to be of type batch, it is special and this means that it is not used when building the model; classification models use regular attributes to predict a class label. This means the model should work on an example set that does not contain the attribute with the batch role. If the model is used with an example set containing the batch attribute which is set to be regular, the performance will not depend on it (the model may not work at all - it depends on the model).

Extract multiple date strings from a single varchar(MAX) column SQL Server

I've inherited a report spec subsystem that needs to be tweaked. The mission is to add a date column to the tReports table, and populate it with the CreateDate that is (supposed to be) contained in the XML Code Spec. The problem is that some of the older reports don't have the CREATEDATE attribute, or as in one example below, the XML is valid but poorly formed, and the CREATEDATE cannot be retrieved using the xQuery that works for most of the other reports. Since I don't have an explicit creation date included in the spec, I'm using interpolation to estimate a reasonable date. One factor of the interpolation is to look at the date strings contained in the report spec--some will be useful, others not.
There are too many reports (over 1,200) to visually skim each report spec for date strings. These date strings can appear in any location in the report spec, and there are a very large number of combinations of elements and attributes that can contain a date string.
The ideal solution would be a listing of the reportID's and the date string ready to use in an UPDATE, but because date format varies (m/d/yy, mm/dd/yy, m/dd/yy ...) I'd be grateful to get a vew spurious characters surrounding the date string that I could clean up later.
All the date strings will be from 2000 or later, so the search string I'v been using is '/20' which has provided good results.
I've looked at many sites that discuss this kind of issue, and found only one solution by Mikael Eriksson that is something like what I'm describing, but I can't make it work after several hours of playing with it. How to extract multiple strings from single rows in SQL Server
Is there a way, without using a cursor nor WHILE loop, to extract these embedded dates?
-- Some representative data: (I'm using SQL Server 2008 R2)
CREATE TABLE #ReportSpecs (ReportID INT, ReportSpec VARCHAR(MAX))
INSERT INTO #ReportSpecs
( ReportID, ReportSpec )
VALUES
(136,
'<ReportID>136</ReportID>
<EmpIDCreator>23816</EmpIDCreator>
<EmpName>Blanc, Melvin J</EmpName>
<ReportType>0</ReportType>
<ReportName>PSST Sys Spec</ReportName>
<ReportData>
<REPORT>
<COLUMNS>
<Column Name="JobNumber" Position="1" />
<Column Name="TaskType" Position="2" />
<Column Name="Assignees" Position="3" />
<Column NAME="JobDueDate" Position="4" />
<Column Name="ReferenceNumber" Position="5" />
<Column Name="Deliverable" Position="6" />
<Column Name="Priority" Position="7" />
</COLUMNS>
<FILTERS>
<FILTER NAME="TYPE" VALUE="To_Me" />
<FILTER NAME="Status" VALUE="All" />
<FILTER NAME="DateOptions" VALUE="DateRange" From="8/16/2002" To="8/23/2002" />
<FILTER NAME="FromDate" VALUE="8/16/2002" />
<FILTER NAME="ToDate" VALUE="8/23/2002" />
<FILTER NAME="Role" VALUE="All" />
</FILTERS>
<parameters>
<PARAMETER NAME="#Cascading" TYPE="integer" VALUE="0" />
<PARAMETER NAME="#EmpID" SYSTEM="true" TYPE="integer" VALUE="#Request.EmployeeIDAlias#" />
<PARAMETER NAME="#FromOrgs" TYPE="varchar(250)" VALUE="" />
<PARAMETER NAME="#ToOrgs" TYPE="varchar(250)" VALUE="" />
</parameters>
<NAME>PSST Sys Spec</NAME>
<OWNER>
<ID>23816</ID>
</OWNER>
<source id="8" useinternalid="True" />
</REPORT>
</ReportData>'),
(311,
'<ReportID>311</ReportID>
<EmpIDCreator>7162</EmpIDCreator>
<EmpName>Potter, Harry J</EmpName>
<ReportType>0</ReportType>
<ReportName>CPVC Synch Test</ReportName>
<ReportData>
<REPORT>
<COLUMNS>
<Column Name="JobNumber" Position="1" />
<Column Name="TaskType" Position="2" />
<Column Name="Subject" Position="3" />
<Column Name="CurrentAssignee" Position="4" />
<Column NAME="JobDueDate" Position="5" />
<Column Name="Deliverable" Position="6" />
<Column Name="Category" Position="7" />
<Column Name="Priority" Position="8" />
</COLUMNS>
<FILTERS>
<FILTER NAME="TYPE" VALUE="By_Orgs_6098,By_Orgs_6123" />
<FILTER NAME="Status" VALUE="Open" />
<FILTER NAME="DateOptions" VALUE="DateRange" From="3/25/2002" To="4/4/2002" />
<FILTER NAME="ReviewFromDate" VALUE="3/25/2002" />
<FILTER NAME="ReviewToDate" VALUE="4/4/2002" />
<FILTER NAME="Role" VALUE="All" />
</FILTERS>
<parameters>
<PARAMETER NAME="#Act" TYPE="integer" VALUE="0" />
<PARAMETER NAME="#MgrID" SYSTEM="true" TYPE="integer" VALUE="#Request.EmployeeIDAlias#" />
<PARAMETER NAME="#MgrIDActing" TYPE="integer" VALUE="" />
<PARAMETER NAME="#FromDept" TYPE="varchar(250)" VALUE="" />
<PARAMETER NAME="#FromEmp" TYPE="varchar(250)" VALUE="" />
<PARAMETER NAME="#ToDept" TYPE="varchar(250)" VALUE="" />
</parameters>
<NAME>CPVC Synch Test</NAME>
<OWNER>
<ID>7162</ID>
</OWNER>
<source id="17" useinternalid="True" />
</REPORT>
</ReportData>'),
(1131,
'<ReportID>1131</ReportID>
<EmpIDCreator>13185</EmpIDCreator>
<EmpName>Reed, Alan</EmpName>
<ReportType>0</ReportType>
<ReportName>
''"><script>alert(''hello'')</script>
</ReportName>
<ReportData>
<Report NAME="''">
<script>alert(''hello'')</script>" CREATEDATE="12/7/2009">
<DESCRIPTION>sfasf</DESCRIPTION>
<OWNER ID="13185"/>
<SOURCE ID="1" USEINTERNALID="TRUE"/>
<COLUMNS>
<COLUMN NAME="JobNumber" POSITION="1" SORTORDER="asc"/>
</COLUMNS>
<FILTERS>
<FILTER NAME="TYPE" VALUE="By_Me,To_Me" />
<FILTER NAME="ASGSTATUS" VALUE="Open" />
<FILTER NAME="DATEOPTIONS" VALUE="All" />
<FILTER NAME="STATUS" VALUE="Open" />
<FILTER NAME="ASGDATEOPTIONS" VALUE="All" />
<FILTER NAME="ROLE" VALUE="All" />
</FILTERS>
<PARAMETERS>
<PARAMETER NAME="#Me" TYPE="integer" VALUE="3" />
<PARAMETER NAME="#FromCost" TYPE="varchar(250)" VALUE=""/>
<PARAMETER NAME="#ToCost" TYPE="varchar(250)" VALUE="" />
</PARAMETERS>
<ADVANCEDSORT SortByA="JobNumber" SortOrderA="asc" SortByB="" SortOrderB="" SortByC="" SortOrderC="" />
</Report>
</ReportData>');
/*
Desired Output (A DISTINCT list would be better, but just getting this output would be GREAT.)
ReportID DateString
-------- ----------
136 8/16/2002
136 8/23/2002
136 8/16/2002
136 8/23/2002
311 3/25/2002
311 4/4/2002
311 3/25/2002
311 4/4/2002
1131 12/7/2009
*/
DROP TABLE #ReportSpecs
Thanks for your time.
select R.ReportID,
D.V as DateString
from #ReportSpecs as R
cross apply (select cast(R.ReportSpec as xml)) as X(R)
cross apply X.R.nodes('//#*, //*/text()') as T(X)
cross apply (select T.X.value('.', 'varchar(max)')) as D(V)
where charindex('/20', D.V) > 0
Result:
ReportID DateString
----------- --------------------------
136 8/16/2002
136 8/23/2002
136 8/16/2002
136 8/23/2002
311 3/25/2002
311 4/4/2002
311 3/25/2002
311 4/4/2002
1131 " CREATEDATE="12/7/2009">

RapidMiner : Where/how is the Store (Model) operator connected in this process flow

I have created a process flow within RapidMiner that utilizes some loops. I'm not exactly sure where my Store Model operator should be connected to, in order to save the model parameters derived through this process to be in a new process.
The attached example has my data replaced with some sample data, however the rest of the process is what I have for my actual data set.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.012">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.012" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="5.3.012" expanded="true" height="60" name="Retrieve Sonar" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Sonar"/>
</operator>
<operator activated="true" class="numerical_to_binominal" compatibility="5.3.012" expanded="true" height="76" name="Numerical to Binominal" width="90" x="179" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="20_OV_COVER"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.012" expanded="true" height="76" name="Set Role" width="90" x="45" y="120">
<parameter key="attribute_name" value="class"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.3.012" expanded="true" height="94" name="Normalize" width="90" x="179" y="120"/>
<operator activated="true" class="nominal_to_numerical" compatibility="5.3.012" expanded="true" height="94" name="Nominal to Numerical (2)" width="90" x="45" y="210">
<list key="comparison_groups"/>
</operator>
<operator activated="true" class="replace_missing_values" compatibility="5.3.012" expanded="true" height="94" name="Replace Missing Values" width="90" x="179" y="210">
<list key="columns"/>
</operator>
<operator activated="true" class="independent_component_analysis" compatibility="5.3.012" expanded="true" height="94" name="ICA" width="90" x="313" y="210">
<parameter key="number_of_components" value="700"/>
</operator>
<operator activated="true" class="optimize_selection_forward" compatibility="5.3.012" expanded="true" height="94" name="Forward Selection" width="90" x="514" y="75">
<parameter key="maximal_number_of_attributes" value="100"/>
<parameter key="speculative_rounds" value="10"/>
<process expanded="true">
<operator activated="true" class="x_validation" compatibility="5.3.012" expanded="true" height="112" name="Validation" width="90" x="112" y="30">
<parameter key="number_of_validations" value="5"/>
<process expanded="true">
<operator activated="true" class="naive_bayes" compatibility="5.3.012" expanded="true" height="76" name="Naive Bayes" width="90" x="112" y="30"/>
<connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
<connect from_op="Naive Bayes" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.012" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.3.012" expanded="true" height="76" name="Performance" width="90" x="276" y="30"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_port="example set" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Sonar" from_port="output" to_op="Numerical to Binominal" to_port="example set input"/>
<connect from_op="Numerical to Binominal" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Nominal to Numerical (2)" to_port="example set input"/>
<connect from_op="Nominal to Numerical (2)" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="ICA" to_port="example set input"/>
<connect from_op="ICA" from_port="example set output" to_op="Forward Selection" to_port="example set"/>
<connect from_op="ICA" from_port="original" to_port="result 1"/>
<connect from_op="ICA" from_port="preprocessing model" to_port="result 2"/>
<connect from_op="Forward Selection" from_port="example set" to_port="result 3"/>
<connect from_op="Forward Selection" from_port="attribute weights" to_port="result 4"/>
<connect from_op="Forward Selection" from_port="performance" to_port="result 5"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="18"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
<portSpacing port="sink_result 6" spacing="0"/>
</process>
</operator>
</process>
The forward selection operator outputs a set of weights and these can be used to select the attributes that the operator found to give the best performance. The first thing to do therefore is to use these weights with the Select by Weight operator to give the example set that was used to build the model.
From there you could simply rebuild the model with this example set outside the Forward Selection operator. If you additionally wanted to get an estimate of the performance on unseen data you could use a Validation block on all the data but if not then simply using the model operator would create the model you need.
When I tried it, the estimated performance I got was different to the one produced by the Forward Selection operator because the partitioning in the validation block is different owing to different random number seeds. It is also the case that the Validation block inside the Forward Selection is giving an average performance for 10 models built on 10 partitions of data. These 10 models could all be different so there is no one true model that can be saved.
Hope that helps.
regards
Andrew