WEKA instances from SQL query data - instance

I am wondering how to create the instance in WEKA without actually using CSV or ARRF file. i mean if ihave result of SQL query having four string fields, than how can i used that result to create instance.

Assuming you are not using Weka Explorer and are coding this yourself, something like this should help you make an instance (in this code str1-str4 is your SQL result strings)! Note: This code requires you to load an .arff file with at least a couple of instances that are the same as what your new instances will be like!
Instance inst = new DenseInstance(4);
inst.setValue(attr1, str1);
inst.setValue(attr2, str2);
inst.setValue(attr3, str3);
inst.setValue(attr4, str4);
inst.setDataset(instanceList);
System.out.println("New instance: " + inst);

Related

Is there a way to execute text gremlin query with PartitionStrategy

I'm looking for an implementation to run text query ex: "g.V().limit(1).toList()" while using the PatitionStrategy in Apache TinkerPop.
I'm attempting to build a REST interface to run queries on selected graph paritions only. I know how to run a raw query using Client, but I'm looking for an implementation where I can create a multi-tenant graph (https://tinkerpop.apache.org/docs/current/reference/#partitionstrategy) and query only selected tenants using raw text query instead of a GLV. Im able to query only selected partitions using pythongremlin, but there is no reference implementation I could find to run a text query on a tenant.
Here is tenant query implementation
connection = DriverRemoteConnection('ws://megamind-ws:8182/gremlin', 'g')
g = traversal().withRemote(connection)
partition = PartitionStrategy(partition_key="partition_key",
write_partition="tenant_a",
read_partitions=["tenant_a"])
partitioned_g = g.withStrategies(partition)
x = partitioned_g.V.limit(1).next() <---- query on partition only
Here is how I execute raw query on entire graph, but Im looking for implementation to run text based queries on only selected partitions.
from gremlin_python.driver import client
client = client.Client('ws://megamind-ws:8182/gremlin', 'g')
results = client.submitAsync("g.V().limit(1).toList()").result().one() <-- runs on entire graph.
print(results)
client.close()
Any suggestions appreciated? TIA
It depends on how the backend store handles text mode queries, but for the query itself, essentially you just need to use the Groovy/Java style formulation. This will work with GremlinServer and Amazon Neptune. For other backends you will need to make sure that this syntax is supported. So from Python you would use something like:
client.submit('
g.withStrategies(new PartitionStrategy(partitionKey: "_partition",
writePartition: "b",
readPartitions: ["b"])).V().count()')

Using apache beam JsonTimePartitioning to create time partitioned tables in bigqiery

I have tried using the JsonTimePartitioning class in apache beam JAVA sdk to write data to dynamic tables in bigquery but i get "cannot find symbol" for the class JsonTimePartitioning.
this is how i try to import the class
import com.google.api.services.bigquery.model.JsonTimePartitioning;
and this is how i try to use it in my pipeline
.withWriteDisposition(WriteDisposition.WRITE_APPEND)
.withJsonTimePartitioningTo(new JsonTimePartitioning().setType("DAY")));
I can't seem to find the JsonTimePartitioning anywhere. Can you point to an example that you are trying to follow? The existing methods on BigQueryIO either accept an instance of TimePartiotioning, or a value-provider for a String that is actually a JSON-serialized instance of the same TimePartitioning. And in fact, when calling the TimePartitioning version of the method, you still end up just serializing it into string internally:. You can find an example of how it's used here:
Loading historical data into time-partitioned BigQuery tables To load
historical data into a time-partitioned BigQuery table, specify
BigQueryIO.Write.withTimePartitioning(com.google.api.services.bigquery.model.TimePartitioning)
with a field used for column-based partitioning. For example:
PCollection<Quote> quotes = ...;
quotes.apply(BigQueryIO.write()
.withSchema(schema)
.withFormatFunction(quote -> new TableRow()
.set("timestamp", quote.getTimestamp())
.set(..other columns..))
.to("my-project:my_dataset.my_table")
.withTimePartitioning(new TimePartitioning().setField("time"))); ```

How to add an array of datarows into an exisitng table inside my database

I'm a newbie so don't laugh :#
I'm working with 2002-2003 Microsoft access database.
Now, I want to add an array of DataRow into an existing table that I've in my database. Is there a way to do that? because right now I'm just adding the rows with a foreach loop
thank you
I think that the foreach-loop actually is the best way to do it.
foreach(DataRow row in yourRowArray)
{
dataTable.Add(row);
}
If you are using .Net Framework 3.5+ you can also use the DataRows CopyToDataTable() Method.
But you have to watch out because the Data in the DataTable is overwritten in this case.
DataTable table = yourDataTable;
DataRow[] yourRowArray = ...;
if(yourRowArray.Length > 0)
{
table = yourRowArray.CopyToDataTable();
}
I would recommend using the foreach-loop.
What you describe as array must be a saved file type i.e. excel or csv. Be sure it is a clean grid of data without extraneous non aligned rows.
Then you can link to that file with Access as a table. This is a manual step using the Access interface - in the ribbon it is the External area. This link remains good - allowing you to replace the excel/csv with a new one as long as the location path and structure of the file do not change.
Then you create an Append query to write all the records from this table into the table in your Access database.
www.CahabaData.com

Export Data from SQL to CSV

I'm using EntityFramework to access a sql server to return data. The data needs to be formatted into a tab delimited file. I then want to compress the data to return to the user.
I can do the select, and then iterate over the EF objects and format all the data into one big string- but this takes forever (I'm returning abouit 800k rows). The query itself is quite fast, but its just the creating of the csv file in memory that is killing it.
I found this post that describes how to use sqlcmd to do this directly as an export (but with csv) with sql which seems very promising, but I'm unclear how to pass the -E and other parameters to ExecuteSqlCommand()... or if it is even meant for this.
I tried to do something like this:
var test = context.Database.ExecuteSqlCommand("select Chromosome c,
StartLocation sl, Endlocation el, GeneName gn from Gencode where c = chr1",
"-E", "-Q", new SqlParameter("-s", "\t"));
But of course that didn't work...
Any suggestions as to how to go about this? I'm using EF 6.1 if that matters.
Alternate option using simple method.
F5-->store result--> keep file name

Dynamic querystring in JRXML [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I'm trying to build a report that would be smart enough to modify slightly its sql query based on an input parameter of some sort.
For example if that special modifying parameter value is "1", it adds a field in the select and adds a group by clause to the query.
I've looked into java expressions, but they don't seem to be supported in the queryString tag of the jrxml. Also tried to make a variable containing the java expression and use that variable in the queryString tag... That didn't work either!
Right now I'm thinking of maybe have a stored procedure with all that logic and simply have the jrxml calling that stored procedure with the modifying input parameter, but the project I'm working on doesn't seem to have a whole lot of stored proc, so I'd like to see if there are other solutions before I go down that path.
Thanks for your help.
Thank you guys for your help, much apprieciated. However I found another way to go about it, and posted it for information: here
JasperDesign actually lets you modify portions of your jrxml document. So say you have a package "reports" where you store your report built either by hand or by a tool like iReport. As long as your query is defined in the tag <queryString> the following will work allowing you to change the query on the fly:
try {
String fileName = getClass().getClassLoader().getResource("com/foo/myproject/reports/TestReport.jrxml").getFile();
File theFile = new File(fileName);
JasperDesign jasperDesign = JRXmlLoader.load(theFile);
//Build a new query
String theQuery = "SLECT * FROM myTable WHERE ...";
// update the data query
JRDesignQuery newQuery = new JRDesignQuery();
newQuery.setText(theQuery);
jasperDesign.setQuery(newQuery);
JasperReport jasperReport = JasperCompileManager.compileReport(jasperDesign);
Connection conn = MyDatabaseClass.getConnection();
JasperPrint jasperPrint = JasperFillManager.fillReport(jasperReport, null, conn);
JasperViewer.viewReport(jasperPrint);
} catch (Exception ex) {
String connectMsg = "Could not create the report " + ex.getMessage() + " " + ex.getLocalizedMessage();
System.out.println(connectMsg);
}
With something like this you can create a member variable of your class that holds the new query and build it with whatever user constrains desired. Then at view time just modify the design.
-Jeff
I've done it using stored procedures which are just fine for these kinds of stuff. Otherwise you may switch to Java. Just grab the data from the database and according to the user provided parameters filter it, group it and send as a collection of beans to the Jasper report which will do the rendering.
JasperDesign helped me to solve the problem of building dynamic query on Jrxml file.
To build the Dynamic SQL, I was using the Squiggle(Google Code) to build the SQL dynamically.
Thanks jeff