I'm trying to update a series of BLOB values using the following code. Note how we're not using Blob objects, but just calling setBytes.
try (PreparedStatement p = c.prepareStatement("UPDATE table SET data = ? WHERE id = ?") {
for (Map.Entry<String, byte[]> d : updates.entrySet()) {
p.setBytes(1, d.getValue());
p.setString(2, d.getKey());
p.addBatch();
}
p.executeBatch();
}
However, this is running really slowly (updating about 100 rows a second)
Is this the 'correct' way of doing these updates? Should we be using Blob objects? How do we do that to do these updates and new data inserts?
Is there any reason this might be running slowly?
Related
I am trying to create a dynamic pivoting in postgresql. my query is including 3 dimensions in which one includes a case statement. and one measure. I have tried various solutions found on web but none is working.
I am looking for a script which converts normal query to a pivoted table. pls help me with this.
You have 3 options, basically:
write your query in MDX, which can easily return a pivoted table; requires Mondrian schema first;
use a Kettle datasource and denormalise your data with PDI;
write a denormalisation function in your postFetch method of the table component: it gets the data coming from the query and manipulates it before passing it to the table renderer.
Code snippet to guide you through the process of denormalising in the postFetch of the component:
function(data){
var resultset = data.resultset;
var metadata = data.metadata;
// doStuff - this is the bit you'll have to code according to your needs. The resultset array has 1 element for each row of data; the metadata array holds metadata info such as colName, colType and output index.
var newResultset = someFunction(resultset);
var newMetadata = someOtherFunction(metadata);
// set data to the new values
data.resultset = newResultset;
data.metadata = newMetadata;
// return the modified data object to the component
return data;
}
Scenario:
I have created transformation to load data into table from csv file and I have following columns in csv file:
Customer_Id
Company_Id
Employee_Name
But user may give input file with column ordering (random order) as
Employee_Name
Company_Id
Customer_Id
so, if I try to load file which has random column ordering, will kettle load correct column values as per column names ... ?
Using ETL Metadata Injection you can use a transformation like this, to either normalize the data, or to store it to your database:
Then you just need to send the correct data to that transformation. You can read the header line from the CSV, and use Row Normaliser to convert to the format used by ETL Metadata Injection.
I have included a quick example here: csv_inject on Dropbox, if you make something like this and run it from something that runs it per csv file it should work.
Ooh, thats some nasty javascript!
The way to do this is with metadata injection. Look at the samples, but basically you need a template which reads the file, and writes it back out. you then use another parent transformation to figure out the headings, configure that template and then execute it.
There are samples in the PDI samples folder, and also take a look at the "figuring out file format" example in matt casters blueprints project on github.
You could try something like this as your JavaScript:
//Script here
var seen;
trans_Status = CONTINUE_TRANSFORMATION;
var col_names = ['Customer_Id','Company_Id','Employee_Name'];
var col_pos;
if (!seen) {
// First line
trans_Status = SKIP_TRANSFORMATION;
seen = 1;
col_pos = [-1,-1,-1];
for (var i = 0; i < col_names.length; i++) {
for (var j = 0; j < row.length; j++) {
if (row[j] == col_names[i]) {
col_pos[i] = j;
break;
}
}
if (col_pos[i] === -1) {
writeToLog("e", "Cannot find " + col_names[i]);
trans_Status = ERROR_TRANSFORMATION;
break;
}
}
}
var Customer_Id = row[col_pos[0]];
var Company_Id = row[col_pos[1]];
var Employee_Name = row[col_pos[2]];
Here is the .ktr I tried: csv_reorder.ktr
(edit, here are the test csv files)
1.csv:
Customer_Id,Company_Id,Employee_Name
cust1,comp1,emp1
2.csv:
Employee_Name,Company_Id,Customer_Id
emp2,comp2,cust2
Assuming rejecting the input file is not an option you basically have 4 solutions.
reorder the fields in an external editor (don't use excel if it contains dates)
Use code within your transformation to detect the column headers and reorder the file.
Use metadata injection as proposed by bolav
Create a job. This need to:
a. load the file into a temporary database.
b. use an sql statement to retrieve the fields (use a SELECT with an ORDER By clause)
c. output the file in the correct order
Given the code:
var q = db.TranslationProjectKeys.Where(c => c.ProjectID == forProject.ID);
foreach (var key in q)
{
key.Live = false;
}
db.SubmitChanges();
If I run the SQL server profiler it shows dozens of UPDATE.... SQL statements. I believe this would be considerably faster if this query was executed with one UPDATE statement:
UPDATE TranslationProjectKeys SET Live = 0 WHERE ProjectID = x
Is there a way to make the query execute in this way?
I'm wary of using db.ExecuteCommand("...") as I am aware the context can be cached which can cause some bugs.
You can use EntityFramework.Extended . You can execute multiple update.
For Example
var q = db.TranslationProjectKeys.Where(c => c.ProjectID == forProject.ID).Update(c => new TranslationProjectKey{ Live = false })
linq to sql does not have a built in mass update function (unless something has changed). you could just execute the command manually like
dataContext.ExecuteCommand("UPDATE TranslationProjectKeys SET Live = 0 WHERE ProjectID = {0}", x);
or if you wanted to get fancy, you could try implementing your own mass update functionality like this article shows, but it is MUCH more in depth and complicated (Though it IS possible to do)
I'm new to Apache Spark and Scala (also a beginner with Hadoop in general).
I completed the Spark SQL tutorial: https://spark.apache.org/docs/latest/sql-programming-guide.html
I tried to perform a simple query on a standard csv file to benchmark its performance on my current cluster.
I used data from https://s3.amazonaws.com/hw-sandbox/tutorial1/NYSE-2000-2001.tsv.gz, converted it to csv and copy/pasted the data to make it 10 times as big.
I loaded it into Spark using Scala:
// sc is an existing SparkContext.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// createSchemaRDD is used to implicitly convert an RDD to a SchemaRDD.
import sqlContext.createSchemaRDD
Define classes:
case class datum(exchange: String,stock_symbol: String,date: String,stock_price_open: Double,stock_price_high: Double,stock_price_low: Double,stock_price_close: Double,stock_volume: String,stock_price_adj_close: Double)
Read in data:
val data = sc.textFile("input.csv").map(_.split(";")).filter(line => "exchange" != "exchange").map(p => datum(p(0).trim.toString, p(1).trim.toString, p(2).trim.toString, p(3).trim.toDouble, p(4).trim.toDouble, p(5).trim.toDouble, p(6).trim.toDouble, p(7).trim.toString, p(8).trim.toDouble))
Convert to table:
data.registerAsTable("data")
Define query (list all rows with 'IBM' as stock symbol):
val IBMs = sqlContext.sql("SELECT * FROM data WHERE stock_symbol ='IBM'")
Perform count so query actually runs:
IBMs.count()
The query runs fine, but returns res: 0 instead of 5000 (which is what it returns using Hive with MapReduce).
filter(line => "exchange" != "exchange")
Since "exchange" is equal to "exchange" filter will return a collection of size 0. And since there is no data, querying for any result will return 0. You need to re-write your logic.
I have 2 servers: dbh1 and dbh2 where I query dbh1 and pull data via fetchall_arrayref method. Once I execute the query, I want to insert the output from dbh1 into a temp table on server dbh2.
I am able to establish access to both servers at the same time and am able to pull data from both.
1. I pull data from dbh1:
while($row = shift(#$rowcache) || shift(#{$rowcache=$sth1->fetchall_arrayref(undef, $max_rows)})) {
#call to sub insert2tempData
&insert2tempData(values #{$row});
}
2. Then on dbh2 I have an insert query:
INSERT INTO ##population (someid, Type, anotherid)
VALUES ('123123', 'blah', '634234');
Question:
How can I insert the bulk result of the fetchall_arrayref from dbh1 into the temp table on server dbh2 (without looping through individual records)?
Ok - so i was able to resolve this issue and was able to implement the following code:
my $max_rows = 38;
my $rowcache = [];
my $sum = 0;
if($fldnames eq "ALL"){ $fldnames = join(',', #{ $sth1->{NAME} });}
my $ins = $dbh2->prepare("insert into $database2.dbo.$tblname2 ($fldnames) values $fldvalues");
my $fetch_tuple_sub = sub { shift(#$rowcache) || shift(#{$rowcache=$sth1->fetchall_arrayref(undef, $max_rows)}) };
my #tuple_status;
my $rc;
$rc = $ins->execute_for_fetch($fetch_tuple_sub, \#tuple_status);
my #errors = grep { ref $_ } #tuple_status;
The transfer works but it is still slower than if I were to transfer data manually through SQL Server export/import wizard . The issue that i notice is that the data flows row by row into the destination and I was wondering if it is possible to increase the bulk transfer size. It downloads the data extremely fast, but when i combine download and upload then the speeds decreases dramatically and it takes up to 10 minutes to transfer a 5000 row table between servers.
It would be better if you said what your goal was (speed?) rather than asking a specific question on avoiding looping.
For a Perl/DBI way:
Look at DBI's execute_array and execute_for_fetch however as you've not told us which DBD you are using it is impossible to say more. Not all DBDs support bulk insert and when they don't DBI emulates it. DBD::Oracle does and DBD::ODBC does (in recent versions see odbc_array_operations) but in the latter it is off by default.
You didn't mention which version of SQL Server you are using. First, I would look into the "BULK INSERT" support of that version.
You also didn't mention how many rows are involved. I'll assume that they fit into memory, otherwise a bulk insert won't work.
From there it's up to you to translate the output of fetchall_arrayref into the syntax needed for the "BULK INSERT" operation.