Groovy parallel sql queries

Groovy parallel sql queries - sql

I'm trying to run parallel sql queries using GPars. But somehow it isn't working as I expected. Since I'm relatively new to groovy/java concurrency I'm not sure how to solve my issue.
I following code:
def rows = this.sql.rows(
"SELECT a_id, b_id FROM data_ids LIMIT 10 OFFSET 10"
)
With this code I get a List of ids. Now I want the load the data for each loaded id and this should happen parallel to improve my performance, because I have a large database.
To get the detail data I use the following code:
GParsPool.withPool() {
result = rows.collectParallel {
// 2. Get the data for each source and save it in an array.
def tmpData = [:]
def row = it
sql.withTransaction {
if (row.a_id != null) {
tmpData.a = sql.firstRow("SELECT * FROM data_a WHERE id = '" + row.a_id + "'")
}
if (row.b_id != null) {
tmpData.b = sql.firstRow("SELECT * FROM data_b WHERE id = '" + row.b_id + "'")
}
}
return tmpData
}
// 3. Return the loaded data.
return result
Now I run the code and everything works fine except that the code isn't executed parallel. Using the JProfiler I can see that I have blocked threads and waiting threads, but 0 runnable threads.
Thanks for any help. I you need more information, I will provide them :)
Daniel

Related

Entity Framework, update multiple fields more efficiently

Using Entity Framework, I am updating about 300 rows, and 9 columns about every 30 seconds. Below is how I am currently doing it. My question is, how can I make the code more efficient?
Every once in a while, I feel my database gets hit with the impact and I just want to make it as efficient as possible.
// FOREACH OF MY 300 ROWS
var original = db.MarketDatas.FirstOrDefault(x => x.BBSymbol == targetBBsymbol);
if (original != null)
{
//if (original.BBSymbol.ToUpper() == "NOH7 INDEX")
//{
// var x1 = 1;
//}
original.last_price = marketDataItem.last_price;
original.bid = marketDataItem.bid;
original.ask = marketDataItem.ask;
if (marketDataItem.px_settle_last_dt_rt != null)
{
original.px_settle_last_dt_rt = marketDataItem.px_settle_last_dt_rt;
}
if (marketDataItem.px_settle_actual_rt != 0)
{
original.px_settle_actual_rt = marketDataItem.px_settle_actual_rt;
}
original.chg_on_day = marketDataItem.chg_on_day;
if (marketDataItem.prev_close_value_realtime != 0)
{
original.prev_close_value_realtime = marketDataItem.prev_close_value_realtime;
}
if (marketDataItem.px_settle_last_dt_rt != null)
{
DateTime d2 = (DateTime)marketDataItem.px_settle_last_dt_rt;
if (d1.Day == d2.Day)
{
//market has settled
original.settled = "yes";
}
else
{
//market has NOT settled
original.settled = "no";
}
}
if (marketDataItem.updateTime.Year != 1)
{
original.updateTime = marketDataItem.updateTime;
}
db.SaveChanges();
}
Watching what is being hit in the debugger...
SELECT TOP (1)
[Extent1].[MarketDataID] AS [MarketDataID],
[Extent1].[BBSymbol] AS [BBSymbol],
[Extent1].[Name] AS [Name],
[Extent1].[fut_Val_Pt] AS [fut_Val_Pt],
[Extent1].[crncy] AS [crncy],
[Extent1].[fut_tick_size] AS [fut_tick_size],
[Extent1].[fut_tick_val] AS [fut_tick_val],
[Extent1].[fut_init_spec_ml] AS [fut_init_spec_ml],
[Extent1].[last_price] AS [last_price],
[Extent1].[bid] AS [bid],
[Extent1].[ask] AS [ask],
[Extent1].[px_settle_last_dt_rt] AS [px_settle_last_dt_rt],
[Extent1].[px_settle_actual_rt] AS [px_settle_actual_rt],
[Extent1].[settled] AS [settled],
[Extent1].[chg_on_day] AS [chg_on_day],
[Extent1].[prev_close_value_realtime] AS [prev_close_value_realtime],
[Extent1].[last_tradeable_dt] AS [last_tradeable_dt],
[Extent1].[fut_notice_first] AS [fut_notice_first],
[Extent1].[updateTime] AS [updateTime]
FROM [dbo].[MarketDatas] AS [Extent1]
WHERE ([Extent1].[BBSymbol] = #p__linq__0) OR (([Extent1].[BBSymbol] IS NULL) AND (#p__linq__0 IS NULL))
It seems it updates the same thing multiple times, if I am understanding it correctly.
UPDATE [dbo].[MarketDatas]
SET [last_price] = #0, [chg_on_day] = #1, [updateTime] = #2
WHERE ([MarketDataID] = #3)
UPDATE [dbo].[MarketDatas]
SET [last_price] = #0, [chg_on_day] = #1, [updateTime] = #2
WHERE ([MarketDataID] = #3)

You can reduce this to 2 round trips.
Don't call SaveChanges() in side the loop. Move it outside and call it after you are done processing everything.
Write the select in such a way that it retrieves all the originals in one go and pushes them to a memory collection, then retrieve from that for each item you are updating/inserting.
code
// use this as your source
// to retrieve an item later use TryGetValue
var originals = db.MarketDatas
.Where(x => arrayOftargetBBsymbol.Contains(x.BBSymbol));
.ToDictionary(x => x.BBSymbol, y => y);
// iterate over changes you want to make
foreach(var change in changes){
MarketData original = null;
// is there an existing entity
if(originals.TryGetValue(change.targetBBsymbol, out original)){
// update your original
}
}
// save changes all at once
db.SaveChanges();

You could only execute "db.SaveChanges" after your foreach loop. It think it you would do exactly what your are asking for.

It seems it updates the same thing multiple times, if I am
understanding it correctly.
Entity Framework performs a database round-trip for every entity to update.
Just check the parameter value, they will be different.
how can I make the code more efficient
The major problem is your current solution is not scalable.
It works well when you only have a few entities to update but will become worse and worse are the number of items to update in a batch will increase.
It's often better to make this kind of logic all in the database, but perhaps you cannot do it.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library can make your code more efficient by allowing you to save multiples entities at once. All bulk operations are supported:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
BulkSynchronize
Example:
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);
// Customize Primary Key
context.BulkMerge(customers, operation => {
operation.ColumnPrimaryKeyExpression =
customer => customer.Code;
});

Sqlite3 - node js insert multiple rows in two tables, lastID not working

I know there are many solutions provided regarding multiple insertion in sqlite3 but I am looking for efficient method in which data is getting inserted into two tables and data of second table is dependent on first table. This is a node js application.
I have two tables programs and tests in sqlite3. Table tests contains id of programs i.e. one program can contains multiple tests.
The suggested method on official page of sqlite3 module is as follows:
var sqlite3 = require('sqlite3').verbose();
var db = new sqlite3.Database(':memory:');
db.serialize(function() {
db.run("CREATE TABLE lorem (info TEXT)");
var stmt = db.prepare("INSERT INTO lorem VALUES (?)");
for (var i = 0; i < 10; i++) {
stmt.run("Ipsum " + i);
}
stmt.finalize();
db.each("SELECT rowid AS id, info FROM lorem", function(err, row) {
console.log(row.id + ": " + row.info);
});
});
db.close();
As In my requirement I have to insert data in tow tables so I am using the following code:
var programData = resp.program_info; // contains complete data of programs and tests
db1.run("INSERT INTO programs (`parent_prog_id`, `prog_name`, `prog_exercises`, `prog_orgid`, `prog_createdby`, `prog_created`, `prog_modified`, `prog_status`) VALUES (?,?,?,?,?,?,?,?)",prog.parent_program_id,prog.programName, JSON.stringify(prog), req.session.org_id, prog.created_by, prog.created_at, prog.updated_at, prog.program_status,function(err){
if(err){
throw err;
}else{
var count = 1;
var step2PostedData = prog;
for (i in step2PostedData.testsInfo) {
var c = 0;
for(j in step2PostedData.testsInfo[i]){
var obj = Object.keys(step2PostedData.testsInfo[i])[c];
db1.prepare("INSERT INTO `tests` ( `parent_name`,`test_name`, `test_alias`, `sequences`, `duration`, `prog_id`, `org_id`, `test_createdby`, `test_created`, `test_modified`, `test_status`) VALUES(?,?,?,?,?,?,?,?,?,?,?)")
.run(count, obj,
step2PostedData.testsInfo[i][j].alias,
step2PostedData.testsInfo[i][j].sequences,
step2PostedData.testsInfo[i][j].duration,
this.lastID, // using this I am getting program id
req.session.org_id,
prog.created_by,
prog.created_at,
prog.updated_at,
prog.program_status);
c++;
count++;
}
}
Now, my query is If I am using the suggested method then I am not getting last inserted program id from programs table. without callback.
e.g. If I use the following code;
var stmt = db1.prepare("INSERT INTO programs (`parent_prog_id`, `prog_name`, `prog_exercises`, `prog_orgid`, `prog_createdby`, `prog_created`, `prog_modified`, `prog_status`) VALUES (?,?,?,?,?,?,?,?)";
var stmt2 = db1.prepare("INSERT INTO `tests` ( `parent_name`,`test_name`, `test_alias`, `sequences`, `duration`, `prog_id`, `org_id`, `test_createdby`, `test_created`, `test_modified`, `test_status`) VALUES(?,?,?,?,?,?,?,?,?,?,?)")
programData.forEach(function(prog){
// Inserts data in programs table
stmt.run(prog.parent_program_id,prog.programName, JSON.stringify(prog), req.session.org_id, prog.created_by, prog.created_at, prog.updated_at, prog.program_status);
for (i in step2PostedData.testsInfo) {
var c = 0;
for(j in step2PostedData.testsInfo[i]){
var obj = Object.keys(step2PostedData.testsInfo[i])[c];
stmt2.run(
count,
obj,
step2PostedData.testsInfo[i][j].alias,
step2PostedData.testsInfo[i][j].sequences,
step2PostedData.testsInfo[i][j].duration,
'what should be there',// How to get last program inserted ID there
req.session.org_id,
prog.created_by,
prog.created_at,
prog.updated_at,
prog.program_status);
} // inner for loop ends
} // outer for loop ends
stmt.finalize();
stmt2.finalize();
});
If I use this.lastID that returns null, obviously as no callback is
now.
If I use sqlite3_last_insert_rowid() then
sqlite3_last_insert_rowid is not defined error.
If I use last_insert_rowid() then last_insert_rowid() is not
defined.
Query:
How can I insert last inserted program id there, Currently I am getting last program id as null?
Edit:
If and only if using callback is the way or method to get the last ID then I will keep my code running as it is currently. Can anyone please suggest how can I increase the speed of insertion.
Thank you!

sqlite3_last_insert_rowid() is a part of the C API.
last_insert_rowid() is an SQL function.
If the documentation tells you that lastID is valid inside the callback, then you must use the callback.
Just move all the child INSERTs into the completion callback of the parent INSERT.
(The suggested code is not intended to show how to get the last inserted ID; it just demonstrates that the values actually have been inserted.)

Salesforce Apex: Error ORA-01460

I've developed an apex API on salesforce which performs a SOQL on a list of CSV data. It has been working smoothly until yesterday, after making a few changes to code that follow the SOQL query, I started getting a strange 500 error:
[{"errorCode":"APEX_ERROR","message":"System.UnexpectedException:
common.exception.SfdcSqlException: ORA-01460: unimplemented or
unreasonable conversion requested\n\n\nselect /SampledPrequery/
sum(term0) \"cnt0\",\nsum(term1) \"cnt1\",\ncount(*)
\"totalcount\",\nsum(term0 * term1) \"combined\"\nfrom (select /*+
ordered use_nl(t_c1) /\n(case when (t_c1.deleted = '0') then 1 else 0
end) term0,\n(case when (upper(t_c1.val18) = ?) then 1 else 0 end)
term1\nfrom (select /+ index(sampleTab AKENTITY_SAMPLE)
*/\nentity_id\nfrom core.entity_sample sampleTab\nwhere organization_id = '00Dq0000000AMfz'\nand key_prefix = ?\nand rownum <=
?) sampleTab,\ncore.custom_entity_data t_c1\nwhere
t_c1.organization_id = '00Dq0000000AMfz'\nand t_c1.key_prefix = ?\nand
sampleTab.entity_id =
t_c1.custom_entity_data_id)\n\nClass.labFlows.queryContacts: line 13,
column 1\nClass.labFlows.fhaQuery: line 6, column
1\nClass.zAPI.doPost: line 10, column 1"}]
the zAPI.doPost() is simply our router class which takes in the post payload as well as the requested operation. It then calls whatever function the operation requests. In this case, the call is to labFlows.queryContacts():
Public static Map<string,List<string>> queryContacts(string[] stringArray){
//First get the id to get to the associative entity, Contact_Deals__c id
List<Contact_Deals__c> dealQuery = [SELECT id, Deal__r.id, Deal__r.FHA_Number__c, Deal__r.Name, Deal__r.Owner.Name
FROM Contact_Deals__c
Where Deal__r.FHA_Number__c in :stringArray];
//Using the id in the associative entity, grab the contact information
List<Contact_Deals__c> contactQuery = [Select Contact__r.Name, Contact__r.Id, Contact__r.Owner.Name, Contact__r.Owner.Id, Contact__r.Rule_Class__c, Contact__r.Primary_Borrower_Y_N__c
FROM contact_deals__c
WHERE Id in :dealQuery];
//Grab all deal id's
Map<string,List<string>> result = new Map<string,List<string>>();
for(Contact_Deals__c i:dealQuery){
List<string> temp = new list<string>();
temp.add(i.Deal__r.Id);
temp.add(i.Deal__r.Owner.Name);
temp.add(i.Deal__r.FHA_Number__c);
temp.add(i.Deal__r.Name);
for(Contact_Deals__c j:contactQuery){
if(j.id == i.id){
//This doesn't really help if there are multiple primary borrowers on a deal - but that should be a SF worflow rule IMO
if(j.Contact__r.Primary_Borrower_Y_N__c == 'Yes'){
temp.add(j.Contact__r.Owner.Id);
temp.add(j.Contact__r.Id);
temp.add(j.Contact__r.Name);
temp.add(j.Contact__r.Owner.Name);
temp.add(j.Contact__r.Rule_Class__c);
break;
}
}
}
result.put(i.Deal__r.id, temp);
}
return result;
}
The only thing I've changed is moving the temp list to add elements before the inner-loop (previously temp would only capture things from the inner-loop). The error above is referencing line 13, which is specifically the first SOQL call:
List<Contact_Deals__c> dealQuery = [SELECT id, Deal__r.id, Deal__r.FHA_Number__c, Deal__r.Name, Deal__r.Owner.Name
FROM Contact_Deals__c
Where Deal__r.FHA_Number__c in :stringArray];
I've tested this function in the apex anonymous window and it worked perfectly:
string a = '00035398,00035401';
string result = zAPI.doPost(a, 'fhaQuery');
system.debug(result);
Results:
13:36:54:947 USER_DEBUG
[5]|DEBUG|{"a09d000000HRvBAD":["a09d000000HRvBAD","Contacta","11111111","Plaza
Center
Apts"],"a09d000000HsVAD":["a09d000000HsVAD","Contactb","22222222","The
Garden"]}
So this is working. The next part is maybe looking at my python script that is calling the API,
def origQuery(file_name, operation):
csv_text = ""
with open(file_name) as csvfile:
reader = csv.reader(csvfile, dialect='excel')
for row in reader:
csv_text += row[0]+','
csv_text = csv_text[:-1]
data = json.dumps({
'data' : csv_text,
'operation' : operation
})
results = requests.post(url, headers=headers, data=data)
print results.text
origQuery('myfile.csv', 'fhaQuery')
I've tried looking up this ORA-01460 apex error, but I can't find anything that will help me fix this issue.
Can any one shed ore light on what this error is telling me?
Thank you all so much!

It turns out the error was in the PY script. For some reason the following code isn't functioning as it is supposed to:
with open(file_name) as csvfile:
reader = csv.reader(csvfile, dialect='excel')
for row in reader:
csv_text += row[0]+','
csv_text = csv_text[:-1]
This was returning one very long string that had zero delimiters. The final line in the code was cutting off the delimiter. What I needed instead was:
with open(file_name) as csvfile:
reader = csv.reader(csvfile, dialect='excel')
for row in reader:
csv_text += row[0]+','
csv_text = csv_text[:-1]
Which would cut off the final ','
The error was occurring because the single long string was above 4,000 characters.

Make selecting and writing into file faster in Groovy

I have this groovy code which is pretty simple:
sql.eachRow("""SELECT
LOOP_ID,
FLD_1,
... 20 more fields
FLD_20
FROM MY_TABLE ORDER BY LOOP_ID"""){ res->
if(oldLoopId != res.loop_id){
oldLoopId = res.loop_id
fileToWrite = new File("MYNAME_${type}_${res.loop_id}_${today.format('YYYYmmDDhhMM')}.txt")
fileToWrite.append("20 fields header\n")
}
fileToWrite.append("${res.FLD_1}|${res.FLD_2}| ... |${res.FLD_20}\n");
}
}
it selects things from a table and writes to the database. For each new loop_id it creates a new file. The problem is it takes about 15 minutes to write 50mb file.
How do I make it faster?

Try writing to a BufferedWriter instead of using append directly:
sql.eachRow("""SELECT
LOOP_ID,
FLD_1,
... 20 more fields
FLD_20
FROM MY_TABLE ORDER BY LOOP_ID""") { res ->
def writer
if (oldLoopId != res.loop_id) {
oldLoopId = res.loop_id
def fileToWrite = new File("MYNAME_${type}_${res.loop_id}_${today.format('YYYYmmDDhhMM')}.txt")
if (writer != null) { writer.close() }
writer = fileToWrite.newWriter()
writer.append("20 fields header\n")
}
writer.append("${res.FLD_1}|${res.FLD_2}| ... |${res.FLD_20}\n");
File::withWriter automagically close the resources, but to use it you'd need to do way more trips do DB, getting all the loop_id and fetching the data for each one.
The following script:
f=new File("b.txt")
f.write ""
(10 * 1024 * 1024).times { f.append "b" }
Execution:
$ time groovy Appends.groovy
real 1m9.217s
user 0m45.375s
sys 0m31.902s
And using a BufferedWriter:
w = new File("/tmp/a.txt").newWriter()
(10 * 1024 * 1024).times { w.write "a" }
Execution:
$ time groovy Writes.groovy
real 0m1.774s
user 0m1.688s
sys 0m0.872s

Grails filter stats insert performance

I have a filter in grails to capture all controller requests and insert a row into the database with the controllerName, actionName, userId, date, and guid. This works fine, but I would like to find a way to increase the performance. Right now it takes ~100 milliseconds to do all this with 70-80ms of that time creating a statement. I've used both a domain object insert, groovy Sql, and raw java connection/statement. Is there any faster way to improve the performance of inserting a single record within a filter? Alternatively, is there a different pattern that can be used for the inserts? Code (using groovy SQL) Below:
class StatsFilters {
def grailsApplication
def dataSource
def filters =
{
logStats(controller:'*', action:'*')
{
before = {
if(controllerName == null || actionName == null)
{
return true
}
def logValue = grailsApplication.config.statsLogging
if(logValue.equalsIgnoreCase("on") && session?.user?.uid != null & session?.user?.uid != "")
{
try{
def start = System.currentTimeMillis()
Sql sql = new Sql(dataSource)
def userId = session.user.uid
final String uuid = "I" + UUID.randomUUID().toString().replaceAll("-","");
String insert = "insert into STATS(ID, CONTROLLER, ACTION, MODIFIED_DATE, USER_ID) values ('${uuid}','${controllerName}','${actionName}',SYSDATE,'${userId}')"
sql.execute(insert)
sql.close()
def end = System.currentTimeMillis()
def total = end - start
println("total " + total)
}
catch(e)
{
log.error("Stats failed to save with exception " + e.getStackTrace())
return true
}
}
return true
}
}
}
}
And my current data source
dataSource {
pooled = true
dialect="org.hibernate.dialect.OracleDialect"
properties {
maxActive = 50
maxIdle = 10
initialSize = 10
minEvictableIdleTimeMillis = 1800000
timeBetweenEvictionRunsMillis = 1800000
maxWait = 10000
validationQuery = "select * from resource_check"
testWhileIdle = true
numTestsPerEvictionRun = 3
testOnBorrow = true
testOnReturn = true
}
//loggingSql = true
}
----------------------Solution-------------------------
The solution was to simply spawn a thread and do the stats save. This way user response time isn't impacted, but the save is done in near real time. The number of users in this application (corporate internal, limited user group) doesn't merit anything more robust.
void saveStatData(def controllerName, def actionName, def userId)
{
Thread.start{
Sql sql = new Sql(dataSource)
final String uuid = "I" + UUID.randomUUID().toString().replaceAll("-","");
String insert = "insert into STATS(ID, CONTROLLER, ACTION, MODIFIED_DATE, USER_ID) values ('${uuid}','${controllerName}','${actionName}',SYSDATE,'${userId}')"
sql.execute(insert)
sql.close()
}
}

Better pattern is not to insert the row in the filter instead just add a record to some list and flush the list into database regularly by asynchronous job (using Quartz plugin for example).
You could loose some data if application crashes, but if you schedule the job to run often (like every x minutes), it should not be an issue.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Groovy parallel sql queries - sql

Related

Entity Framework, update multiple fields more efficiently

Sqlite3 - node js insert multiple rows in two tables, lastID not working

Salesforce Apex: Error ORA-01460

Make selecting and writing into file faster in Groovy

Grails filter stats insert performance

Categories

Resources