Make selecting and writing into file faster in Groovy

Make selecting and writing into file faster in Groovy - optimization

I have this groovy code which is pretty simple:
sql.eachRow("""SELECT
LOOP_ID,
FLD_1,
... 20 more fields
FLD_20
FROM MY_TABLE ORDER BY LOOP_ID"""){ res->
if(oldLoopId != res.loop_id){
oldLoopId = res.loop_id
fileToWrite = new File("MYNAME_${type}_${res.loop_id}_${today.format('YYYYmmDDhhMM')}.txt")
fileToWrite.append("20 fields header\n")
}
fileToWrite.append("${res.FLD_1}|${res.FLD_2}| ... |${res.FLD_20}\n");
}
}
it selects things from a table and writes to the database. For each new loop_id it creates a new file. The problem is it takes about 15 minutes to write 50mb file.
How do I make it faster?

Try writing to a BufferedWriter instead of using append directly:
sql.eachRow("""SELECT
LOOP_ID,
FLD_1,
... 20 more fields
FLD_20
FROM MY_TABLE ORDER BY LOOP_ID""") { res ->
def writer
if (oldLoopId != res.loop_id) {
oldLoopId = res.loop_id
def fileToWrite = new File("MYNAME_${type}_${res.loop_id}_${today.format('YYYYmmDDhhMM')}.txt")
if (writer != null) { writer.close() }
writer = fileToWrite.newWriter()
writer.append("20 fields header\n")
}
writer.append("${res.FLD_1}|${res.FLD_2}| ... |${res.FLD_20}\n");
File::withWriter automagically close the resources, but to use it you'd need to do way more trips do DB, getting all the loop_id and fetching the data for each one.
The following script:
f=new File("b.txt")
f.write ""
(10 * 1024 * 1024).times { f.append "b" }
Execution:
$ time groovy Appends.groovy
real 1m9.217s
user 0m45.375s
sys 0m31.902s
And using a BufferedWriter:
w = new File("/tmp/a.txt").newWriter()
(10 * 1024 * 1024).times { w.write "a" }
Execution:
$ time groovy Writes.groovy
real 0m1.774s
user 0m1.688s
sys 0m0.872s

Related

Multiple SQL statements using Groovy

Delete multiple entries from DB using Groovy in SoapUI
I am able to execute one SQL statement, but when I do a few it just hangs.
How can I delete multiple rows?
def sql = Sql.newInstance('jdbc:oracle:thin:#jack:1521:test1', 'test', 'test', 'oracle.jdbc.driver.OracleDriver')
log.info("SQL connetced")
sql.connection.autoCommit = false
try {
log.info("inside try")
log.info("before")
String Que =
"""delete from table name where user in (select user from user where ID= '123' and type= 262);
delete from table name where user in (select user from user where ID= '1012' and type= 28)
delete from table name where user in (select user from user where ID= '423' and type= 27)
"""
log.info (Que)
def output = sql.execute(Que);
log.info(sql)
log.info(output)
log.info("after")
sql.commit()
println("Successfully committed")
}catch(Exception ex) {
sql.rollback()
log.info("Transaction rollback"+ex)
}
sql.close()

Here is what you are looking for.
I feel it is more effective way if you want bulk number of records using the following way.
Create a map for the data i.e., id, type as key value pair that needs to be removed in your case.
Used closure to execute the query by iterating thru it.
Added comments appropriately.
//Closure to execute the query with parameters
def runQuery = { entry ->
def output = sql.execute("delete from table name where user in (select user from user where ID=:id and type=:type)", [id:entry.key, type:entry.value] )
log.info(output)
}
//Added below two statements
//Create the data that you want to remove in the form of map id, and type
def deleteData = ['123':26, '1012':28, '423':27]
def sql = Sql.newInstance('jdbc:oracle:thin:#jack:1521:test1', 'test', 'test', 'oracle.jdbc.driver.OracleDriver')
log.info("SQL connetced")
sql.connection.autoCommit = false
try {
log.info(sql)
log.info("inside try")
log.info("before")
//Added below two statements
//Call the above closure and pass key value pair in each iteration
deleteData.each { runQuery(it) }
log.info("after")
sql.commit()
println("Successfully committed")
}catch(Exception ex) {
sql.rollback()
log.info("Transaction rollback"+ex)
}
sql.close()
If you are just looking after execution of multiple queries only approach, then you may look at here and not sure if your database supports the same.

Groovy parallel sql queries

I'm trying to run parallel sql queries using GPars. But somehow it isn't working as I expected. Since I'm relatively new to groovy/java concurrency I'm not sure how to solve my issue.
I following code:
def rows = this.sql.rows(
"SELECT a_id, b_id FROM data_ids LIMIT 10 OFFSET 10"
)
With this code I get a List of ids. Now I want the load the data for each loaded id and this should happen parallel to improve my performance, because I have a large database.
To get the detail data I use the following code:
GParsPool.withPool() {
result = rows.collectParallel {
// 2. Get the data for each source and save it in an array.
def tmpData = [:]
def row = it
sql.withTransaction {
if (row.a_id != null) {
tmpData.a = sql.firstRow("SELECT * FROM data_a WHERE id = '" + row.a_id + "'")
}
if (row.b_id != null) {
tmpData.b = sql.firstRow("SELECT * FROM data_b WHERE id = '" + row.b_id + "'")
}
}
return tmpData
}
// 3. Return the loaded data.
return result
Now I run the code and everything works fine except that the code isn't executed parallel. Using the JProfiler I can see that I have blocked threads and waiting threads, but 0 runnable threads.
Thanks for any help. I you need more information, I will provide them :)
Daniel

Grails filter stats insert performance

I have a filter in grails to capture all controller requests and insert a row into the database with the controllerName, actionName, userId, date, and guid. This works fine, but I would like to find a way to increase the performance. Right now it takes ~100 milliseconds to do all this with 70-80ms of that time creating a statement. I've used both a domain object insert, groovy Sql, and raw java connection/statement. Is there any faster way to improve the performance of inserting a single record within a filter? Alternatively, is there a different pattern that can be used for the inserts? Code (using groovy SQL) Below:
class StatsFilters {
def grailsApplication
def dataSource
def filters =
{
logStats(controller:'*', action:'*')
{
before = {
if(controllerName == null || actionName == null)
{
return true
}
def logValue = grailsApplication.config.statsLogging
if(logValue.equalsIgnoreCase("on") && session?.user?.uid != null & session?.user?.uid != "")
{
try{
def start = System.currentTimeMillis()
Sql sql = new Sql(dataSource)
def userId = session.user.uid
final String uuid = "I" + UUID.randomUUID().toString().replaceAll("-","");
String insert = "insert into STATS(ID, CONTROLLER, ACTION, MODIFIED_DATE, USER_ID) values ('${uuid}','${controllerName}','${actionName}',SYSDATE,'${userId}')"
sql.execute(insert)
sql.close()
def end = System.currentTimeMillis()
def total = end - start
println("total " + total)
}
catch(e)
{
log.error("Stats failed to save with exception " + e.getStackTrace())
return true
}
}
return true
}
}
}
}
And my current data source
dataSource {
pooled = true
dialect="org.hibernate.dialect.OracleDialect"
properties {
maxActive = 50
maxIdle = 10
initialSize = 10
minEvictableIdleTimeMillis = 1800000
timeBetweenEvictionRunsMillis = 1800000
maxWait = 10000
validationQuery = "select * from resource_check"
testWhileIdle = true
numTestsPerEvictionRun = 3
testOnBorrow = true
testOnReturn = true
}
//loggingSql = true
}
----------------------Solution-------------------------
The solution was to simply spawn a thread and do the stats save. This way user response time isn't impacted, but the save is done in near real time. The number of users in this application (corporate internal, limited user group) doesn't merit anything more robust.
void saveStatData(def controllerName, def actionName, def userId)
{
Thread.start{
Sql sql = new Sql(dataSource)
final String uuid = "I" + UUID.randomUUID().toString().replaceAll("-","");
String insert = "insert into STATS(ID, CONTROLLER, ACTION, MODIFIED_DATE, USER_ID) values ('${uuid}','${controllerName}','${actionName}',SYSDATE,'${userId}')"
sql.execute(insert)
sql.close()
}
}

Better pattern is not to insert the row in the filter instead just add a record to some list and flush the list into database regularly by asynchronous job (using Quartz plugin for example).
You could loose some data if application crashes, but if you schedule the job to run often (like every x minutes), it should not be an issue.

Is there any way to do multiple inserts/updates in Slick?

In sql we can do something like this:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
Is there any way to do multiple/bulk/batch inserts or updates in Slick?
Can we do something similar, at least using SQL plain queries ?

For inserts, as Andrew answered, you use insertALL.
def insertAll(items:Seq[MyCaseClass])(implicit session:Session) = {
(items.size) match {
case s if s > 0 =>
try {
// basequery is the tablequery object
baseQuery.insertAll(tempItems :_*)
} catch {
case e:Exception => e.printStackTrace()
}
Some(tempItems(0))
case _ => None
}
}
For updates, you're SOL. Check out Scala slick 2.0 updateAll equivalent to insertALL? for what I ended up doing. To paraphrase, here's the code:
private def batchUpdateQuery = "update table set value = ? where id = ?"
/**
* Dropping to jdbc b/c slick doesnt support this batched update
*/
def batchUpate(batch:List[MyCaseClass])(implicit session:Session) = {
val pstmt = session.conn.prepareStatement(batchUpdateQuery)
batch map { myCaseClass =>
pstmt.setString(1, myCaseClass.value)
pstmt.setString(2, myCaseClass.id)
pstmt.addBatch()
}
session.withTransaction {
pstmt.executeBatch()
}
}

In Slick, you are able to use the insertAll method for a Table. An example of insertAll is given in the Getting Started page on Slick's website.
http://slick.typesafe.com/doc/0.11.1/gettingstarted.html

Amazon RDS w/ SQL Server wont allow bulk insert from CSV source

I've tried two methods and both fall flat...
BULK INSERT TEMPUSERIMPORT1357081926
FROM 'C:\uploads\19E0E1.csv'
WITH (FIELDTERMINATOR = ',',ROWTERMINATOR = '\n')
You do not have permission to use the bulk load statement.
but you cannot enable that SQL Role with Amazon RDS?
So I tried... using openrowset but it requires AdHoc Queries to be enabled which I don't have permission to do!

I know this question is really old, but it was the first question that came up when I searched bulk inserting into an aws sql server rds instance. Things have changed and you can now do it after integrating the RDS instance with S3. I answered this question in more detail on this question. But overall gist is that you setup the instance with the proper role, put your file on S3, then you can copy the file over to RDS with the following commands:
exec msdb.dbo.rds_download_from_s3
#s3_arn_of_file='arn:aws:s3:::bucket_name/bulk_data.csv',
#rds_file_path='D:\S3\seed_data\data.csv',
#overwrite_file=1;
Then BULK INSERT will work:
FROM 'D:\S3\seed_data\data.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
AWS doc

You can enable ad hoc distributed queries via heading to your Amazon Management Console, navigating to your RDS menu and then creating a DB Parameter group with ad hoc distributed queries set to 1, and then attaching this parameter group to your DB instance.
Don't forget to reboot your DB once you have made these changes.
Here is the source of my information:
http://blogs.lessthandot.com/index.php/datamgmt/dbadmin/turning-on-optimize-for-ad/
Hope this helps you.

2022
I'm adding for anyone like me who wants to quickly insert data into RDS from C#
While RDS allows csv bulk uploads directly from S3 instances, there are times when you just want to directly upload data straight from your program.
I've written a C# utility method which does inserts using a StringBuilder to concatenate statements to do 2000 inserts per call, which is way faster than an ORM like dapper which does one insert per call.
This method should handle date, int, double, and varchar fields, but I haven't had to use it for character escaping or anything like that.
//call as
FastInsert.Insert(MyDbConnection, new object[]{{someField = "someValue"}}, "my_table");
class FastInsert
{
static int rowSize = 2000;
internal static void Insert(IDbConnection connection, object[] data, string targetTable)
{
var props = data[0].GetType().GetProperties();
var names = props.Select(x => x.Name).ToList();
foreach(var batch in data.Batch(rowSize))
{
var sb = new StringBuilder($"insert into {targetTable} ({string.Join(",", names)})");
string lastLine = "";
foreach(var row in batch)
{
sb.Append(lastLine);
var values = props.Select(prop => CreateSQLString(row, prop));
lastLine = $"select '{string.Join("','", values)}' union all ";
}
lastLine = lastLine.Substring(0, lastLine.Length - " union all".Length) + " from dual";
sb.Append(lastLine);
var fullQuery = sb.ToString();
connection.Execute(fullQuery);
}
}
private static string CreateSQLString(object row, PropertyInfo prop)
{
var value = prop.GetValue(row);
if (value == null) return "null";
if (prop.PropertyType == typeof(DateTime))
{
return $"'{((DateTime)value).ToString("yyyy-MM-dd HH:mm:ss")}'";
}
//if (prop.PropertyType == typeof(string))
//{
return $"'{value.ToString().Replace("'", "''")}'";
//}
}
}
static class Extensions
{
public static IEnumerable<T[]> Batch<T>(this IEnumerable<T> source, int size) //split an IEnumerable into batches
{
T[] bucket = null;
var count = 0;
foreach (var item in source)
{
if (bucket == null)
bucket = new T[size];
bucket[count++] = item;
if (count != size)
continue;
yield return bucket;
bucket = null;
count = 0;
}
// Return the last bucket with all remaining elements
if (bucket != null && count > 0)
{
Array.Resize(ref bucket, count);
yield return bucket;
}
}
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Make selecting and writing into file faster in Groovy - optimization

Related

Multiple SQL statements using Groovy

Groovy parallel sql queries

Grails filter stats insert performance

Is there any way to do multiple inserts/updates in Slick?

Amazon RDS w/ SQL Server wont allow bulk insert from CSV source

Categories

Resources