I have a filter in grails to capture all controller requests and insert a row into the database with the controllerName, actionName, userId, date, and guid. This works fine, but I would like to find a way to increase the performance. Right now it takes ~100 milliseconds to do all this with 70-80ms of that time creating a statement. I've used both a domain object insert, groovy Sql, and raw java connection/statement. Is there any faster way to improve the performance of inserting a single record within a filter? Alternatively, is there a different pattern that can be used for the inserts? Code (using groovy SQL) Below:
class StatsFilters {
def grailsApplication
def dataSource
def filters =
{
logStats(controller:'*', action:'*')
{
before = {
if(controllerName == null || actionName == null)
{
return true
}
def logValue = grailsApplication.config.statsLogging
if(logValue.equalsIgnoreCase("on") && session?.user?.uid != null & session?.user?.uid != "")
{
try{
def start = System.currentTimeMillis()
Sql sql = new Sql(dataSource)
def userId = session.user.uid
final String uuid = "I" + UUID.randomUUID().toString().replaceAll("-","");
String insert = "insert into STATS(ID, CONTROLLER, ACTION, MODIFIED_DATE, USER_ID) values ('${uuid}','${controllerName}','${actionName}',SYSDATE,'${userId}')"
sql.execute(insert)
sql.close()
def end = System.currentTimeMillis()
def total = end - start
println("total " + total)
}
catch(e)
{
log.error("Stats failed to save with exception " + e.getStackTrace())
return true
}
}
return true
}
}
}
}
And my current data source
dataSource {
pooled = true
dialect="org.hibernate.dialect.OracleDialect"
properties {
maxActive = 50
maxIdle = 10
initialSize = 10
minEvictableIdleTimeMillis = 1800000
timeBetweenEvictionRunsMillis = 1800000
maxWait = 10000
validationQuery = "select * from resource_check"
testWhileIdle = true
numTestsPerEvictionRun = 3
testOnBorrow = true
testOnReturn = true
}
//loggingSql = true
}
----------------------Solution-------------------------
The solution was to simply spawn a thread and do the stats save. This way user response time isn't impacted, but the save is done in near real time. The number of users in this application (corporate internal, limited user group) doesn't merit anything more robust.
void saveStatData(def controllerName, def actionName, def userId)
{
Thread.start{
Sql sql = new Sql(dataSource)
final String uuid = "I" + UUID.randomUUID().toString().replaceAll("-","");
String insert = "insert into STATS(ID, CONTROLLER, ACTION, MODIFIED_DATE, USER_ID) values ('${uuid}','${controllerName}','${actionName}',SYSDATE,'${userId}')"
sql.execute(insert)
sql.close()
}
}
Better pattern is not to insert the row in the filter instead just add a record to some list and flush the list into database regularly by asynchronous job (using Quartz plugin for example).
You could loose some data if application crashes, but if you schedule the job to run often (like every x minutes), it should not be an issue.
Related
Delete multiple entries from DB using Groovy in SoapUI
I am able to execute one SQL statement, but when I do a few it just hangs.
How can I delete multiple rows?
def sql = Sql.newInstance('jdbc:oracle:thin:#jack:1521:test1', 'test', 'test', 'oracle.jdbc.driver.OracleDriver')
log.info("SQL connetced")
sql.connection.autoCommit = false
try {
log.info("inside try")
log.info("before")
String Que =
"""delete from table name where user in (select user from user where ID= '123' and type= 262);
delete from table name where user in (select user from user where ID= '1012' and type= 28)
delete from table name where user in (select user from user where ID= '423' and type= 27)
"""
log.info (Que)
def output = sql.execute(Que);
log.info(sql)
log.info(output)
log.info("after")
sql.commit()
println("Successfully committed")
}catch(Exception ex) {
sql.rollback()
log.info("Transaction rollback"+ex)
}
sql.close()
Here is what you are looking for.
I feel it is more effective way if you want bulk number of records using the following way.
Create a map for the data i.e., id, type as key value pair that needs to be removed in your case.
Used closure to execute the query by iterating thru it.
Added comments appropriately.
//Closure to execute the query with parameters
def runQuery = { entry ->
def output = sql.execute("delete from table name where user in (select user from user where ID=:id and type=:type)", [id:entry.key, type:entry.value] )
log.info(output)
}
//Added below two statements
//Create the data that you want to remove in the form of map id, and type
def deleteData = ['123':26, '1012':28, '423':27]
def sql = Sql.newInstance('jdbc:oracle:thin:#jack:1521:test1', 'test', 'test', 'oracle.jdbc.driver.OracleDriver')
log.info("SQL connetced")
sql.connection.autoCommit = false
try {
log.info(sql)
log.info("inside try")
log.info("before")
//Added below two statements
//Call the above closure and pass key value pair in each iteration
deleteData.each { runQuery(it) }
log.info("after")
sql.commit()
println("Successfully committed")
}catch(Exception ex) {
sql.rollback()
log.info("Transaction rollback"+ex)
}
sql.close()
If you are just looking after execution of multiple queries only approach, then you may look at here and not sure if your database supports the same.
I'm trying to run parallel sql queries using GPars. But somehow it isn't working as I expected. Since I'm relatively new to groovy/java concurrency I'm not sure how to solve my issue.
I following code:
def rows = this.sql.rows(
"SELECT a_id, b_id FROM data_ids LIMIT 10 OFFSET 10"
)
With this code I get a List of ids. Now I want the load the data for each loaded id and this should happen parallel to improve my performance, because I have a large database.
To get the detail data I use the following code:
GParsPool.withPool() {
result = rows.collectParallel {
// 2. Get the data for each source and save it in an array.
def tmpData = [:]
def row = it
sql.withTransaction {
if (row.a_id != null) {
tmpData.a = sql.firstRow("SELECT * FROM data_a WHERE id = '" + row.a_id + "'")
}
if (row.b_id != null) {
tmpData.b = sql.firstRow("SELECT * FROM data_b WHERE id = '" + row.b_id + "'")
}
}
return tmpData
}
// 3. Return the loaded data.
return result
Now I run the code and everything works fine except that the code isn't executed parallel. Using the JProfiler I can see that I have blocked threads and waiting threads, but 0 runnable threads.
Thanks for any help. I you need more information, I will provide them :)
Daniel
I wanted to take advantage of the new BigQuery functionality of time partitioned tables, but am unsure this is currently possible in the 1.6 version of the Dataflow SDK.
Looking at the BigQuery JSON API, to create a day partitioned table one needs to pass in a
"timePartitioning": { "type": "DAY" }
option, but the com.google.cloud.dataflow.sdk.io.BigQueryIO interface only allows specifying a TableReference.
I thought that maybe I could pre-create the table, and sneak in a partition decorator via a BigQueryIO.Write.toTableReference lambda..? Is anyone else having success with creating/writing partitioned tables via Dataflow?
This seems like a similar issue to setting the table expiration time which isn't currently available either.
As Pavan says, it is definitely possible to write to partition tables with Dataflow. Are you using the DataflowPipelineRunner operating in streaming mode or batch mode?
The solution you proposed should work. Specifically, if you pre-create a table with date partitioning set up, then you can use a BigQueryIO.Write.toTableReference lambda to write to a date partition. For example:
/**
* A Joda-time formatter that prints a date in format like {#code "20160101"}.
* Threadsafe.
*/
private static final DateTimeFormatter FORMATTER =
DateTimeFormat.forPattern("yyyyMMdd").withZone(DateTimeZone.UTC);
// This code generates a valid BigQuery partition name:
Instant instant = Instant.now(); // any Joda instant in a reasonable time range
String baseTableName = "project:dataset.table"; // a valid BigQuery table name
String partitionName =
String.format("%s$%s", baseTableName, FORMATTER.print(instant));
The approach I took (works in the streaming mode, too):
Define a custom window for the incoming record
Convert the window into the table/partition name
p.apply(PubsubIO.Read
.subscription(subscription)
.withCoder(TableRowJsonCoder.of())
)
.apply(Window.into(new TablePartitionWindowFn()) )
.apply(BigQueryIO.Write
.to(new DayPartitionFunc(dataset, table))
.withSchema(schema)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
);
Setting the window based on the incoming data, the End Instant can be ignored, as the start value is used for setting the partition:
public class TablePartitionWindowFn extends NonMergingWindowFn<Object, IntervalWindow> {
private IntervalWindow assignWindow(AssignContext context) {
TableRow source = (TableRow) context.element();
String dttm_str = (String) source.get("DTTM");
DateTimeFormatter formatter = DateTimeFormat.forPattern("yyyy-MM-dd").withZoneUTC();
Instant start_point = Instant.parse(dttm_str,formatter);
Instant end_point = start_point.withDurationAdded(1000, 1);
return new IntervalWindow(start_point, end_point);
};
#Override
public Coder<IntervalWindow> windowCoder() {
return IntervalWindow.getCoder();
}
#Override
public Collection<IntervalWindow> assignWindows(AssignContext c) throws Exception {
return Arrays.asList(assignWindow(c));
}
#Override
public boolean isCompatible(WindowFn<?, ?> other) {
return false;
}
#Override
public IntervalWindow getSideInputWindow(BoundedWindow window) {
if (window instanceof GlobalWindow) {
throw new IllegalArgumentException(
"Attempted to get side input window for GlobalWindow from non-global WindowFn");
}
return null;
}
Setting the table partition dynamically:
public class DayPartitionFunc implements SerializableFunction<BoundedWindow, String> {
String destination = "";
public DayPartitionFunc(String dataset, String table) {
this.destination = dataset + "." + table+ "$";
}
#Override
public String apply(BoundedWindow boundedWindow) {
// The cast below is safe because CalendarWindows.days(1) produces IntervalWindows.
String dayString = DateTimeFormat.forPattern("yyyyMMdd")
.withZone(DateTimeZone.UTC)
.print(((IntervalWindow) boundedWindow).start());
return destination + dayString;
}}
Is there a better way of achieving the same outcome?
I believe it should be possible to use the partition decorator when you are not using streaming. We are actively working on supporting partition decorators through streaming. Please let us know if you are seeing any errors today with non-streaming mode.
Apache Beam version 2.0 supports sharding BigQuery output tables out of the box.
I have written data into bigquery partitioned tables through dataflow. These writings are dynamic as-in if the data in that partition already exists then I can either append to it or overwrite it.
I have written the code in Python. It is a batch mode write operation into bigquery.
client = bigquery.Client(project=projectName)
dataset_ref = client.dataset(datasetName)
table_ref = dataset_ref.table(bqTableName)
job_config = bigquery.LoadJobConfig()
job_config.skip_leading_rows = skipLeadingRows
job_config.source_format = bigquery.SourceFormat.CSV
if tableExists(client, table_ref):
job_config.autodetect = autoDetect
previous_rows = client.get_table(table_ref).num_rows
#assert previous_rows > 0
if allowJaggedRows is True:
job_config.allowJaggedRows = True
if allowFieldAddition is True:
job_config._properties['load']['schemaUpdateOptions'] = ['ALLOW_FIELD_ADDITION']
if isPartitioned is True:
job_config._properties['load']['timePartitioning'] = {"type": "DAY"}
if schemaList is not None:
job_config.schema = schemaList
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
else:
job_config.autodetect = autoDetect
job_config._properties['createDisposition'] = 'CREATE_IF_NEEDED'
job_config.schema = schemaList
if isPartitioned is True:
job_config._properties['load']['timePartitioning'] = {"type": "DAY"}
if schemaList is not None:
table = bigquery.Table(table_ref, schema=schemaList)
load_job = client.load_table_from_uri(gcsFileName, table_ref, job_config=job_config)
assert load_job.job_type == 'load'
load_job.result()
assert load_job.state == 'DONE'
It works fine.
If you pass the table name in table_name_YYYYMMDD format, then BigQuery will treat it as a sharded table, which can simulate partition table features.
Refer the documentation: https://cloud.google.com/bigquery/docs/partitioned-tables
I have this groovy code which is pretty simple:
sql.eachRow("""SELECT
LOOP_ID,
FLD_1,
... 20 more fields
FLD_20
FROM MY_TABLE ORDER BY LOOP_ID"""){ res->
if(oldLoopId != res.loop_id){
oldLoopId = res.loop_id
fileToWrite = new File("MYNAME_${type}_${res.loop_id}_${today.format('YYYYmmDDhhMM')}.txt")
fileToWrite.append("20 fields header\n")
}
fileToWrite.append("${res.FLD_1}|${res.FLD_2}| ... |${res.FLD_20}\n");
}
}
it selects things from a table and writes to the database. For each new loop_id it creates a new file. The problem is it takes about 15 minutes to write 50mb file.
How do I make it faster?
Try writing to a BufferedWriter instead of using append directly:
sql.eachRow("""SELECT
LOOP_ID,
FLD_1,
... 20 more fields
FLD_20
FROM MY_TABLE ORDER BY LOOP_ID""") { res ->
def writer
if (oldLoopId != res.loop_id) {
oldLoopId = res.loop_id
def fileToWrite = new File("MYNAME_${type}_${res.loop_id}_${today.format('YYYYmmDDhhMM')}.txt")
if (writer != null) { writer.close() }
writer = fileToWrite.newWriter()
writer.append("20 fields header\n")
}
writer.append("${res.FLD_1}|${res.FLD_2}| ... |${res.FLD_20}\n");
File::withWriter automagically close the resources, but to use it you'd need to do way more trips do DB, getting all the loop_id and fetching the data for each one.
The following script:
f=new File("b.txt")
f.write ""
(10 * 1024 * 1024).times { f.append "b" }
Execution:
$ time groovy Appends.groovy
real 1m9.217s
user 0m45.375s
sys 0m31.902s
And using a BufferedWriter:
w = new File("/tmp/a.txt").newWriter()
(10 * 1024 * 1024).times { w.write "a" }
Execution:
$ time groovy Writes.groovy
real 0m1.774s
user 0m1.688s
sys 0m0.872s
I have a reporting application written in grails. it fires SQL at a production back office database, and simply lists the resultant rows back to the user.
The bit which executes the sql is this:
class ReportService {
static transactional = false
def dataSource_target
def runReport(sql) {
def rows = null
def start
def con
start = System.currentTimeMillis()
try {
con = new Sql(dataSource_target)
rows = con.rows(sql)
} finally {
con.close()
}
def time = System.currentTimeMillis() - start
def response = [rows: rows, time:time,]
response
}
When this runs, it takes say 60 seconds (60000 ms). If I run the exact same SQL using mysql work bench or similar, it comes back in 30 seconds. Thats a big difference!
Typicically on 30 or so rows are returned, so there is not much network overhead.
Any ideas why grails should be to run the query and red the results?
The time variable in your code measures the total of
time to create the connection
time to execute the query
time to destroy the connection
To more accurately measure just (2), change your code to
def runReport(sql) {
def rows = null
def time
def con
try {
con = new Sql(dataSource_target)
def start = System.currentTimeMillis()
rows = con.rows(sql)
time = System.currentTimeMillis() - start
} finally {
con.close()
}
def response = [rows: rows, time:time,]
response
}