ScalaQuery's query/queryNA several times slower than JDBC? - scalaquery

In the following performance tests of many queries, this timed JDBC code takes 500-600ms:
val ids = queryNA[String]("select id from account limit 1000").list
val stmt = session.conn.prepareStatement("select * from account where id = ?")
debug.time() {
for (id <- ids) {
stmt.setString(1, id)
stmt.executeQuery().next()
}
}
However, when using ScalaQuery, the time goes to >2s:
val ids = queryNA[String]("select id from account limit 1000").list
implicit val gr = GetResult(r => ())
val q = query[String,Unit]("select * from account where id = ?")
debug.time() {
for (id <- ids) {
q.first(id)
}
}
After debugging with server logs, this turns out to be due to the fact that the PreparedStatements are being repeatedly prepared and not reused.
This is in fact a performance issue that we've been hitting in our application code, so we're wondering if we're missing something regarding how to reuse prepared statements properly in ScalaQuery, or if dropping down to JDBC is the suggested workaround.

Got an answer from the scalaquery mailing list. This is just how ScalaQuery is designed - it assumes that you're something that provides statement pooling underneath:
Nowadays ScalaQuery always requests a new PreparedStatement from the Connection. There used to be a cache for PreparedStatements in early versions but I removed it because there are already good solutions for this problem. Every decent connection pool should have an option for PreparedStatement pooling. If you're using a Java EE server, it should have an integrated connection pool. For standalone applications, you can use something like http://sourceforge.net/projects/c3p0/

Related

Data is not properly stored to hsqldb when using pooled data source by dbcp

I'm using hsqldb to create cached tables and indexed tables.
The data being stored has pretty high frequency so I need to use a connection pool.
Also because there is a lot of data I do not call checkpoint on every commit, but rather expect the data to be flushed after 50,000 rows are inserted.
So the thing is that I can see the .data file is growing but when I connect with hsqldb client I don't see the tables and the data.
So I had 2 simple tests, one inserted single row and one inserted 60,000 rows to new table. In both cases I couldn't see the result in any hsqldb client.
(Note that I use shutdown=true)
So when I add checkpoint after each commit, it solve the problem.
Also if specify in the connection string to use log, it solves the problem (I don't want the log in production though). Also not using pooled connection solved the problem and last is using pooled data source and explicitly close it before shutdown.
So I guess that some connections in the connection pool are not being closed, preventing from the db to somehow commit the changes and make them available for the client. But then, why couldn't I see the result even with 60,000 rows?
I also would expect the pool to be closed automatically...
What am I doing wrong? What is happening behind the scene?
The code to get the data source looks like this:
Class.forName("org.hsqldb.jdbcDriver");
String url = "jdbc:hsqldb:" + m_dbRoot + dbName + "/db" + ";hsqldb.log_data=false;shutdown=true;hsqldb.nio_data_file=false";
ConnectionFactory connectionFactory = new DriverManagerConnectionFactory(url, user, password);
GenericObjectPool connectionPool = new GenericObjectPool();
KeyedObjectPoolFactory stmtPool = new GenericKeyedObjectPoolFactory(null);
new PoolableConnectionFactory(connectionFactory, connectionPool, stmtPool, null, false, true);
DataSource ds = new PoolingDataSource(connectionPool);
And I'm using this Pooled data source to create table:
Connection c = m_dataSource.getConnection();
Statement st = c.createStatement();
String script = String.format("CREATE CACHED TABLE IF NOT EXISTS %s (id %s NOT NULL, entity %s NOT NULL, PRIMARY KEY (id));", m_tableName, m_idGenerator.getIdType(), TABLE_ENTITY_TYPE);
st.execute(script);
c.close;
st.close();
And insert rows:
Connection c = m_dataSource.getConnection();
c.setAutoCommit(false);
Statement stmt = c.prepareStatement(m_sqlInsert);
stmt.setObject(1, id);
stmt.setBinaryStream(2, Serializer.Helper.serialize(m_serializer, entity));
stmt.executeUpdate();
stmt.close();
stmt = null;
c.commit();
c.close();
stmt.close();
so the above seems to add data but it cannot be seen.
When I explicitly called
connectionPool.close();
Then and only then I could see the result.
I also tried to use JDBCDataSource and it worked as well.
So what is going on? And what is the right way to do this?
Your method of accessing the database from outside your application process is simply wrong.
Only one java process is supposed to connect to the file: database.
In order to achieve your aim, launch an HSQLDB server within your application, using exactly the same JDBC URL. Then connect to this server from the external client.
See the Guide:
http://www.hsqldb.org/doc/2.0/guide/listeners-chapt.html#lsc_app_start
Update: The OP commented that the external client was used after the application had stopped. Because you have turned the log off with hsqldb.log_data=false, nothing is persisted permanently. You need to perform an explicit CHECKPOINT or SHUTDOWN when your application completes its work. You cannot rely on shutdown=true at all, even without connection pooling.
See the Guide:
http://www.hsqldb.org/doc/2.0/guide/deployment-chapt.html#dec_bulk_operations

Weird timeout issues with Dapper.net

I started to use dapper.net a while ago for performance reasons and that i really like the named parameters feature compared to just run "ExecuteQuery" in LINQ To SQL.
It works great for most queries but i get some really weird timeouts from time to time. The strangest thing is that this timeout only happens when the SQL is executed via dapper. If i take the executed query copied from the profiler and just run it in Management Studio its fast and works perfect. And it's not just a temporary issues. The query consistently timeout via dapper and consistently works fine in Management Studio.
exec sp_executesql N'SELECT Item.Name,dbo.PlatformTextAndUrlName(Item.ItemId) As PlatformString,dbo.MetaString(Item.ItemId) As MetaTagString, Item.StartPageRank,Item.ItemRecentViewCount
NAME_SRCH.RANK as NameRank,
DESC_SRCH.RANK As DescRank,
ALIAS_SRCH.RANK as AliasRank,
Item.itemrecentviewcount,
(COALESCE(ALIAS_SRCH.RANK, 0)) + (COALESCE(NAME_SRCH.RANK, 0)) + (COALESCE(DESC_SRCH.RANK, 0) / 20) + Item.itemrecentviewcount / 4 + ((CASE WHEN altrank > 60 THEN 60 ELSE altrank END) * 4) As SuperRank
FROM dbo.Item
INNER JOIN dbo.License on Item.LicenseId = License.LicenseId
LEFT JOIN dbo.Icon on Item.ItemId = Icon.ItemId
LEFT OUTER JOIN FREETEXTTABLE(dbo.Item, name, #SearchString) NAME_SRCH ON
Item.ItemId = NAME_SRCH.[KEY]
LEFT OUTER JOIN FREETEXTTABLE(dbo.Item, namealiases, #SearchString) ALIAS_SRCH ON
Item.ItemId = ALIAS_SRCH.[KEY]
INNER JOIN FREETEXTTABLE(dbo.Item, *, #SearchString) DESC_SRCH ON
Item.ItemId = DESC_SRCH.[KEY]
ORDER BY SuperRank DESC OFFSET #Skip ROWS FETCH NEXT #Count ROWS ONLY',N'#Count int,#SearchString nvarchar(4000),#Skip int',#Count=12,#SearchString=N'box,com',#Skip=0
That is the query that i copy pasted from SQL Profiler. I execute it like this in my code.
using (var connection = new SqlConnection(ConfigurationManager.ConnectionStrings["Conn"].ToString())) {
connection.Open();
var items = connection.Query<MainItemForList>(query, new { SearchString = searchString, PlatformId = platformId, _LicenseFilter = licenseFilter, Skip = skip, Count = count }, buffered: false);
return items.ToList();
}
I have no idea where to start here. I suppose there must be something that is going on with dapper since it works fine when i just execute the code.
As you can see in this screenshot. This is the same query executed via code first and then via Management Studio.
I can also add that this only (i think) happens when i have two or more word or when i have a "stop" char in the search string. So it may have something todo with the full text search but i cant figure out how to debug it since it works perfectly from Management Studio.
And to make matters even worse, it works fine on my localhost with a almost identical database both from code and from Management Studio.
Dapper is nothing more than a utility wrapper over ado.net; it does not change how ado.net operates. It sounds to me that the problem here is "works in ssms, fails in ado.net". This is not unique: it is pretty common to find this occasionally. Likely candidates:
"set" option: these have different defaults in ado.net - and can impact performance especially if you have things like calculated+persisted+indexed columns - if the "set" options aren't compatible it can decide it can't use the stored value, hence not the index - and instead table-scan and recompute. There are other similar scenarios.
system load / transaction isolation-level / blocking; running something in ssms does not reproduce the entire system load at that moment in time
cached query plans: sometimes a duff plan gets cached and used; running from ssms will usually force a new plan - which will naturally be tuned for the parameters you are using in your test. Update all your index stats etc, and consider adding the "optimise for" query hint
In ADO is the default value for CommandTimeout 30 Seconds, in Management Studio infinity. Adjust the command timeout for calling Query<>, see below.
var param = new { SearchString = searchString, PlatformId = platformId, _LicenseFilter = licenseFilter, Skip = skip, Count = count };
var queryTimeoutInSeconds = 120;
using (var connection = new SqlConnection(ConfigurationManager.ConnectionStrings["Conn"].ToString()))
{
connection.Open();
var items = connection.Query<MainItemForList>(query, param, commandTimeout: queryTimeoutInSeconds, buffered: false);
return items.ToList();
}
See also
SqlCommand.CommandTimeout Property on MSDN
For Dapper , default timeout is 30 seconds But we can increase the timeout in this way. Here we are incresing the timeout 240 seconds (4 minutes).
public DataTable GetReport(bool isDepot, string fetchById)
{
int? queryTimeoutInSeconds = 240;
using (IDbConnection _connection = DapperConnection)
{
var parameters = new DynamicParameters();
parameters.Add("#IsDepot", isDepot);
parameters.Add("#FetchById", fetchById);
var res = this.ExecuteSP<dynamic>(SPNames.SSP_GetSEPReport, parameters, queryTimeoutInSeconds);
return ToDataTable(res);
}
}
In the repository layer , we can call our custom ExecuteSP method for the Stored Procedures with additional parameters "queryTimeoutInSeconds".
And below is the "ExecuteSP" method for dapper:-
public virtual IEnumerable<TEntity> ExecuteSP<TEntity>(string spName, object parameters = null, int? parameterForTimeout = null)
{
using (IDbConnection _connection = DapperConnection)
{
_connection.Open();
return _connection.Query<TEntity>(spName, parameters, commandTimeout: parameterForTimeout, commandType: CommandType.StoredProcedure);
}
}
Could be a matter of setting the command timeout in Dapper. Here's an example of how to adjust the command timeout in Dapper:
Setting Command Timeout in Dapper

Allocating integers from sequence with transaction in NHibernate

I have some sequence in database, represented by range. It has three fields: beginId, nextId and endId.
The task is to acquire the nextId from this range, and to ensure that it is unique. The code could be ran in highly parallelized environment, with many threads, on many machines.
What I need to do:
lock(database)
{
var seq = GetSequence()
var acquiredId = seq.NextId;
seq.NextId++
Save(seq)
}
So I use this code:
using (ISession session = GetSessionFactory().OpenSession())
using (ITransaction transaction = session.BeginTransaction(IsolationLevel.RepeatableRead))
{
var sequence = session.CreateCriteria<Sequence>().Single(); // This line is simplified
var allocatedId = sequence.NextId;
sequence.NextId++;
session.SaveOrUpdate(sequence);
transaction.Commit();
return allocatedId;
}
But for some reason when I run this code in multi-threading for testing, I received the same id assigned several times. I'm using the transaction with RepeatableRead lock, but that doesn't help.
P.S. Id doesn't mean Id of the table - it just the name agreement we use.
I've used .Lock().Upgrade when the data is taken from DB, and everything worked. Thanks to hazzik for the link.
Now my code have this:
session.QueryOver<Sequence>().Lock().Upgrade.DoSomeFiltering().Single()

Transaction timeout expired while using Linq2Sql DataContext.SubmitChanges()

please help me resolve this problem:
There is an ambient MSMQ transaction. I'm trying to use new transaction for logging, but get next error while attempt to submit changes - "Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding." Here is code:
public static void SaveTransaction(InfoToLog info)
{
using (TransactionScope scope =
new TransactionScope(TransactionScopeOption.RequiresNew))
{
using (TransactionLogDataContext transactionDC =
new TransactionLogDataContext())
{
transactionDC.MyInfo.InsertOnSubmit(info);
transactionDC.SubmitChanges();
}
scope.Complete();
}
}
Please help me.
Thx.
You could consider increasing the timeout or eliminating it all together.
Something like:
using(TransactionLogDataContext transactionDC = new TransactionLogDataContext())
{
transactionDC.CommandTimeout = 0; // No timeout.
}
Be careful
You said:
thank you. but this solution makes new question - if transaction scope was changed why submit operation becomes so time consuming? Database and application are on the same machine
That is because you are creating new DataContext right there:
TransactionLogDataContext transactionDC = new TransactionLogDataContext())
With new data context ADO.NET opens up new connection (even if connection strings are the same, unless you do some clever connection pooling).
Within transaction context when you try to work with more than 1 connection instances (which you just did)
ADO.NET automatically promotes transaction to a distributed transaction and will try to enlist it into MSDTC. Enlisting very first transaction per connection into MSDTC will take time (for me it takes 30+ seconds), consecutive transactions will be fast, however (in my case 60ms). Take a look at this http://support.microsoft.com/Default.aspx?id=922430
What you can do is reuse transaction and connection string (if possible) when you create new DataContext.
TransactionLogDataContext tempDataContext =
new TransactionLogDataContext(ExistingDataContext.Transaction.Connection);
tempDataContext.Transaction = ExistingDataContext.Transaction;
Where ExistingDataContext is the one which started ambient transaction.
Or attemp to speed up your MS DTC.
Also do use SQL Profiler suggested by billb and look for SessionId between different commands (save and savelog in your case). If SessionId changes, you are in fact using 2 different connections and in that case will have to reuse transaction (if you don't want it to be promoted to MS DTC).

why would a SQLCLR proc run slower than the same code client side

I am writing a stored procedure that when completed will be used to scan staging tables for bogus data on a column by column basis.
Step one in the exercise was just to scan the table --- which is what the code below does. The issue is that this code runs in 5:45 seconds --- however the same code run as a console app (changing the connectionstring of course) runs in about 44 seconds.
using (SqlConnection sqlConnection = new SqlConnection("context connection=true"))
{
sqlConnection.Open();
string sqlText = string.Format("select * from {0}", source_table.Value);
int count = 0;
using (SqlCommand sqlCommand = new SqlCommand(sqlText, sqlConnection))
{
SqlDataReader reader = sqlCommand.ExecuteReader();
while (reader.Read())
count++;
SqlDataRecord record = new SqlDataRecord(new SqlMetaData("rowcount", SqlDbType.Int));
SqlContext.Pipe.SendResultsStart(record);
record.SetInt32(0, count);
SqlContext.Pipe.SendResultsRow(record);
SqlContext.Pipe.SendResultsEnd();
}
}
However the same code (different connection string of course) runs in a console app in about 44 seconds (which is closer to what I was expecting on the client side)
What am I missing on the SP side, that would cause it to run so slow.
Please note: I fully understand that if I wanted a count of rows, I should use the count(*) aggregation --- that's not the purpose of this exercise.
The type of code you are writing is highly susceptible to SQL Injection. Rather than processing the reader like you are, you could just use the RecordsAffected Property to find the number of rows in the reader.
EDIT:
After doing some research, the difference you are seeing is a by design difference between the context connection and a regular connection. Peter Debetta blogged about this and writes:
"The context connection is written such that it only fetches a row at a time, so for each of the 20 million some odd rows, the code was asking for each row individually. Using a non-context connection, however, it requests 8K worth of rows at a time."
http://sqlblog.com/blogs/peter_debetta/archive/2006/07/21/context-connection-is-slow.aspx
Well it would seem the answer is in the connection string after all.
context connection=true
versus
server=(local); database=foo; integrated security=true
For some bizzare reason, using the "external" connection the SP runs almost as fast as a console app (still not as fast mind you! -- 55 seconds)
Of course now the assembly has to be deployed as External rather than Safe --- and that introduces more frustration.