Memory leaks Symfony2 Doctrine2 / exceed memory limit - orm

I have a lot of trouble with the combination of symfony2 and doctrine2. I have to deal with huge datasets (around 2-3 million write and read) and have to do a lot of additional effort to avoid running out of memory.
I figgured out 2 main points, that "leak"ing memory (they are actually not really leaking, but allocating a lot).
The Entitymanager entity storage (I don't know the real name of this one) it seems like it keeps all processed entities and you have to clear this storage regularly with
$entityManager->clear()
The Doctrine QueryCache - it caches all used Queries and the only configuration I found was, that you are able to decide what kind of Cache you wanna use. I didn't find a global disable neither a useful flag for each query to disable it.
So usually I disable it for every query object with the function
$qb = $repository->createQueryBuilder($a);
$query = $qb->getQuery();
$query->useQueryCache(false);
$query->execute();
So.. that's all I figured out right now..
My questions are:
Is there a easy way to deny some objects from the Entitymanagerstorage?
Is there a way to set the querycache use in the entitymanager?
Can I configure this caching behaviors somewhere in the Symfony/doctrine configuration?
Would be very cool if someone has some nice tips for me.. otherwise this may help some rookie..
cya

As stated by the Doctrine Configuration Reference by default logging of the SQL connection is set to the value of kernel.debug, so if you have instantiated AppKernel with debug set to true the SQL commands get stored in memory for each iteration.
You should either instantiate AppKernel to false, set logging to false in you config YML, or either set the SQLLogger manually to null before using the EntityManager
$em->getConnection()->getConfiguration()->setSQLLogger(null);

Try running your command with --no-debug. In debug mode the profiler retains informations about every single query in memory.

1. Turn off logging and profiling in app/config/config.yml
doctrine:
dbal:
driver: ...
...
logging: false
profiling: false
or in code
$this->entityManager->getConnection()->getConfiguration()->setSQLLogger(null);
2. Force garbage collector. If you actively use CPU then garbage collector waits and you can find yourself with no memory soon.
At first enable manual garbage collection managing. Run gc_enable() anywhere in the code. Then run gc_collect_cycles() to force garbage collector.
Example
public function execute(InputInterface $input, OutputInterface $output)
{
gc_enable();
// I'm initing $this->entityManager in __construct using DependencyInjection
$customers = $this->entityManager->getRepository(Customer::class)->findAll();
$counter = 0;
foreach ($customers as $customer) {
// process customer - some logic here, $this->em->persist and so on
if (++$counter % 100 == 0) {
$this->entityManager->flush(); // save unsaved changes
$this->entityManager->clear(); // clear doctrine managed entities
gc_collect_cycles(); // PHP garbage collect
// Note that $this->entityManager->clear() detaches all managed entities,
// may be you need some; reinit them here
}
}
// don't forget to flush in the end
$this->entityManager->flush();
$this->entityManager->clear();
gc_collect_cycles();
}
If your table is very large, don't use findAll. Use iterator - http://doctrine-orm.readthedocs.org/projects/doctrine-orm/en/latest/reference/batch-processing.html#iterating-results

Set SQL logger to null
$em->getConnection()->getConfiguration()->setSQLLogger(null);
Manually call function gc_collect_cycles() after $em->clear()
$em->clear();
gc_collect_cycles();
Don't forget to set zend.enable_gc to 1, or manually call gc_enable() before use gc_collect_cycles()
Add --no-debug option if you run command from console.

got some "funny" news from doctrine developers itself on the symfony live in berlin - they say, that on large batches, us should not use an orm .. it is just no efficient to build stuff like that in oop
.. yeah.. maybe they are right xD

As per the standard Doctrine2 documentation, you'll need to manually clear or detatch entities.
In addition to that, when profiling is enabled (as in the default dev environment). The DoctrineBundle in Symfony2 configures a several loggers use quite a bit of memory. You can disable logging completely, but it is not required.
An interesting side effect, is the loggers affect both Doctrine ORM and DBAL. One of loggers will result in additional memory usage for any service that uses the default logger service. Disabling all of these would be ideal in commands-- since the profiler isn't used there yet.
Here is what you can do to disable the memory-intense loggers while keeping profiling enabled in other parts of Symfony2:
$c = $this->getContainer();
/*
* The default dbalLogger is configured to keep "stopwatch" events for every query executed
* the only way to disable this, as of Symfony 2.3, Doctrine Bundle 1.2, is to reinistiate the class
*/
$dbalLoggerClass = $c->getParameter('doctrine.dbal.logger.class');
$dbalLogger = new $dbalLoggerClass($c->get('logger'));
$c->set('doctrine.dbal.logger', $dbalLogger);
// sometimes you need to configure doctrine to use the newly logger manually, like this
$doctrineConfiguration = $c->get('doctrine')->getManager()->getConnection()->getConfiguration();
$doctrineConfiguration->setSQLLogger($dbalLogger);
/*
* If profiling is enabled, this service will store every query in an array
* fortunately, this is configurable with a property "enabled"
*/
if($c->has('doctrine.dbal.logger.profiling.default'))
{
$c->get('doctrine.dbal.logger.profiling.default')->enabled = false;
}
/*
* When profiling is enabled, the Monolog bundle configures a DebugHandler that
* will store every log messgae in memory.
*
* As of Monolog 1.6, to remove/disable this logger: we have to pop all the handlers
* and then push them back on (in the correct order)
*/
$handlers = array();
try
{
while($handler = $logger->popHandler())
{
if($handler instanceOf \Symfony\Bridge\Monolog\Handler\DebugHandler)
{
continue;
}
array_unshift($handlers, $handler);
}
}
catch(\LogicException $e)
{
/*
* As of Monolog 1.6, there is no way to know if there's a handler
* available to pop off except for the \LogicException that's thrown.
*/
if($e->getMessage() != 'You tried to pop from an empty handler stack.')
{
/*
* this probably doesn't matter, and will probably break in the future
* this is here for the sake of people not knowing what they're doing
* so than an unknown exception is not silently discarded.
*/
// remove at your own risk
throw $e;
}
}
// push the handlers back on
foreach($handlers as $handler)
{
$logger->pushHandler($handler);
}

Try disabling any Doctrine caches that exist. (If you're not using APC / other as a cache then memory is used).
Remove Query Cache
$qb = $repository->createQueryBuilder($a);
$query = $qb->getQuery();
$query->useQueryCache(false);
$query->useResultCache(false);
$query->execute();
There's no way to globally disable it
Also this is an alternative to clear that might help (from here)
$connection = $em->getCurrentConnection();
$tables = $connection->getTables();
foreach ( $tables as $table ) {
$table->clear();
}

I just posted a bunch of tips for using Symfony console commands with Doctrine for batch processing here.

Related

Elastic APM show total number of SQL Queries executed on .Net Core API Endpoint

Currently have Elastic Apm setup with: app.UseAllElasticApm(Configuration); which is working correctly. I've just been trying to find a way to record exactly how many SQL Queries are run via Entity Framework for each transaction.
Ideally when viewing the Apm data in Kibana the metadata tab could just include an EntityFramework.ExecutedSqlQueriesCount.
Currently on .Net Core 2.2.3
One thing you can use is the Filter API for this.
With that you have access to all transactions and spans before they are sent to the APM Server.
You can't run through all the spans on a given transaction, so you need some tweaking - for this I use a Dictionary in my sample.
var numberOfSqlQueries = new Dictionary<string, int>();
Elastic.Apm.Agent.AddFilter((ITransaction transaction) =>
{
if (numberOfSqlQueries.ContainsKey(transaction.Id))
{
// We make an assumption here: we assume that all SQL requests on a given transaction end before the transaction ends
// this in practice means that you don't do any "fire and forget" type of query. If you do, you need to make sure
// that the numberOfSqlQueries does not leak.
transaction.Labels["NumberOfSqlQueries"] = numberOfSqlQueries[transaction.Id].ToString();
numberOfSqlQueries.Remove(transaction.Id);
}
return transaction;
});
Elastic.Apm.Agent.AddFilter((ISpan span) =>
{
// you can't relly filter whether if it's done by EF Core, or another database library
// but you have all sorts of other info like db instance, also span.subtype and span.action could be helpful to filter properly
if (span.Context.Db != null && span.Context.Db.Instance == "MyDbInstance")
{
if (numberOfSqlQueries.ContainsKey(span.TransactionId))
numberOfSqlQueries[span.TransactionId]++;
else
numberOfSqlQueries[span.TransactionId] = 1;
}
return span;
});
Couple of thing here:
I assume you don't do "fire and forget" type of queries, if you do, you need to handle those extra
The counting isn't really specific to EF Core queries, but you have info like db name, database type (mssql, etc.) - hopefully based on that you'll be able filter the queries you want.
With transaction.Labels["NumberOfSqlQueries"] we add a label to the given transction, and you'll be able to see this data on the transaction in Kibana.

Changing the GemFire query ResultSender batch size

I am experiencing a performance issue related to the default batch size of the query ResultSender using client/server config. I believe the default value is 100.
If I run a simple query to get keys (with some order by columns due to the PARTITION Region type), this default batch size causes too many chunks being sent back for even 1000 records. In my tests, even the total query time is only less than 100 ms, however, the app takes more than 10 seconds to process those chunks.
Reading between the lines in your problem statement, it seems you are:
Executing an OQL query on a PARTITION Region (PR).
Running the query inside a Function as recommended when executing queries on a PR.
Sending batch results (as opposed to streaming the results).
I also assume since you posted exclusively in the #spring-data-gemfire channel, that you are using Spring Data GemFire (SDG) to:
Execute the query (e.g. by using the SDG GemfireTemplate; Of course, I suppose you could also be using the GemFire Query API inside your Function directly, too)?
Implemented the server-side Function using SDG's Function annotation support?
And, are possibly (indirectly) using SDG's BatchingResultSender, as described in the documentation?
NOTE: The default batch size in SDG is 0, NOT 100. Zero means stream the results individually.
Regarding #2 & #3, your implementation might look something like the following:
#Component
class MyApplicationFunctions {
#GemfireFunction(id = "MyFunction", batchSize = "1000")
public List<SomeApplicationType> myFunction(FunctionContext functionContext) {
RegionFunctionContext regionFunctionContext =
(RegionFunctionContext) functionContext;
Region<?, ?> region = regionFunctionContext.getDataSet();
if (PartitionRegionHelper.isPartitionRegion(region)) {
region = PartitionRegionHelper.getLocalDataForContext(regionFunctionContext);
}
GemfireTemplate template = new GemfireTemplate(region);
String OQL = "...";
SelectResults<?> results = template.query(OQL); // or `template.find(OQL, args);`
List<SomeApplicationType> list = ...;
// process results, convert to SomeApplicationType, add to list
return list;
}
}
NOTE: Since you are most likely executing this Function "on Region", the FunctionContext type will actually be a RegionFunctionContext in this case.
The batchSize attribute on the SDG #GemfireFunction annotation (used for Function "implementations") allows you to control the batch size.
Of course, instead of using SDG's GemfireTemplate to execute queries, you can, of course, use the GemFire Query API directly, as mentioned above.
If you need even more fine grained control over "result sending", then you can simply "inject" the ResultSender provided by GemFire to the Function, even if the Function is implemented using SDG, as shown above. For example you can do:
#Component
class MyApplicationFunctions {
#GemfireFunction(id = "MyFunction")
public void myFunction(FunctionContext functionContext, ResultSender resultSender) {
...
SelectResults<?> results = ...;
// now process the results and use the `resultSender` directly
}
}
This allows you to "send" the results however you see fit, as required by your application.
You can batch/chunk results, stream, whatever.
Although, you should be mindful of the "receiving" side in this case!
The 1 thing that might not be apparent to the average GemFire user is that GemFire's default ResultCollector implementation collects "all" the results first before returning them to the application. This means the receiving side does not support streaming or batching/chunking of the results, allowing them to be processed immediately when the server sends the results (either streamed, batched/chunked, or otherwise).
Once again, SDG helps you out here since you can provide a custom ResultCollector on the Function "execution" (client-side), for example:
#OnRegion("SomePartitionRegion", resultCollector="myResultCollector")
interface MyApplicationFunctionExecution {
void myFunction();
}
In your Spring configuration, you would then have:
#Configuration
class ApplicationGemFireConfiguration {
#Bean
ResultCollector myResultCollector() {
return ...;
}
}
Your "custom" ResultCollector could return results as a stream, a batch/chunk at a time, etc.
In fact, I have prototyped a "streaming" ResultCollector implementation that will eventually be added to SDG, here.
Anyway, this should give you some ideas on how to handle the performance problem you seem to be experiencing. 1000 results is not a lot of data so I suspect your problem is mostly self-inflicted.
Hope this helps!
John,
Just to clarify, I use client/server topology(actually wan, but that is not important in here). My client is a spring boot web app which has kendo grid as ui. Users can filter/sort on any combination of the columns, which will be passed to the spring boot app for generating dynamic OQL and create the pagination. Till now, except for being dynamic, my OQL queries are quite straight forward. I do not want to introduce server side functions due to the complexity of our global deployment process. But I can if you think that is something I have to do.
Again, thanks for your answers.

Read SQL Server Broker messages and publish them using NServiceBus

I am very new to NServiceBus, and in one of our project, we want to accomplish following -
Whenever table data is modified in Sql server, construct a message and insert in sql server broker queue
Read the broker queue message using NServiceBus
Publish the message again as another event so that other subscribers
can handle it.
Now it is point 2, that I do not have much clue, how to get it done.
I have referred the following posts, after which I was able to enter the message in broker queue, but unable to integrate with NServiceBus in our project, as the NServiceBus libraries are of older version and also many methods used are deprecated. So using them with current versions is getting very troublesome, or if I was doing it in improper way.
http://www.nullreference.se/2010/12/06/using-nservicebus-and-servicebroker-net-part-2
https://github.com/jdaigle/servicebroker.net
Any help on the correct way of doing this would be invaluable.
Thanks.
I'm using the current version of nServiceBus (5), VS2013 and SQL Server 2008. I created a Database Change Listener using this tutorial, which uses SQL Server object broker and SQLDependency to monitor the changes to a specific table. (NB This may be deprecated in later versions of SQL Server).
SQL Dependency allows you to use a broad selection of all the basic SQL functionality, although there are some restrictions that you need to be aware of. I modified the code from the tutorial slightly to provide better error information:
void NotifyOnChange(object sender, SqlNotificationEventArgs e)
{
// Check for any errors
if (#"Subscribe|Unknown".Contains(e.Type.ToString())) { throw _DisplayErrorDetails(e); }
var dependency = sender as SqlDependency;
if (dependency != null) dependency.OnChange -= NotifyOnChange;
if (OnChange != null) { OnChange(); }
}
private Exception _DisplayErrorDetails(SqlNotificationEventArgs e)
{
var message = "useful error info";
var messageInner = string.Format("Type:{0}, Source:{1}, Info:{2}", e.Type.ToString(), e.Source.ToString(), e.Info.ToString());
if (#"Subscribe".Contains(e.Type.ToString()) && #"Invalid".Contains(e.Info.ToString()))
messageInner += "\r\n\nThe subscriber says that the statement is invalid - check your SQL statement conforms to specified requirements (http://stackoverflow.com/questions/7588572/what-are-the-limitations-of-sqldependency/7588660#7588660).\n\n";
return new Exception(messageMain, new Exception(messageInner));
}
I also created a project with a "database first" Entity Framework data model to allow me do something with the changed data.
[The relevant part of] My nServiceBus project comprises two "Run as Host" endpoints, one of which publishes event messages. The second endpoint handles the messages. The publisher has been setup to IWantToRunAtStartup, which instantiates the DBListener and passes it the SQL statement I want to run as my change monitor. The onChange() function is passed an anonymous function to read the changed data and publish a message:
using statements
namespace Sample4.TestItemRequest
{
public partial class MyExampleSender : IWantToRunWhenBusStartsAndStops
{
private string NOTIFY_SQL = #"SELECT [id] FROM [dbo].[Test] WITH(NOLOCK) WHERE ISNULL([Status], 'N') = 'N'";
public void Start() { _StartListening(); }
public void Stop() { throw new NotImplementedException(); }
private void _StartListening()
{
var db = new Models.TestEntities();
// Instantiate a new DBListener with the specified connection string
var changeListener = new DatabaseChangeListener(ConfigurationManager.ConnectionStrings["TestConnection"].ConnectionString);
// Assign the code within the braces to the DBListener's onChange event
changeListener.OnChange += () =>
{
/* START OF EVENT HANDLING CODE */
//This uses LINQ against the EF data model to get the changed records
IEnumerable<Models.TestItems> _NewTestItems = DataAccessLibrary.GetInitialDataSet(db);
while (_NewTestItems.Count() > 0)
{
foreach (var qq in _NewTestItems)
{
// Do some processing, if required
var newTestItem = new NewTestStarted() { ... set properties from qq object ... };
Bus.Publish(newTestItem);
}
// Because there might be a number of new rows added, I grab them in small batches until finished.
// Probably better to use RX to do this, but this will do for proof of concept
_NewTestItems = DataAccessLibrary.GetNextDataChunk(db);
}
changeListener.Start(string.Format(NOTIFY_SQL));
/* END OF EVENT HANDLING CODE */
};
// Now everything has been set up.... start it running.
changeListener.Start(string.Format(NOTIFY_SQL));
}
}
}
Important The OnChange event firing causes the listener to stop monitoring. It basically is a single event notifier. After you have handled the event, the last thing to do is restart the DBListener. (You can see this in the line preceding the END OF EVENT HANDLING comment).
You need to add a reference to System.Data and possibly System.Data.DataSetExtensions.
The project at the moment is still proof of concept, so I'm well aware that the above can be somewhat improved. Also bear in mind I had to strip out company specific code, so there may be bugs. Treat it as a template, rather than a working example.
I also don't know if this is the right place to put the code - that's partly why I'm on StackOverflow today; to look for better examples of ServiceBus host code. Whatever the failings of my code, the solution works pretty effectively - so far - and meets your goals, too.
Don't worry too much about the ServiceBroker side of things. Once you have set it up, per the tutorial, SQLDependency takes care of the details for you.
The ServiceBroker Transport is very old and not supported anymore, as far as I can remember.
A possible solution would be to "monitor" the interesting tables from the endpoint code using something like a SqlDependency (http://msdn.microsoft.com/en-us/library/62xk7953(v=vs.110).aspx) and then push messages into the relevant queues.
.m

How can I count how many SQL queries hibernate does in one Grails request?

I need to debug a Grails application with one really slow request. I have SQL logging but would like to see the amount of SQL-queries without counting them manually.
debug 'org.hibernate.SQL'
trace 'org.hibernate.type'
For eaxmple to have following line after each request (where x is the amount of all queries made to SQL server):
[2012-10-04 13:41:45,049][LoggingFilters] INFO - Request finished in 8296 ms and made x SQL statements
After some googling this doesn't seem to be possible with Grails so maybe MySQL could provide the information?
You can do it by using Filters and Hibernate statistics.
Create class ExampleFilters.groovy in conf folder. This is the content of the class:
import org.hibernate.stat.Statistics
class ExampleFilters {
def sessionFactory
def filters = {
// your filters here
logHibernateStats(controller: '*', action: '*') {
before = {
Statistics stats = sessionFactory.statistics;
if(!stats.statisticsEnabled) {stats.setStatisticsEnabled(true)}
}
afterView = {
Statistics stats = sessionFactory.getStatistics()
double queryCacheHitCount = stats.getQueryCacheHitCount();
double queryCacheMissCount = stats.getQueryCacheMissCount();
double queryCacheHitRatio = (queryCacheHitCount / ((queryCacheHitCount + queryCacheMissCount) ?: 1))
println """
######################## Hibernate Stats ##############################################
Transaction Count:${stats.transactionCount}
Flush Count:${stats.flushCount}
Total Collections Fetched:${stats.collectionFetchCount}
Total Collections Loaded:${stats.collectionLoadCount}
Total Entities Fetched:${stats.entityFetchCount}
Total Entities Loaded:${stats.entityFetchCount}
Total Queries:${stats.queryExecutionCount}
queryCacheHitCount:${queryCacheHitCount}
queryCacheMissCount:${queryCacheMissCount}
queryCacheHitRatio:${queryCacheHitRatio}
######################## Hibernate Stats ##############################################
"""
stats.clear()
}
}
}
}
For reading more about various Hibernate statistics read this article:
http://www.javalobby.org/java/forums/t19807.html
Also note that there is a performance impact when using this, so it should really be used only in development environment.
Have you considered some low tech solutions like running the output through wc -l?
in grails use loggingSql
dataSource {
dbCreate = "update" // one of 'create', 'create-drop','update'
url = "jdbc:postgresql://localhost:5432/demodb"
loggingSql = true
}
you’ll notice that all SQL statements Grails utilize will be logged.
As Burt Beckwith said in his blog post "Stuff I Learned Consulting" http://burtbeckwith.com/blog/?p=1570
SQL Logging
There are two ways to view SQL output from queries; adding logSql = true in DataSource.groovy and configuring Log4j loggers. The Log4j approach is a lot more flexible since it doesn’t just dump to stdout, and can be routed to a file or other appender and conveniently enabled and disabled. But it turns out it’s easy to toggle logSql SQL console logging. Get a reference to the sessionFactory bean (e.g. using dependency injection with def sessionFactory) and turn it on with
sessionFactory.settings.sqlStatementLogger.logToStdout = true
and off with
sessionFactory.settings.sqlStatementLogger.logToStdout = false

Flushing in NHibernate

This question is a bit of a dupe, but I still don't understand the best way to handle flushing.
I am migrating an existing code base, which contains a lot of code like the following:
private void btnSave_Click()
{
SaveForm();
ReloadList();
}
private void SaveForm()
{
var foo = FooRepository.Get(_editingFooId);
foo.Name = txtName.Text;
FooRepository.Save(foo);
}
private void ReloadList()
{
fooRepeater.DataSource = FooRepository.LoadAll();
fooRepeater.DataBind();
}
Now that I am changing the FooRepository to Nhibernate, what should I use for the FooRepository.Save method? Should the FooRepository always flush the session when the entity is saved?
I'm not sure if I understand your question, but here is what I think:
Think in "putting objects to the session" instead of "getting and storing data". NH will store all new and changed objects in the session without any special call to it.
Consider this scenarios:
Data change:
Get data from the database with any query. The entities are now in the NH session
Change entities by just changing property values
Commit the transaction. Changes are flushed and stored to the database.
Create a new object:
Call a constructor to create a new object
Store it to the database by calling "Save". It is in the session now.
You still can change the object after Save
Commit the changes. The latest state will be stored to the database.
If you work with detached entities, you also need Update or SaveOrUpdate to put detached entities to the session.
Of course you can configure NH to behave differently. But it works best if you follow this default behaviour.
It doesn't matter whether or not you explicitly flush the session between modifying a Foo entity and loading all Foos from the repository. NHibernate is smart enough to auto-flush itself if you have made changes in the session that may affect the results of the query you are trying to run.
Ideally I try to use one session per "unit of work". This means one cohesive piece of work which may involve several smaller steps. If you feel that you do not have a seam in your architecture where you can achieve this, then managing the session inside the repository will also work. Just be aware that you are missing out on some of the power that NHibernate provides you.
I'd vote up Stefan Moser's answer if I could - I'm still getting to grips with Nh myself but I think it's nice to be able to write code like this:
private void SaveForm()
{
using (var unitofwork = UnitOfWork.Start())
{
var foo = FooRepository.Get(_editingFooId);
var bar = BarRepository.Get(_barId);
foo.Name = txtName.Text;
bar.SomeOtherProperty = txtBlah.Text;
FooRepository.Save(foo);
BarRepository.Save(bar);
UnitOfWork.CommitChanges();
}
}
so this way either the whole action succeeds or it fails and rolls back, keeping flushing/transaction management outside of the Repositories.