Setting MaxPageSize for EmbeddableDocumentStore - ravendb

Seems that setting the Configuration for MaxPageSize inside the progrram does not work.
I know that it is bad practice, but i want to retrieve all distinct artists inside a Music Database collection and store it in a GUI Combo for selection and don't like to loop through bunch of 128 documents. The system is used standalone, so the performance impact isthe sameas a select * from a sql db.
i do this in my code:
_store = new EmbeddableDocumentStore
{
Configuration = new RavenConfiguration()
{
DataDirectory = #"Database\MusicDatabase\",
RunInMemory = false,
MaxPageSize = 300000
}
};
_store.Initialize();
But i still get only 128 documents returned.
Tried to add a key to my App config file and also added a web.config file with Raven/MaxPageSize, but it seems to be ignored.
Any idea?
thanks,
Helmut

MaxPageSize controls the max limit. And raising it is generally a bad idea.
If you don't specify how many items you want, however, we'll default to 128.

Related

How to access files stored in SQL Server's FileTable?

As I know SQL Server since version 2012 has a new feature, FileTable. It allows us to store files in the file system and to use them from T-SQL.
I am trying to use this feature and I have no idea how to do it properly.
Generally, I don't know how to access files stored in the file table. Let's suppose I have asp.net MVC app and there are a lot of images which I show on web pages in img tags. I would like to store these images in Filetable and access them as files from the filesystem. But I don't know where these files are stored and how to use them as files. Now my images are stored in web application directory in folder images and I write something like this:
<img src='/images/mypicture.png' />
And if I move my images to file table what I should write in src?
<img src='path-toimage-in-filetable' />
I don't think you still need this, anyways I'll post my answer for anyone else interested.
First, a filetable still being a table, so, if you want to access to data from it you need to use a Select SQL statement. So you'd need something like:
select name, file_stream from filetable_name
where
name = 'file_name',
file_type = 'file_extension'
just execute an statement like this in your asp.net app, then fetch the results and use the file_stream column to get the binary data of the stored file. If you want to retrieve the file from HTML, first you need to create an action in your controller, which will return the retrieved file:
public ActionResult GetFile(){
..
return File(file.file_stream,file.file_type);
}
After this, put in you HTML tag something like:
<img src="/controller/GetFile" />
hope this could help!
If you want to know the schema of a filetable see
here
I assume by FileTable you actually mean FileStream. A couple notes about that:
This feature is best used if your files are actually files
The files should be, on average, greater than 1mb - although there can be exceptions to this rule, if they're smaller than 1mb on average, you may be better off using a VARBINARY(MAX) or XML data type as appropriate. If your images are very small on average (only a few KB), consider using a VARBINARY(MAX) column.
Accessing these files will require an open transaction and that the database is properly configured for FILESTREAM
You can get some significant advantages bypassing the normal SQL engine/database file method of data access by telling SQL Server that you want to access the file directly, however it's not meant for directly accessing the file on the file system and attempting to do so can break SQL's management of these files (transactional consistency, tracking, locking, etc.).
It's pretty likely that your use case here would be better served by using a CDN and storing image URLs in the table if you really need SQL for this. You can use FILESTREAM to do this (see code sample below for one implementation), but you'll be hammering your SQL server for every request unless you store the images somewhere else anyway that the browser can properly cache (my example doesn't do that) - and if you store them somewhere else for rendering int he browser you might as well store them there to begin with (you won't have transactional consistency for those images once they're copied to some other drive/disk/location anyway).
With all that said, here's an example of how you'd access the FILESTREAM data using ADO.NET:
public static string connectionString = ...; // get your connection string from encrypted config
// assumes your FILESTREAM data column is called Img in a table called ImageTable
const string sql = #"
SELECT
Img.PathName(),
GET_FILESTREAM_TRANSACTION_CONTEXT()
FROM ImageTagble
WHERE ImageId = #id";
public string RetreiveImage(int id)
{
string serverPath;
byte[] txnToken;
string base64ImageData = null;
using (var ts = new TransactionScope())
{
using (var conn = new SqlConnection(connectionString))
{
conn.Open();
using (SqlCommand cmd = new SqlCommand(sql, conn))
{
cmd.Parameters.Add("#id", SqlDbType.Int).Value = id;
using (SqlDataReader rdr = cmd.ExecuteReader())
{
rdr.Read();
serverPath = rdr.GetSqlString(0).Value;
txnToken = rdr.GetSqlBinary(1).Value;
}
}
using (var sfs = new SqlFileStream(serverPath, txnToken, FileAccess.Read))
{
// sfs will now work basically like a FileStream. You can either copy it locally or return it as a base64 encoded string
using (var ms = new MemoryStream())
{
sfs.CopyTo(ms);
base64ImageData = Convert.ToBase64String(ms.ToArray());
}
}
}
ts.Complete();
// assume this is PNG image data, replace PNG with JPG etc. as appropraite. Might store in table if it will vary...
return "data:img/png;base64," + base64ImageData;
}
}
Obviously, if you have lots of images to handle like this this is not an ideal method - don't try to make an instance of SQL server into what you should be using a CDN for.... However, if you have other really good reasons, you should try to grab as many images as possible in a single request/transaction (e.g. if you know you're displaying 50 images on a page, get all 50 with a single transaction scope, don't use 50 transaction scopes - this code won't handle that).

servicestack redis, when using SetEntry, it will automatic generate a set with key "ids:+objectName" in redis db, how can I disable it?

when using SetEntry, it will automatic generate a set with key "ids:+ objectName" in redis db.
For example:
typedClient.SetEntry("famyly:username:jhon",new Family {FatherName="Jhon",...});
a set with key name of "ids:Family" and a member like "2343443" will be automatic created in redis db,
and each time I update or modify the same key with SetEntry, the set of "ids:Family" will increment with an new auto generated member. And this set will grow extremely large if I update the key frequently.
How can I disable the auto generated set? this set seems useless for the current circumstances.
thanks
I ran into this same problem - I discovered that our database contained a couple dozen of these "ids:XXX" sets, each containing tens of millions of items, which were consuming significant amounts of memory.
The solution is to switch to untyped clients. You can still use typed methods on the client so you're really not giving up any type safety or automatic serialization at all. There's a couple ways to create clients; we tend to use the get-in-get-out Exec shortcuts on RedisClientsManager. You should be able to adapt this to the way you do it.
Typed client - creates "ids" sets:
// set:
redis.ExecAs<T>(c => c.SetEntry(key, value));
// get:
T value = redis.ExecAs<T>(c => c.GetValue(key));
Untyped client - no "ids" sets created:
// set:
redis.Exec(c => c.Set(key, value));
// get:
using (var cli = _redis.GetClient())
{
T value = cli.Get<T>(key);
}
The inferred auto-generated id's are when you use the high-level Redis Typed Client. Use the IRedisClient.SetEntry on the string-based RedisClient API instead.

RavenDB, RavenHQ and Appharbor - document size error with very first document

I have a completely empty RavenHQ database that's linked to my Appharbor application. The amount of space the database is currently using is 1.1mb out of an available 25mb for my bronze account. The database previously had records in it, but I have deleted them using "delete collection" in the management studio.
The very first time I call session.Store(myobject), and BEFORE I call .SaveChanges(), I get the following error.
System.InvalidOperationException: Url: "/docs/Raven/Hilo/AccItems"
Raven.Database.Exceptions.OperationVetoedException: PUT vetoed by Raven.Bundles.Quotas.Triggers.DatabaseSizeQoutaForDocumetsPutTrigger because: Database size is 45,347 KB, which is over the allowed quota of 25,600 KB. No more documents are allowed in.
Now, the document is definitely not that big, so I don't know what this error can mean, especially as I don't think I've even hit the database at that point since I haven't closed the session by calling SaveChanges(). Any ideas? Here's the code itself.
XDocument doc = XDocument.Parse(rawXml);
var accItems = ExtractItemsFromFeed(doc);
using (IDocumentSession session = _store.OpenSession())
{
var dbItems = session.Query<AccItem>().ToList();
foreach (var item in accItems)
{
var existingRecord = dbItems.SingleOrDefault(x => x.Source == x.SourceId == cottage.SourceId);
if (existingRecord == null)
{
session.Store(item);
_logger.Info("Saved new item {0}.", item.ShortName);
}
else
{
existingRecord.ShortName = item.ShortName;
_logger.Info("Updated item {0}.", item.ShortName);
}
session.SaveChanges();
}
}
Any other comments about the style of this code would be most welcome, as I was unsure of the best way to approach the "update existing item or create if it isn't there" scenario.
The answer here was as follows.
RavenHQ support found that the database was indeed oversized, but it seemed that the size reported in the Appharbor-branded RavenHQ control panel was incorrect. I had filled up the database way over the limit with a previous faulty version of the code posted above, so the error message I received was actually correct.
Fixing this problem without paying to upgrade the database wasn't straightforward, as it's not possible to shrink the database. As I also wasn't able to delete my single Appharbor/RavenHQ database or create another one that left me with the choice of creating an entirely new Appharbor application, or registering directly with RavenHQ for a new account. I chose the latter. The RavenHQ-branded control panel is slightly different to the Appharbor one, in that it has the ability to create and delete databases.
So to summarize: there doesn't seem to be any benefit to using RavenHQ as an add-on to Appharbor - you might as well go and get a proper free RavenHQ account.

Memory leaks Symfony2 Doctrine2 / exceed memory limit

I have a lot of trouble with the combination of symfony2 and doctrine2. I have to deal with huge datasets (around 2-3 million write and read) and have to do a lot of additional effort to avoid running out of memory.
I figgured out 2 main points, that "leak"ing memory (they are actually not really leaking, but allocating a lot).
The Entitymanager entity storage (I don't know the real name of this one) it seems like it keeps all processed entities and you have to clear this storage regularly with
$entityManager->clear()
The Doctrine QueryCache - it caches all used Queries and the only configuration I found was, that you are able to decide what kind of Cache you wanna use. I didn't find a global disable neither a useful flag for each query to disable it.
So usually I disable it for every query object with the function
$qb = $repository->createQueryBuilder($a);
$query = $qb->getQuery();
$query->useQueryCache(false);
$query->execute();
So.. that's all I figured out right now..
My questions are:
Is there a easy way to deny some objects from the Entitymanagerstorage?
Is there a way to set the querycache use in the entitymanager?
Can I configure this caching behaviors somewhere in the Symfony/doctrine configuration?
Would be very cool if someone has some nice tips for me.. otherwise this may help some rookie..
cya
As stated by the Doctrine Configuration Reference by default logging of the SQL connection is set to the value of kernel.debug, so if you have instantiated AppKernel with debug set to true the SQL commands get stored in memory for each iteration.
You should either instantiate AppKernel to false, set logging to false in you config YML, or either set the SQLLogger manually to null before using the EntityManager
$em->getConnection()->getConfiguration()->setSQLLogger(null);
Try running your command with --no-debug. In debug mode the profiler retains informations about every single query in memory.
1. Turn off logging and profiling in app/config/config.yml
doctrine:
dbal:
driver: ...
...
logging: false
profiling: false
or in code
$this->entityManager->getConnection()->getConfiguration()->setSQLLogger(null);
2. Force garbage collector. If you actively use CPU then garbage collector waits and you can find yourself with no memory soon.
At first enable manual garbage collection managing. Run gc_enable() anywhere in the code. Then run gc_collect_cycles() to force garbage collector.
Example
public function execute(InputInterface $input, OutputInterface $output)
{
gc_enable();
// I'm initing $this->entityManager in __construct using DependencyInjection
$customers = $this->entityManager->getRepository(Customer::class)->findAll();
$counter = 0;
foreach ($customers as $customer) {
// process customer - some logic here, $this->em->persist and so on
if (++$counter % 100 == 0) {
$this->entityManager->flush(); // save unsaved changes
$this->entityManager->clear(); // clear doctrine managed entities
gc_collect_cycles(); // PHP garbage collect
// Note that $this->entityManager->clear() detaches all managed entities,
// may be you need some; reinit them here
}
}
// don't forget to flush in the end
$this->entityManager->flush();
$this->entityManager->clear();
gc_collect_cycles();
}
If your table is very large, don't use findAll. Use iterator - http://doctrine-orm.readthedocs.org/projects/doctrine-orm/en/latest/reference/batch-processing.html#iterating-results
Set SQL logger to null
$em->getConnection()->getConfiguration()->setSQLLogger(null);
Manually call function gc_collect_cycles() after $em->clear()
$em->clear();
gc_collect_cycles();
Don't forget to set zend.enable_gc to 1, or manually call gc_enable() before use gc_collect_cycles()
Add --no-debug option if you run command from console.
got some "funny" news from doctrine developers itself on the symfony live in berlin - they say, that on large batches, us should not use an orm .. it is just no efficient to build stuff like that in oop
.. yeah.. maybe they are right xD
As per the standard Doctrine2 documentation, you'll need to manually clear or detatch entities.
In addition to that, when profiling is enabled (as in the default dev environment). The DoctrineBundle in Symfony2 configures a several loggers use quite a bit of memory. You can disable logging completely, but it is not required.
An interesting side effect, is the loggers affect both Doctrine ORM and DBAL. One of loggers will result in additional memory usage for any service that uses the default logger service. Disabling all of these would be ideal in commands-- since the profiler isn't used there yet.
Here is what you can do to disable the memory-intense loggers while keeping profiling enabled in other parts of Symfony2:
$c = $this->getContainer();
/*
* The default dbalLogger is configured to keep "stopwatch" events for every query executed
* the only way to disable this, as of Symfony 2.3, Doctrine Bundle 1.2, is to reinistiate the class
*/
$dbalLoggerClass = $c->getParameter('doctrine.dbal.logger.class');
$dbalLogger = new $dbalLoggerClass($c->get('logger'));
$c->set('doctrine.dbal.logger', $dbalLogger);
// sometimes you need to configure doctrine to use the newly logger manually, like this
$doctrineConfiguration = $c->get('doctrine')->getManager()->getConnection()->getConfiguration();
$doctrineConfiguration->setSQLLogger($dbalLogger);
/*
* If profiling is enabled, this service will store every query in an array
* fortunately, this is configurable with a property "enabled"
*/
if($c->has('doctrine.dbal.logger.profiling.default'))
{
$c->get('doctrine.dbal.logger.profiling.default')->enabled = false;
}
/*
* When profiling is enabled, the Monolog bundle configures a DebugHandler that
* will store every log messgae in memory.
*
* As of Monolog 1.6, to remove/disable this logger: we have to pop all the handlers
* and then push them back on (in the correct order)
*/
$handlers = array();
try
{
while($handler = $logger->popHandler())
{
if($handler instanceOf \Symfony\Bridge\Monolog\Handler\DebugHandler)
{
continue;
}
array_unshift($handlers, $handler);
}
}
catch(\LogicException $e)
{
/*
* As of Monolog 1.6, there is no way to know if there's a handler
* available to pop off except for the \LogicException that's thrown.
*/
if($e->getMessage() != 'You tried to pop from an empty handler stack.')
{
/*
* this probably doesn't matter, and will probably break in the future
* this is here for the sake of people not knowing what they're doing
* so than an unknown exception is not silently discarded.
*/
// remove at your own risk
throw $e;
}
}
// push the handlers back on
foreach($handlers as $handler)
{
$logger->pushHandler($handler);
}
Try disabling any Doctrine caches that exist. (If you're not using APC / other as a cache then memory is used).
Remove Query Cache
$qb = $repository->createQueryBuilder($a);
$query = $qb->getQuery();
$query->useQueryCache(false);
$query->useResultCache(false);
$query->execute();
There's no way to globally disable it
Also this is an alternative to clear that might help (from here)
$connection = $em->getCurrentConnection();
$tables = $connection->getTables();
foreach ( $tables as $table ) {
$table->clear();
}
I just posted a bunch of tips for using Symfony console commands with Doctrine for batch processing here.

How do I set a dynamic datasource for ORM?

ORM settings in Coldfusion application.cfc run before anything else runs (onapplicationstart, etc). So how do you set a dynamic datasource (code before the ORM init) in application.cfc? we can set it after and it re-points the ORM to a dynamic datasource, but that requires that the hardcoded datasource must be valid as well. This is tenuous at best.
Here is an example:
<cfscript>
this.name = "someapp_#hash(cgi.http_host)#";
this.ormenabled = "true";
this.ormsettings = { cfclocation = "config/definitions", eventhandling = "true",datasource="STATICDATASOURCE" };
</cfscript>
If it's not specified in application.cfc scope then you get errors like "ORM is not configured for the current application."
We need to be able to get the datasource from a text file on the server.
this.datasource="YourDatasourceName";
Well, if you wanted to store a file, for this example we'll call it "datasource.xml" consisting of:
<dataSourceName>Name goes here</dataSourceName>
You can read it in with:
dataFile = fileRead("pathToFile/datasource.xml");
data = xmlParse(dataFile);
dataSourceName = data.dataSourceName.xmlText;
this.datasource=dataSourceName;
ORM datasource just uses the default datasource if not defined.
Having said that, if you want to add / remove datasource dynamically, see Administrator API at: http://help.adobe.com/en_US/ColdFusion/9.0/Admin/WSc3ff6d0ea77859461172e0811cbf364104-7fcf.html (available since CF8)
I'm not sure if you can re-set the this.ormsettings.datasource to something else at runtime (i.e. onApplicationStart()? or onServerStart()?), but many of the settings can be set again. You may want to try it out.