Is Nhibernate Shards production ready? - nhibernate

At the company I work we have a single database schema but with each of our clients using their own dedicated database, with one central database that stores client contact details and what database the client is using so we can connect to the appropriate database. I've looked at using NHibernate Shards but it seems to have gone very quiet and doesn't look complete.
Does anyone know the status of this project? Has anyone used it in production?
If it's not yet at a point that is considered usable in production, what are the alternatives? The two main ones seem to be:
Create a session factory per database and then a wrapper to select the appropriate factory to generate the correct session - this seems to me to have redundant session factories and not too efficient
Create just one session factory but when calling opensession pass it an IDbConnection - which would allow the session to have a different database connection.
My concern with 2 is how will NHibernate cope with a 2nd level cache as I believe it is controlled by the session factory - also the HiLo generator uses the session factory I believe. In these cases will having sessions attach to different dbs cause problems? For example we will end up with a MyCompany.Model.User class that has an id of 2 in both databases will this cause conflicts within the cache?

You could have a look at Enzo SQL Shard a sharding library for SQL Server. If you are already using NHibernate there might be a few changes required in the code though

NHibernate Shards is up-to-date with the latest NHibernate API changes and now supports all query models of NHibrrnate, including Linq. Complex scalar queries are currently not supported.
We use it in production for a multi-tenant environment, but there are a few things to be mindful of.
NHibernate Shards has a session factory for each shard, but only uses a single NHibernate Configuration instance to generate the session factories. This approach likely won't scale well to large numbers of shards.
Querying across shards does not play well with paging. It works, but can involve considerable client-side processing. It is best to keep result sets as small as possible and lock queries to single shards where feasible.

Related

How to cache connections to different Postgres/MySQL databases in Golang?

I am having an application where different users may connect to different databases (those can be either MySQL or Postgres), what might be the best way to cache those connections across different databases? I saw some connection pools but seems like they are more for one db multiple connections than for multiple db multiple connections.
PS:
For adding more context, I am designing a multi tenant architecture where each tenant connects to one or multiple databases, I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases. Or should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?
An example for the software that I want to build is https://www.sigmacomputing.com/ where the user can connects to multiple databases for working with different tables.
Both MySQL and Postgres do not allow to connection sharing between multiple database users, single database user is specified in connection credentials. If you mean that your different users have their own database credentials, then it is not possible to share connections between them.
If by "different users" you mean your application users and if they share single database user to access DB deeper in the app, then you don't need to do anything particular to "cache" connections. sql.DB keeps and reuses open connections in its pool by default.
Go automatically opens, closes and reuses DB connections with a *database/sql.DB. By default it keeps up to 2 connections open (idle) and opens unlimited number of new connections under concurrency when all opened connections are already busy.
If you need some fine tuning on pool efficiency vs database load, you may want to alter sql.DB config with .Set* methods, for example SetMaxOpenConns.
You seem to have to many unknowns. In cases like this I would apply good, old agile and start with prototype of what you want to achieve with tools that you already know and then benchmark the performance. I think you might be surprised how much go can handle.
Since you understand how to use map[string]*sql.DB for that purpose I would go with that. You reach some limits? Add another machine behind haproxy. Solving scaling problem doesn't necessary mean writing new db pool in go. Obviously if you need this kind of power you can always do it - pgx postgres driver has it's own pool implementation so you can get your inspiration there. However doing this right now seems to be pre-mature optimization - solving problem you don't have yet. Building prototype with map[string]*sql.DB is easy, test it, benchmark it, you will see if you need more.
p.s. BTW you will most likely hit first file descriptor limit before you will be able to exhaust memory.
Assuming you have multiple users with multiple databases with an N to N relation, you could have a map of a database URL to database details (explained below).
The fact that which users have access to which databases should be handled anyway using configmap or a core database; For Database Details, we could have a struct like this:
type DBDetail {
sync.RWMutex
connection *sql.DB
}
The map would be database URL to database's details (dbDetail) and if a user is write it calls this:
dbDetail.Lock()
defer dbDetail.Unock()
and for reads instead of above just use RLock.
As said by vearutop the connections could be a pain but using this you could have a single connection or set the limit with increment and decrement of another variable after Lock.
There isn’t necessarily a correct architectural answer here. It depends on some of the constraints of the system.
I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases.
Whether this will scale sufficiently depends on the expectation of how numerous the databases will be. If there are expected to be tens or hundreds of concurrent users in the near future, is probably sufficient. Often a good next step after using a map is to transition over to a more full featured cache (for example https://github.com/dgraph-io/ristretto).
A factor in the decision of whether to use a map or cache is how you imagine the lifecycle of a database connection. Once a connection is opened, can that connection remain opened for the remainder of the lifetime of the process or do connections need to be closed after minutes of no use to free up resources.
Should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?
The right answer here depends on how many processing nodes are expected and whether there will be gain additional benefits from routing requests to specific machines. For example, row-level caching and isolating users from each other’s requests is an advantage that would be gained by sharing users across the pool. But a disadvantage is that you might end up with “hot” nodes because a single user might generate a majority of the traffic.
Usually, a good strategy for situations like this is to be really explicit about the constraints of the problem. A rule of thumb was coined by Jeff Dean for situations like this:
Ensure your design works if scale changes by 10X or 20X but the right solution for X [is] often not optimal for 100X
https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf
So, if in the near future, the system needs to support tens of concurrent users. The simplest that will support tens to hundreds of concurrent users (probably a map or cache with no user sharding is sufficient). That design will have to change before the system can support thousands of concurrent users. Scaling a system is often a good problem to have because it usually indicates a successful project.

Redis: Using lua and concurrent transactions

Two issues
Do lua scripts really solve all cases for redis transactions?
What are best practices for asynchronous transactions from one client?
Let me explain, first issue
Redis transactions are limited, with an inability to unwatch specific keys, and all keys being unwatched upon exec; we are limited to a single ongoing transaction on a given client.
I've seen threads where many redis users claim that lua scripts are all they need. Even the redis official docs state they may remove transactions in favour of lua scripts. However, there are cases where this is insufficient, such as the most standard case: using redis as a cache.
Let's say we want to cache some data from a persistent data store, in redis. Here's a quick process:
Check cache -> miss
Load data from database
Store in redis
However, what if, between step 2 (loading data), and step 3 (storing in redis) the data is updated by another client?
The data stored in redis would be stale. So... we use a redis transaction right? We watch the key before loading from db, and if the key is updated somewhere else before storage, storage would fail. Great! However, within an atomic lua script, we cannot load data from an external database, so lua cannot be used here. Hopefully I'm simply missing something, or there is something wrong with our process.
Moving on to the 2nd issue (asynchronous transactions)
Let's say we have a socket.io cluster which processes various messages, and requests for a game, for high speed communication between server and client. This cluster is written in node.js with appropriate use of promises and asynchronous concepts.
Say two requests hit a server in our cluster, which require data to be loaded and cached in redis. Using our transaction from above, multiple keys could be watched, and multiple multi->exec transactions would run in overlapping order on one redis connection. Once the first exec is run, all watched keys will be unwatched, even if the other transaction is still running. This may allow the second transaction to succeed when it should have failed.
These overlaps could happen in totally separate requests happening on the same server, or even sometimes in the same request if multiple data types need to load at the same time.
What is best practice here? Do we need to create a separate redis connection for every individual transaction? Seems like we would lose a lot of speed, and we would see many connections created just from one server if this is case.
As an alternative we could use redlock / mutex locking instead of redis transactions, but this is slow by comparison.
Any help appreciated!
I have received the following, after my query was escalated to redis engineers:
Hi Jeremy,
Your method using multiple backend connections would be the expected way to handle the problem. We do not see anything wrong with multiple backend connections, each using an optimistic Redis transaction (WATCH/MULTI/EXEC) - there is no chance that the “second transaction will succeed where it should have failed”.
Using LUA is not a good fit for this problem.
Best Regards,
The Redis Labs Team

Azure SQL Replication

I have an application that, for performance reasons, will have completely independent standalone instances in several Azure data centers. The stack of Azure IaaS and PaaS components at each data center will be exactly the same. Primarily, there will be a front end application and a database.
So let's say I have the application hosted in 4 data centers. I would like to have the data coming into each Azure SQL database replicate it's data asynchronously to all of the other 3 databases, in an eventually consistent manner. Each of these databases needs to be updatable.
Does anyone know if Active Geo-Replication can handle this scenario? I know I can do this using a VM and IaaS, but would prefer to use SQL Azure.
Thanks...
Peer-to-peer tranasaction replication supports what you're asking for, to some extent - I'm assuming that's what you're referring to when you mention setting it up in IaaS, but it seems like it would be self defeating if you're looking to it for a boost in write performance (and against their recommendations):
From https://msdn.microsoft.com/en-us/library/ms151196.aspx
Although peer-to-peer replication enables scaling out of read operations, write performance for the topology is like that for a single node. This is because ultimately all inserts, updates, and deletes are propagated to all nodes. Replication recognizes when a change has been applied to a given node and prevents changes from cycling through the nodes more than one time. We strongly recommend that write operations for each row be performed at only node, for the following reasons:
If a row is modified at more than one node, it can cause a conflict or even a lost update when the row is propagated to other nodes.
There is always some latency involved when changes are replicated. For applications that require the latest change to be seen immediately, dynamically load balancing the application across multiple nodes can be problematic.
This makes me think that you'd be better off using Active Geo Replication - you get the benefit of PaaS and not having to manage your own VMs, not having to manage TR, which gets messy, and if the application is built to deal with "eventual consistency" in the UI, you might be able to get away with slight delays in the secondaries being up to date.

Can I use NHibernate's AdoNetTransactionFactory with distributed transactions?

I am dealing with a strange issue related to NHibernate and distributed transactions in a WCF service. See Deadlocks causing 'Server failed to resume the transaction' with NHibernate and distributed transactions for more details.
One thing that seems to solve my problem is using NHibernate's AdoNetTransactionFactory, instead of AdoNetWithDistributedTransactionsFactory.
I believe that the AdoNetWithDistributedTransactionsFactory is involved with making NHibernate's second-level caching mechanism work right, but we're not using that. What (if any) other problems exist with using AdoNetTransactionFactory with distributed transactions?
Thanks for your time!
I notice that you mentioned from your other question/answer:
SqlConnection class is not thread-safe, and that includes closing the connection
on a separate thread. Based on this response we have filed a
bug report for NHibernate.
However, from NHibernate's documentation:
11.2. Threads and connections
You should observe the following practices when creating NHibernate Sessions:
Never create more than one concurrent ISession or ITransaction instance per database connection.
Be extremely careful when creating more than one ISession per database per transaction. The ISession itself keeps track of updates made to loaded objects, so a different ISession might see stale data.
The ISession is not threadsafe! Never access the same ISession in two concurrent threads. An ISession is usually only a single unit-of-work!
If you are trying to multi-thread the connection with NHibernate perhaps it is just not going to work. Have you considered a different ORM such as Entity Framework?
No matter what ORM you choose though, the database connection will not be thread safe. This is universal.
"many DB drivers are not thread safe. Using a singleton means that if you have many threads, they will all share the same connection. The singleton pattern does not give you thread saftey. It merely allows many threads to easily share a "global" instance." - https://stackoverflow.com/a/6507820/1026459
Using AdoNetTransactionFactory with distributed system transactions will cause those transaction to be ignored by NHibernate, which has the following consequences:
ConnectionReleaseMode.AfterTransaction will not be honored. Instead, NHibernate will release the connection after each statement, and so will re-acquire a connection from the pool for the next one. Depending on your data provider, this may trigger escalation of the transaction to distributed.
FlushMode.Commit will not be honored. Explicit flushes will be required instead. (Auto flushes before queries may still occur.)
Works needing to be isolated from current system transaction will still be included inside it. (Unless the connection string Enlist property is false.) Such works may include id generators queries such as retrieving the next high value for a table hilo generator. If the transaction gets roll-backed, NHibernate may then use conflicting ids.
The NHibernate session will not be able to correctly track locks it holds on entities. Considering itself outside of a transaction, it will consider it has no lock on them. So it may try (on user code request by example) to re-lock them with lower lock level than the one the transaction already holds on them in database. Not sure what outcome could result of that. (At best, ignored, at worst...)
Second level cache will be disabled as soon as you start modifying data. NHibernate sort of "invalidate" cache entries in such situation, and re-enable them only on transaction completion, updated. But since it will not be aware of transactions...
Some extensions (maybe Envers) may rely on NHibernate transaction events, and will no more work as expected.
I strongly recommend upgrading to nhibernate 3.2(or a version close to it). Why? Since 2.1, there has been significant improvements (read rewrite) to the AdoNetWithDistributedTransactionFactory. Matter of fact, it now handles TransactionScopes/ambient-transactions and the like correctly. When we ran 2.1 in production we encounter many issues related to distributed transactions. We pretty much had to fix a ton of stuff ourselves and recompile NHibernate. 3.2 seems to have fixed many issues around the subject.
I don't have the source near me but, if memory doesn't fail me, the AdoNetTransactionFactory doesn't check/handle ambient transactions. So, you are down to NHibernate booting transactions when one is not present in the session(by means of ISession.BeginTransaction()).

Is this a good use-case for Redis on a ServiceStack REST API?

I'm creating a mobile app and it requires a API service backend to get/put information for each user. I'll be developing the web service on ServiceStack, but was wondering about the storage. I love the idea of a fast in-memory caching system like Redis, but I have a few questions:
I created a sample schema of what my data store should look like. Does this seems like it's a good case for using Redis as opposed to a MySQL DB or something like that?
schema http://www.miles3.com/uploads/redis.png
How difficult is the setup for persisting the Redis store to disk or is it kind of built-in when you do writes to the store? (I'm a newbie on this NoSQL stuff)
I currently have my setup on AWS using a Linux micro instance (because it's free for a year). I know many factors go into this answer, but in general will this be enough for my web service and Redis? Since Redis is in-memory will that be enough? I guess if my mobile app skyrockets (hey, we can dream right?) then I'll start hitting the ceiling of the instance.
What to think about when desigining a NoSQL Redis application
1) To develop correctly in Redis you should be thinking more about how you would structure the relationships in your C# program i.e. with the C# collection classes rather than a Relational Model meant for an RDBMS. The better mindset would be to think more about data storage like a Document database rather than RDBMS tables. Essentially everything gets blobbed in Redis via a key (index) so you just need to work out what your primary entities are (i.e. aggregate roots)
which would get kept in its own 'key namespace' or whether it's non-primary entity, i.e. simply metadata which should just get persisted with its parent entity.
Examples of Redis as a primary Data Store
Here is a good article that walks through creating a simple blogging application using Redis:
http://www.servicestack.net/docs/redis-client/designing-nosql-database
You can also look at the source code of RedisStackOverflow for another real world example using Redis.
Basically you would need to store and fetch the items of each type separately.
var redisUsers = redis.As<User>();
var user = redisUsers.GetById(1);
var userIsWatching = redisUsers.GetRelatedEntities<Watching>(user.Id);
The way you store relationship between entities is making use of Redis's Sets, e.g: you can store the Users/Watchers relationship conceptually with:
SET["ids:User>Watcher:{UserId}"] = [{watcherId1},{watcherId2},...]
Redis is schema-less and idempotent
Storing ids into redis sets is idempotent i.e. you can add watcherId1 to the same set multiple times and it will only ever have one occurrence of it. This is nice because it means you don't ever need to check the existence of the relationship and can freely keep adding related ids like they've never existed.
Related: writing or reading to a Redis collection (e.g. List) that does not exist is the same as writing to an empty collection, i.e. A list gets created on-the-fly when you add an item to a list whilst accessing a non-existent list will simply return 0 results. This is a friction-free and productivity win since you don't have to define your schemas up front in order to use them. Although should you need to Redis provides the EXISTS operation to determine whether a key exists or a TYPE operation so you can determine its type.
Create your relationships/indexes on your writes
One thing to remember is because there are no implicit indexes in Redis, you will generally need to setup your indexes/relationships needed for reading yourself during your writes. Basically you need to think about all your query requirements up front and ensure you set up the necessary relationships at write time. The above RedisStackOverflow source code is a good example that shows this.
Note: the ServiceStack.Redis C# provider assumes you have a unique field called Id that is its primary key. You can configure it to use a different field with the ModelConfig.Id() config mapping.
Redis Persistance
2) Redis supports 2 types persistence modes out-of-the-box RDB and Append Only File (AOF). RDB writes routine snapshots whilst the Append Only File acts like a transaction journal recording all the changes in-between snapshots - I recommend adding both until your comfortable with what each does and what your application needs. You can read all Redis persistence at http://redis.io/topics/persistence.
Note Redis also supports trivial replication you can read more about at: http://redis.io/topics/replication
Redis loves RAM
3) Since Redis operates predominantly in memory the most important resource is that you have enough RAM to hold your entire dataset in memory + a buffer for when it snapshots to disk. Redis is very efficient so even a small AWS instance will be able to handle a lot of load - what you want to look for is having enough RAM.
Visualizing your data with the Redis Admin UI
Finally if you're using the ServiceStack C# Redis Client I recommend installing the Redis Admin UI which provides a nice visual view of your entities. You can see a live demo of it at:
http://servicestack.net/RedisAdminUI/AjaxClient/