Sharing variables across multiple sessions - apache

I know I cannot have a global variable in my backend code (java or php or something else) and have different users (and hence sessions) see the same value. If I need to share some values across these user sessions I need to write them to a DB and read it out every time. This seems awfully wasteful to me.
I understand that an apache process (or the app server) will fork and so having global values will not work but if I am looking at a specialized application is there a web server that lets me do this? This should be possible in a web server that uses threads instead of forking processes. But if I need to share global memory I will need to have some kind of locks to properly access them. I understand that it could (and mostly will) get really buggy but will it degrade performance compared to a DB?
Thoughts?
Pav

I'm not sure that's entirely true. Apache will handle each user connection individually - correct. However, I know that in Java it is possible to have a Singleton object that exists for the life of the application, in which you could potentially store values to be used across all user sessions.
When handling each user connection on the server side, each access to this Singleton will access the same object - therefore the same values.
You might want to do some more research into application scope objects as well. I'm not sure exactly what you're trying to achieve due to lack of a use case, but you may find that Java web apps can do more than you expect in this area.

Related

How to cache connections to different Postgres/MySQL databases in Golang?

I am having an application where different users may connect to different databases (those can be either MySQL or Postgres), what might be the best way to cache those connections across different databases? I saw some connection pools but seems like they are more for one db multiple connections than for multiple db multiple connections.
PS:
For adding more context, I am designing a multi tenant architecture where each tenant connects to one or multiple databases, I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases. Or should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?
An example for the software that I want to build is https://www.sigmacomputing.com/ where the user can connects to multiple databases for working with different tables.
Both MySQL and Postgres do not allow to connection sharing between multiple database users, single database user is specified in connection credentials. If you mean that your different users have their own database credentials, then it is not possible to share connections between them.
If by "different users" you mean your application users and if they share single database user to access DB deeper in the app, then you don't need to do anything particular to "cache" connections. sql.DB keeps and reuses open connections in its pool by default.
Go automatically opens, closes and reuses DB connections with a *database/sql.DB. By default it keeps up to 2 connections open (idle) and opens unlimited number of new connections under concurrency when all opened connections are already busy.
If you need some fine tuning on pool efficiency vs database load, you may want to alter sql.DB config with .Set* methods, for example SetMaxOpenConns.
You seem to have to many unknowns. In cases like this I would apply good, old agile and start with prototype of what you want to achieve with tools that you already know and then benchmark the performance. I think you might be surprised how much go can handle.
Since you understand how to use map[string]*sql.DB for that purpose I would go with that. You reach some limits? Add another machine behind haproxy. Solving scaling problem doesn't necessary mean writing new db pool in go. Obviously if you need this kind of power you can always do it - pgx postgres driver has it's own pool implementation so you can get your inspiration there. However doing this right now seems to be pre-mature optimization - solving problem you don't have yet. Building prototype with map[string]*sql.DB is easy, test it, benchmark it, you will see if you need more.
p.s. BTW you will most likely hit first file descriptor limit before you will be able to exhaust memory.
Assuming you have multiple users with multiple databases with an N to N relation, you could have a map of a database URL to database details (explained below).
The fact that which users have access to which databases should be handled anyway using configmap or a core database; For Database Details, we could have a struct like this:
type DBDetail {
sync.RWMutex
connection *sql.DB
}
The map would be database URL to database's details (dbDetail) and if a user is write it calls this:
dbDetail.Lock()
defer dbDetail.Unock()
and for reads instead of above just use RLock.
As said by vearutop the connections could be a pain but using this you could have a single connection or set the limit with increment and decrement of another variable after Lock.
There isn’t necessarily a correct architectural answer here. It depends on some of the constraints of the system.
I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases.
Whether this will scale sufficiently depends on the expectation of how numerous the databases will be. If there are expected to be tens or hundreds of concurrent users in the near future, is probably sufficient. Often a good next step after using a map is to transition over to a more full featured cache (for example https://github.com/dgraph-io/ristretto).
A factor in the decision of whether to use a map or cache is how you imagine the lifecycle of a database connection. Once a connection is opened, can that connection remain opened for the remainder of the lifetime of the process or do connections need to be closed after minutes of no use to free up resources.
Should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?
The right answer here depends on how many processing nodes are expected and whether there will be gain additional benefits from routing requests to specific machines. For example, row-level caching and isolating users from each other’s requests is an advantage that would be gained by sharing users across the pool. But a disadvantage is that you might end up with “hot” nodes because a single user might generate a majority of the traffic.
Usually, a good strategy for situations like this is to be really explicit about the constraints of the problem. A rule of thumb was coined by Jeff Dean for situations like this:
Ensure your design works if scale changes by 10X or 20X but the right solution for X [is] often not optimal for 100X
https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf
So, if in the near future, the system needs to support tens of concurrent users. The simplest that will support tens to hundreds of concurrent users (probably a map or cache with no user sharding is sufficient). That design will have to change before the system can support thousands of concurrent users. Scaling a system is often a good problem to have because it usually indicates a successful project.

Double way API?

For the moment, all my customers are in the same db, same domain etc… on my majestic monolith on https://www.mystartup.com.
Let’s say I want to deploy an instance of my rails app for one big customer. And let’s say I may deploy other instances of this rails app in the future.
The thing is that I am fetching and computing some heavy data, and I want to do it once instead of in all the instances. So I guess I should do them in https://secret-api.mystartup.com, and each of the instance should make requests to it with secret access token.
But my issue is this one : is there a way for https://secret-api.mystartup.com to trigger some calls to each of the domains, when needed? Is this what we call “webhooks”? or is there some double-way-api concept that I am missing?
One question you need to answer is that what if this secret-api server need to be restarted ! You loose all that heavy lifting computation..
Another problem with above solution is that it is going against micro-services architecture in a way.. because you are having a single server for secret-api.. What if this goes down; then your whole system goes down... With micro-services; for high availability you should always have multiple servers for same api.
For such scenarios; when there is heavy lifting to be done, one solution could be that have an in-memory layer in between something like memcached or redis..keep your solutions in this in-meory server and NOT inside a cache maintained inside your secret-api server.. This solution will solve both above mentioned problems.

Multiple application on network with same SQL database

I will have multiple computers on the same network with the same C# application running, connecting to a SQL database.
I am wondering if I need to use the service broker to ensure that if I update record A in table B on Machine 1, the change is pushed to Machine 2. I have seen applications that need to use messaging servers to accomplish this before but I was wondering why this is necessary, surely if they connect to the same database, any changes from one machine will be reflected on the other?
Thanks :)
This is mostly about consistency and latency.
If your applications always perform atomic operations on the database, and they always read whatever they need with no caching, everything will be consistent.
In practice, this is seldom the case. There's plenty of hidden opportunities for caching, like when you have an edit form - it has the values the entity had before you started the edit process, but what if someone modified those in the mean time? You'd just rewrite their changes with your data.
Solving this is a bunch of architectural decisions. Different scenarios require different approaches.
Once data is committed in the database, everyone reading it will see the same thing - but only if they actually get around to reading it, and the two reads aren't separated by another commit.
Update notifications are mostly concerned with invalidating caches, and perhaps some push-style processing (e.g. IM client might show you a popup saying you got a new message). However, SQL Server notifications are not reliable - there is no guarantee that you'll get the notification, and even less so that you'll get it in time. This means that to ensure consistency, you must not depend on the cached data, and you have to force an invalidation once in a while anyway, even if you didn't get a change notification.
Remember, even if you're actually using a database that's close enough to ACID, it's usually not the default setting (for performance and availability, mostly). You need to understand what kind of guarantees you're getting, and how to write code to handle this. Even the most perfect ACID database isn't going to help your consistency if your application introduces those inconsistencies :)

Advantage of LDAP over RDBMS?

I have an application with a backend as database.
The application is sort of PUB-SUB model where users post changes to the application and other peers subscribe to those changes. These changes may happen very frequently or periodically and all the changes have to be written to database.
Now, I am being asked to find the possibility of replacing this RDBMS with LDAP. Probably they want unified DB for all applications but anyways I have to find the advantage/disadvantages of both approaches.
I cannot directly compare RDBMS a with LDAP as I have almost no idea of LDAP though I tried to get some.
I understand that LDAP is designed for directory access and is optimized for Read access, so it is write once and read many. I have read that frequent writes will reduce the performance of LDAP server as each write will result a trigger to indexing process.
Just to give a scenario in regards with indexing in LDAP, my table will have few columns say 2 viz. Name and Desc. Now in LDAP I suppose this would become two attributes as Name and Desc. In my scenario it's Desc which will be frequently updated. I assume Name will be indexed so even if Desc is changing frequently it won't trigger indexing process.
I point is worth mentioning that the database will be hosted on some cloud platform.
I tried to find out the differences but nothing conclusive I could find out.
LDAP is a protocol, REST is a service based on the HTTP (protocol). So when the LDAP server shall not be exposed to the internet, how do you want to get the data from it? As LDAP is the protocol you would need direct access to the LDAP-server. Its like a database server that you would not expose directly to the internet. You would build an interface to encapsulate it. and that might as well be a REST interface.
I'd try to get the point actos that one is the transfer protocol and a storage backend and the ither is the public interface to its data. It's a bit like why is mysql better than a webinterface. You'd never make the mysql-server publicly available but encapsulate its protocol into an application.
REST is an interface. It doesn't matter how you orgsnize your data behind that interface. When you decide that you want to organize it differently you can do so without the consumer of your API noticing any change. And you can provide different versions of your API depending on improvements of your service.
LDAP on the other hand is an implementation. You can't change the way your data is handled without the consumer noticing it. So there's no way to rearrange your backend without affecting the consumer.
With REST you can therefore change the backend from MySQL to PostgreSQL even to LDAP without notice which you won't be able with LDAP.
Hope that helps
Now that we finally know what you're actually asking, which has nothing to do with your title, the body of your question, or REST, the simple answer is that there is no particular reason to believe that an LDAP server will perform significantly better than an RDBMS in this application, with two riders:
it may not even be feasible, due to the schema issue, and
if it is feasible it may not be semantically suitable, due to the lack of ACID properties, lack of JOINs, and the other issues mentioned in comments.
I will state that this is one of the worst formulated questions I have seen here for some considerable time, and the difficulty of extracting the actual question was extreme.

Creating a different user for each concern of my application!

I want to create my site and in the page have it so that the forum pages will use the forum mysql user having privileges on mydb.forum_table, mydb_forum_table2.
and the profile page to use the profile user having access to mydb.users and mydb.profiefields
and so on with the photogallery, blog, chat and...
is this the right way to do it! I'm thinking of principle of least privileges but I wonder why I haven't seen other big known CMS do it!
One of the critical resources for a database is connections. Generally databases are configured with a maximum number of connections, an each time a process needs to make a query, it needs a connection to do so. Database connections are expensive objects to create -- they take time and memory, and most importantly, connections are established for a specific user. The generally accepted 'best practice' for web applications is for the application, when it needs a database connection, to check a pool for an available connection. If there's a free connection in the pool, the web app will pull that connection, use it as necessary, and then return it to the pool for reuse. If there are no free connections, the app will create a new one, use it, and then place it in the pool for reuse.
If you're dealing with an application that uses multiple database users (for privilege management) and you need to use connection pooling, your application will need to establish many pools (one for each user), which will usually result in your application acquiring at least one connection for each database user it is using. This is inefficient, error prone, and needlessly complex.
If you're truly intent on limiting your application's access to data, then you should probably investigate how much support your database has for views. If views are well-supported, then you can create a view (or views) that are customized to the needs any given portion of your application.
My recommendation would be to stick to a single database user, and then use the time you just freed up to do more debugging of your application. You'll get better results, and will aggravate fewer DBAs.
If I understand correctly, the question is about implementing module access control based on the permissions on the tables that are used by the module.
I think it would be complicated to maintain (the link between modules, and tables), and slow to have to check the permissions on each table accessed by the module.