Difference between #multitenant(SINGLE_TABLE) and #multitenant(VPD) in EclipseLink - eclipselink

Despite the fact I searched a lot in the internet and read many articles, I still be unable to get the difference between the Single_Table strategy and the VPD strategy of EclipseLink.
At first I thought that the "VPD" is the implementation of the "Separate database strategy" but then I discovered that we use the same table for all tenants in both strategies.
Could anyone clarify the difference between the two strategies please?

The main difference is the level where the filtering is done.
When using SINGLE_TABLE multitenancy, EclipseLink is responsible for including tenant_id in all generated queries.
When using VPD, filtering is done on database level. So, EclipseLink will generate SQL queries which don't include tenant_id, and database will take care of the filtering.
Quote from the documentation
VPD allows users to identify themselves as a specific user, and will be able to 'see' data specific to that user. All result limiting is done at the database level, removing the need to send special SQL containing an additional comparison.

Related

Apache Ignite cache to SQL and vice versa

I'm working on a system which will fetch data from a service and put pieces of the response in to a cache and/or into a SQL table.
The cache is needed for consumption by other Java services directly. These services require a more direct connection than the SQL abstraction, so we need to connect directly to the cache.
The table is needed for a JDBC SQL connection to external SQL clients e.g. SQL Workbench, DBeaver, Tableau, 3rd party systems.
My question is how Ignite works regarding caches vs tables. I know it stores its caches as maps similar to other IMDGs. What I guess I don't understand is how that gets turned into a table, or what APIs are available to set/get between the two.
So the question is, how can I take an INSERT from the JDBC/SQL side and query it via the Cache? How can I add() into the Cache and SELECT it from the JDBC/SQL side? If I have a table named "foo", does that also create a cache named "foo"?
Or am I supposed to use one or the other and not bleed between the two? I haven't found many good examples of this, so it seems to be either you use caches or you use tables.
It would be extremely advantageous to have a bridge between the two. We're migrating to Ignite from an H2 implementation where we mushed a Hazelcast cache and H2's SQL together and are hoping Ignite, being built atop H2, has done something similar already.
In particular, I was hoping to use DataStreamers but I'm not finding much in the way of how it relates to the SQL/table side of things.
Ignite cache falls under key-value type of nosql database. You can fire SQL like query from java code to ignite caches as it supports it. For example,
SELECT _KEY, _VAL from "foo".val
Here, foo is your cache name and val is the value part of key-value pair. As this is all NOSQL, relating it to RDBMS SQL is not so much rational, still we can relate all non primary columns in SQL table to the fields of your value object and primary one to the key part.
So, in datastreamer, you can construct collection of key, value objects and stream it. This internally calls nothing but put operation on cache.
To select in SQL fashon, you can fire query like below-
SqlFieldsQuery query = new SqlFieldsQuery(queryString);
FieldsQueryCursor<List<?>> cursor = cache.query(query);
There are multiple ways to do this, SqlFieldsQuery is one of that.
This was answered couple times already, basically you need to refer to Query Entities, Indexed Types or key_type/value_type parameters of CREATE TABLE to make it work. I.e. every entry in cache of correct type will be a row of table and vice versa.

NHibernate, Caching and custom SQL queries

We're using NHibernate with Memcache as the second level cache. Occasionally there is a need for more advanced queries or bulk query operations. From the book Nhibernate in Action they recommend the following:
"It’s our view that ORM isn’t suitable for mass-update (or mass-delete) operations. If
you have a use case like this, a different strategy is almost always better: call a stored
procedure in the database, or use direct SQL UPDATE and DELETE statements for that
particular use case."
My concern is that queries against the underlying database do not reflect in the cache (at least until cache expiry) and I was wondering if anyone has come up with any effective strategies for mixing and matching NHibernate with custom SQL statements?
Is there any way of getting say a bulk Update statement (executed with custom sql) to reflect in the second level cache? I am aware of being able to manually evict, but this removes the items from cache and thefore increases hits on the database.
Does the community have any solutions that have been found to be effective in dealing with this problem?
As far as I know there is no method to keep the 2nd level cache up to date with massupdates. But you can partially evict the cache as described in: http://www.nhforge.org/doc/nh/en/index.html#performance-sessioncache.

Allow some SQL in public API?

I'm exposing a more or less public API that allows the user to query datasets from a database. Since the user will need to filter out specific datasets I'm tempted to accept the WHERE-part of the SELECT statement as an API parameter. Thus the user could perform queries as complex as she'd like without worrying about a cluttered API interface.
I'm aware of the fact that I would have to catch SQL-injection attempts.
Do you think that this would circumvent the purpose of an API wrapping a database too much or would you consider this a sane approach?
In general, I'd recommend against letting them embed actual sql in their requests
You can allow them to submit where conditions in their request pretty easily:
<where>
<condition "field"="name" "operator"="equal" "value"="Fred"/>
</where>
or something similar.
The value of doing this is muli-fold:
You parse each condition and make sure they're correct before running them
You can create 'fake' fields, such as "full_name" that may not exist.
You can limit the columns they can put conditions on
You can isolate the users from actual changes in your underlying database.
I think the last point is actually most important. The day will come when you'll need to make changes to the underlying schema of the database. Eventually, it will happen. At that point you'll appreciate having some 'translation' layer between what the users send in and the queries. It will allow you to isolate the users from actual changes in the underlying database.
The API should present an 'abstracted' version of the actual tables themselves that meet the users needs and isolate them from changes to the actual underlying database.
I would recommend limitting your users account by modifying the permissions to only allow the user to SELECT from tables. Don't allow updating, inserting, or deleting recordsets. Lock down the user as much as possibile, possibly at a table level.
If the WHERE clause is limited to only a few columns and the comparator is limited to >, = or < then perhaps you could just have the user pass in some extra parameters to represent columns and comparators. You then build the WHERE safely on your server side.
If this is too messy then by all means let them pass a full WHERE clause - it's not too hard to sanitise and if you combine that with running the query under a locked-down account (SELECT only), then any potential damage is limited.
Personally I would not want to allow users to be able to pass in SQL directly to my database, the risks are too great.
If you fail to catch all injection attempts you risk either data theft, someone just destroying your database or hijacking it for some other use that you really dont want.

couchdb read authentication

how can i handle read authentication in couchdb? i know roles can be defined in seperate databases but i want to implement read authentication on document level. i am thinking about using node.js but it does not seem an elegant solution because couchdb also has a http server and i dont want to add one more (or another application server like ruby or python). is there anyone working on this?
Thanks.
In the recent O'Reilly web cast on CouchDB, J. Chris Anderson mentioned that read authentication was best handled by a combination of partial replication and multiple databases per reader group. Each database would contain only the documents pertaining to that specific group.
It makes the most sense when you think of each readers CouchDB as a filtered instance of an authority database.
That's basically the correct answer. What I'd add is that document-level read control is hard to get right, especially in the presence of views. Filtering map rows at read-time is doable, but not very IO efficient. Generate reduction values based on filtered map rows, however, is prohibitively expensive.
For those reasons we encourage you to operate something like a database per access group, and make the entire database readable by all users.

Ideas for Combining Thousand Databases into One Database

We have a SQL server that has a database for each client, and we have hundreds of clients. So imagine the following: database001, database002, database003, ..., database999. We want to combine all of these databases into one database.
Our thoughts are to add a siteId column, 001, 002, 003, ..., 999.
We are exploring options to make this transition as smoothly as possible. And we would LOVE to hear any ideas you have. It's proving to be a VERY challenging problem.
I've heard of a technique that would create a view that would match and then filter.
Any ideas guys?
Create a client database id for each of the client databases. You will use this id to keep the data logically separated. This is the "site id" concept, but you can use a derived key (identity field) instead of manually creating these numbers. Create a table that has database name and id, with any other metadata you need.
The next step would be to create an SSIS package that gets the ID for the database in question and adds it to the tables that have to have their data separated out logically. You then can run that same package over each database with the lookup for ID for the database in question.
After you have a unique id for the data that is unique, and have imported the data, you will have to alter your apps to fit the new schema (actually before, or you are pretty much screwed).
If you want to do this in steps, you can create views or functions in the different "databases" so the old client can still hit the client's data, even though it has been moved. This step may not be necessary if you deploy with some downtime.
The method I propose is fairly flexible and can be applied to one client at a time, depending on your client application deployment methodology.
Why do you want to do that?
You can read about Multi-Tenant Data Architecture and also listen to SO #19 (around 40-50 min) about this design.
The "site-id" solution is what's done.
Another possibility that may not work out as well (but is still appealing) is multiple schemas within a single database. You can pull common tables into a "common" schema, and leave the customer-specific stuff in customer-specific schema. In some database products, however, the each schema is -- effectively -- a separate database. In other products (Oracle, DB2, for example) you can easily write queries that work in multiple schemas.
Also note that -- as an optimization -- you may not need to add siteId column to EVERY table.
Sometimes you have a "contains" relationship. It's a master-detail FK, often defined with a cascade delete so that detail cannot exist without the parent. In this case, the children don't need siteId because they don't have an independent existence.
Your first step will be to determine if these databases even have the same structure. Even if you think they do, you need to compare them to make sure they do. Chances are there will be some that are customized or missed an upgrade cycle or two.
Now depending on the number of clients and the number of records per client, your tables may get huge. Are you sure this will not create a performance problem? At any rate you may need to take a fresh look at indexing. You may need a much more powerful set of servers and may also need to partion by client anyway for performance.
Next, yes each table will need a site id of some sort. Further, depending on your design, you may have primary keys that are now no longer unique. You may need to redefine all primary keys to include the siteid. Always index this field when you add it.
Now all your queries, stored procs, views, udfs will need to be rewritten to ensure that the siteid is part of them. PAy particular attention to any dynamic SQL. Otherwise you could be showing client A's information to client B. Clients don't tend to like that. We brought a client from a separate database into the main application one time (when they decided they didn't still want to pay for a separate server). The developer missed just one place where client_id had to be added. Unfortunately, that sent emails to every client concerning this client's proprietary information and to make matters worse, it was a nightly process that ran in the middle of the night, so it wasn't known about until the next day. (the developer was very lucky not to get fired.) The point is be very very careful when you do this and test, test, test, and test some more. Make sure to test all automated behind the scenes stuff as well as the UI stuff.
what I was explaining in Florence towards the end of last year is if you had to keep the database names and the logical layer of the database the same for the application. In that case you'd do the following:
Collapse all the data into consolidated tables into one master, consolidated database (hereafter referred to as the consolidated DB).
Those tables would have to have an identifier like SiteID.
Create the new databases with the existing names.
Create views with the old table names which use row-level security to query the tables in the consolidated DB, but using the SiteID to filter.
Set up the databases for cross-database ownership chaining so that the service accounts can't "accidentally" query the base tables in the consolidated DB. Access must happen through the views or through stored procedures and other constructs that will enforce row-level security. Now, if it's the same service account for all sites, you can avoid the cross DB ownership chaining and assign the rights on the objects in the consolidated DB.
Rewrite the stored procedures to either handle the change (since they are now referring to views and they don't know to hit the base tables and include SiteID) or use InsteadOf Triggers on the views to intercept update requests and put the appropriate site specific information into the base tables.
If the data is large you could look at using a partioned view. This would simplify your access code as all you'd have to maintain is the view; however, if the data is not large, just add a column to identify the customer.
Depending on what the data is and your security requirements the threat of cross contamination may be a show stopper.
Assuming you have considered this and deem it "safe enough". You may need/want to create VIEWS or impose some other access control to prevent customers from seeing each-other's data.
IIRC a product called "Trusted Oracle" had the ability to partition data based on such a key (about the time Oracle 7 or 8 was out). The idea was that any given query would automagically have "and sourceKey = #userSecurityKey" (or some such) appended. The feature may have been rolled into later versions of the popular commercial product.
To expand on Gregory's answer, you can also make a parent ssis that calls the package doing the actual moving within a foreach loop container.
The parent package queries a config table and puts this in an object variable. The foreach loop then uses this recordset to pass variables to the package, such as your database name and any other details the package might need.
You table could list all of your client databases and have a flag to mark when you are ready to move them. This way you are not sitting around running the ssis package on 32,767 databases. I'm hooked on the foreach loop in ssis.