Apache Ignite cache to SQL and vice versa - ignite

I'm working on a system which will fetch data from a service and put pieces of the response in to a cache and/or into a SQL table.
The cache is needed for consumption by other Java services directly. These services require a more direct connection than the SQL abstraction, so we need to connect directly to the cache.
The table is needed for a JDBC SQL connection to external SQL clients e.g. SQL Workbench, DBeaver, Tableau, 3rd party systems.
My question is how Ignite works regarding caches vs tables. I know it stores its caches as maps similar to other IMDGs. What I guess I don't understand is how that gets turned into a table, or what APIs are available to set/get between the two.
So the question is, how can I take an INSERT from the JDBC/SQL side and query it via the Cache? How can I add() into the Cache and SELECT it from the JDBC/SQL side? If I have a table named "foo", does that also create a cache named "foo"?
Or am I supposed to use one or the other and not bleed between the two? I haven't found many good examples of this, so it seems to be either you use caches or you use tables.
It would be extremely advantageous to have a bridge between the two. We're migrating to Ignite from an H2 implementation where we mushed a Hazelcast cache and H2's SQL together and are hoping Ignite, being built atop H2, has done something similar already.
In particular, I was hoping to use DataStreamers but I'm not finding much in the way of how it relates to the SQL/table side of things.

Ignite cache falls under key-value type of nosql database. You can fire SQL like query from java code to ignite caches as it supports it. For example,
SELECT _KEY, _VAL from "foo".val
Here, foo is your cache name and val is the value part of key-value pair. As this is all NOSQL, relating it to RDBMS SQL is not so much rational, still we can relate all non primary columns in SQL table to the fields of your value object and primary one to the key part.
So, in datastreamer, you can construct collection of key, value objects and stream it. This internally calls nothing but put operation on cache.
To select in SQL fashon, you can fire query like below-
SqlFieldsQuery query = new SqlFieldsQuery(queryString);
FieldsQueryCursor<List<?>> cursor = cache.query(query);
There are multiple ways to do this, SqlFieldsQuery is one of that.

This was answered couple times already, basically you need to refer to Query Entities, Indexed Types or key_type/value_type parameters of CREATE TABLE to make it work. I.e. every entry in cache of correct type will be a row of table and vice versa.

Related

Degenerate a single SQL table into multiple domaines tables

In short: I have a client who wish to be able to add domain tables, without adding SQL tables.
I am working with an application in wich data are organized and made available with a postgresql catalogue. What I mean by catalogue is that the database hold the path to the actual data file(s) as well as some metadata.
Adding a new table means that the (Java class of the) client application has to be updated. This is a costly process for the client, who want us to find a way to let him add new kind of data in the catalogue, without having to change the schema.
I don't have many more specificities about the db itself and it's configuration as I'm usualy mostly a client of the said db.
My idea: to solve this was to have a generic table with the most often used columns (like date, comment etc.) and a column containing a domain key. The domain key would be used by the client application to request the kind of generic data is needed (and would have no meaning whatsoever to the db provider). Adding metadata could be done with a companion file within the catalogue and further filtering would have to be done on the client side.
Question: as I am by no mean an SQL expert, I would like to know if it is an acceptable solution, and what limitation I could be facing ? I'm thinking of performance, data volume etc. Or maybe a different approach, is advisable ?
Regarding expected volume, for a single domain data type, it could be arround 30 new entry per day.

SQL Server and Oracle terminology

SQL Server and Oracle terminology -
In SQL Server If I have two applications and want to keep the database completely separate, I could simply create 1 database for each application therefore I end up with 2 databases.
If I wanted to do the same thing in oracle, what do I need to create?
- create a new "Databases"? "Instance", "Schema", or "Tablespace" per application?
(Note, these two applications is the same application used by two different companies, that do not share data!)
Reference: http://www.codeproject.com/Tips/492342/Concept-mapping-between-SQL-Server-and-Oracle
Having worked with SQL Server a lot in the past, I have sympathy with trying to figure out how Oracle organizes things as I struggled with the same thing. My comments below are from SQL Server 2000 and 2003 so forgive me if things have changed since then.
Previous responders have been helpful. I think one problematic assumption here is that there is an exact "level" equivalency between SQL Server and Oracle. What I mean by "level" is something that occupies the same space in the hierarchies that you have diagrammed above (and which, btw, I think are a good place to start but might need a bit of editing in a couple of places, for example how you have diagrammed "user" and "schema" in the Oracle hierarchy, I might put them side-by-side.) I do not think these concept "levels" match exactly between the DB platforms.
A schema in Oracle is somewhat equivalent to a separate database in SQL Server but not entirely.
I would say that the "walls" -- not an exact technical term but oh well -- between databases in SQL server are a bit higher than the "walls" between schemas in Oracle. Others might disagree but here is my reasoning:
a. A schema in Oracle is a purely logical construct. It denotes who has ownership of objects. It has nothing to do with the physical location or layout of the objects. A tablespace (orthagonal concept, as noted by a previous poster) indicates the physical location of objects. A tablespace can hold objects that are in multiple schemas and vice versa. In SQL Server these two concepts are sort of merged into one -- a database is both tablespace and schema, more or less, although in some respects within a DB in SQL Server you then have multiple owners with various object ownership. This can get a bit confusing because as I remember (it's been a couple of years) if not using NT Authentication the users are defined at the server level and then have to "link" to the users in the individual DBs.
b. I remember finding it easier, or at least a bit simpler, to assure myself that users to two separate DBs in SQL Server had no access to the relative other user's DB than I have found it in Oracle.
c. Because a DB in SQL server represents both physical storage and logical ownership, you can detach the DB and move it to another SQL Server Instance and attach it. You can't do this with a schema in Oracle. I mean, you can datapump the data out or back it up or whatever to another server and another schema, but that all takes at least some scripting and such or at least a fair amount of clicking in Enterprise Manager. It doesn't give you the one-click "Detach DB" option that you have in SQL Server which makes it a lot easier to get the idea that SQL Server DBs are units that you can more-or-less move back and forth between databases.
To sum things up, I think either option would work. That is, 1) Create two separate instances of Oracle with one schema in each instance for each application, or 2) Create two separate schemas in one Oracle instance.
There are pros and cons for each option. Option 1 is probably going to be more work to set up and configure but will also give you more separation, independence, ability to have separate hardware, etc., for each DB. Option 2 will be quite a bit simpler but gives you less separation between the data and greater risk of configuration screw-ups or other things allowing users of one schema to access the other. It also means you have to be a bit more careful that someone writing a query accessing data in one schema doesn't use all the CPU and IO resources and starve a user on the other schema.
Also, yes, you could use pluggable databases in 12c. However, given the fact that you need to ask these questions (no shame, just pointing out where you're at) makes me hesitant about recommending what can easily be a more complex setup.
TL;DR -- SQL Server isn't Oracle and Oracle isn't SQL Server. Either option works and there are pros and cons to each.
If you're using 12.1 or later with the multitenant option, you could create separate pluggable databases in a single container database. The other option, which works in any version of Oracle, would be to create a separate schema. It would be possible, as well, to create a separate database, though that is generally not the preferred approach unless you have a particular need to do things like upgrade the database that one application is using without affecting the other.
Creating a Database
If you create a separate database, you'd end up with complete separate memory structures (i.e. the SGA and PGA for each database would be separate) as well as a completely separate set of background processes (each database would have its own log writer process(es) for example). That is a very heavyweight option-- you can't have too many databases on a single server before you start having a lot of contention for RAM, for scheduling all the background processes, etc. It does provide for the maximum separation between different applications-- each database can be running a different version of Oracle with a different set of initialization parameters-- but this also tends to increase the complexity of managing the environment. This generally only makes sense when you have third party applications that require a specific version of the database or a specific set of initialization parameters.
Creating a Schema
If you create a separate schema, you still have a single database so the two schemas are sharing the same memory structures (competing with each other for space in the SGA's buffer cache, for example), initialization parameters, etc. You have to exercise a modicum of planning to ensure that that the two don't interfere with each other-- you'd probably want to make sure that nether application creates public synonyms or at least that they won't wan to create the same public synonym as the other application-- but this is generally pretty trivial.
Creating a Pluggable Database
This only works in 12.1 and only if you have the multitenant option. This is the most similar to the SQL Server concept of creating a new database for each application.
You should create a new instance (schema) on the same database, where the schema in oracle is the same as the SQL server database

How to synchronize between a custom KnowledgeSyncProvider and a SqlSyncProvider?

So, I need to synchronize two data stores, one of which is a SQL database, and I felt it was natural to use the built in provider for that side. But unfortunately, I started running into trouble because the SqlSyncProvider doesn't use the ChangeDataRetriever and NotifyingChangeApplier, but instead communicates through some DbSyncContext object. Therefore, I had to derive from the SqlSyncProvider and override mainly the GetChangeBatch and ProcessChangeBatch methods so they become compatible with the rest of the Sync Framework.
But the trouble is that I believe that I'm missing something in that transformation. The result is that when I create a row in a SQL database, and synchronize to the other store, and delete the row (or update) in the other store, after syncing the changes don't appear in the SQL database. The problem is probably caused by the bulkdelete stored procedure which filters the delete table and separates rows that are created locally from the rows created elsewhere.
Does anybody know what could cause this problem? I would really like to see some samples or documentation regarding synchronization between a SQL provider and a custom provider.

Is there downside to creating and dropping too many tables on SQL Server

I would like to know if there is an inherent flaw with the following way of using a database...
I want to create a reporting system with a web front end, whereby I query a database for the relevant data, and send the results of the query to a new data table using "SELECT INTO". Then the program would make a query from that table to show a "page" of the report. This has the advantage that if there is a lot of data, this can be presented a little at a time to the user as pages. The same data table can be accessed over and over while the user requests different pages of the report. When the web session ends, the tables can be dropped.
I am prepared to program around issues such as tracking the tables and ensuring they are dropped when needed.
I have a vague concern that over a long period of time, the database itself might have some form of maintenance problems, due to having created and dropped so many tables over time. Even day by day, lets say perhaps 1000 such tables are created and dropped.
Does anyone see any cause for concern?
Thanks for any suggestions/concerns.
Before you start implementing your solution consider using SSAS or simply SQL Server with a good model and properly indexed tables. SQL Server, IIS and the OS all perform caching operations that will be hard to beat.
The cause for concern is that you're trying to write code that will try and outperform SQL Server and IIS... This is a classic example of premature optimization. Thousands and thousands of programmer hours have been spent on making sure that SQL Server and IIS are as fast and efficient as possible and it's not likely that your strategy will get better performance.
First of all: +1 to #Paul Sasik's answer.
Now, to answer your question (if you still want to go with your approach).
Possible causes of concern if you use VARBINARY(MAX) columns (from the MSDN)
If you drop a table that contains a VARBINARY(MAX) column with the
FILESTREAM attribute, any data stored in the file system will not be
removed.
If you do decide to go with your approach, I would use global temporary tables. They should get DROPped automatically when there are no more connections using them, but you can still DROP them explicitly.
In your query you can check if they exist or not and create them if they don't exist (any longer).
IF OBJECT_ID('mydb..##temp') IS NULL
-- create temp table and perform your query
this way, you have most of the logic to perform your queries and manage the temporary tables together, which should make it more maintainable. Plus they're built to be created and dropped, so it's quite safe to think SQL Server would not be impacted in any way by creating and dropping a lot of them.
1000 per day should not be a concern if you talk about small tables.
I don't know sql-server, but in Oracle you have the concept of temporary table(small article and another) . The data inserted in this type of table is available only on the current session. when the session ends, the data "disapear". In this case you don't need to drop anything. Every user insert in the same table, and his data is not visible to others. Advantage: less maintenance.
You may check if you have something simmilar in sql-server.

Ideas for Combining Thousand Databases into One Database

We have a SQL server that has a database for each client, and we have hundreds of clients. So imagine the following: database001, database002, database003, ..., database999. We want to combine all of these databases into one database.
Our thoughts are to add a siteId column, 001, 002, 003, ..., 999.
We are exploring options to make this transition as smoothly as possible. And we would LOVE to hear any ideas you have. It's proving to be a VERY challenging problem.
I've heard of a technique that would create a view that would match and then filter.
Any ideas guys?
Create a client database id for each of the client databases. You will use this id to keep the data logically separated. This is the "site id" concept, but you can use a derived key (identity field) instead of manually creating these numbers. Create a table that has database name and id, with any other metadata you need.
The next step would be to create an SSIS package that gets the ID for the database in question and adds it to the tables that have to have their data separated out logically. You then can run that same package over each database with the lookup for ID for the database in question.
After you have a unique id for the data that is unique, and have imported the data, you will have to alter your apps to fit the new schema (actually before, or you are pretty much screwed).
If you want to do this in steps, you can create views or functions in the different "databases" so the old client can still hit the client's data, even though it has been moved. This step may not be necessary if you deploy with some downtime.
The method I propose is fairly flexible and can be applied to one client at a time, depending on your client application deployment methodology.
Why do you want to do that?
You can read about Multi-Tenant Data Architecture and also listen to SO #19 (around 40-50 min) about this design.
The "site-id" solution is what's done.
Another possibility that may not work out as well (but is still appealing) is multiple schemas within a single database. You can pull common tables into a "common" schema, and leave the customer-specific stuff in customer-specific schema. In some database products, however, the each schema is -- effectively -- a separate database. In other products (Oracle, DB2, for example) you can easily write queries that work in multiple schemas.
Also note that -- as an optimization -- you may not need to add siteId column to EVERY table.
Sometimes you have a "contains" relationship. It's a master-detail FK, often defined with a cascade delete so that detail cannot exist without the parent. In this case, the children don't need siteId because they don't have an independent existence.
Your first step will be to determine if these databases even have the same structure. Even if you think they do, you need to compare them to make sure they do. Chances are there will be some that are customized or missed an upgrade cycle or two.
Now depending on the number of clients and the number of records per client, your tables may get huge. Are you sure this will not create a performance problem? At any rate you may need to take a fresh look at indexing. You may need a much more powerful set of servers and may also need to partion by client anyway for performance.
Next, yes each table will need a site id of some sort. Further, depending on your design, you may have primary keys that are now no longer unique. You may need to redefine all primary keys to include the siteid. Always index this field when you add it.
Now all your queries, stored procs, views, udfs will need to be rewritten to ensure that the siteid is part of them. PAy particular attention to any dynamic SQL. Otherwise you could be showing client A's information to client B. Clients don't tend to like that. We brought a client from a separate database into the main application one time (when they decided they didn't still want to pay for a separate server). The developer missed just one place where client_id had to be added. Unfortunately, that sent emails to every client concerning this client's proprietary information and to make matters worse, it was a nightly process that ran in the middle of the night, so it wasn't known about until the next day. (the developer was very lucky not to get fired.) The point is be very very careful when you do this and test, test, test, and test some more. Make sure to test all automated behind the scenes stuff as well as the UI stuff.
what I was explaining in Florence towards the end of last year is if you had to keep the database names and the logical layer of the database the same for the application. In that case you'd do the following:
Collapse all the data into consolidated tables into one master, consolidated database (hereafter referred to as the consolidated DB).
Those tables would have to have an identifier like SiteID.
Create the new databases with the existing names.
Create views with the old table names which use row-level security to query the tables in the consolidated DB, but using the SiteID to filter.
Set up the databases for cross-database ownership chaining so that the service accounts can't "accidentally" query the base tables in the consolidated DB. Access must happen through the views or through stored procedures and other constructs that will enforce row-level security. Now, if it's the same service account for all sites, you can avoid the cross DB ownership chaining and assign the rights on the objects in the consolidated DB.
Rewrite the stored procedures to either handle the change (since they are now referring to views and they don't know to hit the base tables and include SiteID) or use InsteadOf Triggers on the views to intercept update requests and put the appropriate site specific information into the base tables.
If the data is large you could look at using a partioned view. This would simplify your access code as all you'd have to maintain is the view; however, if the data is not large, just add a column to identify the customer.
Depending on what the data is and your security requirements the threat of cross contamination may be a show stopper.
Assuming you have considered this and deem it "safe enough". You may need/want to create VIEWS or impose some other access control to prevent customers from seeing each-other's data.
IIRC a product called "Trusted Oracle" had the ability to partition data based on such a key (about the time Oracle 7 or 8 was out). The idea was that any given query would automagically have "and sourceKey = #userSecurityKey" (or some such) appended. The feature may have been rolled into later versions of the popular commercial product.
To expand on Gregory's answer, you can also make a parent ssis that calls the package doing the actual moving within a foreach loop container.
The parent package queries a config table and puts this in an object variable. The foreach loop then uses this recordset to pass variables to the package, such as your database name and any other details the package might need.
You table could list all of your client databases and have a flag to mark when you are ready to move them. This way you are not sitting around running the ssis package on 32,767 databases. I'm hooked on the foreach loop in ssis.