GridGain SQL queries without data model + other GridGain SQL questions - sql

I have been checking out GridGain for a while and came across some features regarding GridGain's SQL capabilities, which led me to some questions (that I couldn't find a firm answer in the docs)
From the examples, there is always an explicit data model. I am using Java, so that means there's always a class definition of the model to be queried for. The examples in the API docs: http://atlassian.gridgain.com/wiki/display/GG60/SQL,+Scan,+And+Full+Text+Queries begin by showing how properties much be annotated, which suggests to me an explicit model is always required. Properties of the model can be annotated for SQL querying such as "#GridCacheQuerySqlField". Is an explicit data model always required? Ideally, I would like a way to not have to explicitly state the model, as my use case does change often and has complex relations.
What subset of SQL queries can be performed through GridGain's SQL API? My use cases often require very complex queries. For example, in the docs (same link as above) it states that "Continuous Queries cannot be used with SQL. Only predicate-based queries are supported." where can I find what subset of SQL is supported (and under what conditions, as the example provided does not perform continuous sql queries unless the condition that queries are predicate-based is met)
Thanks in advance for the insight

GridGain has support for non-fixed data model in Enterprise version, namely portable objects. Portable objects allow you to render data model as a map-like nesting structure which allows dynamic structure changes, indexing and portability across different languages (Java, C#, .NET). You can take a look at portable objects in GridGain Enterprise edition examples and read documentation here: http://entdoc.gridgain.org/latest/Portable+Cross+Platform+Objects In open-source version explicit class definition is always required.
The SQL limitations are described in GridCacheQuery javadoc: http://gridgain.com/sdk/6.5.0/javadoc/org/gridgain/grid/cache/query/GridCacheQuery.html
Group by and sort by statements are applied separately on each node, so result set will likely be incorrectly grouped or sorted after results from multiple remote nodes are grouped together.
Aggregation functions like sum, max, avg, etc. are also applied on each node. Therefore you will get several results containing aggregated values, one for each node.
Joins will work correctly only if joined objects are stored in collocated mode or at least one side of the join is stored in REPLICATED cache.

Related

Can I use straight SQL in Django models?

And, if I can, does that mean I lose my advantage of treating the results as objects? I find complex queries confusing in many ORMs, not just Django's. But, it is probably because I have never really used an ORM. Does anyone use straight up SQL anymore?
edit: Am I defeating the purpose of having a framework if I bypass the ORM completely? They all have a "nifty" ORM, but when it comes to queries with lots of subqueries, derived tables, it doesn't look pretty.
Using Django's QuerySet API you have different possibilities:
You can use extra() which will return a queryset which evaluates to model objects. Therefore it is, as the name says, somehow limited, because for returning model instances it is necessary to eg. query the model's table. But you have the possibility to add additional SQL eg. the WHERE or ORDER clause. Querysets that use extra() can still use the features of the ORM - like chaining multiple filter() for example.
raw() returns a RawQueryset which also can be iterated over to get model instances, but you loose a lot of features that the ORM would normally provide.
And of course you can execute SQL directly, using a low level connection cursor API (no model instances of course).
Study the documentation on raw queries, there's also a lot of information on eg. how to map a model's fields on the data coming from a raw query and documeting a few gotchas when passing parameters into the query.
To also answer your edited question: I wouldn't use raw SQL when you can do it with the ORM, but of course the ORM is limited and if you need to do some more complex stuff you will always have to switch to SQL (but sometimes using extra() is enough-so you can still use the advantages of the ORM). Don't forget that the ORM works with every DB backend, while the custom SQL might not work with every database.
You can use raw SQL to either return objects; or if you want you can bypass the ORM completely.

Why is dynamical selection of column & table names so difficult in SQL?

I figure there has to be a specific design reason why you can't write a query like the following one:
select
(select column_name
from information_schema
where column_name not like '%rate%'
and table_name = 'Fixed_Income')
from Fixed_Income
and instead have to resort to dynamic SQL.
Anyone knows what that reason is? I tried Googling it, but all the hits were cries for help in solving the problem -- meaning it's a pretty widespread need and not well understood.
The reason is that the query optimizer needs to know the exact schema objects you are referring to at compile time. It needs them to optimize the query. You wouldn't believe how slow the RDBMS would be without having this information available to the query optimizer.
It's a little like the performance difference of static vs. dynamic typing in practice: There is usually a non-trivial difference (I'm thinking just about mainstream languages here). The compiler can exploit the static information to generate great code.
Even if this feature was present, it would be implemented by first computing the table and column names and then doing a standard "static" query planning.
You ask a very interesting question.
The "relational" in "relational algebra" refers to name-value pairs, not to relationships between tables. In relational algebra, there is no requirement that all records in a set (table) have the same columns.
My best guess is that the limitation is related to the idea of entity-relationship diagrams comes into play. A database is designed around tables, and these tables have relationships to each other. The choice of a relational database for data storage and access was specifically when the data could be stored this way. Knowing the entities and their attributes suggests a static form of the data and hence static references in queries.
In addition, SQL as a language is a declarative language rather than a procedural language. This suggests -- but does not impose -- a compilation step separate from the running of the query. In general, the SQL engine does the following (at a very high level):
Compiles the query, generally into some sort of data flow process.
Optimizes the data flow process. (Typically part of the compilation process.)
Runs the query.
The first two result in what is called "the query plan". You really cannot do optimization, though, unless you know about the objects you are operating on. So, dynamically choosing tables and columns means that optimization would be part of running the query rather than compiling it.
Finally, some databases like SQL Server support dynamic SQL. This allows you to build strings that get compiled and run at the same time. This is very useful for complex decision support queries. It is not recommended when you need fast transaction throughput, because the overhead for compilation is too high relative to the query.

Confused about the role of a query language

So, I haven't had any luck finding any articles or forum posts that have explained to me how exactly a query language works in conjunction with a general use programming language like c++ or vb. So I guess it wont hurt to ask >.<
Basically, I've been having a hard time understanding what the roles of the query language are ( we'll use SQL as an example for query language and VB6 for norm language) if i'm creating a simple database query that fills a table with normal information (first name, last name, address etc). I somewhat know the steps in setting up a program like this using ado objects for the connection and whatnot, but how do we decide which language of the 2 gets used for certain things ? Does vb6 specifically handle the basics like loops, if else's, declarations of your vars, and SQL specifically handles things like connecting to the database and doing the searching, filtering and sorting ? Is it possible to do certain general use vb6 actions (loops or conditionals) in SQL syntax instead ? Any help would be GREATLY appreciated.
SQL is a language to query a database. SQL is an ISO standard and relational database vendors implement to the ISO standard and then add on their own customizations. For example in SQL Server it is called T-SQL and in Oracle it is called PL-SQL. They both implement ISO standards and so each will have identical queries for a simple select like
select columname from tablename where columnname=1
However, each have different syntax for string functions, date functions, etc....
The ISO SQL standard by design is not a full procedural language with looping, subroutines, ect as in a full procedural language like VB.
However, each vendor has added capabilities to their version to add some of this functionality in.
For example both T-SQL and PL-SQL can "loop" through records using various constructs in their language.
There is also a difference when working with data that many developers are not well in tuned with. That is set based operations vs. procedural based.
Databases can work with procedural constructs but are often more performant with set based. A developer who is not versed in this concept may end up creating a very innefficient query. Here's an example of this discussion.
With any situation you have to weight out the pro's/con's of where it is best to do this work.
I tend to favor using procedural constructs such as loops in the language I am using over SQL. I find it easier to maintain and the language I am using offers more powerful syntax for me to get the job done.
However, I keep both options as a tool in the toolbox. For example, I have written data conversion scripts in SQL and in this case I have used the looping constructs in SQL.
Usually programming language are executed in the client side (app server too), and query languages are executed in the db server, so in the end it depends where you want to put all the work. Sometimes you can put lot of work in the client side by doing all the calculations with the programming language and other times you want to use more the db server and you end up using the query language or even better tsql/psql or whatever.
Relational databases are designed to manage data. In particular, they provide an efficient mechanism for managing memory, disk, and processors for large quantities of data. In addition, relational databases can handle multiple clients, guarantee transactional integrity, security, backups, persistence, and numerous other functions.
In general, if you are using an RDBMS with another language, you want to design the data structure first and then think about the API (applications programming interface) between the two. This is particularly true when you have an app/server relationship.
For a "simple" type of application, which uses a lot of data but with minimal or batch changes to it, you want to move as much of the processing into the database as is reasonable. Here are things you do not want to do:
Use queries to load things into arrays, and then do array manipulations at the language level. SQL provides joins for this.
Load data into an array and do manipulations and summaries on the array. SQL provides aggregations for this.
Save data into a file to have a backup. Databases provide backup mechanisms.
If you data fits into an array or on an Excel spreadsheet, it is often sufficient to get started with the data stored there. Only when you start to expand the needs (multiple clients, security, integration with other data) do the advantages of a database become more apparent.
These are just for guidance and to give you some ideas.
In terms of doing what where, do as much as is sensible in SQL (given it runs on a server) as you can.
So for instance don't do stuff like this (psuedo code)
foreach(row in "Select * from Orders")
if (row[CustomerID] = 876)
Display(row)
Do
foreach(row in "Select * from Orders where CustomerId = 876")
Display(row)
First it's likely Orders is indexed by CustomerID so it will find all 876s order way quicker.
Second to do the first one you just sucked every record in that table into the client's memory space probably across your network.
What language is used is essentially irrelevant, you could invent your own DBMS with it's own language.
It's where you do what processing that matters. It's Rule with exceptions, but the essential idea is let your backend do as much as it can.

searching DBMS with hierarchical structure

Is there any open-source hierarchical database or emulation atop of existing RDBMS?
I am searching a DMBS (or plugin to existing RDBMS) which can store hierarchical data and permits to perform queries on hierarchical data (something like "SELECT LEVEL ... CONNECT BY ...", "SELECT PARENT ..." for example). I know there is some support in Oracle, but is there a more complex solution?
There isn't a standardized plugin for doing this. I've looked more than once. However, there are a number of options. See from my earlier question on the same topic:
What are the options for storing hierarchical data in a relational database?
In short, if you're using a table with ID and ParentID (a.k.a. adjacency list) you use Common Table Expressions with most databases (Oracle's CONNECT BY being one of the most notable exceptions). OTO, something like materialized path or nested sets may be a better fit for your situation - for instance ability to easily find "lineage" where with adjacency list this is an expensive operation.
Usually what ends up happening with a system that needs to work extensively with hierarchical data, for instance a CMS, is that it implements more than one of these solutions. The assumption is reads heavily outweigh writes.
Have you tried the Nest Set model http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
Relational data doesn't directly support hierarchies in the way that an inherently hierarchical structure like XML does. You have to use a data model such as nested sets or a straight self-join to model the hierarchy.
Depending on the type of system you have, Common Table Expressions will let you run hierarchical queries on data. CTEs are supported by SQL Server versions since 2005, Recent versions of DB/2 and PostgreSQL - and probably some other systems. CTEs are a bit more fiddly than CONNECT BY, but they do run on a fair variety of platorms.

Is SQL the ''assembler'' of the NoSQL database world?

I recently came across http://www.fossil-scm.org/index.html/doc/tip/www/theory1.wiki by D. Richard Hipp, the developer responsible for SQLite.
it go me thinking, is Fossil the only NoSQL database that uses SQL?
Do others uses SQL as a 'High Level Scripting Language'?
From the article, it sounds like Fossil isn't a database any more than git is a database. Yes, it's a thing that contains data, and yes, it's backed by a database, but it seems pretty far from a database itself. So the first part of of your question basically relies on a faulty assumption. There is a database called Friendly which uses MySQL to store schema-less models, but it seems like an awkward bandaid sort of solution at best.
I'm certainly not familiar with all of the NoSQL options out there, but, to my knowledge, none of the well-though-of ones use SQL for anything. MongoDB and CouchDB, the two I'm most familiar with, both use Javascript as part of their query interface, though in very different ways. MongoDB has queries more like what you'd expect from a relational database: you can write an arbitrary query for all documents that match a certain set of attributes. However, unlike a relational database, there's no such thing as a join (you'll only ever get a list of distinct documents back, not compound documents) and you can write arbitrary Javascript code to select documents. CouchDB, on the other hand, does not allow arbitrary queries. Instead, you create views (which are essentially simpler key-value stores) using map/reduce functions written in Javascript and then query those views from a start key to and end key.
In both cases, the type of information being transmitted to the server to perform the query isn't well-suited for the type of problem that SQL is good at solving. The trade-off to SQL being so high-level (to use the logic of the author of the paper) is that it's only suitable for a very narrow set of problems.
The creator of Fossil / SQLite is working and pushing UnQL as the NoSQL standard:
UnQL means Unstructured Query Language.
It's an open query language for JSON, semi-structured and document
databases.
It looks like a stripped down version of SQL.