Limitation of SQL in BigQuery vs Cloud Spanner - google-bigquery

While comparing Cloud Spanner vs BigQuery, Am trying to figure what kind of limitations there are in BigQuery's in SQL, compared to ANSI SQL (select part only) ?
Does BigQuery support all complex joins of ANSI SQL ?
Additionally, is there anything that Cloud Spanner can do and BigQuery cannot?

BigQuery Standard SQL is compliant with the SQL 2011 standard and has extensions that support querying nested and repeated data.
You can see about SELECT and JOINS and other details of the BigQuery Standard SQL in Query Syntax documentation
Additionnally, is there anything than Spanner can do and BigQuery cannot ?
Main difference between BigQuery and Spanner:
BigQuery - Large scale data warehouse service with append-only tables
Spanner - A horizontally scalable, globally consistent, relational database service
Foreign Keys, Transaction support, Indexes - are good examples of what is supported in Spanner but not in BigQuery
Note: above not supported features are by design and reflect respective purpose of those two products. What is "a must" feature for one is even conceptually not in another. Comparing BigQuery and Spanner is close to comparing Hadoop and mySQL for example if this will make it easier for you to imagine.
I think it would be great if you read respective documentations and then ask specific questions
cloud.google.com/­bigquery/­docs
cloud.google.com/­spanner/­docs

Related

Bigquery - how is data distributed by partition key?

I come from a Teradata and Netezza background in Data Warehousing in MPP technologies.
I would like to ask how Google BigQuery distributes data by partition key on a simple table? I am really trying to understand the logic in how the BigQuery engine works if this makes sense?
Teradata and Netezza had a well documented technical page from recollection which described the processes used (like a step by step walkthrough).
Thanks,
Simon
BigQuery's partitioned tables are also very well documented here:
https://cloud.google.com/bigquery/docs/partitioned-tables
I think I don't understand what you want to know. Please rephrase your question after reading all the above.

Does jOOQ translate support Teradata queries to BigQuery queries

Does https://www.jooq.org/translate/ support sql query translation from Terdata DML queries to GCP BigQuery queries ?
jOOQ 3.13 does not yet support BigQuery: https://github.com/jOOQ/jOOQ/issues/2620, hence the translator on the website cannot do that for you right now. It is definitely among the top candidate dialects to add support for, in the near future.

Is SQL Server support for JSON a replacement for NoSql solutions like MongoDB?

After many years working with SQL databases, it feels unconformable working with a database that doesn't rely on a schema to model the data.
I understand that SQL and NoSQL solutions have their places for different business needs and goals, but I don't have any experience with NoSQL databases.
But since I discovered that Microsoft SQL Server has support to also work with JSON data (https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-2017), I wonder:
Can I always default to SQL Server for any (new) application I might need to create and use this flexibility of JSON querying when needed?
That would mean I don't have to wrap my head around considering between SQL Server OR MongoDB OR both. I could just use SQL Server always and be good to go.
A similar consideration of mine is about graph-databases. SQL Server vs Neo4j for graph databases.
(https://learn.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-architecture?view=sql-server-2017).
Sure SQL Server support for graph is inferior compared to Neo4j which is specialized for that task, but it seems that Microsoft is trying to create a one-for-all database solution that every project could rely on.
Now a days mostly all database providing the datatype of any field in a table as json type.
But relational database is not providing the solutions as nosql database.

Standard or Legacy SQL for Google Analytics Data in BigQuery?

we are just starting to use Google Analytics data in BigQuery and previously used just the MSSQL Server in the work environment. We would like to move some of the analysis to the GCP and BigQuery, but could not decide on what is the better option to use - standard or legacy SQL?
In both cases we would have to adjust to the new language version, but the real question is what is the best choice when it comes to Google Analytics data analysis? Is there something that from the technical point of view should make us choose legacy over standard, or the other way around?
It is very misleading for us that there are two versions, because legacy seems to be more developed now, but perphaps standard will be the main version for SQL in the future in BQ?
BigQuery Standard SQL is the way to go. It has much more features than Legacy SQL.
Note: it is not binary choice. You always can use Legacy SQL - if there is something that you will find easier to express with it. From my experience it is mostly opposite - with very few exceptions. Most prominent (for me for example being) - Table Decorators - Support for table decorators in standard SQL is planned but not yet implemented.
I would recommend looking into Migrating from legacy SQL - not from migration point of view as you are the new to BigQuery - but because it is a good place to see and compare features of both dialects in one place.
Also I recommend to check BigQuery Issue Tracker so you can get some extra insight
Standard SQL is the preferred SQL dialect for use in BigQuery, as stated in the migration guide. While legacy SQL has been around for quite some time--and is still the default at the time of this writing--there is no active development work on it. If you are evaluating which to use, you should pick standard SQL, since in addition to being more similar to T-SQL (SQL Server's dialect) it is more expressive, has fewer surprising edge cases, and generally has more features.
Go with Standard SQL, as that's on the longterm roadmap.
From experience some queries are faster under Legacy SQL, but this is changing as Standard SQL is the one that is actively developed.

Are all SQL Geospatial implementations database specific?

My team is looking into geospatial features offered by different database platforms.
Are all of the implementations database specific, or is there a ANSI SQL standard, or similar type of standard, which is being offered, or will be offered in the future?
I ask, because I would like the implemented code to be as database agnostic as possible (our project is written to be ANSI SQL standard).
Is there any known plan for standardization of this functionality in the future?
Currently, there are more than one specifications followed by popular proprietary and open source implementations of spatial databases:
The OpenGIS - Simple Features for SQL
ISO SQL Multimedia Specification for Spatial - ISO/IEC 13249-3:2006 - Information technology -- Database languages -- SQL multimedia and application packages -- Part 3: Spatial
PostGIS, Oracle, Microsoft SQL Server and to some limited degree MySQL, all the databases implement the standard interfaces to manipulate spatial data. However, in spite of this fairly standardized features, all databases usually differ on simple SQL level what may make the database-agnostic implementation of your solution tricky. You likely need to survey the features you are interested and compare what various vendors provide.
For example GIS extensions for MySQL and for PostgreSQL both follow OpenGIS "Simple Features Specification for SQL" standard.
I haven't tried it, but Google tells me FDO is "an open-source API for manipulating, defining and analyzing geospatial information regardless of where it is stored". It's listed on osgeo.org - a point in its favour in my opinion.
There are providers for MySQL & Oracle. Disappointingly though SQL Server and Postgis aren't listed on the FDO providers page.
The only standard I know of is http://www.opengeospatial.org/standards/sfs and I don't know how well all the spatial database extensions implement it.
there are a number of geo-databases which are accessible with hibernate spatial
Oracle10g
Postgresql
MySQL
using an abtraction layer like hibernate is a good idea anyways, if you plan to write a database agnostic application. hibernatespatial fills this gap for geo features.