Standard or Legacy SQL for Google Analytics Data in BigQuery? - google-bigquery

we are just starting to use Google Analytics data in BigQuery and previously used just the MSSQL Server in the work environment. We would like to move some of the analysis to the GCP and BigQuery, but could not decide on what is the better option to use - standard or legacy SQL?
In both cases we would have to adjust to the new language version, but the real question is what is the best choice when it comes to Google Analytics data analysis? Is there something that from the technical point of view should make us choose legacy over standard, or the other way around?
It is very misleading for us that there are two versions, because legacy seems to be more developed now, but perphaps standard will be the main version for SQL in the future in BQ?

BigQuery Standard SQL is the way to go. It has much more features than Legacy SQL.
Note: it is not binary choice. You always can use Legacy SQL - if there is something that you will find easier to express with it. From my experience it is mostly opposite - with very few exceptions. Most prominent (for me for example being) - Table Decorators - Support for table decorators in standard SQL is planned but not yet implemented.
I would recommend looking into Migrating from legacy SQL - not from migration point of view as you are the new to BigQuery - but because it is a good place to see and compare features of both dialects in one place.
Also I recommend to check BigQuery Issue Tracker so you can get some extra insight

Standard SQL is the preferred SQL dialect for use in BigQuery, as stated in the migration guide. While legacy SQL has been around for quite some time--and is still the default at the time of this writing--there is no active development work on it. If you are evaluating which to use, you should pick standard SQL, since in addition to being more similar to T-SQL (SQL Server's dialect) it is more expressive, has fewer surprising edge cases, and generally has more features.

Go with Standard SQL, as that's on the longterm roadmap.
From experience some queries are faster under Legacy SQL, but this is changing as Standard SQL is the one that is actively developed.

Related

Does MSSQL currently have any features similar to "PERIOD" columns in SQL2011?

I have a project where the PERIOD columns defined in the SQL2011 spec are the perfect solution. Unfortunately, I am forced to use MSSQL 2008R2 (or possibly MSSQL 2012) as my database, which does not support this feature.
Is there any proprietary feature that resembles the PERIOD features in SQL2011 currently in MSSQL? If not, any advice for the best way to try to implement something resembling it?
Take a look at Anchor Modelling. I know it's not exactly what you're looking for, (a PERIOD equivalent) but databases implemented as an Anchor model can include bi-temporal aspects. The generated SQL code when exporting the model primarily supports MS SQL. Oracle is available too but a lot of work when into optimizing the schema, trigger and view SQL code for MS SQL. Maybe it'll help, maybe you can see how they implemented bi-temporal data in a way that works really well with MS SQL.

How to maintain SQL scripts when developing an application working against many databases

Imagine an application which is supposed to work with different database vendors. As we all know the syntax for SQLs (especially DDL) is not portable. How do you deal with maintaing the SQL scripts?
Until now I see three options:
to store SQLs in format of one of the databases and have a tool which automatically converts from one syntax do another (do you know such tools?)
to store SQLs in some artificial language and a have a tool which is able to generate vendor-specific SQLs on demand (any recommendation here?)
to store SQLs in many database formats neglecting the redundancy (this is the worst one, isn't it?)
Do you recommend any of them? Do you have a better idea?
The development environment tries to follow the continuous integration principles, so automation is a key feature here.
Have a look at Liquibase (that's essentially your second item on the list)
http://www.liquibase.org
It's not perfect (e.g. it does not support check constraints) but it is quite useful
This video shows a solution using the Subsonic project http://subsonicproject.com/docs/Using_SimpleRepository and its data migration capabilities. The strategy is to use a general language and apply it to different databases.
Hope this is what you were looking for
Use some kind of ORM framework with schema generation capability.

Cross platform SQL? (sqlite+mysql+tsql)

Is there a cross platform solution for sql? My prototype was in sqlite. I am switching to a server that offers tsql and i was considering mysql in the past for my webservers(maybe i should stick to tsql and sqlite). I am wondering if theres a .NET lib that allows me to write sql compatible with all.
Some annoyance i had was in create table. I thought primary keys auto increased but they dont. I have to write identity(1,1). When i ported my sqlite code to mysql i had issues also with create table but i am sure there will be other places once my sql statements get more complex. So i thought trying a lib may be a good idea.
I personally use SubSonic as an ORM. It supports both SQL-Server and MySQL, as well as SQLite. Database creation is always a tricky question, but you generally don't perform it often. I have chosen SQLite, as it is very portable, and should also be available when I port code using Mono. TSQL (SQL-Server) is a windows-only product, and is not portable. My databases also won't be very large (< 100 Mb) and this may also need to figure in your choice.
I would recommend (for the moment, until VS 2010 is out) that you don't use Entity Framework, as it is a traditional v1.0 MS product, and very problematic. Essential features are not present, and artificial restrictions make it less useful than other ORMs. No doubt the next version will be better, but the current version is not worth your time compared to something like SubSonic.
An ORM like NHibernate or Entity Framework will support all those sql backends, and more. It will also save you from writing all your SQL queries yourself.
You might want to take a look at ADO.NET Entity Framework. It supports a lot of different backends.

Are all SQL Geospatial implementations database specific?

My team is looking into geospatial features offered by different database platforms.
Are all of the implementations database specific, or is there a ANSI SQL standard, or similar type of standard, which is being offered, or will be offered in the future?
I ask, because I would like the implemented code to be as database agnostic as possible (our project is written to be ANSI SQL standard).
Is there any known plan for standardization of this functionality in the future?
Currently, there are more than one specifications followed by popular proprietary and open source implementations of spatial databases:
The OpenGIS - Simple Features for SQL
ISO SQL Multimedia Specification for Spatial - ISO/IEC 13249-3:2006 - Information technology -- Database languages -- SQL multimedia and application packages -- Part 3: Spatial
PostGIS, Oracle, Microsoft SQL Server and to some limited degree MySQL, all the databases implement the standard interfaces to manipulate spatial data. However, in spite of this fairly standardized features, all databases usually differ on simple SQL level what may make the database-agnostic implementation of your solution tricky. You likely need to survey the features you are interested and compare what various vendors provide.
For example GIS extensions for MySQL and for PostgreSQL both follow OpenGIS "Simple Features Specification for SQL" standard.
I haven't tried it, but Google tells me FDO is "an open-source API for manipulating, defining and analyzing geospatial information regardless of where it is stored". It's listed on osgeo.org - a point in its favour in my opinion.
There are providers for MySQL & Oracle. Disappointingly though SQL Server and Postgis aren't listed on the FDO providers page.
The only standard I know of is http://www.opengeospatial.org/standards/sfs and I don't know how well all the spatial database extensions implement it.
there are a number of geo-databases which are accessible with hibernate spatial
Oracle10g
Postgresql
MySQL
using an abtraction layer like hibernate is a good idea anyways, if you plan to write a database agnostic application. hibernatespatial fills this gap for geo features.

Can you recommend a good source for Teradata Best Practices?

Looks like my data warehouse project is moving to Teradata next year (from SQL Server 2005).
I'm looking for resources about best practices on Teradata - from limitations of its SQL dialect to idioms and conventions for getting queries to perform well - particularly if they highlight things which are significantly different from SQL Server 2005. Specifically tips similar to those found in The Art of SQL (which is more Oracle-focused).
My business processes are currently in T-SQL stored procedures and rely fairly heavily on SQL Server 2005 features like PIVOT, UNPIVOT, and Common Table Expressions to produce about 27m rows of output a month from a 4TB data warehouse.
One place to start is here: http://www.teradataforum.com/
This might be a little late, but there are a few things which I can warn you about Teradata which I have learned.
Use the most recent version as often as possible.
For V12 the optimizer was re-written and the database performs much better now.
Try to realize that SQL Server and Teradata are very different beasts, most of the concepts will not transition well.
Do not underestimate the importance of a primary index.
The locks that teradata uses are very primitive when compared to other databases.
Do NOT use TERA mode. You do not have any code which is legacy, ANSI mode is far superior and is widely encouraged.
Join indexes are very helpful tools, but they do not provide all the answers.
Parallelism, take the time to understand how FASTLOAD, MULTILOAD, and TPUMP works and find out how one can leverage it with their ETL strategy.
If you are attempting to run a query which needs to be performant, do not use any casts, the optimizer will not use statistics to generate the best execution plan.
Working with dates are going to be a pain, just a warning.
Teradata is very DDL oriented, try to understand all the syntax related when creating a table.
Compression is a wonderful tool, if you have any values which are repeated in a table, make use of it.
There are not many tools available with Teradata, be prepared to build a lot. The tools that exist are very expensive.
Unfortunately, I do not know much about SQL Server, so I cannot say what tools in SQL Server appear in Teradata.
Hope this helps
I would also look into the recently launched Teradata Developer Exchange as well as the TeradataForum and forums on Teradata's main website.
I don't know of any good references available online. Teradata has some design manuals that are available for download, but they're more instruction manuals and not "best practices" as such. check them out here: http://www.info.teradata.com/DataWarehouse/eTeradata-BrowseBy.cfm?page=Teradata%20Database
Alternatively, you need to find a friendly Teradata expert to bounce ideas off. Try Teradata themselves, or find a local consultant with Teradata experience.
Best Practices on Teradata isn't a topic that gets lots of discussions and most of the best tricks tend to be proprietary knowledge of the person/people who discovered them.
Sorry,
David Stewardson
Satyam Computer Services
Top of the list on a Google search for "Teradata Best Practices" gave me TERADATA ADVISORY GROUP SETS BEST PRACTICES FOR BUSINESS OBJECTS AND TERADATA CUSTOMERS
EDIT: Seeing as that's just advertising, as you've pointed out, see how you go with these. Please bear in mind that I don't have a clue what Teradata is and can't see myself using it any time this side of the 22nd century AD.
Teradata Discussion Forums
Best Practices for Teradata Deployments
Best Study Guides For NCR Teradata Certifications
The middle one looks promising with it's nice long link tree at the top
Oracle® Business Intelligence Applications Installation and Configuration Guide > Preinstallation and Predeployment Considerations for Oracle BI Applications > Teradata-Specific Database Guidelines for Oracle Business Analytics Warehouse >
and the first link, to the forums, should put you in touch with the right people.