Does transactional database which is migrated to analytical to use with Bigquery and eventually Looker, need to be denormalized for faster queries and higher concurrency?
Thanks!
I can say "yes", here is the best practice recommended by google. https://cloud.google.com/bigquery/docs/best-practices-performance-input#denormalizing_data
Related
I have a very high level question.
could indexes on a sql server table improve the loading performance of a tableau dashboard?
if so - is there any best practice / guideline we could follow?
thanks a lot
An index will speed up the extraction of the data to Tableaus database structure, but it will not speed up Tableau as you interact with it. There is a Tableau community website where you can find best practices, etc. on.
Yes, it will speed up Tableau. Just like the indexes speed up any query (if applied properly), Tableau is the same way - it's just querying the data and displaying the results.
Best practice ? Like everything else, analyse the usage to see where it is appropriate to apply indexes. Not enough is bad, too many is bad.
In my past, using engine report like Crystal Report, SSRS and others, the best practice was to give the data to the report engine and have it crunch the numbers and logics. Usually, logics were aggregation, pivot table and sometime some basic mathematic and if statements.
Today, with BigQuery and DataStudio, I'm told the opposite on the premise that bigquery will be more performant computing the aggregation.
So, does the old best practice still stand or does putting computations in the underlying bigquery statement is really the way to go with the combo "BigQuery and DataStudio"?
Today, with BigQuery and DataStudio, I'm told the opposite on the
premise that bigquery will be more performant computing the
aggregation.
Data Studio and BigQuery use the same engine to compute aggregations, so performance is not a factor.
You should decide whether to put calculations into SQL or into DataStudio formulas based on other factors such as maintenance, access control etc.
I've been reading through the excellent Developing Multi-tenant Applications for the Cloud, 3rd Edition and am trying to understand how partitioning impacts on query performance in Windows Azure SQL Database.
If you use the shared schema approach and put all tenants records in a single table and separate their data using partitions, is that always going to be slower than a separate schema approach due to the larger number of records in the table, or does the partitioning effectively make each partition act like its own table?
(I appreciate query execution speed is only one of many factors to consider when choosing a multi tenancy strategy, we're not basing our decisions on performance alone.)
The approach that uses different schemas for different tenants has its problems, too. For instance, it is very easy to bloat the plan cache with this approach since each tenant gets its own set of query plans. You may end up with more recompiles (and lower performance) with this approach because of that.
I would recommend to take a look at an approach where you place each tenant in its own database. That provides for great isolation and, in combination with Elastic Database Pools in Azure SQL DB, it actually becomes quite affordable. A good entry point into the documentation is this: https://azure.microsoft.com/en-us/documentation/articles/sql-database-elastic-scale-introduction/.
What are the best practices to design a scalable database? If database / tables span on multiple servers, how can I join them?
Where can I get more information for this?
Here is a white-paper on how to scale out mysql
Scale out mysql
Although, IMO, if you are looking to scale a database using multiple servers, you should take a look (use) no-sql databases. Here is a link where you can start your research
nosql-database org
In an environment where you have a relational database which handles all business transactions is it a good idea to utilise SimpleDB for all data queries to have faster and more lightweight search?
So the master data storage would be a relational DB which is "replicated"/"transformed" into SimpleDB to provide very fast read only queries since no JOINS and complicated subselects are needed.
What you're considering smells of premature optimization ...
Have you benchmarked your application? Have you identified your search queries as a performance bottleneck? Have you correctly implemented indexes into your database?
IF (and that's a big if) there's no way using a relational database to offer decent search times to your users, going NOSQL might be something worth considering ... but not before !
SimpleDB is a good technology but its claim to fame is not faster queries than a relational database. Offloading queries to replicated SimpleDB is not likely to significantly improve your query response time.
I am still finding it hard to believe, but our experiments show that a round trip from and EC2 instance to simpledb is averaging out to 300milliseconds or so, on a good day! On a bad day we've seen it go down to 1.5sec. This is for a single insert. I'd love to see somebody replicate the experiment to verify these results, but as it is... simpledb is no solution for anything but post processing- in the request/response cycle it would just be way to slow.
If the data is largely read-only, try using indexed views. Otherwise, cache the data in the application.