I have a very high level question.
could indexes on a sql server table improve the loading performance of a tableau dashboard?
if so - is there any best practice / guideline we could follow?
thanks a lot
An index will speed up the extraction of the data to Tableaus database structure, but it will not speed up Tableau as you interact with it. There is a Tableau community website where you can find best practices, etc. on.
Yes, it will speed up Tableau. Just like the indexes speed up any query (if applied properly), Tableau is the same way - it's just querying the data and displaying the results.
Best practice ? Like everything else, analyse the usage to see where it is appropriate to apply indexes. Not enough is bad, too many is bad.
Related
I am planning to create a de-normalized reporting database that will be only used for reporting service.(We don't want live data for reporting so scheduled nightly agent will be sufficient)
I have few questions and would like some help in order to proceed with the implementation.
Is this approach good or bad ? Having de-normalize reporting database from live normalized database.
What method should be used to replicate\transfer data from normalized database to de-normalized database ? I want to transfer specific data not all tables.
Best practice (data replication\transfer) for fast speed and performance of SQL server with minimum memory utilization ?
If there is any other good alternative please do add references and articles so that I can read through and understand.
Haven't tried anything yet still searching for best optimized solution.
The question name may seem irrelevant, but I couldn't find a name best fit for my intent.
I am working on an e-commerce project and using a relational database for storage. But I have a little issue with the performance my product domain queries. The thing we want to accomplish is a little bit complex, so the way we designed the database schema is a little bit complex. We have over 40 tables for product domain and we are having some performance problems while querying the db to make our product pages work. To view some simple information about a product we need to query at least 5-7 table and this causes a huge penalty for website since we have hundreds of thousands of requests per day.
By the way our data is not big. The problem is, we have many tables containing little data and we have to join many of them at a time.
Is there a better way or place to store such data? I have looked at nosql dbs, but as I understand they might not be the best fit for my solution. Can a graph db like neo4j be helpful in my case?
Thanks for any help...
I'm currently researching a very large table (~100 million rows, 35 columns), it's currently stored in SQL db, but the queries I'm running (and they're various) run very, very slow..
so I get it I should probably move to NoSQL db. question is:
How can I tell which (NoSQL) db is best for me?
How can I move my current SQL table to the new NoSQL scheme?
OR should I stay in SQL and just fine tune it?
A few more details: rows will not be added/removed, this is historical data and all of the analysis will be done on that table. plan to run various queries on it. data is numerical.
I routinely work with a SQL Server 2012 table that has 900 million rows. This table has rows being added to it about every 2 minutes with a total of about 200K per day. I can query this table and get rows back in a couple seconds (using the clustered index / PK). I can also query on one of the other indexes and get results back in seconds or less.
So, it's all a matter of making sure your indexes are set up correctly, AND BEING USED!! Check your queries against the query plan being generated and make sure seeks are being done.
There could be good reasons for moving to NoSQL, or something similar. But moving to NoSQL because you think you can't get good performance in SQL Server, before making sure you've done everything you can do to improve performance first, is not a good reason.
Some food for thought:
100M rows is well within SQL's "sweet spot". You can grow by x10 and still be assured that SQL will be able to support you with fairly trivial effort.
NoSQL is not a silver bullet for solving performance problems at scale. It offers a set of tradeoffs which, with careful planning, can provide better results. But if sounds like you don't fully understand your performance issues in SQL, and without that your chances of making the correct design decisions in a NoSQL environment are slim.
One of the common tradeoffs in NoSQL systems is that they typically provide less flexibilty in querying, in return for greater flexibility in schema management. You mentioned your queries are "various"- if they are truly varied, or more importantly- frequently changing - then moving to a NoSQL system can put you in a world of pain. Especially if you are not familiar with the technology yet.
Bottom line- You aren't doing anything which is clearly "beyond" the capabilities of SQL, and your problems are probably caused more by inefficient implementation than by any inherent platform limitations. Moving to a NoSQL system won't magically solve any of your problems, and will probably introduce new ones.
If you are running a query on columns that are not indexed you will be very slow. You can add more indexes to speed them up. If your DB is static this should work.
One major speed up is the usage of map-reduce queries, where aggregations are carried out by multiple processes or computers. NoSQL databases like MongoDB can be used in such ways. But even MySQL has Cluster capabilities nowadays: http://www.mysql.de/products/cluster/scalability.html. SQL Server can be clustered as well.
So I guess the best first shot would be to optimize your indexes in the table to the query. Each argument column to the query (compare, count ...) etc. should be indexed.
If this is not doing any better you probably count and calculate a lot and you should use map-reduce jobs and a DB which can handle this like MongoDB: http://docs.mongodb.org/manual/aggregation/
I hope this helps
What are the best docs/articles out there that show how different queries perform on large data sets? I'm trying to get a feel approaches are better than others in building a dashboard where the number and type of queries easily becomes large and complex, and slow.
Suggest you take the time to get a book on performance tuning for the database backend you are using. Performance tuning is very database specific and techiniques that are fast on one database may be dog slow on another (Cursors on Oracle and SQL Server comes to mind). If you are going to be writing complex queries, you need to understand performance tuning in depth before trying to write them. I will give you a hint - joins are usually faster than correlated subqueries whcih developers often use becasue they seem more straightforward.
It is not easy to compare the actual performance of one database architecture over another, I think the most common you'll find are benchmarks, which will vary, or theoretical proposals.
It's certainly harder to compare SQL vs NoSQL because you need a lot of data in the first place to have NoSQL be beneficial.
Minimizing joins, indexing properly, and having enough memory/cpu availability are going to be the three dominating factors.
In an environment where you have a relational database which handles all business transactions is it a good idea to utilise SimpleDB for all data queries to have faster and more lightweight search?
So the master data storage would be a relational DB which is "replicated"/"transformed" into SimpleDB to provide very fast read only queries since no JOINS and complicated subselects are needed.
What you're considering smells of premature optimization ...
Have you benchmarked your application? Have you identified your search queries as a performance bottleneck? Have you correctly implemented indexes into your database?
IF (and that's a big if) there's no way using a relational database to offer decent search times to your users, going NOSQL might be something worth considering ... but not before !
SimpleDB is a good technology but its claim to fame is not faster queries than a relational database. Offloading queries to replicated SimpleDB is not likely to significantly improve your query response time.
I am still finding it hard to believe, but our experiments show that a round trip from and EC2 instance to simpledb is averaging out to 300milliseconds or so, on a good day! On a bad day we've seen it go down to 1.5sec. This is for a single insert. I'd love to see somebody replicate the experiment to verify these results, but as it is... simpledb is no solution for anything but post processing- in the request/response cycle it would just be way to slow.
If the data is largely read-only, try using indexed views. Otherwise, cache the data in the application.