changing the order that tables are linked in sql - sql

Question: In the Crystal Reports report creation wizard there is the option of changing the order which more than two two tables are linked. The link between Table A and Table B could be made before Table C is linked, or vice versa. It says "The order may affect the resulting data set."
How could that be true? I could see it affecting performance but 3 linked tables should always return the same dataset, shouldn't they?

Depends on what type of Join crystal is using when "linking".
Left and right joins will only pull matching data from either side of the "link". Any null or empty data will be discarded and in some cases you don't want that.

I haven't used Crystal report so far. But your underlying question is about query performance I can comment on it. What database, version you have? In Oracle 10g the rule based optimizer is replaced with cost based optimizer. RBO takes the order of tables into consideration. CBO does not. So, it should work just perfect for you.

Related

Order of tables in INNER JOIN

Going through a book, Learning SQL by Alan Beaulieu. On topic of inner joins, it tells that whatever be the order of tables in a INNER JOIN, results are same and gives reason as follows:
If you are confused about why all three versions of the account/employee/customer query
yield the same results, keep in mind that SQL is a nonprocedural language, meaning
that you describe what you want to retrieve and which database objects need to be
involved, but it is up to the database server to determine how best to execute your
query. Using statistics gathered from your database objects, the server must pick one
of three tables as a starting point (the chosen table is thereafter known as the driving
table), and then decide in which order to join the remaining tables. Therefore, the order
in which tables appear in your from clause is not significant.
So does it imply that if statistics gathered from database objects change, then results would also change?
So does it imply that if statistics gathered from database objects change, then results would also change?
No. The same query will always produce the same results (provided, of course, that the underlying data is the same). What the author is explaining is that the database may choose a strategy or another to process the query (starting from one table or another, using a this or that algorithm to join the rows, and so on). That decision is made based on many factors, some of them being based on information that is available in the statistics.
The key point is that SQL is a declarative language, not a procedural language: you don't get to chose how the database handles the query, you just tell it what result you want.
However, regardless of the algorithm that the database chooses, the result is guaranteed to be consistent.
Note that there are edge case where the database does not guarantee that results are the same for consecutive executions of the same query (like a query without a row limiting clause but without an order by): it's the responsibility of the client to provide a query whose results are properly defined (the language does gives you enough rope to hang yourself, if you really want to).

Merge into CTE vs Merge into Table

I have a question regarding performance of Merge into CTE against Merge into Table.
My scenario is as below:
I am facing an issue in existing stored procedure which is responsible to merge data into table from staging source table. The target table has around 20 million records.
Merging is performed into CTE which is subset of target table filtered on date range. This date range matches with the date range of source table. However, due to incorrect logic implementation, the source table sometimes contain data out of the date range as well. This results in unwanted & duplicate data insertion into target table.
The logic in the stored procedure is now corrected and this change is expected to fix the issue. But, to be sure that, such kind of issue never occur again, is it advisable to merge into table instead of CTE? Considering the size of the target table (~20M rows), what will be the performance trade-off? What are the pros and cons to merge into table considering performance point of view?
Microsoft advises against using CTEs as targets.
Use the WITH clause to filter out rows from
the source or target tables. This method is similar to specifying
additional search criteria in the ON clause and may produce incorrect
results. We recommend that you avoid using this method or test
thoroughly before implementing it.
See for more details: https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver15#join-best-practices
From performance standpoint I do not think it would make any difference. I have never tested this though.

Star Schema Query - do you have to include all dimensions(joins) in the query?

Hi I have a question regarding star schema query in MS SQL datawarehouse.
I have a fact table and 8 dimensions. And I am confused, to get the metrics from Fact, do we have to join all dimensions with Fact, even though I am not getting data from them? Is this required for the right metrics?
My fact table is huge, so that's why I am wondering for performance purposes and the right way to query.
Thanks!
No you do not have to join all 8 dimensions. You only need to join the dimensions that contain data you need for analyzing the metrics in the fact table. Also to increase performance make sure to only include columns from the dimension table that are needed for the analysis. Including all columns from the dimensions you join will decrease performance.
It is not necessary to include all the dimensions. Indeed, while exploring fact tables, It is very important to have the possibility to select only some dimensions to join and drop the others. The performance issues must not be an excuse to give up on this capability.
You have a bunch of different techniques to solve performance issues depending on the database you are using. Some common ways :
aggregate tables : it is one of the best way to solve performance issues. If you have a huge fact table, you can create an aggregate version of it, using only the most frequently queried columns. This way, it should be much smaller. Then, users (or the adhoc query application) has to use the aggregrate table instead of the original fact table when this is possible. The good news is that most databases know how to automatically manage aggregate tables (materialized views for example). Queries that initially target the original fact table, are transparently redirected to the aggregate table whenever possible.
indexing : bitmap indexing for example can be an efficient way to increase performance in a star schema fact table.

Best way to compare contents of two tables in Teradata?

When you need to compare two tables to see what the differences are, are there any tools or shortcuts you use, or do you handcode the SQL to compare the two tables?
Basically the core features of a product like Red Gate SQL Data Compare (schemas for my tables typically always match).
Background: In my SQL Server environment, I created a stored procedure which inspects the metadata of the two tables/views, creates a query (as dynamic sql) which joins the two tables on the specified key columns, and compares data in the compare columns, reporting key differences and data differences. The query can either be printed and modified/copied or just excecuted as is. We are not allowed to create stored procedures in our Teradata environment, unfortunately.
Sounds like a data profiling tool such as Talend's Open Profiler would make the most sense at that point.
You could write a BTEQ statement that builds the query similar to your SQL Server stored procedure and then export the dynamically built SQL. You can then in turn run that inside of your BTEQ. It might get cumbersome, but with enough determination you could probably mock something up.
I dont know if this is the right answer you are searching for.
sel * from database_name1.table_name1
minus
sel * from database_name2.table_name2;
you can do the same by selecting specific columns. This will basically give the non existent rows from table2 which are in table1.
If you were not looking for this type of answer, please ignore this and continue.
Also you can select like below.
select
table1.keycol1,
table2.keycol2,
(table1.factcol1 - table2.factcol2) as diff
from table1
inner join
table2
on table1.keycol1 = table2.keycol1
and table1.keycol2 = table2.keycol2
where diff <> 0
This was just an analysis which can give an idea. Please ignore any syntactical and programmatical errors.
Hope this helps.

MySQL Views - When to use & when not to

the mysql certification guide suggests that views can be used for:
creating a summary that may involve calculations
selecting a set of rows with a WHERE clause, hide irrelevant information
result of a join or union
allow for changes made to base table via a view that preserve the schema of original table to accommodate other applications
but from how to implement search for 2 different table data?
And maybe you're right that it doesn't
work since mysql views are not good
friends with indexing. But still. Is
there anything to search for in the
shops table?
i learn that views dont work well with indexing so, will it be a big performance hit, for the convenience it may provide?
A view can be simply thought of as a SQL query stored permanently on the server. Whatever indices the query optimizes to will be used. In that sense, there is no difference between the SQL query or a view. It does not affect performance any more negatively than the actual SQL query. If anything, since it is stored on the server, and does not need to be evaluated at run time, it is actually faster.
It does afford you these additional advantages
reusability
a single source for optimization
This mysql-forum-thread about indexing views gives a lot of insight into what mysql views actually are.
Some key points:
A view is really nothing more than a stored select statement
The data of a view is the data of tables referenced by the View.
creating an index on a view will not work as of the current version
If merge algorithm is used, then indexes of underlying tables will be used.
The underlying indices are not visible, however. DESCRIBE on a view will show no indexed columns.
MySQL views, according to the official MySQL documentation, are stored queries that when invoked produce a result set.
A database view is nothing but a virtual table or logical table (commonly consist of SELECT query with joins). Because a database view is similar to a database table, which consists of rows and columns, so you can query data against it.
Views should be used when:
Simplifying complex queries (like IF ELSE and JOIN or working with triggers and such)
Putting extra layer of security and limit or restrict data access (since views are merely virtual tables, can be set to be read-only to specific set of DB users and restrict INSERT )
Backward compatibility and query reusability
Working with computed columns. Computed columns should NOT be on DB tables, because the DB schema would be a bad design.
Views should not be use when:
associate table(s) is/are tentative or subjected to frequent structure change.
According to http://www.mysqltutorial.org/introduction-sql-views.aspx
A database table should not have calculated columns however a database view should.
I tend to use a view when I need to calculate totals, counts etc.
Hope that help!
One more down side of view that doesn't work well with mysql replicator as well as it is causing the master a bit behind of the slave.
http://bugs.mysql.com/bug.php?id=30998