SQL cost compare INNER JOIN vs SELECT twice [closed]

SQL cost compare INNER JOIN vs SELECT twice [closed] - sql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
There are two tables with a common field id. What I want to do is select all attributes for a specific id, and I'm wondering which way is more efficient.
Using INNER JOIN, and then a single SELECT * operation is done.
Select from the smaller table first, if the id exist, then select from the larger table.

In most databases, you want to do the join:
select *
from bigtable b join
smalltable s
on b.id = s.id
where b.id = #id;
SQL engines have an optimizer to determine the best execution plan for a query. As mentioned in the comment, having an index woiuld often speed this up.
By selecting from one table and then the other, you are forcing a particular execution plan.
In general, you should trust the SQL engine to produce the best execution plan. In some cases, it may be better to do one and then the other, but generally that is not true.

This will vary based on each circumstance. You can't make a generic statement saying one will always be better .
To compare you can look at execution plans, or simply run both and compare based on execution time.
Ex: if it's rare to find data in the second table, then over time it might be better to run the single query etc

I suggest you take the 2nd way.
It is a good practice to keep some main/primary info in a index table, then put extra / detail info on another big table.
To divide info into two part (main/primary | extra / detail), because most of the time, we only the the first part info, it can save the cost of large query, large data transfer, the net bandwidth.

Related

SQL Join creating duplication in output [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
SO i Have a query which, due to my database requires a lot of tables to function. Unfortunately, this is an older database, which is a little problematic to navigate.
This query, originally checked whether a Membership was to be 40 or 50 years in length which was easily solved. However, Now I have to see whether a an attribute telling me whether they are reinstated members which would invalidate this timeline is present.
I have managed to get show and exclude records, but the the error I believe I have made, is in using an outer join as I was attempting to check multiple records in the joined table for this condition, which has lead several cases to duplication as multiple different attributes can be contained in the same column
I am just trying to make sense of this currently, but believe I have used the incorrect join because where an individual has a different attribute from this result it returns an additional record.
My question is which Join should I be using, or should in this instance I be looking at writing a sub query within the where condition

You might want to use an LEFT/RIGHT or OUTER JOIN. These JOINs have a smaller result. Or you might use DISTINCT - this forces the result to not return duplicates.

You can use distinct with specified names of the columns. Maybe by using that you won't get the duplicate columns.
eg. select distinct ed.id, ed.name, ed.dob, ea.address, ed.Emailid from Employeedetails ed join Employeeaddress ea on ed.id = ea.id

So I solved this myself by using a Sub-query.
The nature of the the requirement meant that it was difficult for me to produce a list which eliminated the duplicates, because i was looking in a joined table for results which may or may not appear.
So i used a sub-query in the where clause to look for the the instances where what i was trying to exclude which was producing the false positives did appear and use the use those results to eliminate those primary keys from the original table that way using a Not In Primary Key.
Probably not the most efficient but it does work within a couple of seconds

CTE over self join and sub queries [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
what are the advantage of CTE over self join and sub queries.
What ever we can implement using the self join /sub queries can be implemented also by using CTE. Wondering what are the benifts of using cte over those mentioned methods,apart from simple cte syntax!?

CTE's allow easy creation of recursive queries that navigate parent-child structures.

When a query is truly complex (think the possibility of 5-6 derived tables and roughly 20 joins and 7 or more where conditions, you know - the average reporting query in an Enterprise environment) then the CTEs help make it easier to understand what is happening and thus make the query more maintainable. They may also make it much simpler to get the actual correct result when things work together in a complex manner.
So say I need a report of all the tecnnical information abut a bunch of orders that meet a certain criteria. Since the information would be summed, the end result should be the same number of records that you would get if you just queried the order table. So you start with a CTE that just gets the orders you want. And you do a simple select from it and find there are 37 orders that meet the criteria. Now you know what your final results should be and as you make the query more and more complex, you can easily check to see that you didn't go wrong.
So now I need the sums of the individual items in each order. And maybe because they are stored in more than one table (say if services and goods were in separate tables) then you have a union that you need to get the sum for. So that becomes the second CTE.
Now you need the point of contact for the order but some orders have multiple people associated and you have to find the main contact by adding some criteria. So now you have a third CTE.
Then you have two shipping addresses, so do you need both concatenated together or or should each shipment be a separate record? At this point it is easier to keep adding chunks of data through the used of multiple CTEs and when you get to the final query, it is relatively simple because all the complexity was in the individual chunk.
Then a year from now when someone needs to change the business rules for determining one chunk of information, you can change just that one part and be much more confident it didn't affect how the other pieces work together. So much easier to maintain.
And of course what #TabAllerman said about recursion.

SQL Server - Derived Table Data Storage [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
In SQL Server while writing a query, I noticed that the data in inner query which is a derived table when joined with another table is taking long. The keys joined to the outer table is on the primary key. So I was surprised since the data was about 10,000 records and 15 columns.
But if we store the data from derived table in a temp table and then join the performance was less than 2 seconds. It made me wonder what the reason would be ?

First, you should edit your question and show your query . . . or at least the structure of the query.
Your issue is probably due to optimization of the query. When you create a temporary table, then the resulting query has accurate statistics about the table during the compilation phase.
When you use a derived table, SQL Server has to guess at the size of the intermediate table and decide on an execution plan before knowing the actual. This would appear to be a situation where the guess is wrong.
If you don't want to use a temporary table, you can probably get the same effect using hints, probably for the join to use either a hash or merge sort algorithm (in my experience, the nested loops algorithm is usually the cause of poor performance).

Sql server: internal workings [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Some of these might make little sense, but:
Sql code interpreted or compiled (and to what)?
What are joins translated into - I mean into some loops or what?
Is algorithm complexity analysis applicable to a query, for example is it possible to write really bad select - exponential in time by number of rows selected? And if so how to analyze queries?

Well ... quite general questions, so some very general answers
1) Sql code interpreted or compiled (and to what)?
SQL code is compiled in to execution plans.
2) What are joins translated into - I mean into some loops or what?
Depends on the join and the tables you're joining (as far as i know). SQL Server has some join primitives (hash join, nested loop join), depending on the objects involved in your sql code the query optimizer tries to choose the best option.
3) Not reallyIs algorithm complexity analysis applicable to a query, for example is it possible to write really bad select - exponential in time by number of rows selected? And if so how to analyze queries?
Not really sure, what you mean by that. But there are cases where you can do really bad things, for example using
SELECT TOP 1 col FROM Table ORDER BY col DESC
on a table without an index on col to find the lagest value for col instead of
SELECT MAX(col) FROM Table
You should get your hands on some/all of the books from the SQL Server internals series. They are really excellent an cover many things in great detail.

You'd get a lot of these answers by reading one of Itzik Ben-Gan's books. He covers these topics you mention in some detail.
http://tsql.solidq.com/books/index.htm

Good resources for learning database optimization part [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am good at database(sql) programming part but I want to move ahead into database optimization part like: where and when to indexes, how to decide which query is better than other, how to optimize database. Can you guide me some good resources or books which can lead me to this?

Inside Microsoft SQL Server 2005: Query Tuning and Optimization,
Inside Microsoft SQL Server 2005: T-SQL Querying
Inside Microsoft SQL Server 2005: The Storage Engine
have very deep and thorough explanation of optimizing sql server querying.

SQL Server Query Performance Tuning Distilled, Second Edition

I've recently been focusing on this for my company, and I've learned some interesting things about specifically query optimization.
I've run SQL Profiler for a half hour at a time and logged queries that required 1000 reads or more (then later ones that required 50 CPU or more).
I originally focused on individual queries with the highest reads and CPU. However, having written the logs to a database, I was able to query aggregate results to see which queries required the most aggregate reads and CPU. Targeting these actually helped a lot more than only targeting the most expensive queries.
The most expensive query might be run once a day, so it's good to optimize that. However, if the 10th most expensive query is run 100 times an hour, it's much more helpful to optimize that first.
Here's a summary of what I've learned so far, which can help you get started in identifying queries for optimization:
A Beginner's Guide to Database Query Optimization
Highly Inefficient Linq Queries that Break Database Indexing
An Obscure Performance Pitfall for Test Accounts and Improperly Indexed Database Tables

Please find some tips for database/query optimization.
Apply functions to parameters, not columns
One of the most common mistakes seen when looking at database queries, is the improper use of functions against database tables. Whenever we need to apply a function to a column and validate the result against a value, it's worth checking if we have the reverse function that we can apply against the given column. In this way, the database engine can use an index against that column, and there isn't the need to define a functional based index.
against a 60 rows table with no indexes whatsoever, the following query
SELECT ticker.SYMBOL,
ticker.TSTAMP,
ticker.PRICE
FROM ticker
WHERE TO_CHAR(ticker.TSTAMP, 'YYYY-MM-DD') = '2011-04-01'
executes in 0.006s, whereas, the "reverse" query
SELECT ticker.SYMBOL,
ticker.TSTAMP,
ticker.PRICE
FROM ticker
WHERE
ticker.TSTAMP = TO_DATE('2011-04-01', 'YYYY-MM-DD')
-- executes in 0.004s
Exists clause instead of IN (subquery)
Another observed pattern in database development is that people choose the easy and the most convenient solution and for this tip, we will take a look at finding an element in a list. The easiest and most convenient solution is using the IN operator.
SELECT symbol, tstamp, price
FROM ticker
WHERE price IN (3,4,5);
--or
SELECT symbol, tstamp, price
FROM ticker
WHERE price IN (SELECT price FROM threshold WHERE action = 'Buy');
This approach is ok when we have a small manageable list. When the list becomes extensively large and when the list is dynamic(it will be generated based on parameters that we'll have only at runtime) this approach tends to becomes quite costly for the database. The alternative solution is the use of the EXISTS operator as shown in the below code snippet:
SELECT symbol, tstamp, price
FROM ticker t
WHERE EXISTS (SELECT 1 FROM threshold m WHERE t.price = m.price AND m.action = 'Buy');
This approach will be faster because once the engine has found a hit, it will quit looking as the condition has proved true. With IN it will collect all the results from the subquery before further processing.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas