Sql server: internal workings [closed] - sql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Some of these might make little sense, but:
Sql code interpreted or compiled (and to what)?
What are joins translated into - I mean into some loops or what?
Is algorithm complexity analysis applicable to a query, for example is it possible to write really bad select - exponential in time by number of rows selected? And if so how to analyze queries?

Well ... quite general questions, so some very general answers
1) Sql code interpreted or compiled (and to what)?
SQL code is compiled in to execution plans.
2) What are joins translated into - I mean into some loops or what?
Depends on the join and the tables you're joining (as far as i know). SQL Server has some join primitives (hash join, nested loop join), depending on the objects involved in your sql code the query optimizer tries to choose the best option.
3) Not reallyIs algorithm complexity analysis applicable to a query, for example is it possible to write really bad select - exponential in time by number of rows selected? And if so how to analyze queries?
Not really sure, what you mean by that. But there are cases where you can do really bad things, for example using
SELECT TOP 1 col FROM Table ORDER BY col DESC
on a table without an index on col to find the lagest value for col instead of
SELECT MAX(col) FROM Table
You should get your hands on some/all of the books from the SQL Server internals series. They are really excellent an cover many things in great detail.

You'd get a lot of these answers by reading one of Itzik Ben-Gan's books. He covers these topics you mention in some detail.
http://tsql.solidq.com/books/index.htm

Related

Does SQL NOT IN opeator scales good? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm writing an app where users pass quizzes. So my purpose is to show quizzes which user didn't tackle before. For this reason I'm using SELECT id, name, problem FROM quizzes WHERE id NOT IN (...).
Imagine that there will be thousands of ids and quizzes.
Is it ok? How does it scale? Probably I need to redesign something / using DB appropriate for that or use another technique to achieve my purpose?
If you have a fixed list, then it should be fine.
If you have a subquery, then I strongly encourage not exists:
from foo f
where not exists (select 1 from <bar> b where b.quiz_id = f.quiz_id)
I recommend this based on the semantics of not exists versus not in. not exists handles NULL values more intuitively.
That said, with appropriate indexes, in most databases, not exists also often has the better performance.
You should consider there are limits to the SQL statements length imposed by each database engine. Though I haven't tested those limits, a 1k values in an IN operator should still work well for most databases, I would think that if you scale it up to 10k or more it could reach some databases limits and your statements will crash.
I would suggest rethinking this solution unless you can verify the worst possible case (with maximum parameters) still works well.
Usually a subquery can do the job, instead of manually sending 1k parameters or assembling a big SQL statement by concatenating strings.

Can converting a SQL query to PL/SQL improve performance in Oracle 12c? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have been given an 800 lines SQL Query which is taking around 20 hours to fetch around 400 million records.
There are 13 tables which are partitioned by month.
The tables have records ranging from 10k to 400 million in each partition.
The tables are indexed on primary keys.
The query uses many inline views and outer joins and a few group by functions.
DBAs say we cannot add more indexes as it would slow down the performance since it is an OLTP system.
I have been asked to convert the query logic to pl/sql and then populate a table in chunks.Then do a select * from that table.
My end result should be a query which can be fed to my application.
So even after I use pl/sql to populate a table in chunks,ultimately I need to fetch the data from that table as a query.
My question is, since pl/sql would require select and insert both, are there any chances pl/sql can be faster than sql?
Are there any cases where pl/sql is faster for any result which is achievable by sql?
I will be happy to provide more information if the given info doesn't suffice.
Implementing it as a stored procedure could be faster because the SQL will already be parsed and compiled when the procedure is created. However, given the volume of data you are describing its unclear if this will make a significant difference. All you can do is try it and see.
I think you really need to identify where the performance problem is; where the time is being spent. For example (and I have seen examples of this many times), the majority of the time might be in fetching to 400M rows to whatever the "client" is. In that case, re-writing the query or as PL/SQL will make no difference.
Anyway, once you can enumerate the problem, you have a better chance of getting sound answers, rather than guesses...

SQL Server - Derived Table Data Storage [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
In SQL Server while writing a query, I noticed that the data in inner query which is a derived table when joined with another table is taking long. The keys joined to the outer table is on the primary key. So I was surprised since the data was about 10,000 records and 15 columns.
But if we store the data from derived table in a temp table and then join the performance was less than 2 seconds. It made me wonder what the reason would be ?
First, you should edit your question and show your query . . . or at least the structure of the query.
Your issue is probably due to optimization of the query. When you create a temporary table, then the resulting query has accurate statistics about the table during the compilation phase.
When you use a derived table, SQL Server has to guess at the size of the intermediate table and decide on an execution plan before knowing the actual. This would appear to be a situation where the guess is wrong.
If you don't want to use a temporary table, you can probably get the same effect using hints, probably for the join to use either a hash or merge sort algorithm (in my experience, the nested loops algorithm is usually the cause of poor performance).

SQL cost compare INNER JOIN vs SELECT twice [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
There are two tables with a common field id. What I want to do is select all attributes for a specific id, and I'm wondering which way is more efficient.
Using INNER JOIN, and then a single SELECT * operation is done.
Select from the smaller table first, if the id exist, then select from the larger table.
In most databases, you want to do the join:
select *
from bigtable b join
smalltable s
on b.id = s.id
where b.id = #id;
SQL engines have an optimizer to determine the best execution plan for a query. As mentioned in the comment, having an index woiuld often speed this up.
By selecting from one table and then the other, you are forcing a particular execution plan.
In general, you should trust the SQL engine to produce the best execution plan. In some cases, it may be better to do one and then the other, but generally that is not true.
This will vary based on each circumstance. You can't make a generic statement saying one will always be better .
To compare you can look at execution plans, or simply run both and compare based on execution time.
Ex: if it's rare to find data in the second table, then over time it might be better to run the single query etc
I suggest you take the 2nd way.
It is a good practice to keep some main/primary info in a index table, then put extra / detail info on another big table.
To divide info into two part (main/primary | extra / detail), because most of the time, we only the the first part info, it can save the cost of large query, large data transfer, the net bandwidth.

Good resources for learning database optimization part [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am good at database(sql) programming part but I want to move ahead into database optimization part like: where and when to indexes, how to decide which query is better than other, how to optimize database. Can you guide me some good resources or books which can lead me to this?
Inside Microsoft SQL Server 2005: Query Tuning and Optimization,
Inside Microsoft SQL Server 2005: T-SQL Querying
Inside Microsoft SQL Server 2005: The Storage Engine
have very deep and thorough explanation of optimizing sql server querying.
SQL Server Query Performance Tuning Distilled, Second Edition
I've recently been focusing on this for my company, and I've learned some interesting things about specifically query optimization.
I've run SQL Profiler for a half hour at a time and logged queries that required 1000 reads or more (then later ones that required 50 CPU or more).
I originally focused on individual queries with the highest reads and CPU. However, having written the logs to a database, I was able to query aggregate results to see which queries required the most aggregate reads and CPU. Targeting these actually helped a lot more than only targeting the most expensive queries.
The most expensive query might be run once a day, so it's good to optimize that. However, if the 10th most expensive query is run 100 times an hour, it's much more helpful to optimize that first.
Here's a summary of what I've learned so far, which can help you get started in identifying queries for optimization:
A Beginner's Guide to Database Query Optimization
Highly Inefficient Linq Queries that Break Database Indexing
An Obscure Performance Pitfall for Test Accounts and Improperly Indexed Database Tables
Please find some tips for database/query optimization.
Apply functions to parameters, not columns
One of the most common mistakes seen when looking at database queries, is the improper use of functions against database tables. Whenever we need to apply a function to a column and validate the result against a value, it's worth checking if we have the reverse function that we can apply against the given column. In this way, the database engine can use an index against that column, and there isn't the need to define a functional based index.
against a 60 rows table with no indexes whatsoever, the following query
SELECT ticker.SYMBOL,
ticker.TSTAMP,
ticker.PRICE
FROM ticker
WHERE TO_CHAR(ticker.TSTAMP, 'YYYY-MM-DD') = '2011-04-01'
executes in 0.006s, whereas, the "reverse" query
SELECT ticker.SYMBOL,
ticker.TSTAMP,
ticker.PRICE
FROM ticker
WHERE
ticker.TSTAMP = TO_DATE('2011-04-01', 'YYYY-MM-DD')
-- executes in 0.004s
Exists clause instead of IN (subquery)
Another observed pattern in database development is that people choose the easy and the most convenient solution and for this tip, we will take a look at finding an element in a list. The easiest and most convenient solution is using the IN operator.
SELECT symbol, tstamp, price
FROM ticker
WHERE price IN (3,4,5);
--or
SELECT symbol, tstamp, price
FROM ticker
WHERE price IN (SELECT price FROM threshold WHERE action = 'Buy');
This approach is ok when we have a small manageable list. When the list becomes extensively large and when the list is dynamic(it will be generated based on parameters that we'll have only at runtime) this approach tends to becomes quite costly for the database. The alternative solution is the use of the EXISTS operator as shown in the below code snippet:
SELECT symbol, tstamp, price
FROM ticker t
WHERE EXISTS (SELECT 1 FROM threshold m WHERE t.price = m.price AND m.action = 'Buy');
This approach will be faster because once the engine has found a hit, it will quit looking as the condition has proved true. With IN it will collect all the results from the subquery before further processing.