I understand what a subquery is (aka inner query / nested query).
A subquery, also known as a nested query or subselect, is a SELECT query embedded within the WHERE or HAVING clause of another SQL query.
Example -
SELECT * FROM customers WHERE cust_id IN (SELECT DISTINCT cust_id FROM orders
WHERE order_value > 5000);
I am more concerned about understanding the kind (name) of below query -
SELECT ta.col_a1, ta.col_a2, temp.col_tmp_a FROM table_a ta, (
SELECT tb.col_b1, tb.col_b2, tc.col_c1 FROM table_b tb, table_c tc
WHERE tb.col_a1 = tc.col_c2 ) AS temp
WHERE temp.col_b1 = ta.col_a1
If am right, the above query can be better written with inner joins for better performance.
But performance is not my concern, i just want to know the name of this kind of query.
If some one knows the name, then plz answer.
That type of query is called "Table Expression", and also known as "Derived Table", or "Inline View", depending on the lingo and database documentation. They take the place of a table/view in a query.
Just to complement the question, the types of subqueries I've identified so far are:
Scalar subquery: a query that takes the place of a scalar in a SELECT list.
Table Expression/Derived Table/Inline View: described above.
Independent [Recursive] CTE: A query definition specified before the main query itself.
Dependent [Recursive] CTE: A query definition specified before the main query itself that depends on another CTE(s).
Non-correlated subquery: A subquery that can be run independently of the rest of the query.
Correlated subquery: A subquery that depends on values from another table and needs to be executed accordingly.
Lateral Subquery: A query placed in the same location of a Table Expression, but that is correlated to the previous tables.
See How many types of SQL subqueries are there?.
It is called a derived table and below are the details
Derived tables are the tables which are created on the fly with the help of the Select statement. Derived table expression appears in the FROM clause of a query. In derived table server create and populate the table in the memory, we can directly use it and we also don’t require to drop the table. But scope of derived table is limited to the outer Select query who created it. Derived table can’t be used further outside the scope of outer select query.
it is still just called a Subquery, just instead as it being used to get specific clause value to filter the list, it is used effectivly as a table and allows you to select columns from the nested query just as you would from a table. Hope that answers your question.
Related
I came across a SQL practice question. The revealed answer is
SELECT ROUND(ABS(a - c) + ABS(b - d), 4) FROM (
SELECT MIN(lat_n) AS a, MIN(long_w) AS b, MAX(lat_n) AS c, MAX(long_w) AS d
FROM station);
Normally, I would enocunter
select[] from[] where [] (select...)
which to imply that the selected variable from the inner loop at the where clause will determine what is to be queried in the outer loop. As mentioned at the beginning, this time the select is after
FROM to me I'm curious the functionality of this. Is it creating an imaginary table?
The piece in parentheses:
(SELECT MIN(lat_n) AS a, MIN(long_w) AS b, MAX(lat_n) AS c, MAX(long_w) AS d FROM station)
is a subquery.
What's important here is that the result of a subquery looks like a regular table to the outer query. In some SQL flavors, an alias is necessary immediately following the closing parenthesis (i.e. a name by which to refer to the table-like result).
Whether this is technically a "temporary table" is a bit of a detail as its result isn't stored outside the scope of the query; and there is an also a thing called a temporary table which is stored.
Additionally (and this might be the source of confusion), subqueries can also be used in the WHERE clause with an operator (e.g. IN) like this:
SELECT student_name
FROM students
WHERE student_school IN (SEELCT school_name FROM schools WHERE location='Springfield')
This is, as discussed in the comments and the other answer a subquery.
Logically, such a subquery (when it appears in the FROM clause) is executed "first", and then the results treated as a table1. Importantly though, that is not required by the SQL language2. The entire query (including any subqueries) is optimized as a whole.
This can include the optimizer doing things like pushing a predicate from the outer WHERE clause (which, admittedly, your query doesn't have one) down into the subquery, if it's better to evaluate that predicate earlier rather than later.
Similarly, if you had two subqueries in your query that both access the same base table, that does not necessarily mean that the database system will actually query that base table exactly twice.
In any case, whether the database system chooses to materialize the results (store them somewhere) is also decided during the optimization phase. So without knowing your exact RDBMS and the decisions that the optimizer takes to optimize this particular query, it's impossible to say whether it will result in something actually being stored.
1Note that there is no standard terminology for this "result set as a table" produced by a subquery. Some people have mentioned "temporary tables" but since that is a term with a specific meaning in SQL, I shall not be using it here. I generally use the term "result set" to describe any set of data consisting of both columns and rows. This can be used both as a description of the result of the overall query and to describe smaller sections within a query.
2Provided that the final results are the same "as if" the query had been executed in its logical processing order, implementations are free to perform processing in any ordering they choose to.
As there are so many terms involved, I just thought I'll throw in another answer ...
In a relational database we deal with tables. A query reads from tables and its result again is a table (albeit not a stored one).
So in the FROM clause we can access query results just like any stored table:
select * from (select * from t) x;
This makes the inner query a subquery to our main query. We could also call this an ad-hoc view, because view is the word we use for queries we access data from. We can move it to the begin of our main query in order to enhance readability and possibly use it multiple times in it:
with x as (select * from t) select * from x;
We can even store such queries for later access:
create view v as select * from t;
select * from v;
In the SQL standard these terms are used:
BASE TABLE is a stored table we create with CREATE TABLE .... t in above examples is supposed to be a base table.
VIEWED TABLE is a view we create with CREATE VIEW .... v above examples is a viewed table.
DERIVED TABLE is an ad-hoc view, such as x in the examples above.
When using subqueries in other clauses than FROM (e.g. in the SELECT clause or the WHERE clause), we don't use the term "derived table". This is because in these clauses we don't access tables (i.e. something like WHERE mytable = ... does not exist), but columns and expression results. So the term "subquery" is more general than the term "derived table". In those clauses we still use various terms for subqueries, though. There are correlated and non-correlated subqueries and scalar and non-scalar ones.
And to make things even more complicated we can use correlated subqueries in the FROM clause in modern DBMS that feature lateral joins (sometimes implemented as CROSS APPLY and OUTER APPLY). The standard calls these LATERAL DERIVED TABLES.
So I've written a simple query that gives me the ID #s for properties that show up only once in the property_usage table, along with the code for their associated usage type. Since I didn't want to include a column that shows the count of how many times each property ID shows up in the property_usage table, I wrote two subqueries to get a list of all the property IDs that only show up once. I then use the result of those subqueries (a single column of propertyIDs) to filter out those properties that show up more than once in the table.
here's the query:
select pu.property_id, pu.usage_type_id
from acres_final_40.property_usage pu
where pu.property_id not in
(select multiple_use_properties
from
(select pu.property_id multiple_use_properties, count(pu.property_id)
from acres_final_40.property_usage pu
group by pu.property_id having count(pu.property_id) > 1))
order by pu.property_id;
My question is: is that innermost subquery correlated or noncorrelated with the outermost query?
I have the following thoughts (see the paragraph below), but I'd like to know for sure whether I'm right about this. I'm learning all this stuff on my own and don't have anyone I can ask about this in person!
My feeling is that it's not, because it seems like the pu.propertyID column from the outermost query isn't a value that's passed into the innermost query. It seems like the innermost query may technically be a derived table, in which case my code is sloppy because I don't alias the table name in the FROM clause of that SELECT statement.
In a SQL database query, a correlated subquery (also known as a synchronized subquery) is a subquery (a query nested inside another query) that uses values from the outer query.
(Wikipedia, emph. mine.)
Yours does not, so it's not a correlated subquery. Basically, if you can cut away the subquery and run it as an independent query, without the outer context, it's definitely uncorrelated. It can be done in your case.
BTW you could probably rewrite it using a not exists clause to check if another record with the same PK but another property_id exists, and get a better query plan than using count(). This is my speculation, though; only an explain plan would show if there's a benefit.
I have seen a few queries where the alias of the derived table is also used in the query that makes up the derived table. Can anyone confirm if this is allowable or not?
Here is a sample query. Pay attention to how alias "st" is used twice:
SELECT ft.ThisColumn, st.OtherID
FROM FirstTable ft
INNER JOIN
(SELECT st.CommonID,st.OtherID,DateEntered,DateExited,row_number() OVER (PARTITION BY OtherID ORDER BY DateEntered DESC) stRank
FROM SecondTable st
WHERE (#StartDate BETWEEN DateEntered and DateExited)
) st
ON ft.CommonID=st.CommonID AND st.stRank=1
Is it OK to use the same alias "st" in these two different places?
The st inside the derived table is only accessible inside that query and only inside phases that will be executed after FROM clause, and that is OK as it is not accessible in outside context.
The second st is an alias for the whole derived table's results which will be used in the outer context and inside phases that will be executed after FROM clause and that is OK too.
As you know, first the FROM of outer query clause will be executed and that will cause the derived table to be executed and after that the result(which are relational) returned by the derived table will get st as alias and will be participated in your join query.
Additional Note: Please keep in mind that Sql Server databases has a close relation with mathematical relations and sets, and as you know all the sets in mathematical theories should have a valid name as we need to refer to them, so every relation in sql server(Table, View, Table Expression such as derived table or CTE and etc) should have a valid name too.
But I advice you not to use two aliases with same name in one query, even if their logical processing phase is different, because it will reduce the readability of your query.
In short, your query is correct and valid.
I am new to DB2 and I have a question about the with clause.
For example in the following query:
WITH values AS
(
SELECT user_id, user_data FROM USER WHERE user_age < 20
)
SELECT avg(values.user_data) FROM values
UNION
SELECT sum(values.user_data) FROM values
How many times will the common table expression be executed? Will the result of the with clause be stored in a temporary table or it will do sub-select twice.
(I use with and union here just to give an example, and sorry for my poor english)
As #Vladimir Oselsky has mentioned, only looking at the execution plan will give you a definite answer. In this contrived example the CTE subselect will likely run twice.
In DB2, common table expressions should create the Common Table Expression Node in the execution plan (see the documentation here). This node explicitly says:
They serve as intermediate tables. Traditionally, a nested table
expression also serves this purpose. However, a common table
expression can be referenced multiple times after it is instantiated;
nested table expressions cannot.
I read this as saying that the CTE is only evaluated once, instantiated, and then used multiple times. Also, if the CTE is referenced only one time, the "instantiation" is optimized away.
Note that this is the way that Postgres handles CTEs (materialized subqueries) and not the way the SQL Server handles them.
The CUST table below will be joined with ~10 tables.
For this subquery in particular, am I better off simply joining up directly with the Customer table and moving the subquery's 4-part WHERE clause to the main query's WHERE clause?
I'm primarily wondering if it is possible to cut down on the amount of processing that SQL Server has to do if we localize portions of the master WHERE Clause by creating subqueries as below.
select * From
(select CKey, CID, CName from MainDB.dbo.Customer
where
LOC = 'ARK'
and Status = 1
and CID not like 'KAN%'
and CID not like 'MIS%') as CUST
In older versions, yes, I've seen huge improvements using derived tables (not a subquery) rather then all in one JOIN/WHERE. It's less relevant now since SQL Server 2005
However, why not try both and see what happens?
Based on what you provided, there's no need for the subquery. Without the ~10 joins to the derived table CUST, it's extremely difficult to say what should or should not be done.
I suggest that if you are joining that many tables it would be better to build a view.