Suppose I have a table with lots of rows and columns (alias: bigtable), and a table that always has 1 row, but multiple columns (alias: 1rowtable). The 1rowtable has nothing to do with bigtable, it is just there for some settings my script uses that are modified dynamically. So I cannot use static SQLCMD variables for that and I cannot use normal variables for that either because my script has GO statements.
Now I want to write a select statement that accesses BOTH tables.
If I do:
SELECT ... FROM bigtable, 1rowtable
it does a CROSS JOIN so that is bad, can't go that route.
If I use a CTE for 1rowtable, I have to access its fields with
SELECT field FROM 1rowtable
So that is bad too. Same with a table valued function like:
CREATE FUNCTION getSetting(#name nvarchar(40))
RETURNS TABLE
AS
RETURN (SELECT name FROM 1rowtable WHERE name = #name)
Obviously I cannot use a scalar function at all because it only returns a specific datatype, but the settings have different datatypes. Yet, obviously I would like to use it LIKE a scalar function of course without doing the 'SELECT .. FROM dbo.getfieldfrom1rowtable(..)' stuff, since I am using the 1rowtable rather often in queries.
I also tried doing:
SELECT
(SELECT
<expression involving bigtable and 1rowtable>,
<expression involving bigtable and 1rowtable>,
<expression involving bigtable and 1rowtable>,
...
FROM 1rowtable)
FROM bigtable
But of course a subselect cannot select more than one item if it does not begin with exists...
So what should I do? It seems I will have to continue using 'SELECT .. FROM dbo.getfieldfrom1rowtable(..)' every time? Just curious :)
PS. ms sql server 2008r2
There is nothing wrong with using a cross join to bring together rows from tables, particularly when one only has one row.
Use the syntax:
select bt.*, ort.*
from bigtable bt cross join
onerowtable ort
There is nothing inherently "wrong" with cross joins, when they are used correctly. The problem is when they are used inadvertently. If you cross join two tables with a million rows . . . well, your temp space is going to fill up, your processor(s) will be very busy, and the query will eventually crash due to a lack of resources.
However, cross joining a table with one row to another table poses no problems at all.
Related
I came across a SQL practice question. The revealed answer is
SELECT ROUND(ABS(a - c) + ABS(b - d), 4) FROM (
SELECT MIN(lat_n) AS a, MIN(long_w) AS b, MAX(lat_n) AS c, MAX(long_w) AS d
FROM station);
Normally, I would enocunter
select[] from[] where [] (select...)
which to imply that the selected variable from the inner loop at the where clause will determine what is to be queried in the outer loop. As mentioned at the beginning, this time the select is after
FROM to me I'm curious the functionality of this. Is it creating an imaginary table?
The piece in parentheses:
(SELECT MIN(lat_n) AS a, MIN(long_w) AS b, MAX(lat_n) AS c, MAX(long_w) AS d FROM station)
is a subquery.
What's important here is that the result of a subquery looks like a regular table to the outer query. In some SQL flavors, an alias is necessary immediately following the closing parenthesis (i.e. a name by which to refer to the table-like result).
Whether this is technically a "temporary table" is a bit of a detail as its result isn't stored outside the scope of the query; and there is an also a thing called a temporary table which is stored.
Additionally (and this might be the source of confusion), subqueries can also be used in the WHERE clause with an operator (e.g. IN) like this:
SELECT student_name
FROM students
WHERE student_school IN (SEELCT school_name FROM schools WHERE location='Springfield')
This is, as discussed in the comments and the other answer a subquery.
Logically, such a subquery (when it appears in the FROM clause) is executed "first", and then the results treated as a table1. Importantly though, that is not required by the SQL language2. The entire query (including any subqueries) is optimized as a whole.
This can include the optimizer doing things like pushing a predicate from the outer WHERE clause (which, admittedly, your query doesn't have one) down into the subquery, if it's better to evaluate that predicate earlier rather than later.
Similarly, if you had two subqueries in your query that both access the same base table, that does not necessarily mean that the database system will actually query that base table exactly twice.
In any case, whether the database system chooses to materialize the results (store them somewhere) is also decided during the optimization phase. So without knowing your exact RDBMS and the decisions that the optimizer takes to optimize this particular query, it's impossible to say whether it will result in something actually being stored.
1Note that there is no standard terminology for this "result set as a table" produced by a subquery. Some people have mentioned "temporary tables" but since that is a term with a specific meaning in SQL, I shall not be using it here. I generally use the term "result set" to describe any set of data consisting of both columns and rows. This can be used both as a description of the result of the overall query and to describe smaller sections within a query.
2Provided that the final results are the same "as if" the query had been executed in its logical processing order, implementations are free to perform processing in any ordering they choose to.
As there are so many terms involved, I just thought I'll throw in another answer ...
In a relational database we deal with tables. A query reads from tables and its result again is a table (albeit not a stored one).
So in the FROM clause we can access query results just like any stored table:
select * from (select * from t) x;
This makes the inner query a subquery to our main query. We could also call this an ad-hoc view, because view is the word we use for queries we access data from. We can move it to the begin of our main query in order to enhance readability and possibly use it multiple times in it:
with x as (select * from t) select * from x;
We can even store such queries for later access:
create view v as select * from t;
select * from v;
In the SQL standard these terms are used:
BASE TABLE is a stored table we create with CREATE TABLE .... t in above examples is supposed to be a base table.
VIEWED TABLE is a view we create with CREATE VIEW .... v above examples is a viewed table.
DERIVED TABLE is an ad-hoc view, such as x in the examples above.
When using subqueries in other clauses than FROM (e.g. in the SELECT clause or the WHERE clause), we don't use the term "derived table". This is because in these clauses we don't access tables (i.e. something like WHERE mytable = ... does not exist), but columns and expression results. So the term "subquery" is more general than the term "derived table". In those clauses we still use various terms for subqueries, though. There are correlated and non-correlated subqueries and scalar and non-scalar ones.
And to make things even more complicated we can use correlated subqueries in the FROM clause in modern DBMS that feature lateral joins (sometimes implemented as CROSS APPLY and OUTER APPLY). The standard calls these LATERAL DERIVED TABLES.
I would like to consult three aspects of performance (Oracle 11g).
1./ If I define temporary table by keyword "WITH" like
WITH tbl AS (
SELECT [columns from both tables...]
FROM table_with_inexes
JOIN other_table ...
)
SELECT ...
FROM tbl
JOIN xxx ON tbl.column = xxx.column
is subsequent select on that temporary table able to use indexes, that was defined on table_with_inexes and other_table?
2./ Is it possible to add indexes to temporary table created by "WITH" in that above-like single SQL command?
3./ When I have construct such as this:
...
LEFT JOIN (
SELECT indexedColumn, otherColumns
FROM table
JOIN other_table
GROUP BY ...
) C
ON (outerTable.indexedColumn = C.indexedColumn)
in which cases could Oracle use indexes on indexedColumn? I assume, that the select in LEFT JOIN is only "projection" that does not maintain indexes, so the join's ON clausule evaluation is evaluated without using indexes?
The WITH clause (or subquery factoring as it's known as) is just a means of creating aliases for subqueries. It's most useful when you have multiple copies of the same subquery in your query, in which case Oracle may or may not choose to create a temporary table for it behind the scenes (aka "materialize" it). You should read up on this - here's a good link.
To answer your questions:
1) If the indexes are available to be used (no functions on the columns involved, selecting a small percentage of the data etc, etc) then they'll be used, just like in any other query.
2) You can't add indexes to the subquery. Not even to the temporary table that Oracle might create behind the scenes; you have no control over that.
3) I suggest you read up about when indexes might or might not be used. Try http://www.orafaq.com/node/1403 or http://www.orafaq.com/tuningguide/not%20using%20index.html, or perform your own google search.
WITH clause might be either inlined or materialized. It's up to Oracle to decide which approach is better. In your case most probably both queries will have the same execution plan(will be inlined)
PS: even if the table is materialized, indexes can not be added, Oracle can not do that. On the other hand in most cases it is not even necessary, the table can be materialized as a hash table(not heap table) or full table scan is used on it.
I maintain an application where I am trying to optimize an Oracle SQL query wherein multiple IN clauses are used. This query is now a blocker as it hogs nearly 3 minutes of execution time and affects application performance severely.The query is called from Java code(JDBC) and looks like this :
Select disctinct col1,col2,col3,.. colN from Table1
where 1=1 and not(col1 in (idsetone1,idsetone2,... idsetoneN)) or
(col1 in(idsettwo1,idsettwo2,...idsettwoN))....
(col1 in(idsetN1,idsetN2,...idsetNN))
The ID sets are retrieved from a different schema and therefore a JOIN between column1 of table 1 and ID sets is not possible. ID sets have grown over time with use of the application and currently they number more than 10,000 records.
How can I start with optimizing this query ?
I really doupt about "The ID sets are retrieved from a different schema and therefore a JOIN between column1 of table 1 and ID sets is not possible." Of course you can join the tables, provided you got select privileges on it.
Anyway, let's assume it is not possible due to whatever reason. One solution could be to insert all entries first into a Nested Table and the use this one:
CREATE OR REPLACE TYPE NUMBER_TABLE_TYPE AS TABLE OF NUMBER;
Select disctinct col1,col2,col3,.. colN from Table1
where 1=1
and not (col1 NOT MEMBER OF (NUMBER_TABLE_TYPE(idsetone1,idsetone2,... idsetoneN))
OR
(col1 MEMBER OF NUMBER_TABLE_TYPE(idsettwo1,idsettwo2,...idsettwoN))
Regarding the max. number of elements Oracle Documentation says: Because a nested table does not have a declared size, you can put as many elements in the constructor as necessary.
I don't know how serious you can take this statement.
You should put all the items into one temporary table and to an explicit join:
Select your cols
from Table1
left join table_with_items
on table_with_items.id = Table1.col1
where table_with_items.id is null;
Also that distinct suggest a problem in your business logic or in the architecture of application. Why do you have duplicate ids? You should get rid of that distinct.
I want to select a set of rows and return them to the client, but I would also like to insert just the primary keys (integer id) from the result set into a temporary table for use in later joins in the same transaction.
This is for sync, where subsequent queries tend to involve a join on the results from earlier queries.
What's the most efficient way to do this?
I'm reticent to execute the query twice, although it may well be fast if it was added to the query cache. An alternative is store the entire result set into the temporary table and then select from the temporary afterward. That also seems wasteful (I only need the integer id in the temp table.) I'd be happy if there was a SELECT INTO TEMP that also returned the results.
Currently the technique used is construct an array of the integer ids in the client side and use that in subsequent queries with IN. I'm hoping for something more efficient.
I'm guessing it could be done with stored procedures? But is there a way without that?
I think you can do this with a Postgres feature that allows data modification steps in CTEs. The more typical reason to use this feature is, say, to delete records for a table and then insert them into a log table. However, it can be adapted to this purpose. Here is one possible method (I don't have Postgres on hand to test this):
with q as (
<your query here>
),
t as (
insert into temptable(pk)
select pk
from q
)
select *
from q;
Usually, you use the returning clause with the data modification queries in order to capture the data being modified.
Morning,
I have a stored procedure that returns 'SELECT *' from a table.
Whenever I add a new column to the table, the 'SELECT *' often returns some data in the wrong columns.
Is this an optimization or caching problem? How do I solve this without having to explicitly define the return column names in my stored procedure?
Thanks!
Regardless of the exact nature of your problem, or a solution to it, I would recommend that you don't use Select * From Table. This is less efficient - each time you run the query, an extra request is sent to the DB to determine exactly what columns '*' constitutes, and then a proper request is sent with specific column information.
The reason for your problem probably depends on the client stack you're using.
Speaking in very general terms, SQL Server will return the columns in the order they're added to the table if you perform a SELECT *. If there is more than one table involved in the query, it will return the columns from each table in the order they appear in the query.
Neither caching nor optimisation should affect this server-side if the table columns have changed, so it may be something happening between your code and the server, in whatever data access stack you happen to be using.
This is one of the reasons it's generally recommended not to use "SELECT *" in client code.
The best way to avoid this problem is to have the stored procedure return a predefined set of columns (SELECT a, b instead of SELECT *) and use a table join to retrieve the rest of the code. Because stored procedures cannot be part of a query, you could refactor the stored procedure into a table-valued user-defined function and perform a join on it:
SELECT f.a, f.b, t.*
FROM dbo.fn_YourFunction('a', 'b') f
INNER JOIN YourTable t ON f.id = t.id