Is it possible to get multiple values from a subquery? - sql

Is there any way to have a subquery return multiple columns in oracle db? (I know this specific sql will result in an error, but it sums up what I want pretty well)
select
a.x,
( select b.y, b.z from b where b.v = a.v),
from a
I want a result like this:
a.x | b.y | b.z
---------------
1 | 2 | 3
I know it is possible to solve this problem through joins, but that is not what I am asking for.
My Question is simply if there is any way, to get two or more values out of a subquery? Maybe some workaround using dual? So that there is NO actual join, but a new subquery for each row?
EDIT: This is a principle question. You can solve all these problems using join, I know. You do not need subqueries like this at all (not even for one column). But they are there. So can I use them in that way or is it simply impossible?

A Subquery in the Select clause, as in your case, is also known as a Scalar Subquery, which means that it's a form of expression. Meaning that it can only return one value.
I'm afraid you can't return multiple columns from a single Scalar Subquery, no.
Here's more about Oracle Scalar Subqueries:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/expressions010.htm#i1033549

It's incorrect, but you can try this instead:
select
a.x,
( select b.y from b where b.v = a.v) as by,
( select b.z from b where b.v = a.v) as bz
from a
you can also use subquery in join
select
a.x,
b.y,
b.z
from a
left join (select y,z from b where ... ) b on b.v = a.v
or
select
a.x,
b.y,
b.z
from a
left join b on b.v = a.v

Here are two methods to get more than 1 column in a scalar subquery (or inline subquery) and querying the lookup table only once. This is a bit convoluted but can be the very efficient in some special cases.
You can use concatenation to get several columns at once:
SELECT x,
regexp_substr(yz, '[^^]+', 1, 1) y,
regexp_substr(yz, '[^^]+', 1, 2) z
FROM (SELECT a.x,
(SELECT b.y || '^' || b.z yz
FROM b
WHERE b.v = a.v)
yz
FROM a)
You would need to make sure that no column in the list contain the separator character.
You could also use SQL objects:
CREATE OR REPLACE TYPE b_obj AS OBJECT (y number, z number);
SELECT x,
v.yz.y y,
v.yz.z z
FROM (SELECT a.x,
(SELECT b_obj(y, z) yz
FROM b
WHERE b.v = a.v)
yz
FROM a) v

Can't you use JOIN like this one?
SELECT
a.x , b.y, b.z
FROM a
LEFT OUTER JOIN b ON b.v = a.v
(I don't know Oracle Syntax. So I wrote SQL syntax)

you can use cross apply:
select
a.x,
bb.y,
bb.z
from
a
cross apply
( select b.y, b.z
from b
where b.v = a.v
) bb
If there will be no row from b to mach row from a then cross apply wont return row. If you need such a rows then use outer apply
If you need to find only one specific row for each of row from a, try:
cross apply
( select top 1 b.y, b.z
from b
where b.v = a.v
order by b.order
) bb

In Oracle query
select a.x
,(select b.y || ',' || b.z
from b
where b.v = a.v
and rownum = 1) as multple_columns
from a
can be transformed to:
select a.x, b1.y, b1.z
from a, b b1
where b1.rowid = (
select b.rowid
from b
where b.v = a.v
and rownum = 1
)
Is useful when we want to prevent duplication for table A.
Similarly, we can increase the number of tables:
.... where (b1.rowid,c1.rowid) = (select b.rowid,c.rowid ....

View this web:
http://www.w3resource.com/sql/subqueries/multiplee-row-column-subqueries.php
Use example
select ord_num, agent_code, ord_date, ord_amount
from orders
where(agent_code, ord_amount) IN
(SELECT agent_code, MIN(ord_amount)
FROM orders
GROUP BY agent_code);

Related

Why does an INSERT take time when using subqueries

INSERT INTO TableA
SELECT
x,
y,
z
FROM TableB
WHERE x IN
(select DISTINCT x
FROM TableC
WHERE x NOT IN
(SELECT DISTINCT x from TableD)
)
This query takes forever and it doesn't complete.
When I run the each select query it works fine but when I run it all it takes forever? Can you see the reason?
Try this query :
insert into TableA
select b.*
from TableB b --with(nolock)
left outer join TableC c --with(nolock)
on b.x = c.x
left outer join TableD d --with(nolock)
on c.x = d.x
where c.x is not null and d.x is null
if it is also running infinite then uncomment with(nolock) and try again. if it does not work then check estimated execution plan.
Firstly you need to look at the execution plan for the query - it might tell you where the bottlenecks are or if there are missing indexes that would significantly speed up your query - I think this is likely as your query is simple so I don't see why it would take so long;
I think you could also restructure you query so that it uses Joins instead of the not in - it would help if i knew the data to see if this produced the same results, But i think it should;
SELECT B.x,
B.y,
B.Z
FROM TableB B
INNER JOIN --where in
(
SELECT DISTINCT x
FROM TableC c
LEFT JOIN TableD d
ON c.x = d.x
WHERE d.x IS NULL -- c x not in d x
) sub
on B.x = sub.x
Subqueries and DISTINCT when not needed are notoriously poor for performance. You can accomplish what you need using JOINs.
SELECT b.x, b.y, b.z
FROM TableB b
INNER JOIN TableC c ON c.x=b.x
LEFT JOIN TableD d ON d.x=b.x
WHERE d.x IS NULL
GROUP BY b.x, b.y, b.z -- only if you have duplicates and need unique records
The INNER JOIN on TableC fixes your 1st "IN", then the LEFT JOIN and d.x IS NULL fixes your "NOT IN" clause.
Lastly, make sure that you have indexes on the "x" column in each table.
CREATE INDEX IX_TableB_X ON TableB (X);
CREATE INDEX IX_TableC_X ON TableC (X);
CREATE INDEX IX_TableD_X ON TableD (X);

T-SQL: Wild Card as table name alias in select statement possible?

This is more of a curiosity than an actual applied question. Say you have a statement with multiple joins such as:
SELECT
a.name,
b.salary,
c.x
FROM
[table1] a
INNER JOIN [table2] b
ON a.key = b.key
INNER JOIN [table3] c
ON b.key = c.key
Now, say you were to make several more joins to other tables whose schema was unfamiliar, however you know:
the keys on which to make the join
that several of those tables has a column with the the name 'x'.
Is it possible to select 'x' from all tables that contain it, without explicitly referring to the table alias. So it would ave a similar results as this (if it were possible)
SELECT
a.name,
b.salary,
*.x
...
No this isn't possible.
You can use a.* to get all columns from a but it is not valid to use a wildcard as the table name.
#Martin Smith is correct that you can't use *.x and refer to columns from multiple tables. There is however a way to write a query that shows all columns x from tables where they exist without breaking if one or more of the tables do not have such column. It's a rather complicated way that (mis)uses scope resolution.
Lets say that some of the tables (b and d in the example) have a column named x, while some others (c here) do not have such column. Then you can replace INNER joins with CROSS APPLY and LEFT joins with OUTER APPLY and a query with:
SELECT
a.name,
a.salary,
b.x AS bx,
'WITHOUT column x' AS cx,
d.x AS dx
FROM
a
INNER JOIN b
ON a.aid = b.aid
LEFT JOIN c
ON a.aid = c.aid
LEFT JOIN d
ON a.aid = d.aid ;
would be written as:
SELECT
a.name,
a.salary,
bx,
cx,
dx
FROM
( SELECT a.*,
'WITHOUT column x' AS x
FROM a
) a
CROSS APPLY
( SELECT x AS bx
FROM b
WHERE a.aid = b.aid
) b
OUTER APPLY
( SELECT x AS cx
FROM c
WHERE a.aid = c.aid
) c
OUTER APPLY
( SELECT x AS dx
FROM d
WHERE a.aid = d.aid
) d ;
Tested at SQL-Server 2008: SQL-Fiddle

SQL Case With Many Columns

Hi I have looked through the "Case with multiple columns" questions and don't see something the same as this so I think I should ask.
Basically I have two tables (both are the result of a subquery) which I want to join. They have the same column names. If I join them on their ids and SELECT * I get each row being something like this:
A.id, A.x, A.y, A.z, A.num, B.id, B.x, B.y, B.z, B.num
What I want is a way to only select the columns of the table with the lower value of num. So in this case the result table would always have 5 columns, id, x, y, z, num, and I don't care which table id, x, y, z, num came from after the fact. Also either table result is fine if they are equal.
SELECT CASE WHEN A.num < B.num THEN A.* ELSE B.* END FROM A JOIN B ON A.id=B.id
would be perfect but you can only return one column in a CASE statement, and I could use a CASE for every column but that seems so wasteful (there are 8 in each table in my actual database so I would have 8 CASE statements).
This is SQLite btw. Any help would be appreciated!
Edit for more info on A and B:
A and B come from Queries like this
SELECT "thought case statement might go here" FROM
(SELECT id, x, y, z, num FROM Table1 a JOIN Table2 b ON a.id=b.id AND (y BETWEEN (53348574-3593) AND (53348574+3593)) AND (z BETWEEN (-6259973-6027) AND (-6259973+6027)) JOIN Table3 c ON c.id= b.id GROUP BY a.id, c.r) A
JOIN
(SELECT id, x, y, z, num FROM Table1 a JOIN Table2 b ON a.id=b.id AND (y BETWEEN (53401007-3593) AND (53401007+3593)) AND (z BETWEEN (-6397286-6027) AND (-6397286+6027)) JOIN Table3 c ON c.id= b.id GROUP BY a.id, c.r ) B ON A.id=B.id
So it joins two tables, made based on geolocation if you're wondering why the big numbers, and needs to decide which of the tables to take its data from based on attributes of what it finds in either of the locations.
try
select A.id, A.x, A.y, A.z, A.num from A JOIN B ON A.id=B.id where a.num<b.num
union
select b.id, b.x, b.y, b.z, b.num from A JOIN B ON A.id=B.id where b.num<a.numhere
I don't know of any RDBMS supporting what you want. You will have to write 8 CASE statements. But why is it so wasteful? Or are you just lazy? :)
Edit:
See, when you write SELECT * ... what your RDBMS is doing is to query system tables (information_schema and so on) and get the list of columns in the table.
So when you write
SELECT CASE WHEN A.num < B.num THEN A.* ELSE B.* END ...
you basically write
SELECT CASE WHEN A.num < B.num THEN A.num, A.whatever, A.more, ... ELSE B.num, B.whatever, B.more... END FROM A JOIN B ON A.id=B.id
and this is unfortunately wrong syntax.

Using a single SQL correlated sub-query to get two columns

My problem is represented by the following query:
SELECT
b.row_id, b.x, b.y, b.something,
(SELECT a.x FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_x,
(SELECT a.y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_y
FROM
my_table b
I'm using the same subquery statement twice, for getting both source_x and source_y.
That's why I'm wondering if it's possible to do it using one subquery only?
Because once I run this query on my real data (millions of rows) it seems to never finish and take hours, if not days (my connection hang up before the end).
I am using PostgreSQL 8.4
I think you can use this approach:
SELECT b.row_id
, b.x
, b.y
, b.something
, a.x
, a.y
FROM my_table b
left join my_table a on a.row_id = (b.row_id - 1)
and a.something != 42
#DavidEG posted the best syntax for the query.
However, your problem is definitely not just with the query technique. A JOIN instead of two subqueries can speed up things by a factor of two at best. Most likely less. That doesn't explain "hours". Even with millions of rows, a decently set up Postgres should finish the simple query in seconds, not hours.
First thing that stands out is the syntax error in your query:
... WHERE a.row_id = (b.row_id - 1), a.something != 42
AND or OR is needed here, not a comma.
Next thing to check are indexes. If row_id is not the primary key, you may not have an index on it. For optimum performance of this particular query create a multi-column index on (row_id, something) like this:
CREATE INDEX my_table_row_id_something_idx ON my_table (row_id, something)
If the filter excludes the same value every time in something != 42you can also use a partial index instead for additional speed up:
CREATE INDEX my_table_row_id_something_idx ON my_table (row_id)
WHERE something != 42
This will only make a substantial difference if 42 is a common value or something is a bigger column than just an integer. (An index with two integer columns normally occupies the the same size on disk as an index with just one, due to data alignment. See:
Calculating and saving space in PostgreSQL
Is a composite index also good for queries on the first field?
When performance is an issue, it is always a good idea to check your settings. Standard settings in Postgres use minimal resources in many distributions and are not up to handling "millions of rows".
Depending on your actual version of Postgres, an upgrade to a current version (9.1 at the time of writing) may help a lot.
Ultimately, hardware is always a factor, too. Tuning and optimizing can only get you so far.
old-fashioned syntax:
SELECT
b.row_id, b.x, b.y, b.something
, a.x AS source_x
, a.y AS source
FROM my_table b
,my_table a
WHERE a.row_id = b.row_id - 1
AND a.something != 42
;
Join-syntax:
SELECT
b.row_id, b.x, b.y, b.something
, a.x AS source_x
, a.y AS source
FROM my_table b
JOIN my_table a
ON (a.row_id = b.row_id - 1)
WHERE a.something != 42
;
SELECT b.row_id, b.x, b.y, b.something, a.x, a.y
FROM my_table b
LEFT JOIN (
SELECT row_id + 1, x, y
FROM my_table
WHERE something != 42
) AS a ON a.row_id = b.row_id;
Postgres:
SELECT
b.row_id, b.x, b.y, b.something,
source_x,
source_y
FROM
my_table b,
LATERAL(SELECT a.x AS source_x, a.y AS source_y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 )
MsSQL
SELECT
b.row_id, b.x, b.y, b.something,
source_x,
source_y
FROM
my_table b
OUTER APPLY(SELECT a.x AS source_x, a.y AS source_y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 )

Locally symmetric difference in sql

I have a problem similar to this StackOverflow question, except that I need to exclude certain fields from the comparison but still include it in the result set.
I'm penning the problem as locally symmetric difference.
For example Table A and B have columns X,Y,Z and I want to compare only Y,Z for differences but I still want the result set to include X.
Sounds like this is basically what you want. Match rows between two tables on columns Y and Z, find the unmatched rows, and output the values of columns X, Y, and Z.
SELECT a.x, a.y, a.z, b.x, b.y, b.z
FROM a FULL OUTER JOIN b ON a.y = b.y AND a.z = b.z
WHERE a.y IS NULL OR b.y IS NULL
Old style SQL for a full join - A concatenated with B, excluding rows in B also in A (the middle):
-- all rows in A with or without matching B
select a.x, a.y, a.z
from a
left join b
on a.x = b.x
and a.y = b.y
union all
-- all rows in B with no match in A to "exclude the middle"
select b.x, b.y, null as z
from b
where not exists (select null
from a
where b.x = a.x
and b.y = a.y)
ANSI Style:
select coalesce(a.x, b.x) as x,
coalesce(a.y, b.y) as y,
a.z
from a
full outer join b
on a.x = b.x
and a.y = b.y
The coalesce's are there for safety; I've never actually had cause to write a full outer join in the real world.
If what you really want to find out if two table are identical, here's how:
SELECT COUNT(*)
FROM (SELECT list_of_columns
FROM one_of_the_tables
MINUS
SELECT list_of_columns
FROM the_other_table
UNION ALL
SELECT list_of_columns
FROM the_other_table
MINUS
SELECT list_of_columns
FROM one_of_the_tables)
If that returns a non-zero result, then there is a difference. It doesn't tell you which table it's in, but it's a start.