SQL joined subquery problem / performance - sql

Is there a way to write a query to obtain the same result set as the following "imaginary" query?
CREATE OR REPLACE VIEW v_report AS
SELECT
meta.refnum AS refnum,
codes.svc_codes AS svc_codes
FROM
t_bill AS meta
JOIN (SELECT
string_agg(p.service_code, ':') AS svc_codes
FROM
t_bill_service_services AS p
WHERE
p.refnum = meta.refnum
) AS codes
ON meta.refnum = codes.refnum
This query is imaginary because it won't run with an error message about the meta.refnum in the WHERE clause not being able to be referenced from this part of the query.
Note 1: many columns from a variety of other tables which are also JOINed are omitted in the interest of brevity. This may preclude some simpler solutions which eliminate the subquery.
Note 2: it is possible to make this work (for some definitions of "work") by adding the p.refnum column to the subquery and doing a GROUP BY p.refnum and removing the WHERE altogether, but this of course means the entire t_bill_service_services table gets scanned and sorted -- very very slow for my situation, as the table is reasonably large.
(The SQL flavor is Postgres, but should be irrelevant, as only the string_agg() call should be non-std SQL.)

Rather than JOINing to a derived table, you can place the subquery in the SELECT part of the query. In this section, you can access the values from the parent table in subqueries and so only aggregate the relevant entries in the other table. For example:
select meta.refnum,
(SELECT string_agg(p.service_code, ':')
FROM t_bill_service_services AS p
WHERE p.refnum = meta.refnum
) AS svc_codes
from t_bill meta
Demo on dbfiddle

What you are describing is a lateral join -- and Postgres supports these. You can write the query as:
SELECT meta.refnum AS refnum,
codes.svc_codes AS svc_codes
FROM t_bill meta CROSS JOIN LATERAL
(SELECT string_agg(p.service_code, ':') AS svc_codes
FROM t_bill_service_services p
WHERE p.refnum = meta.refnum
) codes;
In this case, a lateral join is a lot like a correlated subquery (Nick's answer). However, a lateral join is much more powerful, because it allows the subquery to return multiple columns and multiple rows.

Related

SQL subselect statement very slow on certain machines

I've got an sql statement where I get a list of all Ids from a table (Machines).
Then need the latest instance of another row in (Events) where the the id's match so have been doing a subselect.
I need to latest instance of quite a few fields that match the id so have these subselects after one another within this single statement so end up with results similar to this...
This works and the results are spot on, it's just becoming very slow as the Events Table has millions of records. The Machine table would have on average 100 records.
Is there a better solution that subselects? Maybe doing inner joins or a stored procedure?
Help appreciated :)
You can use apply. You don't specify how "latest instance" is defined. Let me assume it is based on the time column:
Select a.id, b.*
from TableA a outer apply
(select top(1) b.Name, b.time, b.weight
from b
where b.id = a.id
order by b.time desc
) b;
Both APPLY and the correlated subquery need an ORDER BY to do what you intend.
APPLY is a lot like a correlated query in the FROM clause -- with two convenient enhances. A lateral join -- technically what APPLY does -- can return multiple rows and multiple columns.

Avoid repeated information when having multiple joins?

I have the following query that uses joins to join multiple tables
select DISTINCT
tblArticles.Article_Title,
tblArticles.Article_img,
tblArticles.Article_Content,
tblArticles.Article_Date_Created,
tblArticles.Article_Sequence,
tblWriters.Writer_Name,
tblTypes.Article_Type_Name,
tblimages.image_path as "Extra images"
from tblArticles inner join tblWriters
on tblArticles.Writer_ID_Fkey = tblWriters.Writer_ID inner join
tblArticleType on tblArticles.Article_ID = tblArticleType.Article_ID_Fkey inner join
tblTypes on tblArticleType.Article_Type_ID_Fkey = tblTypes.Article_Type_ID left outer join tblExtraImages
on tblArticles.Article_ID = tblExtraImages.Article_ID_Fkey left outer join tblimages
on tblExtraImages.image_id_fkey = tblimages.image_id
order by tblArticles.Article_Sequence, tblArticles.Article_Date_Created;
And I get the following results:
If an article has more than one type_name then I will get repeated columns for the rest of the records. Is there another way of joining these tables that would prevent that from happening?
The simplest method is to just remove column Article_Type_Name from the select clause. This allows SELECT DISTINCT to identify the rows as duplicates, and eliminate them.
Another option is to use an aggregation function on the column. In recent SQL Server versions, STRING_AGG() comes handy (you can also use MIN() or MAX()):
select
tblArticles.Article_Title,
tblArticles.Article_img,
tblArticles.Article_Content,
tblArticles.Article_Date_Created,
tblArticles.Article_Sequence,
tblWriters.Writer_Name,
string_agg(tblTypes.Article_Type_Name, ',')
within group(order by tblTypes.Article_Type_Name) Article_Type_Name_List,
tblimages.image_path as Extra_Images
from ..
group by
tblArticles.Article_Title,
tblArticles.Article_img,
tblArticles.Article_Content,
tblArticles.Article_Date_Created,
tblArticles.Article_Sequence,
tblWriters.Writer_Name,
tblimages.image_path
What you're seeing here is a Cartesian product; you've joined Tables in such a way that multiple rows from one side match with rows from the other
If you don't care about the article_type, then group the other columns and take the max(article_type), or omit it in a subquery that selects distinct records, not including the article type column, from the table that contains article type). If your SQLS is recent enough and you want to know all the article types you could STRING_AGG them into a csv list
Ultimately what you choose to do depends on what you want them for; filter the rows out, or group them down

SQL Correlated subquery

I am trying to execute this query but am getting ORA-00904:"QM"."MDL_MDL_ID":invalid identifier. What is more confusing to me is the main query has two sub queries which only differ in the where clause. However, the first query is running fine but getting error for the second one. Below is the query.
select (
select make_description
from make_colours#dblink1
where makc_id = (
select makc_makc_id
from model_colours#dblink1
where to_char(mdc_id) = md.allocate_vehicle_colour_id
)
) as colour,
(
select make_description
from make_colours#dblink1
where makc_id = (
select makc_makc_id
from model_colours#dblink1
where mdl_mdl_id = qm.mdl_mdl_id
)
) as vehicle_colour
from schema1.web_order wo,
schema1.tot_order tot,
suppliers#dblink1 sp,
external_accounts#dblink1 ea,
schema1.location_contact_detail lcd,
quotation_models#dblink1 qm,
schema1.manage_delivery md
where wo.reference_id = tot.reference_id
and sp.ea_c_id = ea.c_id
and sp.ea_account_type = ea.account_type
and sp.ea_account_code = ea.account_code
and lcd.delivery_det_id = tot.delivery_detail_id
and sp.sup_id = tot.dealer_id
and wo.qmd_id = qm.qmd_id
and wo.reference_id = md.web_reference_id(+)
and supplier_category = 'dealer'
and wo.order_type = 'tot'
and trunc(wo.confirmdeliverydate - 3) = trunc(sysdate)
Oracle usually doesn't recognise table aliases (or anything else) more than one level down in a nested subquery; from the documentation:
Oracle performs a correlated subquery when a nested subquery references a column from a table referred to a parent statement one level above the subquery. [...] A correlated subquery conceptually is evaluated once for each row processed by the parent statement.
Note the 'one level' part. So your qm alias isn't being recognised where it is, in the nested subquery, as it is two levels away from the definition of the qm alias. (The same thing would happen with the original table name if you hadn't aliased it - it isn't specifically to do with aliases).
When you modified your query to just have select qm.mdl_mdl_id as Vehicle_colour - or a valid version of that, maybe (select qm.mdl_mdl_id from dual) as Vehicle_colour - you removed the nesting, and the qm was now only one level down from it's definition in the main body of the query, so it was recognised.
Your reference to md in the first nested subquery probably won't be recognised either, but the parser tends to sort of work backwards, so it's seeing the qm problem first; although it's possible a query rewrite would make it valid:
However, the optimizer may choose to rewrite the query as a join or use some other technique to formulate a query that is semantically equivalent.
You could also add hints to encourage that but it's better not to rely on that.
But you don't need nested subqueries, you can join inside each top level subquery:
select (
select mc2.make_description
from model_colours#dblink1 mc1,
make_colours#dblink1 mc2
where mc2.makc_id = mc1.makc_makc_id
and to_char(mc1.mdc_id) = md.allocate_vehicle_colour_id
) as colour,
(
select mc2.make_description
from model_colours#dblink1 mc1,
make_colours#dblink1 mc2
where mc2.makc_id = mc1.makc_makc_id
and mc1.mdl_mdl_id = qm.mdl_mdl_id
) as vehicle_colour
from schema1.web_order wo,
...
I've stuck with old-style join syntax to match the main query, but you should really consider rewriting the whole thing with modern ANSI join syntax. (I've also removed the rogue comma #Serg mentioned, but you may just have left out other columns in your real select list when posting the question.)
You could probably avoid subqueries altogether by joining to the make and model colour tables in the main query, either twice to handle the separate filter conditions, or once with a bit of logic in the column expressions. Once step at a time though...

How does Subquery in select statement work in oracle

I have looked all over for an explanation, to how does the subquery in a select statement work and still I cannot grasp the concept because of very vague explanations.
I would like to know how do you use a subquery in a select statement in oracle and what exactly does it output.
For example, if i had a query that wanted to display the names of employees and the number of profiles they manage from these tables
Employee(EmpName, EmpId)
Profile(ProfileId, ..., EmpId)
how do I use the subquery?
I was thinking a subquery is needed in the select statement to implement the group by function to count the number of profiles being managed for each employee, but I am not too sure.
It's simple-
SELECT empname,
empid,
(SELECT COUNT (profileid)
FROM profile
WHERE profile.empid = employee.empid)
AS number_of_profiles
FROM employee;
It is even simpler when you use a table join like this:
SELECT e.empname, e.empid, COUNT (p.profileid) AS number_of_profiles
FROM employee e LEFT JOIN profile p ON e.empid = p.empid
GROUP BY e.empname, e.empid;
Explanation for the subquery:
Essentially, a subquery in a select gets a scalar value and passes it to the main query. A subquery in select is not allowed to pass more than one row and more than one column, which is a restriction. Here, we are passing a count to the main query, which, as we know, would always be only a number- a scalar value. If a value is not found, the subquery returns null to the main query. Moreover, a subquery can access columns from the from clause of the main query, as shown in my query where employee.empid is passed from the outer query to the inner query.
Edit:
When you use a subquery in a select clause, Oracle essentially treats it as a left join (you can see this in the explain plan for your query), with the cardinality of the rows being just one on the right for every row in the left.
Explanation for the left join
A left join is very handy, especially when you want to replace the select subquery due to its restrictions. There are no restrictions here on the number of rows of the tables in either side of the LEFT JOIN keyword.
For more information read Oracle Docs on subqueries and left join or left outer join.
In the Oracle RDBMS, it is possible to use a multi-row subquery in the select clause as long as the (sub-)output is encapsulated as a collection. In particular, a multi-row select clause subquery can output each of its rows as an xmlelement that is encapsulated in an xmlforest.

I would like a simple example of a sub-query using T-SQL 2008

Can anyone give me a good example of a subquery using TSQL 2008?
Maximilian Mayer believes that, due to referencing MS documentation, my assertion that there is a difference between a subquery and a subSelect is incorrect. Frankly, I'd consider MSDN's "Subquery Fundamentals" a better choice. Quote:
You are making distinctions between terms that actually mean the same.
O RLY?
A subQUERY...
IE:
WHERE id IN (SELECT n.id FROM TABLE n)
OR id = (SELECT MAX(m.id) FROM TABLE m)
OR EXISTS(SELECT 1/0 FROM TABLE) --won't return a math error for division by zero
...affects the WHERE or HAVING clauses -- the filteration of data -- for a SELECT, INSERT, UPDATE or DELETE statement. The value from a subquery is never directly visible in the SELECT clause.
A subSELECT...
IE:
SELECT t.column,
(SELECT x.col FROM TABLE x) AS col2
FROM TABLE t
...does not affect the filteration of data in the main query, and the value is exposed directly in the SELECT clause. But it's only one value - you can't return two or more columns into a single column in the outer query.
A subselect is a consistent means of performing a LEFT JOIN in ANSI-89 join syntax - if there is no supporting row, the column will be null. Additionally, a non-correlated subselect will return the same value for every row of the main query.
Correlation
If a subquery or subselect is correlated, that query runs once for every record of the main query returned -- which doesn't scale well as the number of rows in the result set increases.
Derived Table/Inline View
IE:
SELECT x.*,
y.max_date,
y.num
FROM TABLE x
JOIN (SELECT t.id,
t.num,
MAX(t.date) AS max_date
FROM TABLE t
GROUP BY t.id, t.num) y ON y.id = x.id
...is a JOIN to a derived table (AKA inline view).
"Inline view" is a better term, because that is all that happens when you reference a non-materialized view -- a view is just a prepared SQL statement. There's no performance or efficiency difference if you create a view with a query like the one in the example, and reference the view name in place of the SELECT statement within the brackets of the JOIN. The example has the same information as a correlated subquery, but the performance benefit of using a join and none of the subquery detriments. And you can return more than one column, because it is a view/derived table.
Conclusion
It should be obvious why I and others make distinctions. The concept of relying on the word "subquery" to categorize any SELECT statement that isn't the main clause is fatality flawed, because it's also a specific case under a categorization of the same word (IE: subquery-subselect, subquery-subquery, subquery-join...). Now think of helping someone who says "I've got a problem with a subquery..."
Maximilian Mayer's idea of "official" documentation was written by technical writers, who often have no experience in the subject and are only summarizing what they've been told to from knowledgeable people who have simplified things. Ultimately, it's just text on a page or screen -- like what you're reading now -- and the decision is up to you if the details I've laid out make sense to you.
For variety's sake, here's one in the where clause:
select
a.firstname,
a.lastname
from
employee a
where
a.companyid in (
select top 10
c.companyid
from
company c
where
c.num_employees > 1000
)
...returns all employees in the top ten companies with over 1000 employees.
SELECT
*,
(SELECT TOP 1 SomeColumn FROM dbo.SomeOtherTable)
FROM
dbo.MyTable
SELECT a.*, b.*
FROM TableA AS a
INNER JOIN
(
SELECT *
FROM TableB
) as b
ON a.id = b.id
Thats a normal subquery, running once for the whole result set.
On the other hand
SELECT a.*, (SELECT b.somecolumn FROM TableB AS b WHERE b.id = a.id)
FROM TableA AS a
is a correlated subquery, running once for every row in the result set.