Can I modify this query using a WITH clause? - sql

I have written the following query in order to achieve the following:
1) Select all regulatory languages that do not have a specified ID.
2) Link those regulatory languages based on a hierarchy field (RL_ID_DEFINED - this field is the ID of the parent regulatory language).
My first variation used NOT IN, but after looking into it I decided that NOT EXISTS would be a more efficient approach. Additionally, I was thinking that adding a WITH clause might make it run a bit faster, since in my current code it is running the nested SELECT statement for each ID in the iteration. Would it be possible to rewrite with using a WITH clause for that nested SELECT?
SELECT
T1.ID
FROM
REGULATORY_LANGUAGES T1
WHERE
T1.INACTIVE_DATE IS NULL
AND NOT EXISTS (
SELECT
NULL
FROM
REGULATORY_LANGUAGES T2,
REVIEW_REGULATIONS T3
WHERE
T3.RVWTYPYR_ID = ?
AND T3.RL_ID = T2.ID
AND T1.ID = T2.ID)
START WITH
RL_ID_DEFINED IS NULL
AND INACTIVE_DATE IS NULL
CONNECT BY
PRIOR ID = RL_ID_DEFINED
The problem I'm running into is that when I look at the structure of a WITH clause, I would be creating it prior to my main SELECT. However, that would require me to have defined my T1 table already. Any thoughts?
(Note - this is being called in a java method, hence the ? in the line T3.RVWTYPYR_ID = ?. When I test this in the database editor via Toad, I just hard code a value for the ?).

While speed is important, so is accuracy. You mentioned that you switched from not in to not exists for efficiency. They do different things. There is another way to speed up the logic of not in. Instead of this:
where someField not in
(select someField
from etc
)
Do this
where someField in
(select someField
from etc
where whatever
minus
select someField
from etc
where whatever
and more filters that identify records to exclude
)
Now for the with keyword. It speeds up performance when you want to run the exact same subquery more than once. So, instead of this:
where field1 in
(sql for subquery)
and field 2 in
(exact same sql as above)
you do this:
with temp as (sql for subquery)
select etc
where field1 in
(select something from temp)
and field 2 in
(select something from temp)
However, that's not your situation. What you probably want to do is to investigate ways to send a list of parameters from java so that your query looks like this:
T3.RVWTYPYR_ID in (?,?,etc)
Then you wouldn't have to repeat the subquery.

Much thanks to Tom H for his insight. I've rewritten the query using JOIN:
SELECT
T1.ID
FROM
REGULATORY_LANGUAGES T1
LEFT JOIN (
SELECT
T2.ID ID
FROM
REGULATORY_LANGUAGES T2
INNER JOIN
REVIEW_REGULATIONS T3
ON
T3.RVWTYPYR_ID = ?
AND T3.RL_ID = T2.ID) T_JOIN
ON T1.ID = T_JOIN.ID
WHERE
T1.INACTIVE_DATE IS NULL
AND T_JOIN.ID IS NULL
START WITH
T1.RL_ID_DEFINED IS NULL
AND T1.INACTIVE_DATE IS NULL
CONNECT BY
PRIOR T1.ID = T1.RL_ID_DEFINED

Related

Why does Oracle SQL update query return "invalid identifier" on existing column?

I have an update query for an Oracle SQL db. Upon execution the query returns ORA-00904: "t1"."sv_id": invalid identifier
So, why do I get an "invalid identifier" error message although the column exists?
Here is the complete query (replaced actual table and column names by dummies in np++)
UPDATE table_1 t1 SET (type) =
CASE
WHEN
((SELECT COUNT(dateCheck.id) FROM table_2 dateCheck
WHERE dateCheck.s_id = t1.s_id AND dateCheck.sv_id = t1.sv_id) = 0)
THEN
(SELECT sv.type FROM table_3 sv WHERE sv.id = t1.sv_id)
ELSE
(SELECT type FROM
(SELECT d.type as type FROM table_2 d
WHERE d.s_id = t1.s_id AND d.sv_id = t1.sv_id
ORDER BY d.creationTimestamp ASC)
WHERE ROWNUM = 1)
END
Now I don't understand why that error occurs.
Here is what I already know:
The Queries in the CASE statement work when executed separately, provided they are wrapped into a query that provides table_1 t1 for sure.
t1.s_id seems to work since oracle doesn't complain about that. When i change it to a column that really doesn't exist, oracle starts complaining about that non existent column before returning something about t1.sv_id. So somehow the alias might work, although I'm not sure about it.
I'm 100% sure that the column t1.sv_id exists and no typo was made. Executed a query on t1 directly and doublechecked everything in notepad by marking all occurences.
An (completely unrelated) update query like the following works as well (note the alias t1 is used in the select query). Don't assume table_1/2 to be the same as in the update query above, just reused the names. This should just illustrate that I successfully used an alias in an update query before.
update table_1 t1 set (t2_id) = (select id from table_2 t2 where t1.id = t2.t1_id)
UPDATE
Thx a lot for pointing me to the "you don't have access to alises in deeper suquery layers" issue. That got me on track again pretty fast.
So here is the query I ended up with. This seems to work fine. Eliminates the acces to t1 in the deeper layers and selects the oldest row, so that the same result should be returned from the query I expected from the original query in the ELSE part.
UPDATE table_1 t1 SET (type) =
CASE
WHEN
((SELECT COUNT(dateCheck.id) FROM table_2 dateCheck
WHERE dateCheck.s_id = t1.s_id AND dateCheck.sv_id = t1.sv_id) = 0)
THEN
(SELECT sv.type FROM table_3 sv WHERE sv.id = t1.sv_id)
ELSE
(SELECT d.type as type FROM table_2 d
WHERE d.s_id = t1.s_id
AND d.sv_id = t1.sv_id
AND d.creation = (SELECT MIN(id.creation) FROM table_2 id
WHERE d.s_id = id.s_id AND d.sv_id = id.sv_id))
END
You can't reference a table alias in a subquery of a subquery; the alias doesn't apply (or doesn't exist, or isn't in scope, depending on how you prefer to look at it). With the code you posted the error is reported against line 11 character 24, which is:
(SELECT type FROM
(SELECT d.type as type FROM table_2 d
WHERE d.s_id = t1.s_id AND d.sv_id = t1.sv_id
^^^^^^^^
If you change the t1.s_id reference on the same line to something invalid then the error doesn't change and is still reported as ORA-00904: "T1"."SV_ID": invalid identifier. But if you change the same reference on line 5 instead to something like
((SELECT COUNT(dateCheck.id) FROM table_2 dateCheck
WHERE dateCheck.s_id = t1.s_idXXX AND dateCheck.sv_id = t1.sv_id) = 0)
... then the error changes to ORA-00904: "T1"."S_IDXXX": invalid identifier. This is down to how the statement is being parsed. In your original version the subquery in the WHEN clause is value, and you only break it by changing that identifier. The subquery in the ELSE is also OK. But the nested subquery in the ELSE has the problem, and changing the t1.s_id in that doesn't make any difference because the parser reads that part of the statement backwards (I don't know, or can't remember, why!).
So you have to eliminate the nested subquery. A general approach would be to make the whole CASE an inline view which you can then join using s_id and sv_id, but that's complicated as there may be no matching table_2 record (based on your count); and there may be no s_id value to match against as that isn't being checked in table_3.
It isn't clear if there will always be a table_3 record even then there is a table_2 record, or if they're mutually exclusive. If I've understood what the CASE is doing then I think you can use an outer join between those two tables and compare the combined data with the row you're updating, but because of that ambiguity it needs to be a full outer join. I think.
Here's a stab at using that construct with a MERGE instead of an update.
MERGE INTO table_1 t1
USING (
SELECT t2.s_id,
coalesce(t2.sv_id, t3.id) as sv_id,
coalesce(t2.type, t3.type) as type,
row_number() over (partition by t2.s_id, t2.sv_id
order by t2.creationtimestamp) as rn
FROM table_2 t2
FULL OUTER JOIN table_3 t3
ON t3.id = t2.sv_id
) tmp
ON ((tmp.s_id is null OR tmp.s_id = t1.s_id) AND tmp.sv_id = t1.sv_id AND tmp.rn = 1)
WHEN MATCHED THEN UPDATE SET t1.type = tmp.type;
If there will always be a table_3 record then you could use that as the driver and have a left outer join to table_2 instead, but hard to tell which might be appropriate. So this is really just a starting point.
SQL Fiddle with some made-up data that I believe would have hit both branches of your case. More realistic data would expose the flaws and misunderstandings, and suggest a more robust (or just more correct) approach...
Your query and your analysis seems sound to me. I have no solution but a few things you can try to maybe trigger something that explains this odd behavior:
Quote the column (just in case it happens to be a SQL keyword).
Use table_1.sv_id - this works as long as the whole query contains this table only once.
Make sure that the alias t1 exists only once
Run the query with a query tool like SQuirrel SQL - the tool can examine the exact position where Oracle reports the problem. Maybe it's in a different place of the query than you think
Check () and make sure they are around the parts where they should be.
Swap the order of expressions around =

SQL Query - Select Value from T1 where second value fully met in T2

I can do this in an ugly stored procedure with temp tables and whatnot, but I know an experienced developer could do this SO much more elegantly than what I've come up with. In fact, I'd kind of rather not have to call the sproc at all, but just have one query that gives me what I need.
I'm working with two tables:
T1 BillingDirectivesNeeded
T2 BillingDirectives.
T1 Has two fields relevant to this task -
PKey
WBS1.
There will be many PKeys associated with each WBS1.
T2 has only one field of interest
PKey.
The task I'm trying to address is geting a list of WBS1s from T1 that have ALL of their needed directives in T2 before I enable their import.
We want to import a WBS1 ONLY when all of the PKeys for that WBS1 are found in T2. If not, I'll just leave them grayed out.
I've tried a dozen different ways to get this to happen over the last few hours, and I seem to have a mental block. The pseudo-code would look something like this:
select T1.WBS1 from BillingDirectiveNeeded T1
where [all the T1.PKeys for T1.WBS1 can be found in BillingDirectives T2]
You can try using a Where Exists clause:
Select T1.WBS1
From BillingDirectiveNeeded T1
Where Exists
(
Select 1
From BillingDirectives T2
Where T2.PKey = T1.PKey
)
select DISTINCT T1.WBS1 from BillingDirectiveNeeded T1 where T1.PKey in (SELECT T2.PKey FROM BillingDirectives T2)

ORA-01446 occurs if I try to select random rows using SAMPLE clause in Oracle

Since the number of rows in the table is too large I switched from "ORDER BY dbms_random.value" construction for getting 1000 random rows to SAMPLE clause. It takes less than a second instead of 3 minutes to complete. But on some tables I get this error
ORA-01446: cannot select ROWID from view with DISTINCT, GROUP BY, etc
My query looks like this:
SELECT t1.columnA FROM
(SELECT columnA FROM table1 sample(1) where rownum <= 1000) t1
JOIN table2 t2
ON (t1.columnA = t2.columnA)
WHERE t2.columnB IS NOT NULL
and it works fine on some tables, but fails on others. I gave up googling, could you please advise any workaround in my situation.
As I expected SAMPLE clause works faster than all other solutions (Here you can see some of them)
Because I'm new to Oracle DBs generally and Oracle SQL Developer in particaular I mistakenly called view a "table". After I found that out the solution was clear.
SOLUTION: I had to look at the SQL query that forms a view and replace view name with that query.
For example my table1 was actually a view whose name I replaced with SELECT query that forms that view:
SELECT t1.columnA FROM
(SELECT columnA FROM (select distinct tt1.columnA, tt2.columnC
from table22 tt2, table11 tt1
where tt2.columnC = tt1.columnA) sample(1) where rownum <= 1000) t1
JOIN table2 t2
ON (t1.columnA = t2.columnA)
WHERE t2.columnB IS NOT NULL
After that I could work with tables and apply SAMPLE to them! Thank you everybody, great website! =)
PS: sorry for my English and ugly code facepalm.jpg

PostgreSQL - Correlated Sub-Query Fail?

I have a query like this:
SELECT t1.id,
(SELECT COUNT(t2.id)
FROM t2
WHERE t2.id = t1.id
) as num_things
FROM t1
WHERE num_things = 5;
The goal is to get the id of all the elements that appear 5 times in the other table. However, I get this error:
ERROR: column "num_things" does not exist
SQL state: 42703
I'm probably doing something silly here, as I'm somewhat new to databases. Is there a way to fix this query so I can access num_things? Or, if not, is there any other way of achieving this result?
A few important points about using SQL:
You cannot use column aliases in the WHERE clause, but you can in the HAVING clause. That's the cause of the error you got.
You can do your count better using a JOIN and GROUP BY than by using correlated subqueries. It'll be much faster.
Use the HAVING clause to filter groups.
Here's the way I'd write this query:
SELECT t1.id, COUNT(t2.id) AS num_things
FROM t1 JOIN t2 USING (id)
GROUP BY t1.id
HAVING num_things = 5;
I realize this query can skip the JOIN with t1, as in Charles Bretana's solution. But I assume you might want the query to include some other columns from t1.
Re: the question in the comment:
The difference is that the WHERE clause is evaluated on rows, before GROUP BY reduces groups to a single row per group. The HAVING clause is evaluated after groups are formed. So you can't, for example, change the COUNT() of a group by using HAVING; you can only exclude the group itself.
SELECT t1.id, COUNT(t2.id) as num
FROM t1 JOIN t2 USING (id)
WHERE t2.attribute = <value>
GROUP BY t1.id
HAVING num > 5;
In the above query, WHERE filters for rows matching a condition, and HAVING filters for groups that have at least five count.
The point that causes most people confusion is when they don't have a GROUP BY clause, so it seems like HAVING and WHERE are interchangeable.
WHERE is evaluated before expressions in the select-list. This may not be obvious because SQL syntax puts the select-list first. So you can save a lot of expensive computation by using WHERE to restrict rows.
SELECT <expensive expressions>
FROM t1
HAVING primaryKey = 1234;
If you use a query like the above, the expressions in the select-list are computed for every row, only to discard most of the results because of the HAVING condition. However, the query below computes the expression only for the single row matching the WHERE condition.
SELECT <expensive expressions>
FROM t1
WHERE primaryKey = 1234;
So to recap, queries are run by the database engine according to series of steps:
Generate set of rows from table(s), including any rows produced by JOIN.
Evaluate WHERE conditions against the set of rows, filtering out rows that don't match.
Compute expressions in select-list for each in the set of rows.
Apply column aliases (note this is a separate step, which means you can't use aliases in expressions in the select-list).
Condense groups to a single row per group, according to GROUP BY clause.
Evaluate HAVING conditions against groups, filtering out groups that don't match.
Sort result, according to ORDER BY clause.
All the other suggestions would work, but to answer your basic question it would be sufficient to write
SELECT id From T2
Group By Id
Having Count(*) = 5
I'd like to mention that in PostgreSQL there is no way to use aliased column in having clause.
i.e.
SELECT usr_id AS my_id FROM user HAVING my_id = 1
Wont work.
Another example that is not going to work:
SELECT su.usr_id AS my_id, COUNT(*) AS val FROM sys_user AS su GROUP BY su.usr_id HAVING val >= 1
There will be the same error: val column is not known.
Im highliting this because Bill Karwin wrote something not really true for Postgres:
"You cannot use column aliases in the WHERE clause, but you can in the HAVING clause. That's the cause of the error you got."
I think you could just rewrite your query like so:
SELECT t1.id
FROM t1
WHERE (SELECT COUNT(t2.id)
FROM t2
WHERE t2.id = t1.id
) = 5;
try this
SELECT t1.id,
(SELECT COUNT(t2.id) as myCount
FROM t2
WHERE t2.id = t1.id and myCount=5
) as num_things
FROM t1

Best way to perform dynamic subquery in MS Reporting Services?

I'm new to SQL Server Reporting Services, and was wondering the best way to do the following:
Query to get a list of popular IDs
Subquery on each item to get properties from another table
Ideally, the final report columns would look like this:
[ID] [property1] [property2] [SELECT COUNT(*)
FROM AnotherTable
WHERE ForeignID=ID]
There may be ways to construct a giant SQL query to do this all in one go, but I'd prefer to compartmentalize it. Is the recommended approach to write a VB function to perform the subquery for each row? Thanks for any help.
I would recommend using a SubReport. You would place the SubReport in a table cell.
Depending on how you want the output to look, a subreport could do, or you could group on ID, property1, property2 and show the items from your other table as detail items (assuming you want to show more than just count).
Something like
select t1.ID, t1.property1, t1.property2, t2.somecol, t2.someothercol
from table t1 left join anothertable t2 on t1.ID = t2.ID
#Carlton Jenke I think you will find an outer join a better performer than the correlated subquery in the example you gave. Remember that the subquery needs to be run for each row.
Simplest method is this:
select *,
(select count(*) from tbl2 t2 where t2.tbl1ID = t1.tbl1ID) as cnt
from tbl1 t1
here is a workable version (using table variables):
declare #tbl1 table
(
tbl1ID int,
prop1 varchar(1),
prop2 varchar(2)
)
declare #tbl2 table
(
tbl2ID int,
tbl1ID int
)
select *,
(select count(*) from #tbl2 t2 where t2.tbl1ID = t1.tbl1ID) as cnt
from #tbl1 t1
Obviously this is just a raw example - standard rules apply like don't select *, etc ...
UPDATE from Aug 21 '08 at 21:27:
#AlexCuse - Yes, totally agree on the performance.
I started to write it with the outer join, but then saw in his sample output the count and thought that was what he wanted, and the count would not return correctly if the tables are outer joined. Not to mention that joins can cause your records to be multiplied (1 entry from tbl1 that matches 2 entries in tbl2 = 2 returns) which can be unintended.
So I guess it really boils down to the specifics on what your query needs to return.
UPDATE from Aug 21 '08 at 22:07:
To answer the other parts of your question - is a VB function the way to go? No. Absolutely not. Not for something this simple.
Functions are very bad on performance, each row in the return set executes the function.
If you want to "compartmentalize" the different parts of the query you have to approach it more like a stored procedure. Build a temp table, do part of the query and insert the results into the table, then do any further queries you need and update the original temp table (or insert into more temp tables).