Oracle slow performance using subquery inside IN clause - sql

I have query like this:
SELECT
xmlelement("objects",
xmlagg(
xmlelement("object",
xmlelement("accountId", ACCOUNTS.accountId),
xmlelement("address", ACCOUNTS.ADDRESS)
)
)
INTO obj_info_xml
FROM
ACCOUNTS
WHERE account_code IN (SELECT EXTRACTVALUE(VALUE(accountCodes), '/accountCode/text()') as accountCode
FROM TABLE(XMLSEQUENCE(EXTRACT(X, '//accountCodes/accountCode'))) accountCodes);
When I hardcode values inside IN clause then query executes fast, but when I use subquery to select from xml then I can't get results because it executes very slow. Do you have any suggestions?

Assuming that x is a PL/SQL variable containing the list of your account codes, something like this ought to speed things up a little bit:
select xmlelement("objects",
xmlagg(xmlelement("object",
xmlelement("accountId", accounts.accountid),
xmlelement("address", accounts.address))
)
) account_xml
into obj_info_xml
from accounts a
inner join xmltable('//accountCodes/accountCode'
passing x
columns account_code varchar2(30) path '.') xdata -- amend datatype as appropriate
on a.account_code = xdata.account_code;
N.B. untested, due to lack of sample data.
Ok, what does the following give you?
select xmlelement("objects",
xmlagg(xmlelement("object",
xmlelement("accountId", accounts.accountid),
xmlelement("address", accounts.address))
)
) account_xml
into obj_info_xml
from accounts a
where a.account_code in (select /*+ dynamic_sampling(xdata 10) */
account_code
from xmltable('//accountCodes/accountCode'
passing x
columns account_code varchar2(30) path '.') xdata); -- amend datatype as appropriate
Another suggestion would be to replace /*+ dynamic_sampling(xdata 10) */ with /* cardinality(xdata <roughly expected number of rows>) */ (overwriting with the relevant number, of course).
Also, can you edit your question to provide the execution plans for the query with and without the hardcoded variables, please?

Related

Conversion failed when converting the varchar value 'Collect_Date' to data type int

I am struggling with trying to apply a date filter to my query. I keep getting this error message
Conversion failed when converting the varchar value 'Collect_Date' to
data type int
Here is my code:
SELECT
Location_ID,
CONVERT(Date,CONVERT(varchar(10),Collect_Month_Key,101)) as 'Collect_Date',
Calc_Gross_Totals, Loc_Country,
CONVERT(varchar(8),Collect_Month_Key)+'-'+Location_ID as 'Unique Key'
FROM
FT_GPM_NPM_CYCLES,
LU_Location,
LU_Loc_Country
WHERE
LU_Location.LU_Loc_Country_Key=LU_Loc_Country.LU_Loc_Country_Key
AND FT_GPM_NPM_CYCLES.Lu_Loc_Key= LU_Location.LU_Loc_Key
AND Collect_Month_Key<>-1
AND 'Collect_Date'>=2016-1-1
ORDER BY
Location_ID,
Collect_Date;
If someone could help that would be appreciated. I am also getting a different error when I try to do the Month(Collect_Date). So if anyone knows why on that I would appreciate it. I have attched a picture with the code nd results I am getting.
I see whats going on, you are trying to use the alias in the select statement. You can't do that, There are a few other issues that have been covered in the comments, but here is the immediate answer to the question:
Select Location_ID
, Convert(Date,CONVERT(varchar(10),Collect_Month_Key,101)) as Collect_Date
, Calc_Gross_Totals
, Loc_Country
, CONVERT(varchar(8),Collect_Month_Key)+'-'+Location_ID as [Unique Key]
From FT_GPM_NPM_CYCLES
, LU_Location
, LU_Loc_Country
Where LU_Location.LU_Loc_Country_Key=LU_Loc_Country.LU_Loc_Country_Key
and FT_GPM_NPM_CYCLES.Lu_Loc_Key= LU_Location.LU_Loc_Key
and Collect_Month_Key <> -1
and Convert(Date,CONVERT(varchar(10),Collect_Month_Key,101)) >= '2016-1-1'
Order By Location_ID, Collect_Date;
Here is an updated query that brings following modifications:
As commented by Robert Sheahan, you cannot use a resultset column alias in the WHERE clause
As commented by Larnu, since you are storing dates as strings, you could simply do string comparaison to filter records (and return string values). With this technique, you do not need additional condition Collect_Month_Key <> -1, since string '-1' is not greater than string '20160101'.
use explicit joins instead of implicit joins (comment by Gordon Linoff)
I added table aliases : they make the query easier to read (and make it possible to self-join a table...)
I would also recommend to to prefix all columns being used in the query with their table alias. This clearly indicates from which table each column comes from, and makes the query easier to understand and maintain. NB: if Collect_Month_Key belongs to a table other than FT_GPM_NPM_CYCLES, you want to move the condition from the WHERE clause to the ON clause of the relevant JOIN)
Query:
SELECT
Location_ID,
Collect_Month_Key AS Collect_Date,
Calc_Gross_Totals,
Loc_Country,
CONVERT(varchar(8),Collect_Month_Key) + '-' + Location_ID AS Unique_Key
FROM
FT_GPM_NPM_CYCLES AS cyc
INNER JOIN LU_Location AS loc
ON cyc.Lu_Loc_Key = loc.LU_Loc_Key
INNER JOIN LU_Loc_Country AS cty
ON loc.LU_Loc_Country_Key = cty.LU_Loc_Country_Key
WHERE
Collect_Month_Key > '20160101'
ORDER BY
Location_ID,
Collect_Month_Key
To answer your comment "So if I don't put the collect_Date in the WHERE, where should I put it for something like this in the future?", I suggest Common Table Expressions. Functionally they are equivalent to defining a derived table in the FROM clause, but they move it "above" so it feels more like "before" and I think they make it much easier to read. To convert GMB's excellent solution to using a CTE:
--Leading ; because CTEs require prvious command terminated explicitly
;WITH cteWithDates as ( --cteDates becomes a virtual temporary table
SELECT
cyc.* --Keep all the original columns of FT_GPM_NPM_CYCLES
, Collect_Month_Key AS Collect_Date --and add Collect_Date and Unique_Key
, CONVERT(varchar(8),Collect_Month_Key) + '-' + Location_ID AS Unique_Key
FROM FT_GPM_NPM_CYCLES AS cyc
) --you could add more CTEs with the following format,
--all become available at the end
--, cteMore as (SELECT ... FROM ...)
--the first line after the closing ) has access to all CTEs, but ONLY that line
SELECT Location_ID,
Collect_Date,
Calc_Gross_Totals,
Loc_Country,
Unique_Key
FROM
cteWithDates AS cyc --Use the CTE as you would your original table,
--but the added fields are now available EVERYWHERE in your query!
INNER JOIN LU_Location AS loc
ON cyc.Lu_Loc_Key = loc.LU_Loc_Key
INNER JOIN LU_Loc_Country AS cty
ON loc.LU_Loc_Country_Key = cty.LU_Loc_Country_Key
WHERE
Collect_Date > '20160101' --NOW you can use CollectDate!
ORDER BY
Location_ID,
Collect_Date --And here too
Note that this is much more efficient than defining an actual temporary table with #TableName, because the query optimizer can drop unused records from the CTE but it has to put them all into the #temporary table, a huge performance difference if your table is large and the matching subset small.

How to do a Select in another Select with Postgresql

I must do this query in another Select in Postgresql :
SELECT COUNT(tn.autoship_box_transaction_id)
FROM memberships.autoship_box_transaction tn
WHERE tn.autoship_box_id = b.autoship_box_id
Do I must use the clause WITH ?
As long as the query produces a single data element, you can use it in place of an attribute:
SELECT (
SELECT COUNT(tn.autoship_box_transaction_id)
FROM memberships.autoship_box_transaction tn
WHERE tn.autoship_box_id = b.autoship_box_id
) AS cnt
, other_column
FROM wherever
;
Have a look at this SQL fiddle demonstrating the use case.
This method often comes with a performance penalty if the db engine actually iterates over the result set and performs the query on each record encountered.
The db engine's optimizer may be smart enough to avoid the extra cost (and it should in the fiddle's toy example), but you have to look at the explain plan to be sure.
Note that its mostly an issue with 'correlated subqueries', ie. queries embedded as shown which depend on the embedding. Your example example appears to be of this kind as you use a table alias b which isn't defined anywhere.
There might be the option of moving the subselect to the from clause (beware: This statement is for explanatory purposes only; you must adapt it to your use case, I am just wild guessing here):
SELECT stats.cnt
, b.other_column
FROM b_table b
JOIN (
SELECT COUNT(tn.autoship_box_transaction_id) cnt
, tn.autoship_box_id
FROM memberships.autoship_box_transaction tn
GROUP BY tn.autoship_box_id
) stats
ON (stats.autoship_box_id = b.autoship_box_id)
;
There are two options. You can either use the with clause, like so:
WITH some_count AS (
SELECT COUNT(tn.autoship_box_transaction_id)
FROM memberships.autoship_box_transaction tn
WHERE tn.autoship_box_id = b.autoship_box_id
)
SELECT * FROM some_count;
Or the second option is to use a sub-query, like so:
SELECT
*
FROM
(
SELECT COUNT(tn.autoship_box_transaction_id)
FROM memberships.autoship_box_transaction tn
WHERE tn.autoship_box_id = b.autoship_box_id
);

How to use Join with like operator and then casting columns

I have 2 tables with these columns:
CREATE TABLE #temp
(
Phone_number varchar(100) -- example data: "2022033456"
)
CREATE TABLE orders
(
Addons ntext -- example data: "Enter phone:2022033456<br>Thephoneisvalid"
)
I have to join these two tables using 'LIKE' as the phone numbers are not in same format. Little background I am joining the #temp table on the phone number with orders table on its Addons value. Then again in WHERE condition I am trying to match them and get some results. Here is my code. But my results that I am getting are not accurate. As its not returning any data. I don't know what I am doing wrong. I am using SQL Server.
select
*
from
order_no as n
join
orders as o on n.order_no = o.order_no
join
#temp as t on t.phone_number like '%'+ cast(o.Addons as varchar(max))+'%'
where
t.phone_number = '%' + cast(o.Addons as varchar(max)) + '%'
You can not use LIKE statement in the JOIN condition. Please provide more information on your tables. You have to convert the format of one of the phone field to compile with other phone field format in order to join.
I think your join condition is in the wrong order. Because your question explicitly mentions two tables, let's stick with those:
select *
from orders o JOIN
#temp t
on cast(o.Addons as varchar(max)) like '%' + t.phone_number + '%';
It has been so long since I dealt with the text data type (in SQL Server), that I don't remember if the cast() is necessary or not.
Instead of trying to do everything in a single top-level query, you should apply a transformation projection to your orders table and use that as a subquery, which will make the query easier to understand.
Using the CHARINDEX function will make this a lot easier, however it does not support ntext, you will need to change your schema to use nvarchar(max) instead - which you should be doing anyway as ntext is deprecated, fortunately you can use CONVERT( nvarchar(max), someNTextValue ), though this will reduce performance as you won't be able to use any indexes on your ntext values - but this query will run slowly anyway.
SELECT
orders2.*,
CASE WHEN orders2.PhoneStart > 0 AND orders2.PhoneEnd > 0 THEN
SUBSTRING( orders2.Addons, orders2.PhoneStart, orders2.PhoneEnd - orders2.PhoneStart )
ELSE
NULL
END AS ExtractedPhoneNumber
FROM
(
SELECT
orders.*, -- never use `*` in production, so replace this with the actual columns in your orders table
CHARINDEX('Enter phone:', Addons) AS PhoneStart,
CHARINDEX('<br>Thephoneisvalid', AddOns, CHARINDEX('Enter phone:', Addons) ) AS PhoneEnd
FROM
orders
) AS orders2
I suggest converting the above into a VIEW or CTE so you can directly query it in your JOIN expression:
CREATE VIEW ordersWithPhoneNumbers AS
-- copy and paste the above query here, then execute the batch to create the view, you only need to do this once.
Then you can use it like so:
SELECT
* -- again, avoid the use of the star selector in production use
FROM
ordersWithPhoneNumbers AS o2 -- this is the above query as a VIEW
INNER JOIN order_no ON o2.order_no = order_no.order_no
INNER JOIN #temp AS t ON o2.ExtractedPhoneNumber = t.phone_number
Actually, I take back my previous remark about performance - if you add an index to the ExtractedPhoneNumber column of the ordersWithPhoneNumbers view then you'll get good performance.

Static in clause sql

I'm trying to put more than one static in clause in the SQL (oracle) and its not working, any one has an idea or a work around it. Below is what I'm trying to do
select *
from Table
where ('1', '2') in ('1', '2', '3')
I know it can be done using OR clause but I don't want to use it as there are too many arguments.
I'm guess your query is intended to flexible handle a bunch of input options and dynamically adjust the output in some way. Perhaps this is an approach that will work. Usually these BI systems are going to still want a syntactically valid query so unfortunately you can't build the whole query dynamically.
with list(val) as (
select val from master_list_of_values where val in (? /* BI_Parameter */)
)
select * from Table
where
A in (select id from list)
or B in (select id from list)
UPDATE: Based on your edit, this will work although I don't know if it will always generate great plans:
with security_test(passed) as (
select count(*) as passed
from security_groups
where group_id in (? /* BI_Parameter */) and group_id in (/* hard-coded list */)
)
select * from Table
where (select passed from security_test > 0)
You could use also hard-code the list in a values expression instead of using a "security_groups" table.
with security_test(passed) as (
select count(*) as passed
from (values (1), (2)) security_groups(group_id)
where group_id in (? /* BI_Parameter */)
)
select * from Table
where (select passed from security_test > 0)
Here's another thought...
with security_test(passed) as (
select count(*) as passed
from (values (1), (2)) security_groups(group_id)
where group_id in (? /* BI_Parameter */)
having count(*) > 0 /* will this collapse to zero rows? */
)
select * from Table, security_test
It can't be done how you want it done, because of ambiguity. For instance, would that syntax mean "1 and 2 are in 1,2,3" or would it mean "1 is in 1,2,3 or 2 is in 1,2,3".
I think it would really help to see what is generating this static in clause and what the actual end goal is... Then I'll edit my answer to help more.
Edit
I think there is just a simple logical flaw in your application design. Would it not also work to remove the (1,2,3) clause altogether, and then rather than consider your '1,2' as static, just reference the column. For example, this achieves what you want with a little tweaking of your (incorrectly developed) BI report:
select *
from Table
where myCharColumn in ('1', '2')

Solution to avoid non-sargable argument in where clause

In the code_list CTE in this query I have a row constructor that will eventually take any number of arguments. The column icd in the patient_codes CTE is a five digit identifier that is most descriptive that the three digit codes that the row constructor has. The table icd_patient has a 100 million rows so for performance's sake, I would like to filer the rows on this table before I do any further work. I have
;with code_list(code_list)
as
(
select x.code_list
from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
select distinct icd,pat_id,id
from icd_patient
where icd in (select icd from code_list)
)
select distinct pat_id from patient_codes
The problem is, however, is that in the icd_patient table all of the icd columns are five digit and more descriptive. If I look at the execution plan of this query it's pretty streamlined. If I do
;with code_list(code_list)
as
(
select x.code_list
from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
select substring(icd,1,3) as icd,pat_id
from icd_patient2
where substring(icd,1,3) in (select * from code_list)
)
select * from patient_codes
this if course has a large performance impact because of the substring expression in the where clause. Does something akin to like in exist so I can take advantage of my indexes?
Index on icd_patient
CREATE NONCLUSTERED INDEX [ix_icd_patient] ON [dbo].[icd_patient2]
(
[pat_id] ASC
)
INCLUDE ( [id],
This much simpler query should be better than (or, at worst, the same as) your existing query.
select pat_id
FROM dbo.icd_patient
where icd LIKE '707%'
OR icd LIKE '250%'
GROUP BY pat_id;
Note that sargability only matters if there is actually an index on this column.
An alternative (since OR can sometimes give the optimizer fits):
SELECT pat_id FROM
(
SELECT pat_id
FROM dbo.icd_patient
WHERE icd LIKE '707%'
UNION ALL
SELECT pat_id
FROM dbo.icd_patient
WHERE icd LIKE '250%'
) AS x
GROUP BY pat_id;
To make this extensible beyond a handful of OR conditions, I would use a table-valued parameter (TVP).
CREATE TYPE dbo.StringPatterns AS TABLE(s VARCHAR(3) PRIMARY KEY);
Then your stored procedure could say:
CREATE PROCEDURE dbo.whatever
#sp dbo.StringPatterns READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT p.pat_id
FROM dbo.icd_patient AS p
INNER JOIN #sp AS sp
ON p.pat_id LIKE sp.s + '%'
GROUP BY p.pat_id;
END
Then you can pass in your set of three-character substrings from a DataTable or other collection in C#. From T-SQL just as an example:
DECLARE #p dbo.StringPatterns;
INSERT #p VALUES('707'),('250');
EXEC dbo.whatever #sp = #p;
Something like like in does not exist. The following is sargable:
select *
from icd_patient
where icd like '70700%' or
icd like '25002%'
Because like with a constant initial substring is a special case for SQL Server. This does not work when the strings on the right are variables.
One solution is to create an indexed view on the icd_patient table with an index on the first five characters of the icd code.
Using "IN" makes that part of a command non-sargable on both sides. End of discussion.
Saying he fixes it using substring, completely changes what it would return while it remains non sarged.
Any "fix" should exactly match results. The actual fix is to join the cte so the five characters match or put three characters in the cte and match that in a join or put 4 characters in the cte where the fourth is "%" and join matching by using LIKE
Using a "like" that starts with "%" increases the complexity of the search, but it would still use the index to find the value because parsing the index should use less reading by only getting the full table row when a search is successful.