Are these two SQL statements equivalent? - sql

They return the same count of rows, but I'm not sure if one is an accident waiting to happen, or one is simply "the preferred method":
SELECT duckbill.id, duckbill.pack_size, duckbill.description, duckbill.platypus_id, duckbill.department, duckbill.subdepartment, duckbill.unit_cost, duckbill.unit_list, duckbill.open_qty, duckbill.UPC_code, duckbill.UPC_pack_size, duckbill.crv_id, duckbill_platypuss.platypus_item
FROM duckbill
INNER JOIN duckbill_platypuss ON duckbill.platypus_id = duckbill_platypuss.platypus_id
SELECT duckbill.id, duckbill.pack_size, duckbill.description, duckbill.platypus_id, duckbill.department, duckbill.subdepartment, duckbill.unit_cost, duckbill.unit_list, duckbill.open_qty, duckbill.UPC_code, duckbill.UPC_pack_size, duckbill.crv_id, duckbill_platypuss.platypus_item
FROM duckbill, duckbill_platypuss
WHERE (duckbill.platypus_id = duckbill_platypuss.platypus_id)

So the reason there's a difference is due to the order of execution. You may not notice the difference on some table combinations, but on very large tables, you generally want to filter with the JOIN first.
This post gives some good information toward that end it looks like: Order Of Execution of the SQL query

Related

ORA-01841 happens on one environment but not all

I have the following SQL-code in my (SAP IdM) Application:
Select mcmskeyvalue as MKV,v1.searchvalue as STARTDATE, v2.avalue as Running_Changes_flag
from idmv_entry_simple
inner join idmv_value_basic_active v1 on mskey = mcmskey and attrname = 'Start_of_company_change'
and mcentrytype = 'MX_PERSON' and to_date(v1.searchvalue,'YYYY-MM-DD')<= sysdate+3
left join idmv_value_basic v2 on v2.mskey = mcmskey and v2.attrname = 'Running_Changes_flag'
where mcmskey not in (Select mskey from idmv_value_basic_active where attrname = 'Company_change_running_flag')
I already found the solution for the ORA-01841 problem, as it could either be a solution similar to MSSQLs try_to_date as mentioned here: How to handle to_date exceptions in a SELECT statment to ignore those rows?
or a solution where I change the code to something like this, to work soly on strings:
Select mcmskeyvalue as MKV,v1.searchvalue as STARTDATE, v2.avalue as Running_Changes_flag
from idmv_entry_simple
inner join idmv_value_basic_active v1 on mskey = mcmskey and attrname = 'Start_of_company_change'
and mcentrytype = 'MX_PERSON' and v1.searchvalue<= to_char(sysdate+3,'YYYY-MM-DD')
left join idmv_value_basic v2 on v2.mskey = mcmskey and v2.attrname = 'Running_Changes_flag'
where mcmskey not in (Select mskey from idmv_value_basic_active where attrname = 'Company_change_running_flag')
So for the actually problem I have a solution.
But now I came into discussion with my customers and teammates why the error happens at all.
Basically for all entries of idmv_value_basic_activ that comply to the requirement of "attrname = 'Start_of_company_change'" we can be sure that those are dates. In addition, if we execute the query to check all values that would be delivered, all are in a valid format.
I learned in university that the DB-Engine could decide in which order it will run individual segments of a query. So for me the most logical explanation would be that, on the development environment (where we face the problem), the section " to_date(v1.searchvalue,'YYYY-MM-DD')<= sysdate+3” is executed before the section “attrname = 'Start_of_company_change'”
Whereas on the productive environment, where everything works like a charm, the segments are executed in the order that is descripted by the SQL Statement.
Now my Question is:
First: do I remember that right, since the teacher said that only once and at that time I could not really make sense out of it
And Second: Is this assumption of mine correct or is there another reason for the problem?
Borderinformation:
The Tool uses a kind of shifted data structure which is why there can be quite a few different types in the actual “Searchvalue” column of the idmv_value_basic_activ view. The datatype on the database layer is always a varchar one.
"the DB-Engine could decide in which order it will run individual segments of a query"
This is correct. A SQL query is just a description of the data you want and where it's stored. Oracle will calculate an execution plan to retrieve that data as best it can. That plan will vary based on any number of factors, like the number of actual rows in the table and the presence of indexes, so it will vary from environment to environment.
So it sounds like you have an invalid date somewhere in your table, so to_date raises an exception. You can use validate_conversion to find it.

SQL Query not searching data if record contains zero

I have two tables and with column paperNo and some data regarding that paper. I am trying to search all data based on paper no. from both the tables. I have successfully written the query and it is retrieving the data successfully. but I have noticed that. If my paperNo contains zero(0) then the query is not searching for that data. And for the non zero contains paperNo it is retrieving the same record twice.
I don't understand what is going wrong. tried every thing.
Here is my Query .-
SELECT PaperDate.paperNo,
PaperDate.RAW_PAPER,
PaperDate.EDGE_SEALED,
PaperDate.HYDRO_120,
PaperDate.HYDRO_350,
PaperDate.CATALYST_1ST,
PaperDate.CATALYST_2ND,
PaperDate.SIC_350,
tblThicknessPaperDate.rawThickness,
tblThicknessPaperDate.catThickness,
tblThicknessPaperDate.sicThickness,
tblThicknessPaperDate.rejectedThickness
FROM tblThicknessPaperDate
FULL OUTER JOIN PaperDate ON PaperDate.paperNo =tblThicknessPaperDate.paperNo
WHERE (tblThicknessPaperDate.paperNo = #paperNo)
I would try:
FROM tblThicknessPaperDate
RIGHT JOIN PaperDate ON PaperDate.paperNo =tblThicknessPaperDate.paperNo
WHERE (PaperDate.paperNo = #paperNo)
The two changes are: swapping to a right join so even if a record isn't in tblThicknessPaperDate we will still see the record in PaperDate. The other change is to use PapterDate.paperNo in the where clause. Since tblThicknessPaperDate.paperNo could be null we don't want to use that in the where if we can avoid it.
SELECT PaperDate.paperNo,
PaperDate.RAW_PAPER,
PaperDate.EDGE_SEALED,
PaperDate.HYDRO_120,
PaperDate.HYDRO_350,
PaperDate.CATALYST_1ST,
PaperDate.CATALYST_2ND,
PaperDate.SIC_350,
tblThicknessPaperDate.rawThickness,
tblThicknessPaperDate.catThickness,
tblThicknessPaperDate.sicThickness,
tblThicknessPaperDate.rejectedThickness
FROM tblThicknessPaperDate
FULL OUTER JOIN PaperDate ON PaperDate.paperNo =tblThicknessPaperDate.paperNo
WHERE (tblThicknessPaperDate.paperNo = #papNo | PaperDate.paperNo = #paperNo)

Oracle Sub-select taking a long time

I have a SQL Query that comprise of two level sub-select. This is taking too much time.
The Query goes like:
select * from DALDBO.V_COUNTRY_DERIV_SUMMARY_XREF
where calculation_context_key = 130205268077
and DERIV_POSITION_KEY in
(select ctry_risk_derivs_psn_key
from DALDBO.V_COUNTRY_DERIV_PSN
where calculation_context_key = 130111216755
--and ctry_risk_derivs_psn_key = 76296412
and CREDIT_PRODUCT_TYPE = 'SWP OP'
and CALC_OBLIGOR_COUNTRY_OF_ASSETS in
(select ctry_cd
from DALDBO.V_PSN_COUNTRY
where calculation_context_key = 130134216755
--and ctry_risk_derivs_psn_key = 76296412
)
)
These tables are huge! Is there any optimizations available?
Without knowing anything about your table or view definitions, indexing, etc. I would start by looking at the sub-selects and ensuring that they are performing optimally. I would also want to know how many values are being returned by each sub-select as this can impact performance.
How is calculation_context_key used to retrieve rows from V_COUNTRY_DERIV_PSN and V_PSN_COUNTRY? Is it an optimal execution plan?
How is DERIV_POSITION_KEY and CALC_OBLIGOR_COUNTRY_OF_ASSETS used in V_COUNTRY_DERIV_SUMMARY_XREF to retrieve rows? Again, look at the explain plan.
first of all, can you write this query using inner joins (and not subselect) ??
select A.*
from DALDBO.V_COUNTRY_DERIV_SUMMARY_XREF a,
DALDBO.V_COUNTRY_DERIV_PSN b,
DALDBO.V_PSN_COUNTRY c
where calculation_context_key = 130205268077
and a.DERIV_POSITION_KEY = b.ctry_risk_derivs_psn_key
and b.calculation_context_key = 130111216755
--and b.ctry_risk_derivs_psn_key = 76296412
and b.CREDIT_PRODUCT_TYPE = 'SWP OP'
and b.CALC_OBLIGOR_COUNTRY_OF_ASSETS = c.ctry_cd
and c.calculation_context_key = 130134216755
--and c.ctry_risk_derivs_psn_key = 76296412
second, best practice says that when you don't query any data from the tables in the subselect you better of using an EXISTS instead of IN. new versions of oracle does that automatically and actually rewrite the whole thing as an inner join.
last, without any knowledge on you data and of what you are trying to do i would suggest you to try and use views as less as you can - if you can query the underling tables it would be best and you will probably see immediate performance improvement.

relationships produce duplicate records on query, DISTINCT does not work, any other solutions available?

I am new to this portal. I have a very simple problem to be solved. It is related to the ANSI SQL. I am writing a reports using BIRT and I am fetching the data from several tables. I understand how the SQL joins work but maybe not fully. I researched google for hours and I could not find relevant answer.
My problem is that one of the relationships in the code produce a duplicate result (the same row is copied - duplicated). I was so determined to solve it I used every type of join available. Some of this SQL was produced already. I shall post my code below. I know that one of the solutions to my problem is use of the 'DISTINCT' keyword. I have used it and it does not solve my problem.
Can anyone propose any solution to that?
Sample code:
SELECT DISTINCT
partmaster.partdesc,
partmaster.uom,
traders.name AS tradername,
worksorders.id AS worksorderno,
worksorders.partid,
worksorders.quantity,
worksorders.duedate,
worksorders.traderid,
worksorders.orderid,
routingoperations.partid,
routingoperations.methodid,
routingoperations.operationnumber,
routingoperations.workcentreid,
routingoperations.settime,
routingoperations.runtime,
routingoperations.perquantity,
routingoperations.description,
routingoperations.alternativeoperation,
routingoperations.alternativeoperationpreference,
machines.macdesc,
machines.msection,
allpartmaster.partnum,
allpartmaster.nbq,
allpartmaster.partdesc,
routingoperationtools.toolid,
tools.tooldesc,
CAST (emediadetails.data as VARCHAR(MAX)) AS cplandata
FROM worksorders
INNER JOIN partmaster ON worksorders.partid = partmaster.partnum
INNER JOIN traders traders ON worksorders.traderid = traders.id
INNER JOIN routingoperations routingoperations ON worksorders.partid = routingoperations.partid
AND worksorders.routingmethod = routingoperations.methodid
INNER JOIN allpartmaster allpartmaster ON routingoperations.partid = allpartmaster.partnum
LEFT OUTER JOIN machines machines ON routingoperations.workcentreid = machines.macid
LEFT OUTER JOIN routingoperationtools routingoperationtools ON routingoperationtools.partid = routingoperations.partid
AND routingoperationtools.routingmethod = routingoperations.methodid
AND routingoperationtools.operationnumber = routingoperations.operationnumber
LEFT OUTER JOIN tools tools ON tools.toolid = routingoperationtools.toolid
LEFT OUTER JOIN emediadetails ON emediadetails.keyvalue1 = worksorders.id
AND emediadetails.keyvalue2 = routingoperations.operationnumber
AND emediadetails.emediaid = 'worksorderoperation'
I do not have too much of the test data but I know that one row is copied twice as the result of the query below even tho I used DISTINCT keyword. I know that my problem is rather specific and not general but the solution that someone will propose may help others with the similar problem.
I can't solve your problem for you without some test data, but I have some helpful hints.
In principle, you should be really careful with DISTINCT - its a great way of hiding bugs in your query. Only use DISTINCT if you are confident that the underlying data contains legitimate duplicates. If your joins are wrong, and you're getting a cartesian product, you can remove the duplicates from the results with DISTINCT - but that doesn't stop the cartesian product being generated. You'll get very poor performance, and possibly incorrect data.
Secondly, I am pretty sure that DISTINCT works properly - you are almost certainly not getting duplicates, but it may be hard to spot the difference between two rows. Leading or trailing spaces in text columns, for instance could be to blame.
Finally, to work through this problem, I'd recommend building the query up join by join, and seeing where you get the duplicate - that's the join that's to blame.
So, start with:
SELECT
traders.name AS tradername,
worksorders.id AS worksorderno,
worksorders.partid,
worksorders.quantity,
worksorders.duedate,
worksorders.traderid,
worksorders.orderid
FROM worksorders
INNER JOIN traders traders ON
worksorders.traderid = traders.id
and build up to the next join.
Are you sure the results are exact duplicates? Makes sure there isn't one column that actually has a different value.

MySQL is returning an "column ambiguous" error when it used to work before I upgraded

I recently upgraded MySQL to 5.1.41. Before the upgrade the following SQL worked (or at least I thought I remembered it working...it has been a few weeks since designing this...). Now the SQL gives me an error stating that the "archived" column is ambiguous. How can I write this differently, or is there a different problem I'm not aware of?
I simply want to return the "unit_id", "lease_count" (stored in another table with a unit_id that should correspond with the "a.unit_id"), and "lease_archived_count (stored in another table with a unit_id that should correspond with the "a.unit_id").
SELECT a.unit_id,
(SELECT count(*) FROM o_leases WHERE unit_id = a.unit_id AND archived = 0) as lease_count,
(SELECT count(*) FROM o_leases WHERE unit_id = a.unit_id AND archived = 1) as lease_archive_count
FROM p_unit a, properties b, portfolio c
WHERE a.property_id = b.properties_id
AND b.portfolio_id = c.portfolio_id
AND a.archived = 0
Thanks for your help.
There can only be one place where this error is referring to. I suggest to give the tables in the sub select also an alias:
(SELECT count(*) FROM o_leases o WHERE o.unit_id = a.unit_id AND o.archived = 0) as lease_count,
(SELECT count(*) FROM o_leases o WHERE o.unit_id = a.unit_id AND o.archived = 1) as lease_archive_count
It seems that it collides with the archived field of p_unit.
I would guess the error is because o_leases is not the only table that has an archived field. p_unit Or properties or portfolio must have that column as well.
However you have several other things that should also be fixed. First in most databases correlated subqueries are poor performers and should be rewritten as joins (although in this case you probably only need a case statement and a join). You should notgenerally even consider writing correlated subqueries.
Second, you should NEVER use implicit joins. Extremely bad idea that leads to poor maintainability and accidental cross joins as well as a generally poor understanding of joins. Bad bad bad. This kind of code was replaced 18 years ago, would you still be using C# or Java code relaced witha better method even five years ago?