Two almost identical queries returning different results - sql

I am getting different results for the following two queries and I have no idea why. The only difference is one has an IN and one has an equals.
Before I go into the queries you should know that I found a better way to do it by moving the subquery into a common table expression, but this is still driving me crazy! I really want to know what caused the issue in the first place, I am asking out of curiosity
Here's the first query:
use [DB.90_39733]
Select distinct x.uniqproducer, cn.Firstname,cn.lastname,e.code,
ecn.FirstName, ecn.LastName, ecn.entid, x.uniqline
from product x
join employ e on e.EmpID=x.uniqproducer
join contactname cn on cn.uniqentity=e.uniqentity
join [ETL_GAWR92]..idlookupentity ide on ide.enttype='EM'
and ide.UniqEntity=e.UniqEntity
left join [ETL_GAWR92]..EntConName ecn on ecn.entid=ide.empid
and ecn.opt='Y'
Where x.UniqProducer =(SELECT TOP 1 idl.UniqEntity
FROM [ETL_GAWR92]..IDLookupEntity idl
LEFT JOIN [ETL_GAWR92]..Employ e2 ON e2.ProdID = ''
WHERE idl.empID = e2.EmpID AND
idl.EntType = 'EM')
And the second one:
use [DB.90_39733]
Select distinct x.uniqproducer, cn.Firstname,cn.lastname,e.code,
ecn.FirstName, ecn.LastName, ecn.entid, x.uniqline
from product x
join employ e on e.EmpID=x.uniqproducer
join contactname cn on cn.uniqentity=e.uniqentity
join [ETL_GAWR92]..idlookupentity ide on ide.enttype='EM'
and ide.UniqEntity=e.UniqEntity
left join [ETL_GAWR92]..EntConName ecn on ecn.entid=ide.empid
and ecn.opt='Y'
Where x.UniqProducer IN (SELECT TOP 1 idl.UniqEntity
FROM [ETL_GAWR92]..IDLookupEntity idl
LEFT JOIN [ETL_GAWR92]..Employ e2 ON e2.ProdID = ''
WHERE idl.empID = e2.EmpID AND
idl.EntType = 'EM')
The first query returns 0 rows while the second query returns 2 rows.The only difference is x.UniqProducer = versus x.UniqProducer IN for the last where clause.
Thanks for your time

SELECT TOP 1 doesn't guarantee that the same record will be returned each time.
Add an ORDER BY to your select to make sure the same record is returned.
(SELECT TOP 1 idl.UniqEntity
FROM [ETL_GAWR92]..IDLookupEntity idl
LEFT JOIN [ETL_GAWR92]..Employ e2 ON e2.ProdID = ''
WHERE idl.empID = e2.EmpID AND
idl.EntType = 'EM' ORDER BY idl.UniqEntity)

I would guess (with strong emphasis on the word “guess”) that the reason is based on how equals and in are processed by the query engine. For equals, SQL knows it needs to do a comparison with a specific value, where for in, SQL knows it needs to build a subset, and find if the "outer" value is in that "inner" subset. Yes, the end results should be the same as there’s only 1 row returned by the subquery, but as #RickS pointed out, without any ordering there’s no guarantee of which value ends up “on top” – and the (sub)query plan used to build the in - driven subquery might differ from that used by the equals pull.
A follow-up question: which is the correct dataset? When you analyze the actual data, should you have gotten zero, two, or a different number of rows?

Related

Should I use an SQL full outer join for this?

Consider the following tables:
Table A:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
Table B:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
The DOC_TYPE and NEXT_STATUS columns have different meanings between the two tables, although a NEXT_STATUS = 999 means "closed" in both. Also, under certain conditions, there will be a record in each table, with a reference to a corresponding entry in the other table (i.e. the RELATED_DOC_NUM columns).
I am trying to create a query that will get data from both tables that meet the following conditions:
A.RELATED_DOC_NUM = B.DOC_NUM
A.DOC_TYPE = "ST"
B.DOC_TYPE = "OT"
A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999
A.DOC_TYPE = "ST" represents a transfer order to transfer inventory from one plant to another. B.DOC_TYPE = "OT" represents a corresponding receipt of the transferred inventory at the receiving plant.
We want to get records from either table where there is an ST/OT pair where either or both entries are not closed (i.e. NEXT_STATUS < 999).
I am assuming that I need to use a FULL OUTER join to accomplish this. If this is the wrong assumption, please let me know what I should be doing instead.
UPDATE (11/30/2021):
I believe that #Caius Jard is correct in that this does not need to be an outer join. There should always be an ST/OT pair.
With that I have written my query as follows:
SELECT <columns>
FROM A LEFT JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
Does this make sense?
UPDATE 2 (11/30/2021):
The reality is that these are DB2 database tables being used by the JD Edwards ERP application. The only way I know of to see the table definitions is by using the web site http://www.jdetables.com/, entering the table ID and hitting return to run the search. It comes back with a ton of information about the table and its columns.
Table A is really F4211 and table B is really F4311.
Right now, I've simplified the query to keep it simple and keep variables to a minimum. This is what I have currently:
SELECT CAST(F4211.SDDOCO AS VARCHAR(8)) AS SO_NUM,
F4211.SDRORN AS RELATED_PO,
F4211.SDDCTO AS SO_DOC_TYPE,
F4211.SDNXTR AS SO_NEXT_STATUS,
CAST(F4311.PDDOCO AS VARCHAR(8)) AS PO_NUM,
F4311.PDRORN AS RELATED_SO,
F4311.PDDCTO AS PO_DOC_TYPE,
F4311.PDNXTR AS PO_NEXT_STATUS
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = CAST(F4311.PDDOCO AS VARCHAR(8))
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
The other part of the story is that I'm using a reporting package that allows you to define "virtual" views of the data. Virtual views allow the report developer to specify the SQL to use. This is the application where I am using the SQL. When I set up the SQL, there is a validation step that must be performed. It will return a limited set of results if the SQL is validated.
When I enter the query above and validate it, it says that there are no results, which makes no sense. I'm guessing the data casting is causing the issue, but not sure.
UPDATE 3 (11/30/2021):
One more twist to the story. The related doc number is not only defined as a string value, but it contains leading zeros. This is true in both tables. The main doc number (in both tables) is defined as a numeric value and therefore has no leading zeros. I have no idea why those who developed JDE would have done this, but that is what is there.
So, there are matching records between the two tables that meet the criteria, but I think I'm getting no results because when I convert the numeric to a string, it does not match, because one value is, say "12345", while the other is "00012345".
Can I pad the numeric -> string value with zeros before doing the equals check?
UPDATE 4 (12/2/2021):
Was able to finally get the query to work by converting the numeric doc num to a left zero padded string.
SELECT <columns>
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = RIGHT(CONCAT('00000000', CAST(F4311.PDDOCO AS VARCHAR(8))), 8)
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
AND ( F4211.SDNXTR < 999
OR F4311.PDNXTR < 999 )
You should write your query as follows:
SELECT <columns>
FROM A INNER JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
LEFT join is a type of OUTER join; LEFT JOIN is typically a contraction of LEFT OUTER JOIN). OUTER means "one side might have nulls in every column because there was no match". Most critically, the code as posted in the question (with a LEFT JOIN, but then has WHERE some_column_from_the_right_table = some_value) runs as an INNER join, because any NULLs inserted by the LEFT OUTER process, are then quashed by the WHERE clause
See Update 4 for details of how I resolved the "data conversion or mapping" error.

COUNT is outputting more than one row

I am having a problem with my SQL query using the count function.
When I don't have an inner join, it counts 55 rows. When I add the inner join into my query, it adds a lot to it. It suddenly became 102 rows.
Here is my SQL Query:
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'
GROUP BY [fmsStage].[dbo].[File].[FILENUMBER]
Also, I have to do TOP 1 at the SELECT statement because it returns 51 rows with random numbers inside of them. (They are probably not random, but I can't figure out what they are.)
What do I have to do to make it just count the rows from [fmsStage].[dbo].[file].[FILENUMBER]?
First, your query would be much clearer like this:
SELECT COUNT(f.[FILENUMBER])
FROM [fmsStage].[dbo].[File] f INNER JOIN
[fmsStage].[dbo].[Container] c
ON v.[FILENUMBER] = c.[FILENUMBER]
WHERE f.[RELATIONCODE] = 'SHIP02' AND
c.DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08';
No GROUP BY is necessary. Otherwise you'll just one row per file number, which doesn't seem as useful as the overall count.
Note: You might want COUNT(DISTINCT f.[FILENUMBER]). Your question doesn't provide enough information to make a judgement.
Just remove GROUP BY Clause
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'

how to join multiple tables without showing repeated data?

I pop into a problem recently, and Im sure its because of how I Join them.
this is my code:
select LP_Pending_Info.Service_Order,
LP_Pending_Info.Pending_Days,
LP_Pending_Info.Service_Type,
LP_Pending_Info.ASC_Code,
LP_Pending_Info.Model,
LP_Pending_Info.IN_OUT_WTY,
LP_Part_Codes.PartCode,
LP_PS_Codes.PS,
LP_Confirmation_Codes.SO_NO,
LP_Pending_Info.Engineer_Code
from LP_Pending_Info
join LP_Part_Codes
on LP_Pending_Info.Service_order = LP_Part_Codes.Service_order
join LP_PS_Codes
on LP_Pending_Info.Service_Order = LP_PS_Codes.Service_Order
join LP_Confirmation_Codes
on LP_Pending_Info.Service_Order = LP_Confirmation_Codes.Service_Order
order by LP_Pending_Info.Service_order, LP_Part_Codes.PartCode;
For every service order I have 5 part code maximum.
If the service order have only one value it show the result correctly but when it have more than one Part code the problem begin.
for example: this service order"4182134076" has only 2 part code, first'GH81-13601A' and second 'GH96-09938A' so it should show the data 2 time but it repeat it for 8 time. what seems to be the problem?
If your records were exactly the same the distinct keyword would have solved it.
However in rows 2 and 3 which have the same Service_Order and Part_Code if you check the SO_NO you see it is different - that is why distinct won't work here - the rows are not identical.
I say you have some problem in one of the conditions in your joins. The different data is in the SO_NO column so check the raw data in the LP_Confirmation_Codes table for that Service_Order:
select * from LP_Confirmation_Codes where Service_Order = 4182134076
I assume you are missing an and with the value from the LP_Part_Codes or LP_PS_Codes (but can't be sure without seeing those tables and data myself).
By this sentence If the service order have only one value it show the result correctly but when it have more than one Part code the problem begin. - probably you are missing and and with the LP_Part_Codes table
Based on your output result, here are the following data that caused multiple output.
Service Order: 4182134076 has :
2 PartCode which are GH81-13601A and GH96-09938A
2 PS which are U and P
2 SO_NO which are 1.00024e+09 and 1.00022e+09
Therefore 2^3 returns 8 rows. I believe that you need to check where you should join your tables.
Use DINTINCT
select distinct LP_Pending_Info.Service_Order,LP_Pending_Info.Pending_Days,
LP_Pending_Info.Service_Type,LP_Pending_Info.ASC_Code,LP_Pending_Info.Model,
LP_Pending_Info.IN_OUT_WTY, LP_Part_Codes.PartCode,LP_PS_Codes.PS,
LP_Confirmation_Codes.SO_NO,LP_Pending_Info.Engineer_Code
from LP_Pending_Info
join LP_Part_Codes on LP_Pending_Info.Service_order = LP_Part_Codes.Service_order
join LP_PS_Codes on LP_Part_Codes.Service_Order = LP_PS_Codes.Service_Order
join LP_Confirmation_Codes on LP_PS_Codes.Service_Order = LP_Confirmation_Codes.Service_Order
order by LP_Pending_Info.Service_order, LP_Part_Codes.PartCode;
distinct will not return duplicates based on your select. So if a row is same, it will only return once.

Two identical queries that give different results

I have 2 queries: one is written in ANSI SQL, another is written using oracle dialect.
I think that they both must give the same resultset, but it is no true. First query gives 385 rows and the second - only 25
First:
SELECT idclient, cl.surname, sum(sub1.s)
FROM client cl JOIN incomestatement incst USING(idclient)
JOIN (SELECT c.idincome ID, sum(inst.total) AS s
FROM instalment inst JOIN credit c USING(idcredit)
WHERE inst.paydate > c.paydate AND c.isloaned = 1
GROUP BY c.idincome) sub1 ON incst.idincome = sub1.ID
GROUP BY idclient, cl.surname;
Second:
SELECT c.idclient, c.surname, sum(sub.s)
FROM client c, incomestatement inc,
(SELECT sum(inst.total) as s, cr.idincome as id
FROM instalment inst, credit cr
WHERE inst.paydate > cr.paydate AND cr.isloaned = 1 AND cr.idcredit = inst.idcredit
GROUP BY cr.idincome
) sub
WHERE c.idclient = inc.idclient AND inc.income = sub.ID
group by c.idclient, c.surname;
So why they don't give the same result?
I'd approach the problem in steps.
Do the two sub-queries produce the same data sets?
If they do, proceed to step 2.
If not, then you have two simpler queries to analyze and dissect.
Given that the pair of sub-queries produce the same answer, you can then establish whether the Client and IncomeStatement joins give the same results (treat it as another sub-query)
If they do, proceed to step 3.
If not, then you have a pair of queries (one with JOIN, one with classic SQL notation) to analyze and dissect.
Given that the pair of joins and the pair of subqueries each produce the same result, analyze why the join of these does not work correctly.
Have you made a commit?
It's possible that you don´t commited some transactions, so the results can be differents.

Subselect a Summed col in Oracle

this is an attempted fix to a crystal reports use of 2 sub reports!
I have a query that joins 3 tables, and I wanted to use a pair of sub selects that bring in the same new table.
Here is the first of the two columns in script:
SELECT ea."LOC_ID", lo."DESCR", ea."PEGSTRIP", ea."ENTITY_OWNER"
, ea."PCT_OWNERSHIP", ea."BEG_BAL", ea."ADDITIONS", ea."DISPOSITIONS"
, ea."EXPLANATION", ea."END_BAL", ea."NUM_SHARES", ea."PAR_VALUE"
, ag."DESCR", ea."EOY", ea."FAKEPEGSTRIP",
(select sum(htb.END_FNC_CUR_US_GAAP)
from EQUITY_ACCOUNTS ea , HYPERION_TRIAL_BALANCE htb
where
htb.PEGSTRIP = ea.PEGSTRIP and
htb.PRD_NBR = 0 and
htb.LOC_ID = ea.LOC_ID and
htb.PRD_YY = ea.EOY
) firstHyp
FROM ("TAXPALL"."ACCOUNT_GROUPING" ag
INNER JOIN "TAXP"."EQUITY_ACCOUNTS" ea
ON (ag."ACCT_ID"=ea."PEGSTRIP") AND (ag."EOY"=ea."EOY"))
INNER JOIN "TAXP"."LOCATION" lo ON ea."LOC_ID"=lo."LOC_ID"
WHERE ea."EOY"=2009
ORDER BY ea."LOC_ID", ea."PEGSTRIP"
When this delivers the dataset the value of "firstHyp" fails to change by pegstrip value. It returns a single total for the join and fails to put the proper by value by pegstrip.
I thought that the where clause would have picked up the joins line by line.
I don't do Oracle syntax often so what am I missing here?
TIA
Your SQL is equivilent to the following:
SELECT ea."LOC_ID", lo."DESCR", ea."PEGSTRIP",
ea."ENTITY_OWNER" , ea."PCT_OWNERSHIP",
ea."BEG_BAL", ea."ADDITIONS", ea."DISPOSITIONS" ,
ea."EXPLANATION", ea."END_BAL", ea."NUM_SHARES",
ea."PAR_VALUE" , ag."DESCR", ea."EOY", ea."FAKEPEGSTRIP",
(select sum(htb.END_FNC_CUR_US_GAAP)
from EQUITY_ACCOUNTS iea
Join HYPERION_TRIAL_BALANCE htb
On htb.PEGSTRIP = iea.PEGSTRIP
and htb.LOC_ID = iea.LOC_ID
and htb.PRD_YY = iea.EOY
where htb.PRD_NBR = 0 ) firstHyp
FROM "TAXPALL"."ACCOUNT_GROUPING" ag
JOIN "TAXP"."EQUITY_ACCOUNTS" ea
ON ag."ACCT_ID"=ea."PEGSTRIP"
AND ag."EOY"=ea."EOY"
JOIN "TAXP"."LOCATION" lo
ON ea."LOC_ID"=lo."LOC_ID"
WHERE ea."EOY"=2009
ORDER BY ea."LOC_ID", ea."PEGSTRIP"
Notice that the subquery that generates firstHyp is not in any way dependant on the tables in the outer query... It is therefore not a Correllated SubQuery... meaning that the value it generates will NOT be different for each row in the outer query's resultset, it will be the same for every row. You need to somehow put something in the subquery that makes it dependant on the value of some row in the outer query so that it will become a correllated subquery and run over and over once for each outer row....
Also, you mention a pair of subselects, but I only see one. Where is the other one ?
Use:
SELECT ea.LOC_ID,
lo.DESCR,
ea.PEGSTRIP,
ea.ENTITY_OWNER,
ea.PCT_OWNERSHIP,
ea.BEG_BAL,
ea.ADDITIONS,
ea.DISPOSITIONS,
ea.EXPLANATION,
ea.END_BAL,
ea.NUM_SHARES,
ea.PAR_VALUE,
ag.DESCR,
ea.EOY,
ea.FAKEPEGSTRIP,
NVL(SUM(htb.END_FNC_CUR_US_GAAP), 0) AS firstHyp
FROM TAXPALL.ACCOUNT_GROUPING ag
JOIN TAXP.EQUITY_ACCOUNTS ea ON ea.PEGSTRIP = ag.ACCT_ID
AND ea.EOY = ag.EOY
AND ea.EOY = 2009
JOIN TAXP.LOCATION lo ON lo.LOC_ID = ea.LOC_ID
LEFT JOIN HYPERION_TRIAL_BALANCE htb ON htb.PEGSTRIP = ea.PEGSTRIP
AND htb.LOC_ID = ea.LOC_ID
AND htb.PRD_YY = ea.EOY
AND htb.PRD_NBR = 0
GROUP BY ea.LOC_ID,
lo.DESCR,
ea.PEGSTRIP,
ea.ENTITY_OWNER,
ea.PCT_OWNERSHIP,
ea.BEG_BAL,
ea.ADDITIONS,
ea.DISPOSITIONS,
ea.EXPLANATION,
ea.END_BAL,
ea.NUM_SHARES,
ea.PAR_VALUE,
ag.DESCR,
ea.EOY,
ea.FAKEPEGSTRIP,
ORDER BY ea.LOC_ID, ea.PEGSTRIP
I agree with Charles Bretana's assessment that the original SELECT in the SELECT clause was not correlated, which is why the value never changed per row. But the sub SELECT used the EQUITY_ACCOUNTS table, which is the basis for the main query. So I removed the join, and incorporated the HYPERION_TRIAL_BALANCE table into the main query, using a LEFT JOIN. I wrapped the SUM in an NVL rather than COALESCE because I didn't catch what version of Oracle this is for.