This is my select statement, it returns duplicate rows (see screen shot).
How can I prevent the duplicated rows?
SELECT
A.TOTAL_PRESENT,
A."LIMIT",
A.COST_CENTER,
A.ID,
A.PLANT,
A.BUDGET_YEAR,
A."VERSION",
B.BUDGET_YEAR,
B."VERSION",
B.PLANT,
B.CHARGE_CC,
B.YEAR_DATE_USD
FROM
CMS.SUM_REPANDMAINT A,
CMS.V_SUM_REPANDMAINT B
WHERE
(A.BUDGET_YEAR = B.BUDGET_YEAR(+)) AND
(A."VERSION" = B."VERSION"(+)) AND
(A.PLANT = B.PLANT(+)) AND
(A.COST_CENTER = B.CHARGE_CC(+)) AND
(B.USERNAME = '[usr_name]')
Output
Duplicate entries mean the filter criteria are not precise enough. One of your data sources produces multiple rows and the WHERE clause doesn't offer sufficient restriction.
You haven't posted any raw data so we can't tell you what additional criteria you need. However you should look at the use of outer joins. Outer joins mean you will return rows if the criteria for the right hand table don't match the criteria of the left-hand table. Why are you doing that?
Related
I've run a query to calculate the difference between values of two columns from two tables using a common key. The query is:
Select a.GPID, a.StartDate-b.StartDate as Discrepancy FROM Difftable1 a
INNER JOIN Difftable2 b
ON a.GPID= b.GPID;
and the results are here:
Results
But I want to filter the results to only include differences which equal -10000. Every attempt results in a syntax error. I'm new to SQL.
If you want to filter out -10000 from the result set, you can use
SELECT a.GPID, a.StartDate-b.StartDate as Discrepancy
FROM Difftable1 a
INNER JOIN Difftable2 b ON a.GPID= b.GPID
WHERE a.StartDate-b.StartDate != -10000;
If you want to have records in the result set with only -10000, then replace != with = at the end of the above statement.
I read many threads but didn't get the right solution to my problem. It's comparable to this Thread
I have a query, which gathers data and writes it per shell script into a csv file:
SELECT
'"Dose History ID"' = d.dhs_id,
'"TxFieldPoint ID"' = tp.tfp_id,
'"TxFieldPointHistory ID"' = tph.tph_id,
...
FROM txfield t
LEFT JOIN txfielpoint tp ON t.fld_id = tp.fld_id
LEFT JOIN txfieldpoint_hst tph ON fh.fhs_id = tph.fhs_id
...
WHERE d.dhs_id NOT IN ('1000', '10000')
AND ...
ORDER BY d.datetime,...;
This is based on an very big database with lots of tables and machine values. I picked my columns of interest and linked them by their built-in table IDs. Now I have to reduce my result where I get many rows with same values and just the IDs are changed. I just need one(first) row of "tph.tph_id" with the mechanics like
WHERE "Rownumber" is 1
or something like this. So far i couldn't implement a proper subquery or use the ROW_NUMBER() SQL function. Your help would be very appreciated. The Result looks like this and, based on the last ID, I just need one row for every og this numbers (all IDs are not strictly consecutive).
A01";261511;2843119;714255;3634457;
A01";261511;2843113;714256;3634457;
A01";261511;2843113;714257;3634457;
A02";261512;2843120;714258;3634464;
A02";261512;2843114;714259;3634464;
....
I think "GROUP BY" may suit your needs.
You can group rows with the same values for a set of columns into a single row
Why are these two SQL queries not equivalent? One uses a correlated subquery, the other uses group by. The first produces a little over 51000 rows from my database, the second nearly 66000. In both cases, I am simply trying to return all the parts meeting the stated condition, current revision only. A comparison of the output files shows that method #1 (oracle_test1.txt) fails to return quite a few values. Based on that, I can only assume that method #2 is correct. I have some code that has used method #1 for a long time, but it appears I will have to change it. My reasoning concerning the correlated subquery was that as the inner select is comparing the columns in the self join, it will find the max vaule for the prev value for all matches; then return that max prev value for use in the outer query. I designed that query long ago before becoming familiar with the use of group by. Any insights would be appreciated.
Query #1
select pobj_name, prev
from pfmc_part
where pmodel in ('PN-DWG', 'NO-DWG') and pstatus = 'RELEASED'
and prev = (select max(prev) from pfmc_part a where a.pobj_name = pfmc_part.pobj_name)
order by pobj_name, prev"
Query #2
select pobj_name, max(prev) prev
from pfmc_part
where pmodel in ('PN-DWG', 'NO-DWG') and pstatus = 'RELEASED'
group by pobj_name
order by pobj_name, prev"
Sample output:
Query #2 Query #1
P538512 B P538512 B
P538513 A P538513 A
P538514 C P538514 C
P538520 B
P538522 B P538522 B
P538525 A P538525 A
P538531 C P538531 C
P538533 A P538533 A
P538538 B
P538541 B
P538542 B
P538553 A P538553 A
P538569 A P538569 A
Query 1 is returning each of the max ids and then those that have a pmodel of the type specified within your where clause.
Whereas query 2 is selecting all items with a pmodel of the type specified in your where clause and each of the max ids of that.
You may have data which isn't the max id which satisfies your where clause in query 2 which is why it's being omitted in query 1
There are two differences and the rest of the answers focus on one. The "easy" difference is that the max() in the group by is affected by the filter clause. The max() in the other query has no filter, and so it might return no rows (when max(prev) is on a row otherwise filtered out by the where conditions).
In addition, the where version of the query might return duplicate rows when there are multiple rows with the same value of max(prev) for a given pobj_name. The group by will never return duplicate rows.
this query
select pobj_name, prev
from pfmc_part
where pmodel in ('PN-DWG', 'NO-DWG') and pstatus = 'RELEASED'
and prev = (select max(prev) from pfmc_part a where a.pobj_name = pfmc_part.pobj_name)
order by pobj_name, prev"
has a where clause declaration causing it to return less rows -- specifically, only rows where prev = (subquery). that and prev makes it entirely different, and also assigns the value into prev in the first line
if you wanted them to be the more similar, you'd need to modify it like so
select pobj_name, prev, maxes.max
from pfmc_part
JOIN (select max(prev) as max from pfmc_part a where a.pobj_name = pfmc_part.pobj_name) maxes
where pmodel in ('PN-DWG', 'NO-DWG') and pstatus = 'RELEASED'
order by pobj_name, prev"
In query 1 you are ONLY selecting the rows whose prev field is equal to the max(prev) and in query 2 you are selecting all records ALONG WITH max(prev) that's meeting the conditions in the where and group by clause.
Basically, query 1 and query 2 have completely different where clauses. Hope this explains the missing records from query 1.
Your query #1 will certainly fail to return a row for a given pobj_name where maximum prev for that name does not correspond to a revision currently in the database. That could perhaps happen if a revision was skipped or if its row was deleted.
Your Query #2 does not suffer Query #1's limitation, and it may perform better on account of avoiding a correlated subquery. It would be inappropriate, however, if you wanted more data than just pobj_name and aggregate functions of the groups. And by the way, there's no point in including prev in the ORDER BY clause, since pobj_name will already be unique to each result row.
Overall, if the two queries happen to return similar results then that is a matter of the details of the data, not of the queries. They arrive at their results completely differently.
I have two table, and i want to compare the two column from those two table. The column reflow in table f_product must greater and equal to column lreflow in table f_line. The coding that I used is
SELECT f_product.oiv,f_product.product,f_product.passive,f_product.pitch,f_product.reflow,f_line.lreflow,f_product.spi,f_product.scomp,f_product.pallet,f_product.printer,f_line.line
FROM f_product,f_line
WHERE f_product.passive=f_line.passive
AND f_product.pitch=f_line.pitch
AND f_product.spi=f_line.spi
AND f_product.pallet=f_line.pallet
AND f_product.printer=f_line.printer
AND f_product.reflow >= f_line.lreflow
AND oiv='PMLE4720A' .
However, the result display out did not compare out the column data in between f_product.reflow and f_line.lreflow. For example, the result still list out the result of reflow=8 and lreflow=10 where reflow is less than the value of lreflow.
Is that my sql coding have any error?
I'm guessing this is Oracle? Sometimes it gets confused by the ambiguity between real where clauses and an implicit join using a where. I would recast it into ansi sql joins:
SELECT
.....
FROM
f_product a INNER JOIN f_line b ON
(a.passive = b.passive AND
a.pitch =b.pitch AND
a.spi=b.spi AND
a.pallet=b.pallet)
where oiv='PMLE4720A'
and a.reflow >= b.lreflow
Assuming the relationship between product and line is such that it makes sense to jion on these four fields...
i need two join 6 queries in mdx which are fetching result from olap cube .
problem is that all queries have different where condition and i want to join them on the basis of rows. the query is
WITH
MEMBER MEASURES.CONSTANTVALUE AS 0
SELECT
Union(MEASURES.CONSTANTVALUE,[Measures].[Totalresult]) on 0,
NON EMPTY {Hierarchize(Filter ({[keyword].[All keywords]},([Measures].[Totalresult]=0)))} ON 1
FROM [Advancedsearch]
WHERE {[Path].[/Search]}
In above the filter will be changed in different queries
how can we join this one.
I would think that a cross product between the list of filters and your existing set on the rows should either already give you what you want, or be a starting point for further refinement of requirements not stated so far in your question:
This would mean something like
NON EMPTY
{[Path].[/Search], [Path].[/Search2]}
*
{Hierarchize(Filter ({[keyword].[All keywords]}, ([Measures].[Totalresult]=0)))}
ON 1
(guessing your second filter would be [Path].[/Search2]) instead of your original
NON EMPTY
{Hierarchize(Filter ({[keyword].[All keywords]}, ([Measures].[Totalresult]=0)))}
ON 1
and omitting the WHERE.