How to do a LEFT JOIN in MS Access without duplicates? - sql

I have 2 tables with duplicated values in one of the columns. I'd like to do a left join without taking rows, where mentioned column values duplicates.
For example,
i have table X:
id Value
A 2
B 4
C 5
and table Y:
id Value
A 2
A 8
B 2
I'm doing a LEFT JOIN:
SELECT*
FROM X LEFT JOIN Y ON X.id = Y.id;
Would like to have something like:
id Value
A 2 A 2
B 4 B 2
C 5
so that duplicated id (A 8) from table Y is not considered.

You can do it with GROUP BY:
SELECT X.id, X.value, MIN(Y.value)
FROM X
LEFT JOIN Y ON X.id = Y.id
GROUP BY X.id, X.value
Note that it is not necessary to bring Y.id into the mix, because it is either null or equal to X.id.

You are looking for GROUP BY to aggregate the Y table records, effectively collapsing them down to one row per id. I have chosen MIN but you could use SUM if they are integers like your example data.
SELECT
x.id , x.Value, y.id, min(y.value)
FROM
X LEFT JOIN Y ON X.id = Y.id
GROUP BY
x.id, x.value, y.id;
I have given exactly what you asked for. But in my opinion the y.Id is unnecessary in the select and group by list.

This isn't what you're asking for in particular, but if you don't need them joined you could use a union.
SELECT * FROM (
SELECT ID, VALUE FROM tableX
UNION ALL
SELECT ID, VALUE FROM TAbleY)
GROUP BY ID, Value
You would end up getting
id Value
A 2
A 4
A 8
B 2
B 4
C 5

Hmmm, I think you can do this in Access with correlated subqueries:
select x.*,
(select top 1 y.id
from y
where y.id = x.id
),
(select top 1 y.value
from y
where y.id = x.id
),
from x;
This doesn't guarantee that the values come from the same row, but that's not a big deal in this case, because the y.id is either present (and the same as x.id) or it is NULL. The y.value comes from an arbitrary matching row.

Related

Optimising where clause with x in y or z in y

I'm just wondering if there is any way to optimise this query :
select * from table_x where buyer_id in (select id from table_y) x or
seller_id in (select id from table_y) y
Since the two subqueries in my where-clause are identical and I suspect that the program will run the two subqueries separately
Thanks!
Your query is essentially:
select x.*
from table_x x
where x.buyer_id in (select y.id from table_y y) or
x.seller_id in (select y.id from table_y y);
This construct should be fine. In some databases, you might use exists instead of in, but I think Hive will be fine with this.
To solve multiple equery issue in hive use semi left join:
SELECT x.*
FROM table_x x
LEFT SEMI JOIN table_y b ON (x.buyer_id = b.id )
LEFT SEMI JOIN table_y c ON (x.seller_id = c.id )

In sql, can you join to a select statement that references the outer tables in other joins?

What I want to do is to transform the following sql
SELECT X
FROM Y LEFT JOIN Z ON Y.Id=Z.id
WHERE Y.Fld='P'
into
SELECT Y
FROM Y LEFT JOIN (SELECT TOP 1 Id FROM Z WHERE Z.Id=Y.Id ORDER BY Z.PrimaryKey DESC) ON 1=1
WHERE Y.Fld='P'
The reason I want to do this is because Z has multiple rows that can be joined to Y, that are not unique in a distinguishable way, other than that the one we need is the latest one, and we only need that one record. Is this possible? I tried it but mssql complained that I cannot reference Y.Id from within the sub query.
How about a CTE approach:
;WITH CTE
AS
(
SELECT Id,
PrimaryKey,
ROW_NUMBER() OVER (PARTITION BY Id, ORDER BY Primarykey Desc) AS RN
FROM Z
)
SELECT X
FROM Y
LEFT JOIN CTE
ON CTE.ID = Y.ID
WHERE CTE.RN = 1

one to one distinct restriction on selection

I encountered a problem like that. There are two tables (x value is ordered so that
in a incremental trend !)
Table A
id x
1 1
1 3
1 4
1 7
Table B
id x
1 2
1 5
I want to join these two tables:
1) on the condition of the equality of id and
2) each row of A should be matched only to one row of B, vice verse (one to one relationship) based on the absolute difference of x value (small difference row has
more priority to match).
Only based on the description above it is not a clear description because if two pairs of row which share a common row in one of the table have the same difference, there is no way to decide which one goes first. So define A as "Main" table, the row in table A with smaller line number always go first
Expected result of demo:
id A.x B.x abs_diff
1 1 2 1
1 4 5 1
End of table(two extra rows in A shouldn't be considered, because one to one rule)
I am using PostgreSQL so the thing I have tried is DISTINCT ON, but it can not solve.
select distinct on (A.x) id,A.x,B.x,abs_diff
from
(A join B
on A.id=B.id)
order by A.x,greatest(A.x,B.x)-least(A.x,B.x)
Do you have any ideas, it seems to be tricky in plain SQL.
Try:
select a.id, a.x as ax, b.x as bx, x.min_abs_diff
from table_a a
join table_b b
on a.id = b.id
join (select a.id, min(abs(a.x - b.x)) as min_abs_diff
from table_a a
join table_b b
on a.id = b.id
group by a.id) x
on x.id = a.id
and abs(a.x - b.x) = x.min_abs_diff
fiddle: http://sqlfiddle.com/#!15/ab5ae/5/0
Although it doesn't match your expected output, I think the output is correct based on what you described, as you can see each pair has a difference with an absolute value of 1.
Edit - Try the following, based on order of a to b:
select *
from (select a.id,
a.x as ax,
b.x as bx,
x.min_abs_diff,
row_number() over(partition by a.id, b.x order by a.id, a.x) as rn
from table_a a
join table_b b
on a.id = b.id
join (select a.id, min(abs(a.x - b.x)) as min_abs_diff
from table_a a
join table_b b
on a.id = b.id
group by a.id) x
on x.id = a.id
and abs(a.x - b.x) = x.min_abs_diff) x
where x.rn = 1
Fiddle: http://sqlfiddle.com/#!15/ab5ae/19/0
One possible solution for your currently ambiguous question:
SELECT *
FROM (
SELECT id, x AS a, lead(x) OVER (PARTITION BY grp ORDER BY x) AS b
FROM (
SELECT *, count(tbl) OVER (PARTITION BY id ORDER BY x) AS grp
FROM (
SELECT TRUE AS tbl, * FROM table_a
UNION ALL
SELECT NULL, * FROM table_b
) x
) y
) z
WHERE b IS NOT NULL
ORDER BY 1,2,3;
This way, every a.x is assigned the next bigger (or same) b.x, unless there is another a.x that is still smaller than the next b.x (or the same).
Produces the requested result for the demo case. Not sure about various ambiguous cases.
SQL Fiddle.

Help required with a complex self join sql query

myTable is having composite key formed of columns A and B (total columns A, B, C, D, E).
I want to exclude/ignore records where value of D (say order number) is same and E (say decision) is Y in one but N or Null in other. (means all the twin-records with same order number (equal D value) which were ordered first (so E=Y) and then again cancelled (so E=N) should be ignored)
So I pulled out A,B for all records where D is same but E is Y in one and N in other
SELECT *
FROM myTable A, myTable B
WHERE
(A.D=B.D)
AND
((A.E ='Y' AND (B.E ='N' OR B.E IS NULL)) OR (B.E='Y' AND (A.E='N' OR A.E IS NULL)))
Now my final output should be all records from myTable but not the records found above.
I wrote a join query but its not working as it should. Basically issue is how to compare two composite keys ??
Sample Data:
A B C D E
=========================
1 A xyz ONE Y
2 B pqr TWO Y
3 C lmn ONE N
4 D abc THREE Y
5 E ijk FOUR Y
=========================
Thus, my output should be records 2,4 and 5. As 1 and 3 will be ignored. Because 1.D = 3.D and 1.E is Y but 3.E is N.
Thanks,
Nik
I want to exclude records where value of D is "XYZ".
Why not simply query like this directly?
select *
from myTable
where D <> 'XYZ'
To exclude rows from Temp, you could:
select *
from myTable
where not exists
(
select *
from temp
where myTable.A = temp.A
and myTable.B = temp.B
)
Or with an exclusive left join:
select *
from myTable
left join
temp
on myTable.A = temp.A
and myTable.B = temp.B
where temp.A is null
If I've correctly understood you, what you need is this:
select x.*
from mytable x left outer join
( select mt1.a, mt1.b
from mytable mt1 inner join
mytable mt2 on mt1.d = mt2.d
where ((mt1.E ='Y' AND (mt2.E ='N' OR mt2.E IS NULL)) OR (mt2.E='Y' AND (mt1.E='N' OR mt1.E IS NULL)))
) y on x.a = y.a and x.b = y.b
where y.a is NULL
You need something like
select A.*
from myTable A
WHERE (SELECT COUNT(*) FROM myTable B WHERE B.D = A.D AND (B.E IS NULL OR B.E = 'N')) = 0

Multiple NOT distinct

I've got an MS access database and I would need to create an SQL query that allows me to select all the not distinct entries in one column while still keeping all the values.
In this case more than ever an example is worth thousands of words:
Table:
A B C
1 x q
2 y w
3 y e
4 z r
5 z t
6 z y
SQL magic
Result:
B C
y w
y e
z r
z t
z y
Basically it removes all unique values of column B but keeps the multiple rows of the
data kept. I can "group by b" and then "count>1" to get the not distinct but the result will only list one row of B not the 2 or more that I need.
Any help?
Thanks.
Select B, C
From Table
Where B In
(Select B From Table
Group By B
Having Count(*) > 1)
Another way of returning the results you want would be this:
select *
from
my_table
where
B in
(select B from my_table group by B having count(*) > 1)
select
*
from
my_table t1,
my_table t2
where
t1.B = t2.B
and
t1.C != t2.C
-- apparently you need to use <> instead of != in Access
-- Thanks, Dave!
Something like that?
join the unique values of B you determined with group by b and count > 1 back to the original table to retrieve the C values from the table.