SQL SELECT multiple SUM() error - sql

I am having a problem involving multiple SUM() functions in a SQL SELECT statement using JOINs.
Whenever I sum together two values, it makes the value inside the other sum function double. How do I prevent this?
Example: SQL Fiddle - all X and Y values should be a 2.
I am using SQLite.

You can use UNION for this:
SELECT id, SUM(bamount) AS BAmount, SUM(camount) AS CAmount
FROM
(
SELECT a.id, SUM(b.amount) AS bamount, 0 AS camount
FROM a
LEFT JOIN b ON a.id = b.a_id
GROUP BY a.id
UNION ALL
SELECT a.id, 0, SUM(c.amount) AS camount
FROM a
LEFT JOIN c ON a.id = c.a_id
GROUP BY a.id
) AS t
GROUP BY id;
updated demo
This will give you:
| id | BAmount | CAmount |
|----|---------|---------|
| 1 | 2 | 2 |
| 2 | 2 | 2 |
| 3 | 2 | 2 |

You can try performing the aggregations in separate subqueries. This is one way to get around the problem of double (or triple, etc.) counting rows as the result of a join.
SELECT
a.id,
t1.b_sum AS x,
t2.c_sum AS y
FROM a
LEFT JOIN
(
SELECT a_id, SUM(amount) AS b_sum
FROM b
GROUP BY a_id
) t1
ON a.id = t1.a_id
LEFT JOIN
(
SELECT a_id, SUM(amount) AS c_sum
FROM c
GROUP BY a_id
) t2
ON a.id = t2.a_id;

Related

How exactly do aliases work in Oracle databases?

I've been dabbling around in sql code and recently was reading up on aliases. I am kind of confused why the following statement does not work:
select id, data from table1 a
inner join
(
select id, data from table2 b,
(
select id, data from table3 e
where b.id = e.id
) c
where b.id = a.id
)d on a.id = d.id
What I want is something like this to work:
select id, data from table1 a
inner join
(
select id, data from
(
select id, data from table3 e
where a.id = e.id
) c
)d on a.id = d.id
Currently my solution doesn't have the WHERE class at the end, meaning all of the table gets fetched.
...
where a.id = e.id
...
My point here, would be to use an ID present in table A in the table E. I'm open to suggestions as to changing the structure, but unfortunately I think the structure will have to stay the same since the actual query is much more complex. This is just an exert from the full query.
EDIT:
I'll try to elaborate as to why I have the current structure.
I have table 1 which contains ID's and text and other columns.
| id | data |
| -------- | ------ |
| table1_1 | text |
| table1_2 | text |
...
The second table contains multiple entries for an ID of table1.
| id | refid | data |
| -------- | -------- | ------ |
| table2_1 | table1_1 | proj1 |
| table2_1 | table1_1 | proj2 |
| table2_2 | table1_1 | proj1 |
| table2_3 | table1_2 | proj5 |
| table2_3 | table1_2 | proj1 |
What I now do is join the entries from table2 to a list of entries with:
LISTAGG(table2.refid, ',') WITHIN GROUP( ORDER BY table2.refid) list_of_projects,
To use this, I need to use group by
My problem was, that I couldn't use the table1.ID in table2.refid.
For better understanding of how sub-queries work, just imagine that database processes them separate from each other.
It means the sub-query
select id, data
from table3 e
where b.id = e.id
will be executed first. There is no alias b in this context -> an error
The next sub-query has the same problem
select id, data from table2 b,
(
select id, data from table3 e
where b.id = e.id
) c
where b.id = a.id
There is no data source called "a" -> another error
And to be honest, using sub-queries in this case is a bad idea. Join is what you need here
I believe something like this will help you out.
select a.id, a.data
from table1 a
inner join table3 e
on a.id = e.id;
From Oracle 12c, you can use CROSS APPLY or a LATERAL join to pass the outer scope into the inner sub-query:
SELECT id, data
FROM table1 a
CROSS APPLY
(
SELECT data
FROM table2 b
CROSS APPLY (
SELECT data
FROM table3 e
WHERE a.id = e.id
) c
WHERE b.id = a.id
) d
or:
SELECT a.id, data
FROM table1 a
INNER JOIN LATERAL
(
SELECT b.id,
c.data
FROM table2 b
INNER JOIN LATERAL (
SELECT e.id,
e.data
FROM table3 e
) c
ON ( a.id = c.id )
) d
ON ( d.id = a.id )
Which, for the sample data:
CREATE TABLE table1 ( id ) AS
SELECT 1 FROM DUAL;
CREATE TABLE table2 ( id ) AS
SELECT 1 FROM DUAL;
CREATE TABLE table3 ( id, data ) AS
SELECT 1, 'A' FROM DUAL;
Both output:
ID | DATA
-: | :---
1 | A
db<>fiddle here

T-SQL - Get a list of all As which have the same set of Bs

I'm struggling with a tricky SQL query that I'm trying to write. Have a look at the following table:
+---+---+
| A | B |
+---+---+
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 3 | 2 |
| 3 | 3 |
| 4 | 2 |
| 4 | 3 |
| 4 | 4 |
+---+---+
Now, from this table, I essentially want a list of all As which have the exact same set of Bs and give each set an incrementing ID.
Hence, the output set for the above would be:
+---+----+
| A | ID |
+---+----+
| 1 | 1 |
| 3 | 1 |
| 2 | 2 |
| 4 | 2 |
+---+----+
Thanks.
Edit: If it helps, I have a list of all distinct values of B that are possible in another table.
Edit: Thank you so much for all the innovative answers. Was able to learn a lot indeed.
Here is mathematical trick to solve your tricky select:
with pow as(select *, b * power(10, row_number()
over(partition by a order by b)) as rn from t)
select a, dense_rank() over( order by sum(rn)) as rn
from pow
group by a
order by rn, a
Fiddle http://sqlfiddle.com/#!3/6b98d/11
This of course will work only for limited distinct count as you will get overflow. Here is more general solution with strings:
select a,
dense_rank() over(order by (select '.' + cast(b as varchar(max))
from t t2 where t1.a = t2.a
order by b
for xml path(''))) rn
from t t1
group by a
order by rn, a
Fiddle http://sqlfiddle.com/#!3/6b98d/29
Something like this:
select a, dense_rank() over (order by g) as id_b
from (
select a,
(select b from MyTable s where s.a=a.a order by b FOR XML PATH('')) g
from MyTable a
group by a
) a
order by id_b,a
Or maybe using a CTE (I avoid them when possible)
Sql Fiddle
As a side note, this is the output of the inner query using the sample data in the question:
a g
1 <b>2</b><b>3</b>
2 <b>2</b><b>3</b><b>4</b>
3 <b>2</b><b>3</b>
4 <b>2</b><b>3</b><b>4</b>
Here's a long winded approach, by finding sets with the same elements (using EXCEPT bidirectionally to eliminate, and just done a half diagonal cartesian product), then pairing equal sets up, stamping each pair with a ROW_NUMBER(), before unpivoting the pairs of A's into to your final output where the equivalent sets are projected as rows which have the same id.
WITH joinedSets AS
(
SELECT t1.A as t1A, t2.A AS t2A
FROM MyTable t1
INNER JOIN MyTable t2
ON t1.B = t2.B
AND t1.A < t2.A
),
equalSets AS
(
SELECT js.t1A, js.t2A, ROW_NUMBER() OVER (ORDER BY js.t1A) AS Id
FROM joinedSets js
GROUP BY js.t1A, js.t2A
HAVING NOT EXISTS ((SELECT mt.B FROM MyTable mt WHERE mt.A = js.t1A)
EXCEPT (SELECT mt.B FROM MyTable mt WHERE mt.A = js.t2A))
AND NOT EXISTS ((SELECT mt.B FROM MyTable mt WHERE mt.A = js.t2A)
EXCEPT (SELECT mt.B FROM MyTable mt WHERE mt.A = js.t1A))
)
SELECT A, Id
FROM equalSets
UNPIVOT
(
A
FOR ACol in (t1A, t2A)
) unp;
SqlFiddle here
As it stands, this solution will only work with pairs of sets, not triples etc. A general NTuple type solution is probably possible (but beyond my brain right now).
Here is a very simple, fast, but approximate solution.
It is possible that CHECKSUM_AGG returns the same checksum for different sets of B.
DECLARE #T TABLE (A int, B int);
INSERT INTO #T VALUES
(1, 2),(1, 3),(2, 2),(2, 3),(2, 4),(3, 2),(3, 3),(4, 2),(4, 3),(4, 4);
SELECT
A
,CHECKSUM_AGG(B) AS CheckSumB
,ROW_NUMBER() OVER (PARTITION BY CHECKSUM_AGG(B) ORDER BY A) AS GroupNumber
FROM #T
GROUP BY A
ORDER BY A, GroupNumber;
Result set
A CheckSumB GroupNumber
-----------------------------
1 1 1
2 5 1
3 1 2
4 5 2
For exact solution group by A and concatenate all B values into a long (binary) string using either FOR XML, CLR, or T-SQL function. Then you can partition ROW_NUMBER by that concatenated string to assign numbers to the groups. As shown in other answers.
EDIT
I am changing the code, but it will get bigger now, took help from
Concatenate many rows into a single text string? for concatinating strings
Select [A],
Left(M.[C],Len(M.[C])-1) As [D] into #tempSomeTable
From
(
Select distinct T2.[A],
(
Select Cast(T1.[B] as VARCHAR) + ',' AS [text()]
From sometable T1
Where T1.[A] = T2.[A]
ORDER BY T1.[A]
For XML PATH ('')
) [C]
From sometable T2
)M
SELECT t.A, DENSE_RANK() OVER(ORDER BY t.[D]) [ID] FROM
#tempSomeTable t
inner join
(SELECT [D] FROM(
SELECT [D], COUNT([A]) [D_A] from
#tempSomeTable t
GROUP BY [D] )P where [C_A]>1)t1 on t1.[D]=t.[D]
Here is an exact, rather than approximate, solution. It uses nothing more advanced than INNER JOIN and GROUP BY (and, of course, the DENSE_RANK() to get the ID you want).
It is also general, in that it allows for B values to be repeated within an A group.
SELECT A,
DENSE_RANK() OVER (ORDER BY MIN_EQUIVALENT_A) AS ID
FROM (
SELECT MATCHES.A1 AS A,
MIN(MATCHES.A2) AS MIN_EQUIVALENT_A
FROM (
SELECT T1.A AS A1,
T2.A AS A2,
COUNT(*) AS NUM_B_VALS_MATCHED
FROM (
SELECT A,
B,
COUNT(*) AS B_VAL_FREQ
FROM MyTable
GROUP BY A,
B
) AS T1
INNER JOIN
(
SELECT A,
B,
COUNT(*) AS B_VAL_FREQ
FROM MyTable
GROUP BY A,
B
) AS T2
ON T1.B = T2.B
AND T1.B_VAL_FREQ = T2.B_VAL_FREQ
GROUP BY T1.A,
T2.A
) AS MATCHES
INNER JOIN
(
SELECT A,
COUNT(DISTINCT B) AS NUM_B_VALS_TOTAL
FROM MyTable
GROUP BY A
) AS CHECK_TOTALS_A1
ON MATCHES.A1 = CHECK_TOTALS_A1.A
AND MATCHES.NUM_B_VALS_MATCHED
= CHECK_TOTALS_A1.NUM_B_VALS_TOTAL
INNER JOIN
(
SELECT A,
COUNT(DISTINCT B) AS NUM_B_VALS_TOTAL
FROM MyTable
GROUP BY A
) AS CHECK_TOTALS_A2
ON MATCHES.A2 = CHECK_TOTALS_A2.A
AND MATCHES.NUM_B_VALS_MATCHED
= CHECK_TOTALS_A2.NUM_B_VALS_TOTAL
GROUP BY MATCHES.A1
) AS EQUIVALENCE_TABLE
ORDER BY 2,1
;

Get count of foreign key from multiple tables

I have 3 tables, with Table B & C referencing Table A via Foreign Key. I want to write a query in PostgreSQL to get all ids from A and also their total occurrences from B & C.
a | b | c
-----------------------------------
id | txt | id | a_id | id | a_id
---+---- | ---+----- | ---+------
1 | a | 1 | 1 | 1 | 3
2 | b | 2 | 1 | 2 | 4
3 | c | 3 | 3 | 3 | 4
4 | d | 4 | 4 | 4 | 4
Output desired (just the id from A & total count in B & C) :
id | Count
---+-------
1 | 2 -- twice in B
2 | 0 -- occurs nowhere
3 | 2 -- once in B & once in C
4 | 4 -- once in B & thrice in C
SQL so far SQL Fiddle :
SELECT a_id, COUNT(a_id)
FROM
( SELECT a_id FROM b
UNION ALL
SELECT a_id FROM c
) AS union_table
GROUP BY a_id
The query I wrote fetches from B & C and counts the occurrences. But if the key doesn't occur in B or C, it doesn't show up in the output (e.g. id=2 in output). How can I start my selection from table A & join/union B & C to get the desired output
If the query involves large parts of b and / or c it is more efficient to aggregate first and join later.
I expect these two variants to be considerably faster:
SELECT a.id,
, COALESCE(b.ct, 0) + COALESCE(c.ct, 0) AS bc_ct
FROM a
LEFT JOIN (SELECT a_id, count(*) AS ct FROM b GROUP BY 1) b USING (a_id)
LEFT JOIN (SELECT a_id, count(*) AS ct FROM c GROUP BY 1) c USING (a_id);
You need to account for the possibility that some a_id are not present at all in a and / or b. count() never returns NULL, but that's cold comfort in the face of LEFT JOIN, which leaves you with NULL values for missing rows nonetheless. You must prepare for NULL. Use COALESCE().
Or UNION ALL a_id from both tables, aggregate, then JOIN:
SELECT a.id
, COALESCE(ct.bc_ct, 0) AS bc_ct
FROM a
LEFT JOIN (
SELECT a_id, count(*) AS bc_ct
FROM (
SELECT a_id FROM b
UNION ALL
SELECT a_id FROM c
) bc
GROUP BY 1
) ct USING (a_id);
Probably slower. But still faster than solutions presented so far. And you could do without COALESCE() and still not loose any rows. You might get occasional NULL values for bc_ct, in this case.
Another option:
SELECT
a.id,
(SELECT COUNT(*) FROM b WHERE b.a_id = a.id) +
(SELECT COUNT(*) FROM c WHERE c.a_id = a.id)
FROM
a
Use left join with a subquery:
SELECT a.id, COUNT(x.id)
FROM a
LEFT JOIN (
SELECT id, a_id FROM b
UNION ALL
SELECT id, a_id FROM c
) x ON (a.id = x.a_id)
GROUP BY a.id;

4:**Count/sum rows in multiple related tables

I have a complex select that - when simplified - looks like this:
select m.ID,
(select sum(AMOUNT) from A where M_ID = m.ID) sumA,
(select sum(AMOUNT) from B where M_ID = m.ID) sumB,
.....
from M;
The tables A,B,... have a foreign key M_ID pointing into table M.
The problem is that this select is very slow. I'd like to rewrite it using table joins, but I don't know how, because
select m.ID
sum(a.AMOUNT),
sum(b.AMOUNT),
.....
from M
join A on a.M_ID = m.ID
join B on b.M_ID = m.ID
....
group by m.ID;
gives incorrect (much higher) sum results, as each row in A or B can be counted multiple times.
Is there a way how to write that select optimally using e.g. analytical functions or some other ways?
Edit:
The explain plan for the original (not simplified) select looks like this:
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
|* 2 | FILTER | |
|* 3 | TABLE ACCESS BY INDEX ROWID| WORKITEM |
|* 4 | INDEX SKIP SCAN | WORKITEM_U01 |
|* 5 | FILTER | |
|* 6 | TABLE ACCESS FULL | RPRODUCT_INVENTORY_MASTER |
.....
| 31 | SORT AGGREGATE | |
|* 32 | FILTER | |
|* 33 | TABLE ACCESS BY INDEX ROWID| WORKITEM |
|* 34 | INDEX SKIP SCAN | WORKITEM_U01 |
|* 35 | FILTER | |
|* 36 | TABLE ACCESS FULL | RPRODUCT_INVENTORY_MASTER |
| 37 | SORT GROUP BY | |
| 38 | TABLE ACCESS FULL | RPRODUCT |
That's why I want to optimize it. Moreover, the AWR report shows that this select has 50000 gets/exec.
Edit2,3:
The whole select looks like this:
SELECT rprd.ID,
rprd.NAME,
(select sum(AMOUNT) from WORKITEM
where ACTION='REMOVE'
and trunc(CREATED_DATE) = to_date(:1,'DDMMYYYY')
and PAYEE_ID in
(select rim.RPRODUCT_ID from RPRODUCT_INVENTORY_MASTER rim
where rprd.ID = rim.RPRODUCT_ID
and rim.INVENTORY_DATE = to_date(:2,'DDMMYYYY')),
.....
(select sum(AMOUNT) from WORKITEM
where ACTION='COLLECT'
and trunc(CREATED_DATE) < to_date(:11,'DDMMYYYY')
and PAYEE_ID in
(select rim.RPRODUCT_ID from RPRODUCT_INVENTORY_MASTER rim
where rprd.ID = rim.RPRODUCT_ID
and rim.INVENTORY_DATE < to_date(:12,'DDMMYYYY'))
FROM RPRODUCT rprd
GROUP BY rprd.ID, rprd.NAME
ORDER BY rprd.ID
;
I didn't write it :-), I'm about to re-write it. Note, there are differences in comparison operators, in ACTION values, in dates to compare INVENTORY_DATE to.
Edit4:
I tried to rewrite the query like this (and the exec plan looks better), but have run into the "row multiplicity" issues described above:
with RPRODUCT_INVENTORY_MASTER# as (
select RPRODUCT_ID, min(INVENTORY_DATE) INVENTORY_DATE
from RPRODUCT_INVENTORY_MASTER
group by RPRODUCT_ID),
WORKITEM# as (
select AMOUNT, PAYEE_ID, ACTION, trunc(CREATED_DATE) CREATED_DATE
from WORKITEM
where ACTION in ('REMOVE','ADD','COLLECT')
)
select rprd.ID,
rprd.NAME,
-- sum(wip2.AMOUNT), -- this is singular because of '=' in inventory_date comparison
sum(abs(wip4.AMOUNT)),
.....
sum(wip12.AMOUNT)
from RPRODUCT rprd
left join RPRODUCT_INVENTORY_MASTER# rim4 on rim4.RPRODUCT_ID = rprd.ID
and rim4.INVENTORY_DATE <= to_date(:4 ,'DDMMYYYY')
left join WORKITEM# wip4 on wip4.PAYEE_ID = rim4.RPRODUCT_ID
and wip4.ACTION='REMOVE'
and wip4.CREATED_DATE = to_date(:3 ,'DDMMYYYY')
.....
left join RPRODUCT_INVENTORY_MASTER# rim12 on rim12.RPRODUCT_ID = rprd.ID
and rim12.INVENTORY_DATE < to_date(:12 ,'DDMMYYYY')
left join WORKITEM# wip12 on wip12.PAYEE_ID = rim12.RPRODUCT_ID
and wip12.ACTION='COLLECT'
and wip12.CREATED_DATE < to_date(:11 ,'DDMMYYYY')
group by rprd.ID, rprd.NAME
order by rprd.ID
;
RPRODUCT_INVENTORY_MASTER# always gives at most one row for each rprd.ID. WORKITEM# can have any number of rows for each RPRODUCT_ID = rprd.ID.
Yes, this is a typical problem. I like your original query for its clarity. However, if running in performence issues, one has to think of other options.
Here is one option. As A and B get multiplied you could simply divide the sum by the related count. Well, admittedly this looks kind of strange though.
select m.ID
sum(a.AMOUNT) / count(distinct b.id),
sum(b.AMOUNT) / count(distinct a.id),
.....
from M
join A on a.M_ID = m.ID
join B on b.M_ID = m.ID
....
group by m.ID;
The other option, which I would prefer is to build groups, so as not to have multiple A and B per m.id in the first place:
select m.ID
a_agg.SUM_AMOUNT,
b_agg.SUM_AMOUNT,
.....
from M
join (select M_ID, sum(AMOUNT) as SUM_AMOUNT from A group by M_ID) a_agg
on a_agg.M_ID = m.ID
join (select M_ID, sum(AMOUNT) as SUM_AMOUNT from B group by M_ID) b_agg
on b_agg.M_ID = m.ID
EDIT: In case an M_ID might not have any A or any B, you would have to replace the joins with LEFT JOIN in both queries. Then in the first query select:
nvl(sum(a.AMOUNT), 0) / greatest(count(distinct b.id), 1),
nvl(sum(b.AMOUNT), 0) / greatest(count(distinct a.id), 1),
And in the second query:
nvl(a_agg.SUM_AMOUNT, 0),
nvl(b_agg.SUM_AMOUNT, 0),
EDIT: Here is your query modified. The trick is to join with distinct rims.
SELECT
rprd.ID,
rprd.NAME,
nvl(same_date.SUM_AMOUNT, 0),
.....
nvl(earlier_date.SUM_AMOUNT, 0)
FROM RPRODUCT rprd
LEFT JOIN
(
select rim.RPRODUCT_ID, sum(w.AMOUNT) as SUM_AMOUNT
from
(
select distinct RPRODUCT_ID
from RPRODUCT_INVENTORY_MASTER
where INVENTORY_DATE = to_date(:2,'DDMMYYYY')
) rim
left join WORKITEM w
on w.PAYEE_ID = rim.RPRODUCT_ID
and w.ACTION = 'REMOVE'
and trunc(w.CREATED_DATE) = to_date(:1,'DDMMYYYY')
) same_date on same_date.RPRODUCT_ID = rprd.ID
LEFT JOIN
(
select rim.RPRODUCT_ID, sum(w.AMOUNT) as SUM_AMOUNT
from
(
select distinct RPRODUCT_ID
from RPRODUCT_INVENTORY_MASTER
where INVENTORY_DATE < to_date(:12,'DDMMYYYY')
) rim
left join WORKITEM w
on w.PAYEE_ID = rim.RPRODUCT_ID
and w.ACTION = 'REMOVE'
and trunc(w.CREATED_DATE) < to_date(:11,'DDMMYYYY')
) earlier_date on earlier_date.RPRODUCT_ID = rprd.ID
GROUP BY rprd.ID, rprd.NAME
ORDER BY rprd.ID
;
This should work
select m.ID,
a.aamount,
b.bamount
from M
inner join
(
select M_ID,sum(AMOUNT) as aamount
from A group by M_ID
) a
on a.M_ID = m.ID
inner join
(
select M_ID,sum(AMOUNT) as bamount
from B group by M_ID
) b
on b.M_ID = m.ID;
This should work regardlessly of number of m_id rows in A, B, C, ... tables:
select
M.id,
sum(decode(u.src, 'A', u.sumx, 0)) sum_a,
sum(decode(u.src, 'B', u.sumx, 0)) sum_b,
sum(decode(u.src, 'C', u.sumx, 0)) sum_c,
...
from M,
(select 'A' src, m_id, sum(amount) sumx from A group by m_id
union all
select 'B', m_id, sum(amount) from B group by m_id
union all
select 'C', m_id, sum(amount) from C group by m_id
...
) u
where
M.id=u.m_id
group by
M.id;

Select first record in a One-to-Many relation using left join

I'm trying to join two tables using a left-join. And the result set has to include only the first record from the "right" joined table.
Lets say I have two tables A and B as below;
Table "A"
code | emp_no
101 | 12222
102 | 23333
103 | 34444
104 | 45555
105 | 56666
Table "B"
code | city | county
101 | Glen Oaks | Queens
101 | Astoria | Queens
101 | Flushing | Queens
102 | Ridgewood | Brooklyn
103 | Bayside | New York
Expected Output:
code | emp_no | city | county
101 | 12222 | Glen Oaks | Queens
102 | 23333 | Ridgewood | Brooklyn
103 | 34444 | Bayside | New York
104 | 45555 | NULL | NULL
105 | 56666 | NULL | NULL
If you notice my result has only the one matched record from table "B"(doesn't matter what record is matched) after left join (and it is a one to many mapping)
I need to pick the first matched record from table B and ignore all other rows.
Please help!
Thanks
After playing around a bit, this turns out to be trickier than I'd expected! Assuming that table_b has some single column that is unique (say, a single-field primary key), it looks like you can do this:
SELECT table_a.code,
table_a.emp_no,
table_b.city,
table_b.county
FROM table_a
LEFT
JOIN table_b
ON table_b.code = table_a.code
AND table_b.field_that_is_unique =
( SELECT TOP 1
field_that_is_unique
FROM table_b
WHERE table_b.code = table_a.code
)
;
Another option: OUTER APPLY
If supported by the database, OUTER APPLY is an efficient and terse option.
SELECT *
FROM
Table_A a
OUTER APPLY
(SELECT TOP 1 *
FROM Table_B b_1
WHERE b_1.code = a.code
) b
;
This results in a left join to the indeterminate first matched record. My tests show it to be quicker than any other posted solution (on MS SQL Server 2012).
The highest voted answer does not seem correct to me, and seems overcomplicated.
Just group by the code field on table B in your subquery and select the maximum Id per grouping.
SELECT
table_a.code,
table_a.emp_no,
table_b.city,
table_b.county
FROM
table_a
LEFT JOIN
table_b
ON table_b.code = table_a.code
AND table_b.field_that_is_unique IN
(SELECT MAX(field_that_is_unique)
FROM table_b
GROUP BY table_b.code)
If you are on SQL Server 2005 or later version, you could use ranking to achieve what you want. In particular, ROW_NUMBER() seems to suit your needs nicely:
WITH B_ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (PARTITION BY code ORDER BY city)
FROM B
)
SELECT
A.code,
A.emp_no,
B.city,
B.county
FROM A
LEFT JOIN B_ranked AS B ON A.code = B.code AND b.rnk = 1
OR
WITH B_unique_code AS (
select * from(
SELECT
*,
rnk = ROW_NUMBER() OVER (PARTITION BY code ORDER BY city)
FROM B
) AS s
where rnk = 1
)
SELECT
A.code,
A.emp_no,
B.city,
B.county
FROM A
LEFT JOIN B_unique_code AS B ON A.code = B.code
I modified the answer from ruakh and this seem to work perfectly with mysql.
SELECT
table_a.code,
table_a.emp_no,
table_b.city,
table_b.county
FROM table_a a
LEFT JOIN table_b b
ON b.code = a.code
AND b.id = ( SELECT id FROM table_b
WHERE table_b.code = table_a.code
LIMIT 1
)
;
this is how:
Select * From TableA a
Left Join TableB b
On b.Code = a.Code
And [Here put criteria predicate that 'defines' what the first record is]
Hey, if the city and county are unique, then use them
Select * From TableA a
Left Join TableB b
On b.Code = a.Code
And b.City + b.county =
(Select Min(city + county)
From TableB
Where Code = b.Code)
But the point is you have to put some expression in there to tell the query processor what it means to be first.
In Oracle you can do:
WITH first_b AS (SELECT code, min(rowid) AS rid FROM b GROUP BY code))
SELECT a.code, a.emp_no, b.city, b.county
FROM a
INNER JOIN first_b
ON first_b.code = a.code
INNER JOIN b
ON b.rowid = first_b.rid