SQL - How to pick the best available value for each column for each ID from multiple tables?

SQL - How to pick the best available value for each column for each ID from multiple tables? - sql

I have two tables with the same variables referring to attributes of a person.
How can I combine data from two such tables picking the best available value for each column from each table for each field?
Requirements:
For each field, I would like to fill it with a value from either one of the tables, giving a preference to table 1.
Values can be NULL in either table
In the combined table, the value for column 1 could come from table 2 (in case table 1 is missing a value for that person) and the value for column 2 could from table 1 (because both tables had a value, but the value from table 1 is preferred).
In my real example, I have many columns, so an elegant solution with less code duplication would be preferred.
Some users may exist in only one of the tables.
Example:
Table 1:
user_id | age | income
1 | NULL| 58000
2 | 22 | 60000
4 | 19 | 35000
Table 2:
user_id | age | income
1 | 55 | 55000
2 | 19 | NULL
3 | 22 | 33200
Desired output:
user_id | age | income
1 | 55 | 58000
2 | 22 | 60000
3 | 22 | 33200
4 | 19 | 35000

I think that's a full join and priorization logic with colaesce():
select user_id,
coalesce(t1.age, t2.age) as age,
coalesce(t1.income, t2.income) as income
from table1 t1
full join table2 t2 using(user_id)

Use full outer join if user_id in each table is unique.
SELECT
COALESCE(t1.user_id, t2.user_id) AS user_id,
GREATEST(t1.age, t2.age) AS age,
GREATEST(t1.income, t2.income) AS income
FROM t1
FULL OUTER JOIN t2 ON t1.user_id = t2.user_id

try like below using coalesce()
select t1.user_id, coalesce(t1.age,t2.age),
t1.income>t2.income then t1.income else t2.income end as income
table1 t1 join table2 t2 on t1.usesr_id=t2.user_id

You can use below code:
With TableA(Id,age,income) as
( --Select Common Data
select table_1.id,
--Select MAX AGE
case
when table_1.age> table_2.age or table_2.age is null then table_1.age else table_2.age
end,
--Select MAX Income
case
when table_1.income>table_2.income or table_2.income is null then table_1.income else table_2.income
end
from table_1 inner join table_2 on table_2.id=table_1.id
union all
-- Select Specific Data of Table 2
select table_2.id,table_2.age,table_2.income
from table_2
where table_2.id not in (select table_1.id from table_1)
union all
-- Select Specific Data of Table 1
select table_1.id,table_1.age,table_1.income
from table_1
where table_1.id not in (select table_2.id from table_2)
)select * from TableA

Related

How to create a calculated column in access that is based in a conditional sum of another table

It is possible in MS ACCESS 2016 to create a column in a table that is a conditional SUM of another table?
Example
Table 1 - Columns
ID, NAME, TOTAL
Table 2 - Columns
ID, NAME, IDREF, CUSTO
Data:
Table 1
ID | Name | Total
---+-------+----------------------------------------------------------------
35 | Test | "SUM(CUSTO) of ALL ELEMENTS OF TABLE 2 WHERE table2.IDREF = table1.ID"
Table 2
ID | Name | IDREF | CUSTO
---+-------+-------+--------
1 | Test | 35 | 50
2 | Test | 35 | 30
3 | abcd | 12 | 30
4 | Test | 35 | 10
The result should be:
table 1
ID | Name | Total
---+------+------
35 | Test | 90 (50 + 30 + 10 from table 2 where idref = 35)

You can use a subquery:
select t1.*,
(select sum(t2.CUSTO)
from table2 as t2
where t2.idref = t1.id
) as total
from table1 as t1;

More efficiently, consider an aggregate subquery in JOIN to run only once and not SELECT subquery that runs for every row.
SELECT t1.*, agg.Total
FROM table1 as t1
INNER JOIN
( SELECT t2.idref, SUM(t2.CUSTO) AS Total
FROM table2 as t2
GROUP BY t2.idref
) AS agg
ON agg.idref = t1.id
Alternatively, replace subquery with exact saved query for even more efficiency per Allen Browne's optimizing query tips in MS Access:
A subquery will be considerably faster than a domain aggregate function. In most cases, a stacked query will be faster yet (i.e. another saved query that you include as a "table" in this query.)
SELECT t1.*, q.Total
FROM table1 as t1
INNER JOIN mySavedAggQuery q
ON q.idref = t1.id

How to join two tables and show source?

How to join values from two tables ...
Table_1:
ID | Value
----------
10 | Dog
27 | Cat
Table_2:
ID | Value
----------
27 | Cat
My SQL... (Microsoft Access 2016)
SELECT ID, VALUE , "YES" AS Table_1, NULL AS Table_2
FROM Table_1
UNION
SELECT ID, VALUE, NULL AS Table_1, "YES" AS Table_2
FROM Table_2
...returns this result:
ID | Value | Table_1 | Table_2
------------------------------
10 | Dog | YES |
27 | Cat | YES |
27 | Cat | | YES
But I would like to get a result like this:
ID | Value | Table_1 | Table_2
------------------------------
10 | Dog | YES |
27 | Cat | YES | YES

You can use aggregation and union all:
select id, value, max(table_1) as table_1, max(table_2) as table_2
from (select ID, VALUE , "YES" AS Table_1, NULL AS Table_2
from Table_1
union
select ID, VALUE, NULL AS Table_1, "YES" AS Table_2
from Table_2
) t
group by id, value;
The alternative in SQL is a FULL JOIN, but MS Access does not support full joins.

You can make a list of table1 + table2 values and then check if the values exists in table2 or 2 using simple jeft join.
select
base.*,
iif(isnull(t.value), null, 'YES') table1,
iif(isnull(t2.value), null, 'YES') table2
from
(
Select value from Table_1 -- if you have duplicate values add groupby here
union
select value from Table_2
) base -- Make a collection of values from table1 and 2
left join Table_1 T on base.value = T.value
left join Table_2 T2 on base.value = T2.value;
Not sure if you can open above query in design view but the query should work in access. to be design view friendly, you need to make a query object for the base and then you can simply do left joins and iifs

SQL Join On Columns of Different Length

I'm trying to join two tables together in SQL where the columns contain a different number of unique entries.
When I use a full join the additional entries in the column joined on are missing.
The code I'm using is (in a SAS proc SQL):
proc sql;
create table table3 as
select table1.*, table2.*
from table1 full join table2
on table1.id = table2.id;
quit;
Visual example of problem (can't show actual tables as contain sensitive data)
Table 1
id | count1
1 | 2
2 | 3
3 | 2
Table 2
id | count2
1 | 4
2 | 5
3 | 6
4 | 2
Table 3
id | counta | countb
1 | 2 | 4
2 | 3 | 5
3 | 2 | 6
- | - | 2 <----- I want don't want the id column to be blank in this row
I hope I've explained my problem clearly enough, thanks in advance for your help.

The id from table 1 is blank because the row from table2 has no match in table 1. Try looking at the output from this query:
select coalesce(table1.id, table2.id) as id, table1.count1, table2.count2
from table1 full join table2
on table1.id = table2.id;
Coalesce works from left to right returning the first non null value (it can take more than 2 arguments). If the id in table 1 is null it uses the id from table 2 instead
I recommend also to alias all tables in queries, so I’d have written this:
SELECT
COALESCE(t1.id, t2.id) as id,
t1.count1,
t2.count2
FROM
table1 t1
FULL OUTER JOIN
table2 t2
ON
t1.id = t2.id;

Simply select coalesce(t1.id, t2.id), will return the first non-null id value.

Access VBA: Select only multiple values

Say, I have a table that looks like this:
ID | PNo | MM | CP |
---|-----|------|----|
1 | 13 | True | 4 |
2 | 92 | True | 3 |
3 | 1 | True | 3 |
4 | 13 | False| 2 |
5 | 13 | True | 3 |
6 | 1 | True | 3 |
I want to go through all PNos and compare all rows with that PNo and only select those that have different values in field MM.
My plan was to create a table with the distinct values of PNo, iterate through that table by using the usual record set and write an SQL query for each PNo.
Now my problem is the construction of the SQL query.
I can select all rows with Table.PNo = rs("PNo") but I have no idea how to formulate the query to catch the rows with varying values.

You can use a subquery:
Select *
From YourTable
Where PNo IN
(Select T.PNo
From YourTable
Group By PNo, MM
Having Count(*) = 2)

I think this should work.
This will create a cartesian product on y our PNo field. i.e. Every record connected to every record (but just on that PNo).
SELECT *
FROM Table1 T1 INNER JOIN Table1 T2 ON T1.PNo = T2.PNo
You'll end up with 9 instances of PNo 13, 4 of 1 and 1 of 92. Now we just want to return the ones where MM is different, so add that to the WHERE clause.
SELECT *
FROM Table1 T1 INNER JOIN Table1 T2 ON T1.PNo = T2.PNo
WHERE T1.MM <> T2.MM
ORDER BY T1.ID
This will return four records. PNo 1 and 92 will have vanished as the MM result was the same for those. ID number 4 will be returned twice as the MM value is different from that in ID 1 and ID 5.
To remove the duplicate value you could then use DISTINCT:
SELECT DISTINCT T1.ID, T1.PNo, T1.MM, T1.CP
FROM Table1 T1 INNER JOIN Table1 T2 ON T1.PNo = T2.PNo
WHERE T1.MM <> T2.MM
ORDER BY T1.ID
Note: One of the differences between the answer given by #Jonathan and mine is that his query is updateable and mine isn't.

The following should do what you want:
SELECT * FROM MyTable WHERE PNo in
(SELECT t.PNo FROM MyTable t
INNER join MyTable f
ON t.PNo = f.PNo
WHERE t.MM = true and f.MM = false)
The inner join ensures that only those PNos that have both MM false and MM true are included.

Oracle Efficiently joining tables with subquery in FROM

Table 1:
| account_no | **other columns**...
+------------+-----------------------
| 1 |
| 2 |
| 3 |
| 4 |
Table 2:
| account_no | TX_No | Balance | History |
+------------+-------+---------+------------+
| 1 | 123 | 123 | 12.01.2011 |
| 1 | 234 | 2312 | 01.03.2011 |
| 3 | 232 | 212 | 19.02.2011 |
| 4 | 117 | 234 | 24.01.2011 |
I have multiple join query, one of the tables(Table 2) inside a query is problematic as it is a view which computes many other things, that is why each query to that table is costly. From Table 2, for each account_no in Table 1 I need the whole row with the greatest TX_NO, this is how I do it:
SELECT * FROM TABLE1 A LEFT JOIN
( SELECT
X.ACCOUNT_NO,
HISTORY,
X.BALANCE
FROM TABLE2 X INNER JOIN
(SELECT
ACCOUNT_NO,
MAX(TX_NO) AS TX_NO
FROM TABLE2
GROUP BY ACCOUNT_NO) Y ON X.ACCOUNT_NO = Y.ACCOUNT_NO) B
ON B.ACCOUNT_NO = A.ACCOUNT_NO
As I understand at first it will make the inner join for all the rows in Table2 and after that left join needed account_no's with Table1 which is what I would like to avoid.
My question: Is there a way to find the max(TX_NO) for only those accounts that are in Table1 instead of going through all? I think it will help to increase the speed of the query.

I think you are on the right track, but I don't think that you need to, and would not myself, nest the subqueries the way you have done. Instead, if you want to get each record from table 1 and the matching max record from table 2, you can try the following:
SELECT * FROM TABLE1 t1
LEFT JOIN
(
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY account_no ORDER BY TX_No DESC) rn
FROM TABLE2 t
) t2
ON t1.account_no = t2.account_no AND
t2.rn = 1
If you want to continue with your original approach, this is how I would do it:
SELECT *
FROM TABLE1 t1
LEFT JOIN TABLE2 t2
ON t1.account_no = t2.account_no
INNER JOIN
(
SELECT account_no, MAX(TX_No) AS max_tx_no
FROM TABLE2
GROUP BY account_no
) t3
ON t2.account_no = t3.account_no AND
t2.TX_No = t3.max_tx_no
Instead of using a window function to find the greatest record per account in TABLE2, we use a second join to a subquery instead. I would expect the window function approach to perform better than this double join approach, and once you get used to it can even easier to read.

If table1 is comparatiely less expensive then you could think of doing a left outer join first which would considerable decrease the resultset and from that pick the latest transaction id records alone
select <required columns> from
(
select f.<required_columns),row_number() over (partition by account_no order by tx_id desc ) as rn
from
(
a.*,b.tx_id,b.balance,b.History
from table1 a left outer join table2 b
on a.account_no=b.account_no
)f
)g where g.rn=1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - How to pick the best available value for each column for each ID from multiple tables? - sql

I think that's a full join and priorization logic with colaesce(): select user_id, coalesce(t1.age, t2.age) as age, coalesce(t1.income, t2.income) as income from table1 t1 full join table2 t2 using(user_id)

Use full outer join if user_id in each table is unique. SELECT COALESCE(t1.user_id, t2.user_id) AS user_id, GREATEST(t1.age, t2.age) AS age, GREATEST(t1.income, t2.income) AS income FROM t1 FULL OUTER JOIN t2 ON t1.user_id = t2.user_id

try like below using coalesce() select t1.user_id, coalesce(t1.age,t2.age), t1.income>t2.income then t1.income else t2.income end as income table1 t1 join table2 t2 on t1.usesr_id=t2.user_id

Related

How to create a calculated column in access that is based in a conditional sum of another table

How to join two tables and show source?

SQL Join On Columns of Different Length

Access VBA: Select only multiple values

Oracle Efficiently joining tables with subquery in FROM

Categories

Resources