Merge two tables with summing up values

Merge two tables with summing up values - sql

I have two tables,
Table1:
Key, Value
----------
1 10
2 20
3 30
Table2:
Key, Value
----------
3 30
4 40
5 50
How can merge them and get the following using SQL?
Desired output:
Key, Value
----------
1 10
2 20
3 60
4 40
5 50
In Python terminology, I guess it is called summing up two dictionaries.
PS: I am using Oracle 12c.

Use Union all to combine two select queries and sum the value with group by key column
select key,sum(value) value
from
(
select Key, Value from table1
union all
select Key, Value from table2
) a
group by key
or use Full outer Join
SELECT COALESCE(a.Key, b.key),
COALESCE(a.Value, 0) + COALESCE(b.value, 0)
FROM table1 a
FULL OUTER JOIN Table2 b
ON a.Key = b.Key

For more efficient in terms of speed, as much as possible avoid using UNION/UNION ALL. Try my answer below:
SELECT
NVL(A.Key,B.Key)[Key],
(NVL(A.Value,0) + NVL(B.Value,0))[Value]
FROM Table1 A
FULL JOIN Table2 B ON A.Key=B.Key

Related

how to execute query for each row result of another query

I have 2 tables , one stores IDs and another logs for each ID , i would like to get sum of log for each ID and ID number from these 2 tables
A B
------- -------------
ID ID_C LOG
1 1 15
2 1 30
3 4 44
4 2 14
5 3 88
3 10
2 10
for getting sum query is
SELECT SUM(LOG) FROM B WHERE ID_C ='2' ;
notice ID and ID_C are same but name is different in tables
and for getting all ids available query is
SELECT ID FROM A ;
I would like to get the following table result
result
--------------------
ID SUM
1 45
4 44
2 24
3 98
I tried
SELECT SUM(LOG) FROM B WHERE ID_C in (SELECT ID FROM A ) ;
but it result in sum of all IDs

It looks like you just need a join aggregation here:
SELECT a.ID, SUM(b.LOG) AS SUM
FROM A a
INNER JOIN B b
ON b.ID_C = a.ID
GROUP BY a.ID
ORDER BY a.ID;
Note that the inner join will also remove ID values from the A table which no entries whatsoever in the B table, which seems to be the behavior you want.

you should use inner join and GROUP BY:
SELECT A.ID as ID, SUM(LOG) AS SumLOG
FROM A inner join B ON A.ID = B.ID_C
GROUP BY A.ID
if you needed can use where for ID filter.

Selecting max value on subset of data based on other column's value

I'm looking to left join a value from a subset of data from another table, based on a specific value from the first table. Here are example tables:
table1
-----------------
key date
1 2020-01-02
2 2020-03-02
table2
-----------------
key date value
1 2019-12-13 a
1 2019-12-29 b
1 2020-01-14 c
1 2020-02-02 d
2 2019-11-01 e
2 2019-12-02 f
2 2020-04-29 g
Based on the value of date for a specific key value from table1, I want to select the most recent (MAX(date)) from table2, where temp contains all rows for that key value where date is on or before the date from table1.
So, the resulting table would look like this:
key date value
1 2020-01-02 b
2 2020-03-02 f
I'm thinking I could use some type of logic that would create temp tables for each key value where temp.date <= table1.date, then select MAX(temp.date) from the temp table and left join the value. For example, the temp table for key = 1 would be:
date value
1 2019-12-13 a
1 2019-12-29 b
Then it would left join the value b for key = 1, since MAX(date) = 2019-12-29. I'm not sure if this is the right logic to go about my problem; any help would be greatly appreciated!

You can use a correlated subquery:
select t1.*,
(select t2.value
from table2 t2
where t2.key = t1.key and t2.date <= t1.date
order by t2.date desc
fetch first 1 row only
) as value
from table1 t1;
Note that not all databases support the standard fetch first clause. You may need to use limit or select top (1) or something else depending on your database.

Is it possible to left join two tables and have the right table supply each row no more than once?

Given this table structure:
Table A
ID AGE EDUCATION
1 23 3
2 25 6
3 22 5
Table B
ID AGE EDUCATION
1 26 4
2 24 6
3 21 3
I want to find all matches between the two tables where the age is within 2 and the education is within 2. However, I do not want to select any row from TableB more than once. Each row in B should be selected 0 or 1 times and each row in A should be selected one or more times (standard left join).
SELECT *
FROM TableA as A LEFT JOIN TableB as B ON
abs(A.age - B.age) <= 2 AND
abs(A.education - B.education) <= 2
A.ID A.AGE A.EDUCATION B.ID B.AGE B.EDUCATION
1 23 3 3 21 3
2 25 6 1 26 4
2 25 6 2 24 6
3 22 5 2 24 6
3 22 5 3 21 3
As you can see, the last two rows in the output have duplicated B.ID of 2 and 3 when compared to the entire result set. I'd like those rows to return as a single null match with A.ID = 3 since they were both matched to previous A values.
Desired output:
(note that for A.ID = 3, there is no match in B because all rows in B have already been joined to rows in A.)
A.ID A.AGE A.EDUCATION B.ID B.AGE B.EDUCATION
1 23 3 3 21 3
2 25 6 1 26 4
2 25 6 2 24 6
3 22 5 null null null
I can do this with a short program, but I'd like to solve the problem using a SQL query because it is not for me and I will not have the luxury of ever seeing the data or manipulating the environment.
Any ideas? Thanks

As #Joel Coehoorn said earlier, there has to be a mechanism that selects which pairs of (a,b) with the same (b) are filtered out from the output. SQL is not great on allowing you to select ONE row when multiple match, so a pivot query needs to be created, where you filter out the records you don't want. In this case, filtering can be done by reducing all of match IDs of B as a smallest (or largest, it doesn't really matter), using any function that will return one value from a set, it's just min() and max() are most convenient to use. Once you reduced the result to know which (a,b) pairs you care about, then you join against that result, to pull out the rest of the table data.
select a.id a_id, a.age a_age, a.education a_e,
b.id b_id, b.age b_age, b.education b_e
from a left join
(
SELECT
a.id a_id, min(b.id) b_id from a,b where
abs(A.age - B.age) <= 2 AND
abs(A.education - B.education) <= 2
group by a.id
) g on a.id = g.a_id
left join b on b.id = g.b_id;

To my knowledge something like this is not possible with a simple select statement and joins because you need to know what has already been selected in order to eliminate duplicates.
You can however try something a little more like this:
DECLARE #JoinResults TABLE
(A_ID INT, A_Age INT, A_Education INT, B_ID INT, B_Age INT, B_Education INT)
INSERT INTO #JoinResults (A_ID, A_Age, A_Education)
SELECT ID, AGE, EDUCATION
FROM TableA
DECLARE #i INT
SET #i = 1
--Assume that A_ID is incremental and no values missed
WHILE (#i < (SELECT Max(A_ID) FROM #JoinResults
BEGIN
UPDATE #JoinResult
SET B_ID = SQ.ID,
B_Age = SQ.AGE,
B_Education = SQ.Education
FROM (
SELECT ID, AGE, EDUCATION
FROM TableB b
WHERE (
abs((SELECT A_Age FROM #JoinResult WHERE A_Id = #i) - AGE) <=2
AND abs((SELECT A_Education FROM #JoinResult WHERE A_Id = #i) - EDUCATION) <=2
) AND (SELECT B_ID FROM #JoinResults WHERE B_ID = b.id) IS NULL
) AS SQ
SET #i = #i + 1
END
SELECT #JoinResults
NOTE: I do not currently have access to a database so this is untested and I am weary of 2 potential issues with it
I am not sure how the update will react if there are no results
I am unsure if the IS NULL check is correct to eliminate the duplicates.
If these issues do arise let me know and I can help troubleshoot.

In SQL-Server, you can use the CROSS APPLY syntax:
SELECT
a.id, a.age, a.education,
b.id AS b_id, b.age AS b_age, b.education AS b_education
FROM tableB AS b
CROSS APPLY
( SELECT TOP (1) a.*
FROM tableA AS a
WHERE ABS(a.age - b.age) <= 2
AND ABS(a.education - b.education) <= 2
ORDER BY a.id -- your choice here
) AS a ;
Depending on the order you choose in the subquery, different rows from tableA will be selected.
Edit (after your update): But the above query will not show rows from A that have no matching rows in B or even some that have but not been selected.
It could also be done with window functions but Access does not have them. Here is a query that I think will work in Access:
SELECT
a.id, a.age, a.education,
s.id AS s_id, s.age AS b_age, s.education AS b_education
FROM tableB AS a
LEFT JOIN
( SELECT
b.id, b.age, b.education, MIN(a.id) AS a_id
FROM tableB AS b
JOIN tableA AS a
ON ABS(a.age - b.age) <= 2
AND ABS(a.education - b.education) <= 2
GROUP BY b.id, b.age, b.education
) AS s
ON a.id = s.a_id ;
I'm not sure if Access allows such a subquery but if it doesn't, you can define it as a "Query" and then use it in another.

Use SELECT DISTINCT
SELECT DISTINCT A.id, A.age, A.education, B.age, B.education
FROM TableA as A LEFT JOIN TableB as B ON
abs(A.age - B.age) <= 2 AND
abs(A.education - B.education) <= 2

Get ID pairs between 2 tables with matching child records

I have 2 tables with the same structure.
FIELD 1 INT
FIELD 2 VARCHAR(32) -- is a MD5 Hash
The query has to get matching FIELD 1 pairs from for records that have the exact combination of values for FIELD 2 in both TABLE 1 and TABLE 2.
These tables are pretty large ( 1 million records between the two ) but are deduced down to an ID and a Hash.
Example data:
TABLE 1
1 A
1 B
2 A
2 D
2 E
3 G
3 H
4 E
4 D
4 C
5 E
5 D
TABLE 2
8 A
8 B
9 E
9 D
9 C
10 F
11 G
11 H
12 B
12 D
13 A
13 B
14 E
14 A
The results of the query should be
8 1
9 4
11 3
13 1
I have tried creating a concatenated string of FIELD 2 using a correlated sub-query and FOR XML PATH string trick I read on here but that is very slow.

You can try following query also -
SELECT t_2.Field_1, t_1.Field_1 --1
FROM table_1 t_1, table_2 t_2 --2
WHERE t_1.Field_2 = t_2.Field_2 --3
GROUP BY t_1.Field_1, t_2.Field_1 --4
HAVING COUNT(*) = (SELECT COUNT(*) --5
FROM Table_1 t_1_1 --6
WHERE t_1_1.Field_1 = t_1.Field_1) --7
AND COUNT(*) = (SELECT COUNT(*) --8
FROM Table_2 t_2_1 --9
WHERE t_2_1.Field_1 =t_2.Field_1) --10
Edit
First the requested set of result is the combination of Field1 from both the tables where respective Field2 is exactly same.
so for that you can use one method which I have posted above.
Here
query will take the data from both the table based on field2 values (from line 1 to line 3)
then it will group the data based on field1 from table1 and field1 from table2 (line 4)
till this step you will get the result having field1 from table1 and field2 from table2 where it exists (at least one) matching based on field2 from tables for respective field1 values.
after this you just need to filter the result for correct (exactly same values for field2 values for respective field1 column value). so that you can make condition on row count.
here my assumption is that you don't have multiple values for field1 and field2 combination in either tables
means following rows will not be present -
1 b
1 b
In any of the tables.
if so, the rows count got for table1 and table2 for same field2 values should be match with the rows present in table1 for field1 and same rows only should present in tables2 for field2 value.
for this condition query has condition on count(*) in having clause (from line 5 to line 10).

Let me try to explain this version of the query:
select t1.field1 as t1field1, t2.field1 as t2field1
from (select t1.*,
count(*) over (partition by field1) as NumField2
from table1 t1
) t1 full outer join
(select t2.*,
count(*) over (partition by field1) as NumField2
from table2 t2
) t2
on t1.field2 = t2.field2
where t1.NumField2 = t2.NumField2
group by t1.Field1, t2.Field1
having count(t1.field2) = max(t1.NumField2) and
count(t2.field2) = max(t2.NumField2)
(which is here at SQLFiddle).
The idea is to compare the following counts for each pair of field1 values.
The number of field2 values on each.
The number of field2 values that they share.
All of these have to be equal.
Each subquery counts the number of values of field2 on each field1 value. For the first rows of your data, this produces:
1 A 2
1 B 2
2 A 3
2 D 3
2 E 3
. . .
And for the second table
8 A 2
8 B 2
9 E 3
9 D 3
9 C 3
Next, the full outer join is applied, requiring a match on both the count and the field2 value. This multiplies the data, producing rows such as:
1 A 2 8 A 2
1 B 2 8 B 2
2 A 3 NULL NULL NULL
2 D 3 9 D 3
2 E 3 9 E 3
NULL NULL NULL 9 C 3
And so on for all the possible combinations. Note that the NULLs appear due to the full outer join.
Note that when you have a pair, such as 1 and 8 that match, there are no rows with NULL values. When you have a pair with the same counts but they don't match, then you have NULL values. When you have a pair with different counts, they are filtered out by the where clause.
The filtering aggregation step applies these rules to get pairs that meet the first condition but not the second.
The having essentially removes any pair that has NULL values. When you count() a column, NULL values are not included. In that case, the count() on the column is fewer than the number of values expected (NumField2).

Merging columns in a join of two tables

I have the following tables in a Hive database:
table1:
id t X
1 1 a
1 4 a
2 5 a
3 10 a
table2:
id t Y
1 3 b
2 6 b
2 8 b
3 15 b
And I would like to merge them to have a table like:
id t Z
1 1 a
1 3 b
1 4 a
2 5 a
2 6 b
2 8 b
3 10 a
3 15 b
Basically what I want to do is :
a join on the column id (that part is easy)
merge the columns table1.t and table2.t into a new column t
have the variable Z that is equal to table1.X if the corresponding t comes from table1.t, and table2.Y if it comes from table2.t
order the table by id and then by t (that shouldn't be too hard)
I have no idea on how to do the parts 2 and 3. I tried with an outer join on
table1.id = table2.id and table1.t = table2.t, but it doesn't merge the two columns t.
Any pointer would be appreciated. Thanks!

CREATE TABLE table3 as SELECT * FROM (SELECT id,t,X as Z FROM t3_1 UNION ALL SELECT id,t,Y as Z FROM t3_2) u1 order by id,t;
Although not always required, using a subquery for the union'd queries help to organize, plus you can then reference the fields from the union (e.g. u1.id ) in other parts of the query.
You'll need the alias on the 3rd column to make the schemas match. If the source table name was not already a column, you could do something like this:
select * from (select id,t,'a' from t3_1 UNION ALL select id,t,'b' from t3_2) u1;

Try this one. It will insert in table 3, all the values from the other 2 tables
INSERT INTO table3 ( t, Z )
SELECT t, X
FROM table1
UNION ALL
SELECT t, Y
FROM table2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Merge two tables with summing up values - sql

For more efficient in terms of speed, as much as possible avoid using UNION/UNION ALL. Try my answer below: SELECT NVL(A.Key,B.Key)[Key], (NVL(A.Value,0) + NVL(B.Value,0))[Value] FROM Table1 A FULL JOIN Table2 B ON A.Key=B.Key

Related

how to execute query for each row result of another query

Selecting max value on subset of data based on other column's value

Is it possible to left join two tables and have the right table supply each row no more than once?

Get ID pairs between 2 tables with matching child records

Merging columns in a join of two tables

Categories

Resources