joining multiple tables on common attributes with disjoint rows

joining multiple tables on common attributes with disjoint rows - sql

I have multiples tables which need to be joined on multiple common attributes such the different attributes can be shown in a single table.
table1
+--------+---------+-------+--------+
| make | model | year | kms |
+--------+---------+-------+--------+
| toyota | corolla | 1999 | 25000 |
| toyota | camry | 2002 | 50000 |
+--------+---------+-------+--------+
table2
+--------+---------+-------+---------+
| make | model | year | mileage |
+--------+---------+-------+---------+
| toyota | corolla | 1999 | 20 |
| toyota | qualis | 2004 | 25 |
+--------+---------+-------+---------+
table3
+--------+----------+-------+-------+
| make | model | year | color |
+--------+----------+-------+-------+
| toyota | camry | 2002 | blue |
| toyota | rav4 | 2006 | green |
+--------+----------+-------+-------+
I'm doing the following to join the results
select
* from table1 as a
full join table2 as b
using (make, model, year)
full join table3 as c
using (make, model, year)
What I need is a table like below.
+--------+---------+-------+-------+----------+--------+
| make | model | year | kms | mileage | color |
+--------+---------+-------+-------+----------+--------+
| toyota | corolla | 1999 | 25000 | 20 | |
| toyota | camry | 2002 | 50000 | | blue |
| toyota | qualis | 2004 | | 25 | |
| toyota | rav4 | 2006 | | | green |
+--------+---------+-------+-------+----------+--------+
However I get results with the make,model,year duplicated with some empty value for some rows.
How do I go about getting the required. Note that, for the real data set I'm working with, there are 5 common attributes per table and around 20-40 different attributes per table.

Duplicates could be due to the full join you are using. Change those to left joins as given below:
SELECT
A.MAKE, A.MODEL, A.YEAR, T1.KMS, T2.MILEAGE, T3.COLOR
FROM
(SELECT MAKE, MODEL, YEAR FROM TABLE1 UNION
SELECT MAKE, MODEL, YEAR FROM TABLE2 UNION
SELECT MAKE, MODEL, YEAR FROM TABLE3) A
LEFT JOIN TABLE1 T1
ON T1.MAKE = A.MAKE AND T1.MODEL = A.MODEL AND T1.YEAR = A.YEAR
LEFT JOIN TABLE2 T2
ON T2.MAKE = A.MAKE AND T2.MODEL = A.MODEL AND T2.YEAR = A.YEAR
LEFT JOIN TABLE3 T3
ON T3.MAKE = A.MAKE AND T3.MODEL = A.MODEL AND T3.YEAR = A.YEAR;;

Related

How to join a grouped table in sql?

Novice in SQL here but hopefully someone can help. I have two tables. For the simplicity here is how the tables are structured.
Table 1:
+------------+-------+-----------+------------+
| department | sales | date | sales_code |
+------------+-------+-----------+------------+
| 1 | 50 | 5/26/2021 | A |
+------------+-------+-----------+------------+
| 2 | 150 | 5/26/2021 | B |
+------------+-------+-----------+------------+
| 1 | 200 | 5/25/2021 | C |
+------------+-------+-----------+------------+
| 2 | 250 | 5/24/2021 | D |
+------------+-------+-----------+------------+
Table 2:
+------+------------+-------+-----------+-----------------------+
| item | department | sales | date | column I want to join |
+------+------------+-------+-----------+-----------------------+
| 31 | 1 | 50 | 5/26/2021 | x |
+------+------------+-------+-----------+-----------------------+
| 30 | 2 | 150 | 5/26/2021 | x |
+------+------------+-------+-----------+-----------------------+
| 29 | 1 | 200 | 5/25/2021 | x |
+------+------------+-------+-----------+-----------------------+
| 28 | 2 | 250 | 5/24/2021 | x |
+------+------------+-------+-----------+-----------------------+
I need to join table 2 to table 1 - however it needs to be aggregated by department sales first, this is because table 2 is already aggregated by department sales. Here is what I was thinking but cannot seem to get it to work.
SELECT t1.*, t2.*
FROM table1 as t1
JOIN (
SELECT department, date, column_i_want, sum(sales)
FROM table2
GROUP BY department ) as t2
ON t2.department = t1.department AND t1.date = t2.date
Desired Output:
+------------+-------+-----------+------------+-----------------------+
| department | sales | date | sales_code | column I want to join |
+------------+-------+-----------+------------+-----------------------+
| 1 | 50 | 5/26/2021 | A | x |
+------------+-------+-----------+------------+-----------------------+
| 2 | 150 | 5/26/2021 | B | x |
+------------+-------+-----------+------------+-----------------------+
| 1 | 200 | 5/25/2021 | C | x |
+------------+-------+-----------+------------+-----------------------+
| 2 | 250 | 5/24/2021 | D | x |
+------------+-------+-----------+------------+-----------------------+
Any help would be appreciated.

There are several ways to go about doing that, the easiest one is to create a view
CREATE VIEW t2 AS
SELECT department, date, column_i_want, sum(sales)
FROM table2
GROUP BY department;
then it's easier to join them (you can also use a With clause instead of a view but it can get messy)
SELECT *
FROM table1 NATURAL JOIN t2

here is what you want:
select t2.*, t1.sales_code
from table2 t2
join table1 t1
on t1.department = t2.department
and t1.date = t2.date

SQL: Return value based on relationship of relationship

Consider three tables in a SQL Server database, STOCK, BINS and VENDORS:
:: STOCK :: BINS :: VENDORS
+------+-----+-------+ +-----+--------+ +----+---------+
| SKU | BIN | COUNT | | BIN | VENDOR | | ID | NAME |
+------+-----+-------+ +-----+--------+ +----+---------+
| 1000 | A01 | 3 | | A01 | 1 | | 1 | Apples |
| 2000 | A02 | 4 | | A02 | 1 | | 2 | Oranges |
| 1000 | B01 | 6 | | B01 | 2 | +----+---------+
+------+-----+-------+ +-----+--------+
How would I return a result set that includes all columns from the STOCK table, along with the vendor name from the VENDORS table, with a condition that I am looking for a specific sku. The vendor name would need to be determined based on the ID relationship between the BINS and VENDORS table.
The desired output:
+------+-----+-------+---------+
| SKU | BIN | COUNT | VENDOR |
+------+-----+-------+---------+
| 1000 | A01 | 3 | Apples |
| 1000 | B01 | 6 | Oranges |
+------+-----+-------+---------+
I have attempted using left outer joins, as well as nested selects. Using this query for example:
SELECT [stock].*,
(
SELECT [vendors].[name]
FROM [vendors], [bins]
WHERE [vendors].[id] = [bins].[vendor]
AND [bins].[bin] = [stock].[bin]
) AS [vendor]
FROM [stock]
WHERE [stock].[sku] = '1000'
I am getting this result (the issue being the NULL):
+------+-----+-------+--------+
| SKU | BIN | COUNT | VENDOR |
+------+-----+-------+--------+
| 1000 | A01 | 3 | Apples |
| 1000 | B01 | 6 | NULL |
+------+-----+-------+--------+
How would I form my query so as to achieve the desired output noted above, in the most performant manner?

Simple joins should do what you want:
select
s.*,
v.name
from stock s
inner join bins b on b.bin = s.bin
inner join vendors v on v.id = b.vendor
where s.sku = 1000
If there is a possibility of unknown bins or vendors, you can use left joins instead of inner joins.

joining wide tables (10s of unique cols)

I have multiples tables which need to be joined on multiple common attributes such the different attributes can be shown in a single table.
table1
+--------+---------+-------+
| make | model | r_yr |
+--------+---------+-------+
| toyota | corolla | 1999 |
| toyota | camry | 2002 |
| toyota | qualis | 2004 |
| toyota | rav4 | 2006 |
+--------+---------+-------+
table2
+--------+---------+--------+
| make | model | kms |
+--------+---------+--------+
| toyota | corolla | 25000 |
| toyota | camry | 50000 |
+--------+---------+--------+
table4
+--------+---------+---------+
| make | model | mileage |
+--------+---------+---------+
| toyota | corolla | 20 |
| toyota | qualis | 25 |
+--------+---------+---------+
table5
+--------+----------+-------+
| make | model | colr |
+--------+----------+-------+
| toyota | camry | blue |
| toyota | rav4 | green |
+--------+----------+-------+
I'm doing the following to join the results
select a.make, a.model,a.r_yr,b.kms,c.mileage,d.colr
from table1 as a
left join table2 as b
on b.make=a.make and b.model=a.model and b.r_yr=a.r_yr
left join table3 as c
on c.make=a.make and c.model=a.model and c.r_yr=a.r_yr
left join table4 as d
on d.make=a.make and d.model=a.model and d.r_yr=a.r_yr
This gives a table like below
+--------+---------+-------+-------+----------+--------+
| make | model | r_yr | kms | mileage | colr |
+--------+---------+-------+-------+----------+--------+
| toyota | corolla | 1999 | 25000 | 20 | |
| toyota | camry | 2002 | 50000 | | blue |
| toyota | qualis | 2004 | | 25 | |
| toyota | rav4 | 2006 | | | green |
+--------+---------+-------+-------+----------+--------+
However the issue I have is that, for the real data set I'm working with, there are 5 common cols per table and around 20-40 unique attributes per table requiring to specify 20-40 col names in the query in the form of b.kms, ....,c.mileage, ......,d.colr,..... Is there a work around to not having to specify those unique columns by specifying all except the common cols or other ways ?

You cannot do something like SELECT all except x,y,z ...
But you can simplify this query using USING clause instead of JOIN ... ON
Demo: http://sqlfiddle.com/#!17/fa97a/6
select *
from table1 as a
left join table2 as b
USING (make, model)
left join table3 as c
USING (make, model)
left join table4 as d
USING (make, model)
| make | model | r_yr | kms | mileage | colr |
|--------|---------|------|--------|---------|--------|
| toyota | camry | 2002 | 50000 | (null) | blue |
| toyota | corolla | 1999 | 25000 | 20 | (null) |
| toyota | qualis | 2004 | (null) | 25 | (null) |
| toyota | rav4 | 2006 | (null) | (null) | green |
Note: In the above example I am using only two common columns (make, model) since in your example r_yr is not a common column because it is only in table1

Access Multiple SQL Connection

I have two queries in Access which are returning two tables like:
(The tables have both about 1000 lines)
SELECT
(select count(*)
from Table1 T2
where T1.Name=T2.Name and T1.Variable1 >= T2.Variable1) as Rank,
T1.Name,
T1.Variable1
FROM Table1 T1
Results:
+-------+---------+------------+
| Rank | Name | Variable1 |
+-------+---------+------------+
| 1 | Tim | x |
| 2 | Tim | y |
| 3 | Tim | z |
| 1 | Susan | x |
| 2 | Susan | w |
+-------+---------+------------+
Second query:
SELECT (select count(*)
from Table2 T2
where T1.Name=T2.Name and T1.Variable2 >= T2.Variable2) as Rank,
T1.Name,T1.Variable2
FROM Table2 T1
Results:
+--------+---------+------------+
| Ran | Name | Variable2 |
+--------+---------+------------+
| 1 | Tim | a |
| 2 | Tim | b |
| 3 | Tim | c |
| 1 | Susan | a |
| 2 | Susan | c |
+--------+---------+------------+
I want to link them:
Select distinct Table1.Name, Table1.Variable1, Table2.Variable2
from Table1, Table2
where Table1.Name=Table2.Name and Table1.Rank=Table2.Rank
Results:
+-----------+---------+-------------+------------+
| Rank | Name | Variable1 | Variable2 |
+-----------+---------+-------------+------------+
| 1 | Tim | x | a |
| 2 | Tim | y | b |
| 3 | Tim | z | c |
| 1 | Susan | x | a |
| 2 | Susan | w | b |
+-----------+---------+-------------+------------+
But that link isn't performing well in access.
I also tried to link them via "join" but the performance isnt getting better.

These ranking queries are expensive (the subquery has to be executed for each row of the main table).
Stacking / cascading expensive queries in Access often performs badly.
Your best option is to change your 1st and 2nd query into "Create table" (SELECT INTO) queries, storing the results in intermediate tables.
E.g.
SELECT
(select count(*)
from Table1 T2
where T1.Name=T2.Name and T1.Variable1 >= T2.Variable1) as Rank,
T1.Name,
T1.Variable1
INTO Result1
FROM Table1 T1
Then use these tables (Result1, Result2) as input for the JOIN.

Transposing Data SQL

The data looks similar to this:
+----+------+-----------+-------+---------+---------+--------+
| ID | Unit | Floorplan | Sq Ft | Name | Amenity | Charge |
+----+------+-----------+-------+---------+---------+--------+
| 1 | 110 | A1 | 750 | Alan | GARAGE | 50 |
| 2 | | | | | RENT | 850 |
| 3 | | | | | PEST | 2 |
| 4 | | | | | TRASH | 15 |
| 5 | | | | | TOTAL | 20 |
| 6 | 111 | A2 | 760 | Bill | STORAGE | 35 |
| 7 | | | | | GARAGE | 50 |
| 8 | | | | | RENT | 850 |
| 9 | | | | | PEST | 2 |
| 10 | | | | | TOTAL | 15 |
| 11 | 112 | A3 | 770 | Charlie | PETRENT | 20 |
| 12 | | | | | STORAGE | 35 |
| 13 | | | | | GARAGE | 50 |
| 14 | | | | | RENT | 850 |
| 15 | | | | | TOTAL | 2 |
+----+------+-----------+-------+---------+---------+--------+
I am new to SQL and trying my best using Microsoft Access, but I need help.
The data needs to look like this:
My first step is to separate the units from the rest with
SELECT * FROM table WHERE Unit <> NULL;
and after that I've usually just hard-input the rest.
My idea was as follows:
INSERT INTO table
VALUES (NULL,NULL,...,'Pest',$2)
FROM table
WHERE NOT EXIST 'Pest' BETWEEN x AND y
/* where x = Total 1 and y = Total 2*/
Am I on the right track? I probably need a loop or a join, but I'm not at that level yet.

You can use a crosstab query, though a bit convoluted it is:
TRANSFORM
Sum(TableUnit.Charge) AS SumOfCharge
SELECT
S.Unit,
S.Floorplan,
S.SqFt,
S.Name,
S.Amenity
FROM
TableUnit,
(SELECT
Q.Id,
Val(DMax("Id","TableUnit","Id<=" & Q.[Id] & " And Unit Is Not Null")) AS ParentId
FROM TableUnit As Q) AS T,
(SELECT
TableUnit.Id,
TableUnit.Unit,
TableUnit.Floorplan,
TableUnit.SqFt,
TableUnit.Name,
TableUnit.Amenity
FROM
TableUnit
WHERE
TableUnit.Unit Is Not Null) AS S
WHERE
TableUnit.Id=[T].[Id]
AND
T.ParentId)=[S].[Id]
GROUP BY
T.ParentId,
S.Unit,
S.Floorplan,
S.SqFt,
S.Name,
S.Amenity
PIVOT
TableUnit.Amenity In
("Garage","Pest","Trash","PetRent","Storage","Rent");
Your test data differs a little from your expected output, so:

My MSAccess is rather rusty, but something like this should work:
SELECT t0.Unit, t0.Floorplan, t0.[Sq Ft], t0.Name, t0.Amenity
, SUM(IIF(tM.Amenity = 'GARAGE', Charge, 0)) AS [Garage]
, SUM(IIF(tM.Amenity = 'PEST', Charge, 0)) AS [Pest]
FROM (
SELECT t1.id AS id0, MIN(t2.id) AS idN
FROM t AS t1
INNER JOIN t AS t2 ON t1.id < t2.id
WHERE t1.Unit <> '' AND t2.Unit <> ''
) AS groups
INNER JOIN t AS t0 ON t0.id = groups.id0
LEFT JOIN t AS tM ON tM.id > groups.id0 AND tm.id < groups.idN
GROUP BY t0.Unit, t0.Floorplan, t0.[Sq Ft], t0.Name, t0.Amenity
;
Though, if I remember correctly, and it hasn't changed in newer versions; you can't have true subqueries and will need to make groups a separate query you can join to as if it were a table/view.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

joining multiple tables on common attributes with disjoint rows - sql

Related

How to join a grouped table in sql?

SQL: Return value based on relationship of relationship

joining wide tables (10s of unique cols)

Access Multiple SQL Connection

Transposing Data SQL

Categories

Resources