Outer Join Based On Concatenated SELECT - sql

SO perhaps the data framework is flawed from the start, but.. I need to do an out join on two tables, but I need to do it based a concatenation of 2 column sin the second table.
For instance, table one
title | key
-------+-------
foo | Bar1
table two
subcat | pt1 | pt2
--------+-----+-----
kitty | Bar | 1
I basically need to use pt1+pt2 combined as the foreign key.
This is largely academic as I can add a column to the dataset (not my original creation) that is the concatenation, however, I wanted to know if the was possible.
Postgres version 8.4.8
cheers.bo

A join condition can be pretty much any expression; in particular, you can include string concatenation:
select ...
from t1 left outer join t2 on t1.key = t2.pt1 || t2.pt2
where ...

You can always create a sub query and perform the join against the sub query:
SELECT t1.foo, t1.key, t3.subcat FROM table1
JOIN (SELECT t2.pt1 || t2.pt2 AS ptjoined, t2.subcat
FROM tabletwo AS t2) as t3
ON t3.ptjoined = t1.key

Related

How to do FULL OUTER JOIN in MS SQL Server when columns don't share any parameter including primary key?

I have many tables (let's call them single-parameter-tables) which include ID (primary key) and another parameter (2 columns in each table). I wish to join all of them in a joined_table consisting ID and param_1, param_2, ...., param_n columns. The joined_table is NOT NULL for ID column (primary key) and Nullable for other columns.
When the parameters share the ID value, I can do the FULL OUTER JOIN normally and there's no problem. But when one parameter doesn't share primary key with any of the other parameters, I face a problem.
Simply speaking, assume for ID 124 there is some value for param_3 from the third single_param-table but no other occurrence and value in other single-parameter-tables.
My code is as follows:
Insert into [joined_table]
(ID, param_1,param_2,param_3)
SELECT
ID
,param1
,param2
,param3
FROM
(
SELECT
-- here if I write just "A.ID as ID" I will receive error of unfilled primary key column)
COALESCE( A.ID, B.ID, C.ID) as ID
, A.param_1 as param1
, B.param_2 as param2
, C.param_3 as param3
FROM
(
(SELECT ID, param_1 FROM single_param_table_1) A
FULL OUTER JOIN
(SELECT IِِD, param_2 FROM single_param_table_2) B on A.ID= B.ID
FULL OUTER JOIN
(SELECT ID, param_3 FROM single_param_table_3) C on A.ID = C.ID
-- or:
-- ISNULL(A.ID, B.ID)= C.ID
)
) as joined ;
The error message that I receive is as follows:
Violation of PRIMARY KEY constraint 'PK_joined_table'. Cannot insert duplicate key in object 'joined_table'.
It seems like parameter 3 is not completely separate from other parameters and in case it shares the key, repeated row is tried to be inserted into the table.
Ideally I wish to have the result joined_table as this:
ID | param 1 | param 2 | param 3
=======================================
123 | 11 | 12 | NULL
---------------------------------------
124 | NULL | NULL | 23
Your problem is that one or more table have duplicates.
In the meantime, your FULL JOIN logic is filtering out rows that you seem to want. You can simplify and improve the logic:
select coalesce(t1.id, t2.id, t3.id, t4.id, . . . ) as id,
t1.param as param1,
t2.param as param2,
t3.param as param3,
t4.param as param4,
. . .
from single_param_table_1 t1 full join
single_param_table_2 t2
on t2.id = t1.id full join
single_param_table_3 t3
on t3.id = coalesce(t1.id, t2.id) full join
single_param_table_4 t4
on t4.id = coalesce(t1.id, t2.id, t3.id) full join
. . .
That is, you need lots of use of coalesce() so the ids match across the tables.
I should note that standard SQL and most databases support the using clause which simplifies this logic. However, SQL Server does not support using.
That simplifies your logic. However, your issue is that one or more tables have duplicate ids.

How to combine two columns one below the other in SQLite?

I have a query in SQLite to get all the stations in a train route that, at the end, lets me with two columns named ORI_STATION_ID and ORIG_STATION_ID, like in the image:
This is my current query (ORDER_COL is not shown in the result table, it is only used for ordering):
SELECT table_3.ORI_STATION_ID, table_3.DEST_STATION_ID
FROM table_1
INNER JOIN table_2
ON table_2.ID = table_1.ID
INNER JOIN table_3
ON table_3.ID = table_2.ID
WHERE table_1.CODE = "DINO"
ORDER BY ORDER_COL ASC
I want to get all the stations in the train route, from the first to the last one. Here, the first is the one with ID 1 and the last has ID 19.
I could take the first column and ready to go, but then I would miss the last station. If I took the second column, I would miss the first station.
What I want to do: combine the two columns in one putting one below the other and remove the duplicated stations, so my ALL_STATIONS_COLUMN would look like:
| ALL_STATIONS_COLUMN |
-----------------------
| 1 |
| 2 |
| 9 |
| 10 |
| 11 |
| 19 |
I have seen in other posts how to use joins to combine tables or CONCAT (but seems it doesn't suit for SQLite) but didn't found the proper way.
A union query would be one option here:
SELECT t3.ORI_STATION_ID AS ALL_STATIONS_COLUMN
FROM table_1 t1
INNER JOIN table_2 t2 ON t2.ID = t1.ID
INNER JOIN table_3 t3 ON t3.ID = t2.ID
WHERE t1.CODE = 'DINO'
UNION
SELECT t3.DEST_STATION_ID
FROM table_1 t1
INNER JOIN table_2 t2 ON t2.ID = t1.ID
INNER JOIN table_3 t3 ON t3.ID = t2.ID
WHERE t1.CODE = 'DINO'
ORDER BY ALL_STATIONS_COLUMN;
This kills two birds at the same time (splat), because the union brings together the origin and destination station IDs, while also removing duplicate values should they occur between the two sets.
Sounds like a case for a recursive CTE (Example just uses your final route table; I'm not going to try to come up with three tables worth of data; easy to extend the following with the query that computes that route as another CTE):
WITH cte AS
(SELECT 1 AS all_stations_column, 1 AS stop
UNION
SELECT r.dest_station_id, cte.stop + 1
FROM route AS r
JOIN cte ON r.ori_station_id = cte.all_stations_column)
SELECT all_stations_column FROM cte ORDER BY stop;
giving
all_stations_column
-------------------
1
2
9
10
11
19
dbfiddle example

SQL Server : removing duplicate column while joining tables

I have 4 tables with one column is common on all tables. Is there a way to create a view where I can join all tables by same column where I see the common column only once.
Let's say I have table1
Cust ID | Order ID | Product_Name
Table2
Cust_ID | Cust_Name | Cust_Address
Table3
Cust_ID | Cust_Acc | Acc_Type
Table4
Cust_ID | Contact_Phone | Cust_Total_Ord
Here is the code I use to join tables;
SELECT *
FROM table1
LEFT JOIN table2 ON table1.Cust_ID = table2.Cust_ID
LEFT JOIN table3 ON table2.Cust_ID = table3.Cust_ID
LEFT JOIN table4 ON table3.Cust_ID = table4.Cust_ID
I get all tables joined by I see Cust_ID from each table as below;
Cust ID| Order ID|Product_Name| Cust_ID| Cust_Name|Cust_Address| Cust_ID| Cust_Acc| Acc_Type|Cust_ID|Contact_Phone|Cust_Total_Ord
Is there a way to remove duplicate Cust_ID columns or do I need to write each column name in the SELECT? I have more than 50 columns in total so will be difficult to write all.
Sorry if it is a really dumb question, I have checked previous similar questions but couldn't figure out and thanks for help.
you have common columns on all tables so could use using(common_column) to remove duplicated columns.
SELECT *
FROM table1
LEFT JOIN table2 using(Cust_ID)
LEFT JOIN table3 using(Cust_ID)
LEFT JOIN table4 using(Cust_ID)
I hop that useful.
you need to select columns from three tables first and then make inner join like below
select
t1.cust_id, t1.col1, t1.col2,
t2.col1_table2, t2.col2_table2,
t3.col1_table3, t3.col2_table3
from
table1 t1
inner join
table2 t2 on t1.cust_id = t2.cust_id
join table3 t3 on t1.cust_id = t3.cust_id
Result as shown in below image
No, you cannot easily do what you want in SQL Server. In other databases, you can use the using clause.
One thing you can do is select the columns explicitly from all but the first table:
SELECT t1.*, . . .
FROM table1 t1 LEFT JOIN
table2 t2
ON t1.Cust_ID = t2.Cust_ID LEFT JOIN
table3
ON t1.Cust_ID = table3.Cust_ID LEFT JOIN
table4
ON t1.Cust_ID = table4.Cust_ID;
Perhaps more important than the column issue, I changed the join conditions. You are using LEFT JOIN, so the first table is the "driving" table. When you say t2.Cust_ID = t3.Cust_Id, this returns true only when there was a match to table2. In general, you want to use the columns from table1, because it is the first one in the chain of LEFT JOINs.

SQL(ite) JOIN with regex within JOIN

I have two tables both with a column called Name. Sometimes the names begin with an uppercase letter whereas other times not. I would like to join the two tables on the names so that bob's match the Bob's. I assume that this could be possible with regular expressions, so how can one construct an SQL JOIN query that does do this match and what is the correct regex?
For example: Say I have table1 as :
Name| col1
----------
Bob | a
Jon | b
and table 2 as:
Name| col2
----------
bob| c
Jon| d
I would join them as follows (with xx being the missing regex and yy being the correct selection)
SELECT * , yy as NameWithCapAtFront
FROM table1 as t1
LEFT JOIN table2 as t2
ON xx(t1.Name)=xx(t2.Name)
but this misses the bob match with Bob.
Further, how would one always select capitalised version of the Name.
How about just using the lower() function?
SELECT *, upper(substr(t1.name, 1, 1) || lower(t1.name, 2) as NameWithCapAtFront
FROM table1 as t1 LEFT JOIN
table2 as t2
ON lower(t1.Name) = lower(t2.Name);
Admittedly, this lower cases the whole name, but that seems reasonable in this case.

Three table join with joins other than INNER JOIN

I am learning SQL and am trying to learn JOINs this week.
I have gotten to the level where I can do three table joins, similar to a lot of examples I've seen. I'm still trying to figure out the tiny details of how things work. All the examples I've seen of three table joins use INNER JOINS only. What about LEFT and RIGHT JOINs? Do you ever use these in three table joins? What would it mean?
SELECT ~some columns~ FROM ~table name~
LEFT JOIN ~table 2~ ON ~criteria~
INNER JOIN ~table 3~ ON ~criteria~
or
SELECT ~some columns~ FROM ~table name~
INNER JOIN ~table 2~ ON ~criteria~
LEFT JOIN ~table 3~ ON ~criteria~
or
SELECT ~some columns~ FROM ~table name~
LEFT JOIN ~table 2~ ON ~criteria~
LEFT JOIN ~table 3~ ON ~criteria~
or
???
Just trying to explore the space as much as possible
Yes, I do use all three of those JOINs, although I tend to stick to using just LEFT (OUTER) JOINs instead of inter-mixing LEFT and RIGHT JOINs. I also use FULL OUTER JOINs and CROSS JOINs.
In summary, an INNER JOIN restricts the resultset only to those records satisfied by the JOIN condition. Consider the following tables
EDIT: I've renamed the Table names and prefix them with # so that Table Variables can be used for anyone reading this answer and wanting to experiment.
If you'd also like to experiment with this in the browser, I've set this all up on SQL Fiddle too;
#Table1
id | name
---------
1 | One
2 | Two
3 | Three
4 | Four
#Table2
id | name
---------
1 | Partridge
2 | Turtle Doves
3 | French Hens
5 | Gold Rings
SQL code
DECLARE #Table1 TABLE (id INT PRIMARY KEY CLUSTERED, [name] VARCHAR(25))
INSERT INTO #Table1 VALUES(1, 'One');
INSERT INTO #Table1 VALUES(2, 'Two');
INSERT INTO #Table1 VALUES(3, 'Three');
INSERT INTO #Table1 VALUES(4, 'Four');
DECLARE #Table2 TABLE (id INT PRIMARY KEY CLUSTERED, [name] VARCHAR(25))
INSERT INTO #Table2 VALUES(1, 'Partridge');
INSERT INTO #Table2 VALUES(2, 'Turtle Doves');
INSERT INTO #Table2 VALUES(3, 'French Hens');
INSERT INTO #Table2 VALUES(5, 'Gold Rings');
An INNER JOIN SQL Statement, joined on the id field
SELECT
t1.id,
t1.name,
t2.name
FROM
#Table1 t1
INNER JOIN
#Table2 t2
ON
t1.id = t2.id
Results in
id | name | name
----------------
1 | One | Partridge
2 | Two | Turtle Doves
3 | Three| French Hens
A LEFT JOIN will return a resultset with all records from the table on the left hand side of the join (if you were to write out the statement as a one liner, the table that appears first) and fields from the table on the right side of the join that match the join expression and are included in the SELECT clause. Missing details will be populated with NULL
SELECT
t1.id,
t1.name,
t2.name
FROM
#Table1 t1
LEFT JOIN
#Table2 t2
ON
t1.id = t2.id
Results in
id | name | name
----------------
1 | One | Partridge
2 | Two | Turtle Doves
3 | Three| French Hens
4 | Four | NULL
A RIGHT JOIN is the same logic as a LEFT JOIN but will return all records from the right-hand side of the join and fields from the left side that match the join expression and are included in the SELECT clause.
SELECT
t1.id,
t1.name,
t2.name
FROM
#Table1 t1
RIGHT JOIN
#Table2 t2
ON
t1.id = t2.id
Results in
id | name | name
----------------
1 | One | Partridge
2 | Two | Turtle Doves
3 | Three| French Hens
NULL| NULL| Gold Rings
Of course, there is also the FULL OUTER JOIN, which includes records from both joined tables and populates any missing details with NULL.
SELECT
t1.id,
t1.name,
t2.name
FROM
#Table1 t1
FULL OUTER JOIN
#Table2 t2
ON
t1.id = t2.id
Results in
id | name | name
----------------
1 | One | Partridge
2 | Two | Turtle Doves
3 | Three| French Hens
4 | Four | NULL
NULL| NULL| Gold Rings
And a CROSS JOIN (also known as a CARTESIAN PRODUCT), which is simply the product of cross applying fields in the SELECT statement from one table with the fields in the SELECT statement from the other table. Notice that there is no join expression in a CROSS JOIN
SELECT
t1.id,
t1.name,
t2.name
FROM
#Table1 t1
CROSS JOIN
#Table2 t2
Results in
id | name | name
------------------
1 | One | Partridge
2 | Two | Partridge
3 | Three | Partridge
4 | Four | Partridge
1 | One | Turtle Doves
2 | Two | Turtle Doves
3 | Three | Turtle Doves
4 | Four | Turtle Doves
1 | One | French Hens
2 | Two | French Hens
3 | Three | French Hens
4 | Four | French Hens
1 | One | Gold Rings
2 | Two | Gold Rings
3 | Three | Gold Rings
4 | Four | Gold Rings
EDIT:
Imagine there is now a Table3
#Table3
id | name
---------
2 | Prime 1
3 | Prime 2
5 | Prime 3
The SQL code
DECLARE #Table3 TABLE (id INT PRIMARY KEY CLUSTERED, [name] VARCHAR(25))
INSERT INTO #Table3 VALUES(2, 'Prime 1');
INSERT INTO #Table3 VALUES(3, 'Prime 2');
INSERT INTO #Table3 VALUES(5, 'Prime 3');
Now all three tables joined with INNER JOINS
SELECT
t1.id,
t1.name,
t2.name,
t3.name
FROM
#Table1 t1
INNER JOIN
#Table2 t2
ON
t1.id = t2.id
INNER JOIN
#Table3 t3
ON
t1.id = t3.id
Results in
id | name | name | name
-------------------------------
2 | Two | Turtle Doves | Prime 1
3 | Three| French Hens | Prime 2
It might help to understand this result by thinking that records with id 2 and 3 are the only ones common to all 3 tables and are also the field we are joining each table on.
Now all three with LEFT JOINS
SELECT
t1.id,
t1.name,
t2.name,
t3.name
FROM
#Table1 t1
LEFT JOIN
#Table2 t2
ON
t1.id = t2.id
LEFT JOIN
#Table3 t3
ON
t1.id = t3.id
Results in
id | name | name | name
-------------------------------
1 | One | Partridge | NULL
2 | Two | Turtle Doves | Prime 1
3 | Three| French Hens | Prime 2
4 | Four | NULL | NULL
Joel's answer is a good explanation for explaining this resultset (Table1 is the base/origin table).
Now with a INNER JOIN and a LEFT JOIN
SELECT
t1.id,
t1.name,
t2.name,
t3.name
FROM
#Table1 t1
INNER JOIN
#Table2 t2
ON
t1.id = t2.id
LEFT JOIN
#Table3 t3
ON
t1.id = t3.id
Results in
id | name | name | name
-------------------------------
1 | One | Partridge | NULL
2 | Two | Turtle Doves | Prime 1
3 | Three| French Hens | Prime 2
Although we do not know the order in which the query optimiser will perform the operations, we will look at this query from top to bottom to understand the resultset. The INNER JOIN on ids between Table1 and Table2 will restrict the resultset to only those records satisfied by the join condition i.e. the three rows that we saw in the very first example. This temporary resultset will then be LEFT JOINed to Table3 on ids between Table1 and Tables; There are records in Table3 with id 2 and 3, but not id 1, so t3.name field will have details in for 2 and 3 but not 1.
Joins are just ways of combining tables. Joining three tables is no different than joining 2... or 200. You can mix and match INNER, [LEFT/RIGHT/FULL] OUTER, and even CROSS joins as much as you want. The only difference is which results are kept: INNER joins only keep rows where both sides match the expression. OUTER joins pick an "origin" table depending on the LEFT/RIGHT/FULL specification, always keep all rows from the origin table, and supply NULL values for rows from the other side that don't match the expression. CROSS joins return all possible combinations of both sides.
The trick is that because you're working with declarative code rather than more-familiar iterative, the temptation is to try to think of it as if everything happens at once. When you do that, you try to wrap your head around the entire query and it can get confusing.
Instead, you want to think of it as if the joins happen in order, from the first table listed to the last. This actually is not how it works, because the query optimizer can re-order things to make them run faster. But it makes building the query easier for the developer.
So with three tables, you start with your base table, then join in the values you need from the next table, and the next, and so on, just like adding lines of code to a function to produce the required output.
As for using the different join types, I've used all the different types I listed here: INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER, and CROSS. But most of those you only need to use occasionally. INNER JOIN and LEFT JOIN will cover probably 95% or more of what you want to do.
Now let's talk about performance. Often times the order you list tables is dictated to you: you start from TableA and you need to list TableB first in order to have access to columns required to join in TableC. But sometimes both TableB and TableC only depend on TableA, and you could list them in either order. When that happens the query optimizer will usually pick the best order for you, but sometimes it doesn't know how. Even if it did, it helps to have a good system for listing tables so you can always look at a query and know that it's "right".
With that in mind, you should think of a query in terms of the working set currently in memory as the query builds. When you start with TableA, the database looks at all the columns from TableA in the select list or anywhere else (like WHERE or ORDER BY clauses, or potential indexes) in the query, factors in relevant conditions from the WHERE clause, and loads the smallest portion of that table into memory that it can get away with. It does this for each table in turn, always loading as little as possible. And that's the key: you want to keep this working set as small as possible for as long as possible.
So, going back to our three-table join, we want to list the tables in the order that will keep the working set smaller for longer. This means listing the smaller table above the larger one. Another good rule of thumb is that INNER joins tend to shrink result sets, while OUTER joins, tend to grow result sets, and so you want to list your INNER joins first. However, this is not a requirement for a query to work, nor is it always true; sometimes the reverse can happen as well.
Finally, I want to point out again that this isn't how it really works. The query optimizer and execution plans are a very complex topic, and there are lots of tricks the database can take that break this model from time to time. It's just one model that you as a developer can use to help understand what the server is doing and help you write better queries.
Selecting from three tables is no different from selecting from only two (or as many as a hundred, though that would be a fairly ugly query to read).
For EACH join you write, having INNER indicates that you only want rows that successfully join those two tables together. If other tables were joined earlier in the query, those results are now completely irrelevant, except to the extent your own join conditions call on them.
For example:
SELECT person.*
FROM person
LEFT JOIN vehicle ON (person.person_id = vehicle.owner_id)
LEFT JOIN house ON (person.person_id = house.owner_id)
Here I want a list of all people, and (if available) all the vehicles and houses they own.
Alternatively:
SELECT person.*
FROM person
INNER JOIN vehicle ON (person.person_id = vehicle.owner_id)
LEFT JOIN house ON (person.person_id = house.owner_id)
Here I want all people who own vehicles (they must own a vehicle to get results in my query), and (if available) all the houses they own).
Each join is completely separate here.
Of course, by varying what you put in the ON clause, you can make joins interrelate tables any way you want.
This really depends on what you are doing. I've written many 3+ table queries that will have an outer join in them. It just depends on the data you are querying and what you are trying to follow.
The same general logic applies when selecting the join type when you have multiples as with single join queries.
For the sake of this example, lets say we have a table "employees" with an ID, NAME and MANAGER_ID fields.
Here is a simple query:
SELECT E.ID, E.NAME, M.NAME AS MANAGER
FROM EMPLOYEES E
JOIN EMPLOYEE M ON E.MANAGER_ID = M.ID
This will return all of the employees, with their manager name. But what happens for the boss? he who has no manager? A database null would actually prevent that row from returning as it could not find a matching record to join on. Thus you would use an OUTER join (left or right depending on how your write the query).
The same logic would hold for writing a query with 2+n joins. If you are possibly going to have rows that don't have matches in your join clause, and want those rows to come back (albeit with nulls), then you are golden.
Read this great article on outer joins by a well known expert Terry Purcell
also a great write up by Plamen Ratchev
On some sql engines there's an issue where you're joining a using left join.
If you join A->B->C and the row in B doesn't exist then the join column from B is NULL.
A few I've used require that the join from B->C must be a left join if the join from A->B is a left join.
This is ok
select a.*, b.*, c.*
from a
left join b on b.id = a.id
left join c on c.id = b.id
this is not
select a.*, b.*, c.*
from a
left join b on b.id = a.id
inner join c on c.id = b.id
For the sake of completeness and standard evangelics, I'll chime in with the ansi-92 nested join syntax:
select t1.*
,t2.*
,t3.*
from table1 t1
left outer join (
table2 t2 left outer join table3 t3 on (t2.b = t3.b)
) on (t1.a = t2.a)
Your SQL engine of choice may optimize for them.