How to get matching data from another SQL table for two different columns: Inner Join and/or Union? - sql

I've got two tables in MS Access that keep track of class facilitators and the classes they facilitate. The two tables are structured as follows:
tbl_facilitators
facilID -> a unique autonumber to keep track of individual teachers
facilLname -> the Last name of the facilitator
facilFname -> the First name of the facilitator
tbl_facilitatorClasses
classID -> a unique autonumber to keep track of individual classes
className -> the name of the class (science, math, etc)
primeFacil -> the facilID from the first table of a teacher who is primary facilitator
secondFacil -> the facilID from the first table of another teacher who is backup facilitator
I cannot figure out how to write an Inner Join that pulls up the results in this format:
Column 1: Class Name
Column 2: Primary Facilitator's Last Name
Column 3: Primary Facilitator's First Name
Column 4: Secondary Facilitator's Last Name
Column 5: Secondary Facilitator's First Name
I am able to pull up and get the correct results if I only request the primary facilitator by itself or only request the secondary facilitator by itself. I cannot get them both to work out, though.
This is my working Inner Join:
SELECT tbl_facilitatorClasses.className,
tbl_facilitators.facilLname, tbl_facilitators.facilFname
FROM tbl_facilitatorClasses
INNER JOIN tbl_facilitators
ON tbl_facilitatorClasses.primeFacil = tbl_facilitators.facilID;
Out of desperation I also tried a Union, but it didn't work out as I had hoped. Your help is greatly appreciated. I'm really struggling to make any progress at this point. I don't often work with SQL.
SOLUTION
Thanks to #philipxy I came up with the following query which ended up working:
SELECT tblCLS.className,
tblP.facilLname, tblP.facilFname, tblS.facilLname, tblS.facilFname
FROM (tbl_facilitatorClasses AS tblCLS
INNER JOIN tbl_facilitators AS tblP
ON tblCLS.primeFacil=tblP.facilID)
INNER JOIN tbl_facilitators AS tblS
ON tblCLS.secondFacil=tblS.facilID;
When performing multiple Inner Joins in MS Access, parenthesis are needed...As described in this other post.

(The following applies when every row is SQL DISTINCT, and outside SQL code similarly treats NULL like just another value.)
Every base table has a statement template, aka predicate, parameterized by column names, by which we put a row in or leave it out. We can use a (standard predicate logic) shorthand for the predicate that is like its SQL declaration.
-- facilitator [facilID] is named [facilFname] [facilLname]
facilitator(facilID, facilLname, facilFname)
-- class [classID] named [className] has prime [primeFacil] & backup [secondFacil]
class(classID, className, primeFacil, secondFacil)
Plugging a row into a predicate gives a statement aka proposition. The rows that make a true proposition go in a table and the rows that make a false proposition stay out. (So a table states the proposition of each present row and states NOT the proposition of each absent row.)
-- facilitator f1 is named Jane Doe
facilitator(f1, 'Jane', 'Doe')
-- class c1 named CSC101 has prime f1 & backup f8
class(c1, 'CSC101', f1, f8)
But every table expression value has a predicate per its expression. SQL is designed so that if tables T and U hold the (NULL-free non-duplicate) rows where T(...) and U(...) (respectively) then:
T CROSS JOIN U holds rows where T(...) AND U(...)
T INNER JOIN U ONcondition holds rows where T(...) AND U(...) AND condition
T LEFT JOIN U ONcondition holds rows where (for U-only columns U1,...)
    T(...) AND U(...) AND condition
OR T(...)
    AND NOT there EXISTS values for U1,... where [U(...) AND condition]
    AND U1 IS NULL AND ...
T WHEREcondition holds rows where T(...) AND condition
T INTERSECT U holds rows where T(...) AND U(...)
T UNION U holds rows where T(...) OR U(...)
T EXCEPT U holds rows where T(...) AND NOT U(...)
SELECT DISTINCT * FROM T holds rows where T(...)
SELECT DISTINCTcolumns to keepFROM T holds rows where
there EXISTS values for columns to drop where T(...)
VALUES (C1, C2, ...)((v1,v2, ...), ...) holds rows where
C1 = v1 AND C2 = v2 AND ... OR ...
Also:
(...) IN T means T(...)
scalar= T means T(scalar)
T(..., X, ...) AND X = Y means T(..., Y, ...) AND X = Y
So to query we find a way of phrasing the predicate for the rows that we want in natural language using base table predicates, then in shorthand using base table predicates, then in shorthand using aliases in column names except for output columns, then in SQL using base table names plus ON & WHERE conditions etc. If we need to mention a base table twice then we give it aliases.
-- natural language
there EXISTS values for classID, primeFacil & secondFacil where
class [classID] named [className]
has prime [primeFacil] & backup [secondFacil]
AND facilitator [primeFacil] is named [pf.facilFname] [pf.facilLname]
AND facilitator [secondFacil] is named [sf.facilFname] [sf.facilLname]
-- shorthand
there EXISTS values for classID, primeFacil & secondFacil where
class(classID, className, primeFacil, secondFacil)
AND facilitator(pf.facilID, pf.facilLname, pf.facilFname)
AND pf.facilID = primeFacil
AND facilitator(sf.facilID, sf.facilLname, sf.facilFname)
AND sf.facilID = secondFacil
-- shorthand using aliases everywhere but result
-- use # to distinguish same-named result columns in specification
there EXISTS values for c.*, pf.*, sf.* where
className = c.className
AND facilLname#1 = pf.facilLname AND facilFname#1 = pf.facilFname
AND facilLname#2 = sf.facilLname AND facilFname#2 = sf.facilFname
AND class(c.classID, c.className, c.primeFacil, c.secondFacil)
AND facilitator(pf.facilID, pf.facilLname, pf.facilFname)
AND pf.facilID = c.primeFacil
AND facilitator(sf.facilID, sf.facilLname, sf.facilFname)
AND sf.facilID = c.secondFacil
-- table names & SQL (with MS Access parentheses)
SELECT className, pf.facilLname, pf.facilFname, sf.facilLname, sf.facilFname
FROM (class JOIN facilitator AS pf ON pf.facilID = primeFacil)
JOIN facilitator AS sf ON sf.facilID = secondFacil
OUTER JOIN would be used when a class doesn't always have both facilitators or something doesn't always have all names. (Ie if a column can be NULL.) But you haven't given the specific predicates for your base table and query or the business rules about when things might be NULL so I have assumed no NULLs.
Is there any rule of thumb to construct SQL query from a human-readable description?
(Re MS Access JOIN parentheses see this from SO and this from MS.)

Just do an extra join for the secondary facilitator (and please use table aliases!):
SELECT fc.className, f1.facilLname, f2.facilFname
FROM tbl_facilitatorClasses fc
INNER JOIN tbl_facilitators f1 ON fc.primeFacil = f1.facilID
INNER JOIN tbl_facilitators f2 ON fc.secondFacil = f2.facilID;

I would do it as above by joining to the tbl_facilitators table twice but you might want to make sure that every class really does require a 2nd facilitator as the second join should be an outer join instead. Indeed it might be safer to assume that it's not a required field.

Related

JOIN of 4 tables, how to restrict SELECT columns to one table only?

I am working on ABAP program - user input is to query column ANLAGE and output is to get all records from table EADZ (and only fields of EADZ) based on ANLAGE.
Statement and joins should work like this:
Input ANLAGE, find in table EASTL, gets LOGIKNR
Input LOGIKNR, find in table EGERR, gets EQUNR
Input EQUNR, find in table ETDZ, gets LOGIKZW
Input LOGIKZW, find in table EADZ, gets all records (this is the final output)
Here is the code I tried:
DATA: gt_cas_rezy TYPE STANDARD TABLE OF eadz,
lv_dummy_eanl LIKE eanl-anlage.
SELECT-OPTIONS: so_anl FOR lv_dummy_eanl NO INTERVALS NO-EXTENSION.
SELECT * FROM eadz
INNER JOIN etdz ON eadz~logikzw EQ etdz~logikzw
INNER JOIN egerr ON etdz~equnr EQ egerr~equnr
INNER JOIN eastl ON egerr~logiknr EQ eastl~logiknr
INTO CORRESPONDING FIELDS OF TABLE #gt_cas_rezy
WHERE eastl~anlage IN #so_anl.
I got the records from table EADZ except that the date fields are empty (even though, they are filled in database table). I am assuming there is a problem with JOINs since in statement like this I join all the fields of all 4 tables into one "record" and then to corresponding fields of internal table.
How to get the values of date fields?
You can find the answer in the documentation.
If a column name appears multiple times and no alternative column name was granted, the last column listed is assigned.
In your case, at least two tables share the same column name. Therefore the values from the last mentioned table are used in the join.
You can solve this by listing the columns explicitly (or eadz~* in your case), giving an alias if required.
SELECT EADZ~* FROM EADZ INNER JOIN ETDZ ON EADZ~LOGIKZW = ETDZ~LOGIKZW
INNER JOIN EGERR ON ETDZ~EQUNR = EGERR~EQUNR
INNER JOIN EASTL ON EGERR~LOGIKNR = EASTL~LOGIKNR
INTO CORRESPONDING FIELDS OF TABLE #gt_cas_rezy
WHERE EASTL~ANLAGE IN #SO_ANL.
If you require additional fields, you can add them explicily with e.g. EADZ~*, EASTL~A.

How can I make a SQL query that returns null if there is no record or return the value if there is?

I am trying to do a query on three different tables.
The variable table
The variable table carries information about what "area", "rounds" and
"days" the variable belongs to. The variable table also holds a pk column.
The pk is used to determine which variable a record belongs to.
The area table
The area table carries information about the "name" of the area as well as
the "role" the area belongs to. A user is assigned a role and then has
access to specific areas.
The record table
The record table carries information about the record that was entered. It
contains the "value", "alarmed", and "alarmType" columns. You can search
for a record based on the variable, round and day.
I am trying to query all of the variables in a certain round and day for a user.
I want to display all the variables whether or not there is a record found. Currently I have a query that only returns the variables that have records, but not the ones that don't.
If there is no record then thevalue, alarmed, and alarmType column should be null.
This is the query that I have so far constructed:
SELECT DISTINCT variable.name, area.name AS "areaName", variable.pk, CAST(record.value AS TEXT) AS "value", record.alarmed, record.alarmType
FROM variable, area, record
WHERE variable.round LIKE '%,1,%'
AND variable.day LIKE '%,3,%'
AND variable.readOnly = 0
AND variable.area IN (SELECT pk
FROM area
WHERE role = (SELECT role
FROM user
WHERE userName LIKE 'Leo'))
AND variable.area = area.pk
AND record.value = (SELECT CASE WHEN COUNT() < 1 THEN NULL
ELSE CAST(value AS TEXT) END
FROM record
WHERE round = 1
AND day = "11-14-2018"
AND variable = variable.pk)
ORDER BY variable.area, variable.position ASC;
Currently it returns something like this:
There are a lot more variables and I want to know how to display them even if there are no records.
I think I see what you're trying to do. The key is using joins (specifically OUTER joins) instead of trying to mash all the tables together and then find similarities. There are also LEFT, RIGHT and INNER flavors (read more about these here and here), depending on what you consider the "complete" or "master" data set - the starting point of your query.
Here's how I understand your relationships (let me know if I have this wrong):
record.variable --> variable.pk
variable.area --> area.pk
area.role --> user.role
In your case, you stated that you need all records from the variable table, so I would start with this:
SELECT v.*
FROM variable v;
Then, you might find all the AREA records related to a particular USER. Use an INNER join to find only records that exist on BOTH sides of the join:
SELECT a.*, u.*
FROM area a
INNER JOIN user u -- Define the table to join
ON a.role = u.role -- Which columns contain keys to match on
WHERE u.userName = 'Leo';
The WHERE filter applies to the user table, but because we are ONLY asking for records from the area table that have a match with user, then that limits the results from the area table.
The next step is to join these two extracts together using another INNER join, again, to find the intersection - matches that exist on BOTH sides of the join(s):
SELECT v.*, a.*, u.*
FROM variable v -- New starting point
INNER JOIN area a
ON a.pk = v.area
INNER JOIN user u
ON a.role = u.role
WHERE u.userName = 'Leo';
Now, we find all the records for a certain day by adding WHERE clauses:
SELECT v.*, a.*, u.*
FROM variable v
INNER JOIN area a
ON a.pk = v.area
INNER JOIN user u
ON a.role = u.role
WHERE u.userName = 'Leo'
AND v.round = 1 -- Add filters for "round"
AND v.day = '11-14-2018'; -- and "day" columns
Next, we use a LEFT join to give us all the records from the table on the "left" plus any matches we find on the "right" side (the "record" table) or NULL if no match is made:
SELECT v.name
,a.name as "areaName"
,CAST(r.value as TEXT) as "value"
,r.alarmed
,r.alarmType
FROM variable v
INNER JOIN area a
ON v.area = a.pk
INNER JOIN user u
ON a.role = u.role
LEFT JOIN record r -- LEFT is important here
ON v.pk = r.variable
WHERE u.userName = 'Leo'
AND v.round = 1
AND v.day = '11-14-2018'
ORDER BY v.area, v.position;
The result from INNER joins (variable + area + user) becomes the "left" side of this join, and the record becomes the "right" side. Using the LEFT join declares that we want ALL records from the left, whether they have a match on the right or not.
I don't have a dataset to test this with, so please excuse any errors I've made.
Hopefully, this illustrates how joins would be used to both eliminate rows and add data (columns) the result, without having to make individual queries or resort to sub-queries (using IN or EXISTS).

PostgreSQL - copy column from related table

So I have three tables: companies, addresses and company_address.
For optimization reasons I need to copy city column from addresses table to companies table. Relation between companies and addresses is many to one (as many companies can occupy same address). They are connected through company_address table, consisting of address_id and company_id columns.
I found this solution for case without intermediate table: How to copy one column of a table into another table's column in PostgreSQL comparing same ID
Trying to modify query I came up with:
UPDATE company SET company.city=foo.city
FROM (
SELECT company_address.company_id, company_address.address_id, address.city
FROM address LEFT JOIN company_address
ON address.id=company_address.address_id
) foo
WHERE company.id=foo.company_id;
but it gives error:
ERROR: column "company" of relation "company" does not exist
I cant figure out what is going on. I'll be grateful for any ideas.
You don't need a subquery for that. Also, refer in the SET clause to your table columns without preceding with table name.
I believe that since your WHERE condition includes joined table, it should be INNER JOIN instead of a LEFT JOIN.
UPDATE company c
SET city = a.city
FROM address a
INNER JOIN company_address ca ON a.id = ca.address_id
WHERE c.id = ca.company_id
Note how using aliases for table names shortens the code and makes it readable at the very first glance.
You're right syntactically, you just don't need the table name at the beginning of the update statement:
UPDATE company SET city=foo.city
FROM (
SELECT company_address.company_id, company_address.address_id, address.city
FROM address LEFT JOIN company_address
ON address.id=company_address.address_id
) foo
WHERE company.id=foo.company_id;

Subquery that matches column with several ranges defined in table

I've got a pretty common setup for an address database: a person is tied to a company with a join table, the company can have an address and so forth.
All pretty normalized and easy to use. But for search performance, I'm creating a materialized, rather denormalized view. I only need a very limited set of information and quick queries. Most of everything that's usually done via a join table is now in an array. Depending on the query, I can either search it directly or join it via unnest.
As a complement to my zipcodes column (varchar[]), I'd like to add a states column that has the (German fedaral) states already precomputed, so that I don't have to transform a query to include all kinds of range comparisons.
My mapping date is in a table like this:
CREATE TABLE zip2state (
state TEXT NOT NULL,
range_start CHARACTER VARYING(5) NOT NULL,
range_end CHARACTER VARYING(5) NOT NULL
)
Each state has several ranges, and ranges can overlap (one zip code can be for two different states). Some ranges have range_start = range_end.
Now I'm a bit at wit's end on how to get that into a materialized view all at once. Normally, I'd feel tempted to just do it iteratively (via trigger or on the application level).
Or as we're just talking about 5 digits, I could create a big table mapping zip to state directly instead of doing it via a range (my current favorite, yet something ugly enough that it prompted me to ask whether there's a better way)
Any way to do that in SQL, with a table like the above (or something similar)? I'm at postgres 9.3, all features allowed...
For completeness' sake, here's the subquery for the zip codes:
(select array_agg(distinct address.zipcode)
from affiliation
join company
on affiliation.ins_id = company.id
join address
on address.com_id = company.id
where affiliation.per_id = person.id) AS zipcodes,
I suggest a LATERAL join instead of the correlated subquery to conveniently compute both columns at once. Could look like this:
SELECT p.*, z.*
FROM person p
LEFT JOIN LATERAL (
SELECT array_agg(DISTINCT d.zipcode) AS zipcodes
, array_agg(DISTINCT z.state) AS states
FROM affiliation a
-- JOIN company c ON a.ins_id = c.id -- suspect you don't need this
JOIN address d ON d.com_id = a.ins_id -- c.id
LEFT JOIN zip2state z ON d.zipcode BETWEEN z.range_start AND z.range_end
WHERE a.per_id = p.id
) z ON true;
If referential integrity is guaranteed, you don't need to join to the table company at all. I took the shortcut.
Be aware that varchar or text behaves differently than expected for numbers. For example: '333' > '0999'. If all zip codes have 5 digits you are fine.
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?

Some SQL Questions

I have been using SQL for years, but have mostly been using the query designer within SQL Studio (etc.) to put together my queries. I've recently found some time to actually "learn" what everything is doing and have set myself the following fairly simple tasks. Before I begin, I'd like to ask the SOF community their thoughts on the questions, possible answers and any tips they may have.
The questions are;
Find all records w/ a duplicate in a particular column (e.g. a linking id is in more than 1 record throughout table)
SUM price from a linked table within the same query (select within a select?)
Explain the difference between the 4 joins; LEFT, RIGHT, OUTER, INNER
Copy data from one table to another based on SELECT and WHERE criteria
Input welcomed & appreciated.
Chris
I recommend that you start by following some tutorials on this topic. Your questions are not uncommon questions for someone moving from a beginner to intermediate level in SQL. SQLZoo is an excellent resource for learning SQL so consider following that.
In response to your questions:
1) Find all records with a duplicate in a particular column
There are two steps here: find duplicate records and select those records. To find the duplicate records you should be doing something along the lines of:
select possible_duplicate_field, count(*)
from table
group by possible_duplicate_field
having count(*) > 1
What we're doing here is selecting everything from a table, then grouping it by the field we want to check for duplicates. The count function then gives me a count of the number of items within that group. The HAVING clause indicates that we want to filter AFTER the grouping to only show the groups which have more than one entry.
This is all fine in itself but it doesn't give you the actual records that have those values on them. If you knew the duplicate values then you'd write this:
select * from table where possible_duplicate_field = 'known_duplicate_value'
We can use the SELECT within a select to get a list of the matches:
select *
from table
where possible_duplicate_field in (
select possible_duplicate_field
from table
group by possible_duplicate_field
having count(*) > 1
)
2) SUM price from a linked table within the same query
This is a simple JOIN between two tables with a SUM of the two:
select sum(tableA.X + tableB.Y)
from tableA
join tableB on tableA.keyA = tableB.keyB
What you're doing here is joining two tables together where those two tables are linked by a key field. In this case, this is a natural join which operates as you would expect (i.e. get me everything from the left table which has a matching record in the right table).
3) Explain the difference between the 4 joins; LEFT, RIGHT, OUTER, INNER
Consider two tables A and B. The concept of "LEFT" and "RIGHT" in this case are slightly clearer if you read your SQL from left to right. So, when I say:
select x from A join B ...
The left table is "A" and the right table is "B". Now, when you explicitly say "LEFT" the SQL statement you are declaring which of the two tables you are joining is the primary table. What I mean by this is: Which table do I scan through first? Incidentally, if you omit the LEFT or RIGHT, then SQL implicitly uses LEFT.
For INNER and OUTER you are declaring what to do when matches don't exist in one of the tables. INNER declares that you want everything in the primary table (as declared using LEFT or RIGHT) where there is a matching record in the secondary table. Hence, if the primary table contains keys "X", "Y" and "Z", and the secondary table contains keys "X" and "Z", then an INNER will only return "X" and "Z" records from the two tables.
When OUTER is used, we're saying: Give me everything from the primary table and anything that matches from the secondary table. Hence, in the previous example, we'd get "X", "Y" and "Z" records in the output record set. However, there would be NULLs in the fields which should have come from the secondary table for key value "Y" as it doesn't exist in the secondary table.
4) Copy data from one table to another based on SELECT and WHERE criteria
This is pretty trivial and I'm surprised you've never encountered it. It's a simple nested SELECT in an INSERT statement (this may not be supported by your database - if not, try the next option):
insert into new_table select * from old_table where x = y
This assumes the tables have the same structure. If you have different structures then you'll need to specify the columns:
insert into new_table (list, of, fields)
select list, of, fields from old_table where x = y
Let's say you have 2 tables named :
[OrderLine] with the columns [Id, OrderId, ProductId, Qty, Status]
[Product] with [Id, Name, Price]
1) all orderline of command having more than 1 line (it's technically the same as looking for duplicates on OrderId :) :
select OrderId, count(*)
from OrderLine
group by OrderId
having count(*) > 1
2) total price for all order line of the order 1000
select sum(p.Price * ol.Qty) as Price
from OrderLine ol
inner join Product p on ol.ProductId = p.Id
where ol.OrderId = 1000
3) difference between joins:
a inner join b => take all a that has a match with b. if b is not found, a will be not be returned
a left join b => take all a, match them with b, include a even if b is not found
a righ join b => b left join a
a outer join b => (a left join b) union ( a right join b)
4) copy order lines to a history table :
insert into OrderLinesHistory
(CopiedOn, OrderLineId, OrderId, ProductId, Qty)
select
getDate(), Id, OrderId, ProductId, Qty
from
OrderLine
where
status = 'Closed'
To answer #4 and to perhaps show at least some understanding of SQL and the fact this isn't HW, just me trying to learn best practise;
SET NOCOUNT ON;
DECLARE #rc int
if #what = 1
BEGIN
select id from color_mapper where product = #productid and color = #colorid;
select #rc = ##rowcount
if #rc = 0
BEGIN
exec doSavingSPROC #colorid, #productid;
END
END
END