Multiple table join with condition from one table - sql

I apologize in advance if something like this has already been discussed elsewhere, but if it has, I was unable to find it (I'm not even sure how to search such a thing). I'm trying to join two tables, "employees" and "leave." I want to list every employee from the "employees" table AND populate the report with leave data from the "leave" table where the 'leave date' (bdate column in the leave table) is greater than January 1st, 2014 (or current year). The problem is that not every employee has leave data, so I'm finding that a normal join only fetches data from those employees who actually have leave data. I think what I want is a left join, but I'm only getting records from both tables where there is actually data for that employee in both tables (hope that makes sense).
Select bunch_of_columns, leave.bdate, SUM(leave.Vhours) as TotalVacationHours, SUM(leave.shours) as TotalSickHours
from employees
left join leave on employees.id=leave.id
where employees.user_active ='1' AND leave.BDate >= '2014-01-01'
group by employees.id
Order by employees.user_last
This produces ONE record of an individual who has a leave record after "2014-01-01." I want a complete list of employee records from the employee table with available data from the leave table (and blank if there is none) where the "bdate" column in the leave table is greater than new years day.
I want this:
+-----+----------+---------------+---------------+--------------+
|ID | Name | Vacation Hrs | Sick Hrs | Date |
+-----+----------+---------------+---------------+--------------+
| 1 | Bob | 5 | 8 | 2014-01-01 |
| 2 | Lucy | NULL | NULL | NULL |
| 3 | Jerry | NULL | NULL | NULL |
| 4 | Dieter | 3 | 5 | 2014-01-08 |
| 5 | Sprockets| NULL | NULL | NULL |
+-----+----------+---------------+---------------+--------------+
Not this:
+-----+----------+---------------+---------------+--------------+
| row | Name | Vacation Hrs | Sick Hrs | Date |
+-----+----------+---------------+---------------+--------------+
| 1 | Bob | 5 | 8 | 2014-01-01 |
| 4 | Dieter | 3 | 5 | 2014-01-08 |
+-----+----------+---------------+---------------+--------------+

It's because of your WHERE condition.
leave.BDate >= '2014-01-01'
If you do a LEFT JOIN and then filter a column in the right table to something that can't be NULL, it's equivalent to doing an INNER JOIN.
If there's no leave date then the record doesn't fit the criteria. You should check instead that:
(leave.BDate >= '2014-01-01' OR leave.BDate IS NULL)

another way to write it (as pointed out by OGHaza) is apply the date condition to the JOIN portion
Select
bunch_of_columns,
leave.bdate,
COALESCE( SUM(leave.Vhours), 0 ) as TotalVacationHours,
COALESCE( SUM(leave.shours), 0 ) as TotalSickHours
from
employees
left join leave
on employees.id=leave.id
AND leave.BDate >= '2014-01-01'
where
employees.user_active ='1'
group by
employees.id
Order by
employees.user_last

Try Using Full Outer Join for such condition.
From MSDN
The full outer join or full join returns all rows from both tables, matching up the rows wherever a match can be made and placing NULLs in the places where no matching row exists.

Related

SQL Join to the latest record in MS ACCESS

I want to join tables in MS Access in such a way that it fetches only the latest record from one of the tables. I've looked at the other solutions available on the site, but discovered that they only work for other versions of SQL. Here is a simplified version of my data:
PatientInfo Table:
+-----+------+
| ID | Name |
+-----+------+
| 1 | John |
| 2 | Tom |
| 3 | Anna |
+-----+------+
Appointments Table
+----+-----------+
| ID | Date |
+----+-----------+
| 1 | 5/5/2001 |
| 1 | 10/5/2012 |
| 1 | 4/20/2018 |
| 2 | 4/5/1999 |
| 2 | 8/8/2010 |
| 2 | 4/9/1982 |
| 3 | 7/3/1997 |
| 3 | 6/4/2015 |
| 3 | 3/4/2017 |
+----+-----------+
And here is a simplified version of the results that I need after the join:
+----+------+------------+
| ID | Name | Date |
+----+------+------------+
| 1 | John | 4/20/2018 |
| 2 | Tom | 8/8/2010 |
| 3 | Anna | 3/4/2017 |
+----+------+------------+
Thanks in advance for reading and for your help.
You can use aggregation and JOIN:
select pi.id, pi.name, max(a.date)
from appointments as a inner join
patientinfo as pi
on a.id = pi.id
group by pi.id, pi.name;
something like this:
select P.ID, P.name, max(A.Date) as Dt
from PatientInfo P inner join Appointments A
on P.ID=A.ID
group by P.ID, P.name
Both Bing and Gordon's answers work if your summary table only needs one field (the Max(Date)) but gets more tricky if you also want to report other fields from the joined table, since you would need to include them either as an aggregated field or group by them as well.
Eg if you want your summary to also include the assessment they were given at their last appointment, GROUP BY is not the way to go.
A more versatile structure may be something like
SELECT Patient.ID, Patient.Name, Appointment.Date, Appointment.Assessment
FROM Patient INNER JOIN Appointment ON Patient.ID=Appointment.ID
WHERE Appointment.Date = (SELECT Max(Appointment.Date) FROM Appointment WHERE Appointment.ID = Patient.ID)
;
As an aside, you may want to think whether you should use a field named 'ID' to refer to the ID of another table (in this case, the Apppintment.ID field refers to the Patient.ID). You may make your db more readable if you leave the 'ID' field as an identifier specific to that table and refer to that field in other tables as OtherTableID or similar, ie PatientID in this case. Or go all the way and include the name of the actual table in its own ID field.
Edited after comment:
Not quite sure why it would crash. I just ran an equivalent query on 2 tables I have which are about 10,000 records each and it was pretty instanteneous. Are your ID fields (i) unique numbers and (ii) indexed?
Another structure which should do the same thing (adapted for your field names and assuming that there is an ID field in Appointments which is unique) would be something like:
SELECT PatientInfo.UID, PatientInfo.Name, Appointments.StartDateTime, Appointments.Assessment
FROM PatientInfo INNER JOIN Appointments ON PatientInfo_UID = Appointments.PatientFID
WHERE Appointments.ID = (SELECT TOP 1 ID FROM Appointments WHERE Appointments.PatientFID = PatientInfo_UID ORDER BY StartDateTime DESC)
;
But that is starting to look a bit contrived. On my data they both produce the same result (as they should!) and are both almost instantaneous.
Always difficult to troubleshoot Access when it crashes - I guess you see no error codes or similar? Is this against a native .accdb database or another server?

Select rows from a filtered portion of Table A where a column matches a relationship with a column from the row in Table B that matches by ID

I want to get all rows in a table where one column matches a relationship with the value of the column in the row in a different table that has the same value of another column.
Concretely, I have two tables, orders and product_info that I'm accessing through Amazon Redshift
Orders
| ID | Date | Amount | Region |
=====================================
| 1 | 2019/4/1 | $120 | A |
| 1 | 2019/4/4 | $100 | A |
| 2 | 2019/4/2 | $50 | A |
| 3 | 2019/4/6 | $70 | B |
The partition keys of order are region and date.
Product Information
| ID | Release Date | Region |
| ---- | ------------ | ------ |
| 1 | 2019/4/2 | A |
| 2 | 2019/4/3 | A |
| 3 | 2019/4/5 | B |
The primary key of product information is id, and the partition key is region.
I want to get all rows from Orders in region A where the date of the row is greater than the release date value in product information for that ID.
So in this case it should return just one row,
| 1 | 2019/4/4 | $100 | A |
I tried doing
select *
from orders
INNER JOIN product_info ON orders.date>product_info.release_date
AND orders.id=product_info.id
AND orders.region=A
AND product_info.region=A
limit 10
The problem is that this query was absurdly slow (cancelled it after 10 minutes). The tables are extremely large, and I have a feeling it was scanning the entire table without restricting it to region first (in reality I have other filters in addition to region that I want to apply to the list of IDs before I do the inner join, but I've limited it to only region for the sake of simplifying the question).
How can I efficiently write this type of query?
The best way to make an SQL query faster is to exclude rows as soon as possible.
So, rather than putting conditions like orders.region=A in the JOIN statement, you should move them to a WHERE statement. This will eliminate rows before they are joined.
Also, make the JOIN condition as simple as possible so that the database can optimize the comparison.
Try something like this:
SELECT *
FROM orders
INNER JOIN product_info ON orders.id = product_info.id
WHERE orders.region = 'A'
AND product_info.region = 'A'
AND orders.date > product_info.release_date
Any further optimization would require consideration of the DISTKEY and SORTKEY on the Redshift tables. (Preferably a DISTKEY of id and a SORTKEY of date).

Left join returning both more and less rows after query

I am a newbie to SQL and I would like to ask for help. I have 2 tables which I want to join and I would like to generate the same number of rows that table 1 has.
Here are the tables:
Table 1
+----------+------------+---------+-------+
| ENTRY_ID | ROUTE_NAME | STATION | BOUND |
+----------+------------+---------+-------+
| 1 | 1A | ABCC | 1 |
| 2 | 2C | CBDD | 1 |
| 3 | 5 | AAAA | 2 |
| 4 | 1A | EEEE | 1 |
| 5 | 2B | ASFA | 2 |
| 6 | 5 | DSAS | 1 |
| 7 | 3 | QWEA | 2 |
| 8 | 4 | ASDA | 1 |
+----------+------------+---------+-------+
Table 2
+------------+-------+---------+---------------+
| ROUTE_NAME | BOUND | STATION | STOP_SEQUENCE |
+------------+-------+---------+---------------+
| 1A | 1 | AAA | 1 |
| 1A | 1 | ABC | 2 |
| 1A | 1 | CDA | 3 |
| 1A | 2 | ABC | 1 |
| 1A | 2 | ADC | 2 |
| 1A | 2 | ACA | 3 |
Repeated for other Routes
Short description for the Table:
Table 1 contains certain transit trips, with transit route to be taken as ROUTE_NAME, departure stop as STATION and transit bound as BOUND (only 1/2).
Table 2 contains a set of transit route data, with similar field to Table 1 and the sequence of stop as STOP_SEQUENCE
What I would like to do, is to use STATION, BOUND and ROUTE_NAME IN Table 1 to call for STOP_SEQUENCE in Table 2. The code that I have used is :
SELECT t1.ENTRY_ID, t1.ROUTE_NAME, t1.STATION, t1.BOUND, t2.STOP_SEQUENCE
FROM T1
LEFT JOIN t2 ON
(t1.STATION LIKE '*' & t2.STATION & '*') AND
(t1.BOUND = t2.BOUND) AND
(t1.ROUTE_NAME = t2.ROUTE_NAME);
The LIKE is a must as there is some mismatch between the STATION string of the 2 tables, that can be handled by the function.
The first question is, why does the LEFT JOIN not return all rows from TABLE 1? I have a similar code that works in other similar tables. For the data that didn't match up (with the LIKE statement), NULL is returned for that particular row. However, in this query less rows are returned.
The second question is, with the LIKE statement I am returning one or more rows from table 2 from table 1 which matches my criteria (that has happened in my code that 2+ rows with same ENTRY_ID has been returned). How can I keep the minimum of the returned row? i.e. if two STOP_SEQUENCE is found, return the lower one.
Have struggled for this for a long time so many thanks for your help!
UPDATE
I have found that the sentence t1.STATION LIKE '*' & t2.STATION & '*' is causing the lack of rows as in the first question. I have replaced it with = and all rows came up again. However I still need this LIKE clause, what can I do?
Why does the LEFT JOIN not return all rows from TABLE 1?
I could only suspect this is an issue with your testing since the SQL code posted in your question will return the values held by columns t1.ENTRY_ID, t1.ROUTE_NAME, t1.STATION, & t1.BOUND for all records in your table t1, and the value of t2.STOP_SEQUENCE for all records in your table t2 which fulfil the join criteria for each record in your table t1.
Note that this will return multiple records from table t2 if more than one record fulfils the join criteria for a given record in table t1. Which leads to your next question:
With the LIKE statement I am returning one or more rows from table 2 from table 1 which matches my criteria (that has happened in my code that 2+ rows with same ENTRY_ID has been returned). How can I keep the minimum of the returned row? i.e. if two STOP_SEQUENCE is found, return the lower one.
You can achieve this with simple aggregation using the min function:
select
t1.entry_id,
t1.route_name,
t1.station,
t1.bound,
min(t2.stop_sequence) as stopseq
from
t1 left join t2 on
t1.station like '*' & t2.station & '*' and
t1.bound = t2.route_bound and
t1.route_name = t2.route_name
group by
t1.entry_id,
t1.route_name,
t1.station,
t1.bound
This will return the minimum value held by the field t2.stop_sequence within the group defined by each combination of values held by t1.entry_id, t1.route_name, t1.station, & t1.bound.
Aside, note that the sample table t2 in your question does not contain the field route_bound as referenced by your posted code.

Find the missing dates - vba Sql

I am trying to spot which student didn't submit his task and for what date. I want to check for every student whether current or not. I don't mind if the answer is in sql or vba code. More specification below:
Task Table
-------------------------------
SubID |ID | Task | Date
-------------------------------
1 |1 | Dance | 01-01-2014
2 |1 | Sing | 02-01-2014
3 |1 | shout | 05-01-2014
4 |2 | try | 02-01-2014
5 |3 | Okay | 01-01-2014
6 |2 | random| 06-01-2014
8 |4 |Jumping| 01-01-2014
9 |4 | try | 02-01-2014
10 |4 | Piano | 03-01-2014
11 |4 | try | 04-01-2014
12 |4 | guitar| 05-01-2014
13 |4 | try | 06-01-2014
Student table --the Date is in the dd-mm-yyyy format. - also it is a date/time datatype
ID | Name | Current
--------------------
1 | Ron | YES
2 | sqlJ | YES
3 | jque | NO
4 | holy | YES
5 | htdoc| YES
Desired Result:
Who Didn't submit their task between 01-01-2014 and 06-01-2014
ID | Date
---------------
1 | 03-01-2014
1 | 04-01-2014
1 | 06-01-2014
2 | 01-01-2014
2 | 03-01-2014
2 | 04-01-2014
2 | 05-01-2014
3 | 02-01-2014
3 | 03-01-2014
3 | 04-01-2014
3 | 05-01-2014
3 | 06-01-2014
What I have tried:
SELECT w.ID, w.Date, student.[first name], student.[last name], student.[id]
FROM tasktbl AS w
right join student
on w.id = student.[id];
//I was thinking of using a vba-for loop to iterate over the range of date and store it in an array spot every Id that doesn't have a date but it didn't work out quite well.
Any help ranging from pseudo-code to sql-code to vba-code (basically any hint towards my quest) will be appreciated
I used a calendar table which contains one row for each date. Cross joining that table with student, and restricting the date range to your 6 values gave me 30 rows (6 dates times 5 students):
SELECT sub.*
FROM
(
SELECT c.the_date, s.ID
FROM tblCalendar AS c, student AS s
WHERE c.the_date Between #1/1/2014# And #1/6/2014#
) AS sub
Then I used that as a subquery and LEFT JOINed it to tasktbl. So the rows where the "right side" is Null, are those where a student did not complete a task on the date in question.
SELECT sub.ID AS student_id, sub.the_date
FROM
(
SELECT c.the_date, s.ID
FROM tblCalendar AS c, student AS s
WHERE c.the_date Between #1/1/2014# And #1/6/2014#
) AS sub
LEFT JOIN tasktbl AS t
ON (sub.ID = t.ID) AND (sub.the_date = t.Date)
WHERE t.ID Is Null
ORDER BY sub.ID, sub.the_date;
You should be able to do this without any VBA.
select s.id, t.date
from
( --this will populate every possible date, for every student
select distinct s.id
from students s
) s
cross join
(
select distinct t.date
from taskTable t
) t
left join taskTable tt on s.id = tt.id and t.date = tt.date
where tt.date is null
--(optional) order by clause, if needed
Essentially what this is doing is pulling one entry for each possible student (from the "s" table) and one entry for each possible date (from the "t" table) and doing a cross join, which generates a result set with a row containing each possible date for each possible student. By left joining this back to the taskTable (where the taskTable's date field is null) will only return the entries from the cross join table that do not have tasks associated with them.
I would also recommend you re-think the design of the TaskTable, and instead of tracking only completed tasks, track all tasks and add a field "completed" which stores if the task has been completed (Yes) or not (No).

SQL display status for records with null values

I am trying to create a query that lists records from table 1 along with a status based on corresponding records in table 2 that have null values in one or more of the fields. The problem I am running into is how to include records from table 1 which have no corresponding record in table 2.
In my example, I want to list the names of all students in tblStudent along with a field indicating the status of their schedule in tblStudentSchedule. If either course or teacher field in tblStudentSchedule is Null or no corresponding record in tblStudentSchedule is found then I want to display "Incomplete". Otherwise, I want to display "Complete".
desired result
Name | Schedule Status
-----------------------
Bob | Incomplete
Sally | Incomplete
Jane | Incomplete
Matt | Incomplete
Jim | Complete
I'm working in Access. I would post my query attempts but I think they would just confuse the issue. This is probably very basic but I am having a mental block trying to wrap my brain around this one.
tblStudent
studentID | studentName
-----------------------
1 | Bob
2 | Sally
3 | Jane
4 | Matt
5 | Jim
tblStudentSchedule
studentID | period | course | teacher
-------------------------------------
1 | 1 | math | Jones
1 | 2 | <null> | Watson
2 | 1 | reading| <null>
4 | 1 | <null> | Crick
5 | 1 | math | Jones
select s.studentName as Name
, iif(sum(iif(ss.course is null or ss.teacher is null, 1, 0)) = 0,
'Complete', 'Incomplete')
as [Schedule Status]
from tblStudent s
left join
tblStudentSchedule ss
on ss.studentID = s.studentID
group by
s.studentName
A left join returns a single row with null when a match is not found. So the check on ss.course is null will also trigger when the student is absent from the schedule table.
If no corresponding record in tblStudentSchedule is found, you can get rows from this table as null coulmns by using left or right joins. Read below:
http://office.microsoft.com/en-us/access-help/left-join-right-join-operations-HP001032251.aspx
And then to convert null column use isnull function
http://www.techonthenet.com/access/functions/advanced/isnull.php
Or use a case statement to check for null
http://www.techonthenet.com/access/functions/advanced/case.php