I am trying to create a query that lists records from table 1 along with a status based on corresponding records in table 2 that have null values in one or more of the fields. The problem I am running into is how to include records from table 1 which have no corresponding record in table 2.
In my example, I want to list the names of all students in tblStudent along with a field indicating the status of their schedule in tblStudentSchedule. If either course or teacher field in tblStudentSchedule is Null or no corresponding record in tblStudentSchedule is found then I want to display "Incomplete". Otherwise, I want to display "Complete".
desired result
Name | Schedule Status
-----------------------
Bob | Incomplete
Sally | Incomplete
Jane | Incomplete
Matt | Incomplete
Jim | Complete
I'm working in Access. I would post my query attempts but I think they would just confuse the issue. This is probably very basic but I am having a mental block trying to wrap my brain around this one.
tblStudent
studentID | studentName
-----------------------
1 | Bob
2 | Sally
3 | Jane
4 | Matt
5 | Jim
tblStudentSchedule
studentID | period | course | teacher
-------------------------------------
1 | 1 | math | Jones
1 | 2 | <null> | Watson
2 | 1 | reading| <null>
4 | 1 | <null> | Crick
5 | 1 | math | Jones
select s.studentName as Name
, iif(sum(iif(ss.course is null or ss.teacher is null, 1, 0)) = 0,
'Complete', 'Incomplete')
as [Schedule Status]
from tblStudent s
left join
tblStudentSchedule ss
on ss.studentID = s.studentID
group by
s.studentName
A left join returns a single row with null when a match is not found. So the check on ss.course is null will also trigger when the student is absent from the schedule table.
If no corresponding record in tblStudentSchedule is found, you can get rows from this table as null coulmns by using left or right joins. Read below:
http://office.microsoft.com/en-us/access-help/left-join-right-join-operations-HP001032251.aspx
And then to convert null column use isnull function
http://www.techonthenet.com/access/functions/advanced/isnull.php
Or use a case statement to check for null
http://www.techonthenet.com/access/functions/advanced/case.php
Related
EDIT
I've edited this question to make it a little more concise, if you see my edit history you will see my effort and 'what I've tried' but it was adding a lot of unnecessary noise and causing confusion so here is a summary of input and output:
People:
ID | FullName
--------------------
1 | Jimmy
2 | John
3 | Becky
PeopleJobRequirements:
ID | PersonId | Title
--------------------
1 | 1 | Some Requirement
2 | 1 | Another Requirement
3 | 2 | Some Requirement
4 | 3 | Another Requirement
Output:
FullName | RequirementTitle
---------------------------
Jimmy | Some Requirement
Jimmy | Another Requirement
John | Some Requirement
John | null
Becky | null
Becky | Another Requirement
Each person has 2 records, because that's how many distinct requirements there are in the table (distinct based on 'Title').
Assume there is no third table - the 'PeopleJobRequirements' is unique to each person (one person to many requirements), but there will be duplicate Titles in there (some people have the same job requirements).
Sincere apologies for any confusion caused by the recent updates.
CROSS JOIN to get equal record for each person and LEFT JOIN for matching records.
Following query should work in your scenario
select p.Id, p.FullName,r.Title
FROM People p
cross join (select distinct title from PeopleJobRequirements ) pj
left join PeopleJobRequirements r on p.id=r.personid and pj.Title=r.Title
order by fullname
Online Demo
Output
+----+----------+---------------------+
| Id | FullName | Title |
+----+----------+---------------------+
| 3 | Becky | Another Requirement |
+----+----------+---------------------+
| 3 | Becky | NULL |
+----+----------+---------------------+
| 1 | Jimmy | Some Requirement |
+----+----------+---------------------+
| 1 | Jimmy | Another Requirement |
+----+----------+---------------------+
| 2 | John | NULL |
+----+----------+---------------------+
| 2 | John | Some Requirement |
+----+----------+---------------------+
use left join, no need any subquery
select p.*,jr.*,jrr.*
from People p left join
PeopleJobRequirements jr on p.Id=jrPersonId
left join JobRoleRequirements jrr p.id=jrr.PersonId
according the explanation, People and PeopleJobRequirements tables have many to many relationship (n to n).
so first of all you'll need another table to relate these to table.
first do this and then a full join will make it right.
I'm pretty new to SQL Server (using ssms). I need some help to insert and organize data from one table into multiple tables (which are connected to each other by PK/FK).
The source table has the following columns:
Email, UserName, Phone
It's a messy table with lots of duplicates: same email with different username and so on..
My data tables are:
Person - PersonID(PK, int, not null)
Email - Email (nvarchar, null) , PersonID (FK, int, not null)
Phone - PhoneNumber (int, null) , PersonID (FK, int, not null)
UserName - UserName (nvarchar, null) , PersonID (FK, int, not null)
For each row in the source table, I need to check if the person already exists (by Email); if it does exist, I need to add the new data (if any), else I need to create a new person and add the data.
I searched here for some solutions and find recommendations of using CURSOR.
I tried it, but it takes a really long time to execute (hours.. and still going)
Thanks for any help!
example:
from>
EMAIL | USERNAME | PHONE
------------------------
a#a.a | john | 956165
b#b.b | smith | 123456
c#c.c | bob | 654321
d#d.d | mike | 986514
a#a.a | dan | 658732
e#e.e | dave | 147258
f#f.f | harry | 951962
b#b.b | emmy | 456789
g#g.g | kelly | 789466
h#h.h | kelly | 258369
a#a.a | ana | 852369
to>
EMAIL | PERSONID
----------------
a#a.a | 1
b#b.b | 2
c#c.c | 3
d#d.d | 4
e#e.e | 5
f#f.f | 6
g#g.g | 7
h#h.h | 8
USERNAME | PERSONID
-------------------
john | 1
smith | 2
bob | 3
mike | 4
dan | 1
dave | 5
harry | 6
emmy | 2
kelly | 7
kelly | 8
ana | 1
PHONE | PERSONID
----------------
956165 | 1
123456 | 2
654321 | 3
986514 | 4
658732 | 1
147258 | 5
951962 | 6
456789 | 2
789466 | 7
258369 | 8
852369 | 1
Cursors will generally be slower as they operate on a row-by-row basis. Using set based operations, such as a join, will yield better performance. It's somewhat older, but this article further details the implications of cursors as opposed to set operations. I wasn't entirely sure what columns you want to use to verify matches, as well as what data to add, but a basic example is below and you can fill in the columns as necessary. The Email table was used in the example. For the UPDATE, this will update existing rows based off corresponding rows in the source table. Being an INNER JOIN, only rows with matches on both sides will be impacted. In the second statement, this is an INSERT using only rows from the source table that don't exist in the Email table. This same functionality could also be accomplished using the MERGE statement, however there are a number of issues with this, including problems with deadlocks and key violations.
Update Existing Rows:
UPDATE E
SET E.ColumnA = SRC.ColumnA,
E.ColumnB = SRC.ColumnB
FROM YourDatabase.YourSchema.Email E
INNER JOIN YourDatabase.YourSchema.SourceTable SRC
ON E.Email = SRC.Email
Add New Rows:
INSERT INTO YourDatabase.YourSchema.Email (ColumnA, ColumnB)
SELECT
ColumnA,
ColumnB
FROM YourDatabase.YourSchema.SourceTable
WHERE EMAIL NOT IN ((SELECT EMAIL FROM YourDatabase.YourSchema.Email))
I'm working with a SQLite database that receives large data dumps on a regular basis from several sources. Unfortunately, those sources aren't intelligent about what they dump, and I end up with a lot of repeated records from one time to the next. I'm looking for a way to remove these repeated records without affecting the records that have legitimately changed from the past dump to this one.
Here's the general structure of the data (_id is the primary key):
| _id | _dateUpdated | _dateEffective | _dateExpired | name | status | location |
|-----|--------------|----------------|--------------|------|--------|----------|
| 1 | 2016-05-01 | 2016-05-01 | NULL | Fred | Online | USA |
| 2 | 2016-05-01 | 2016-05-01 | NULL | Jim | Online | USA |
| 3 | 2016-05-08 | 2016-05-08 | NULL | Fred | Offline| USA |
| 4 | 2016-05-08 | 2016-05-08 | NULL | Jim | Online | USA |
| 5 | 2016-05-15 | 2016-05-15 | NULL | Fred | Offline| USA |
| 6 | 2016-05-15 | 2016-05-15 | NULL | Jim | Online | USA |
I'd like to be able to reduce this data to something like this:
| _id | _dateUpdated | _dateEffective | _dateExpired | name | status | location |
|-----|--------------|----------------|--------------|------|--------|----------|
| 1 | 2016-05-01 | 2016-05-01 | 2016-05-07 | Fred | Online | USA |
| 2 | 2016-05-15 | 2016-05-01 | NULL | Jim | Online | USA |
| 3 | 2016-05-15 | 2016-05-08 | NULL | Fred | Offline| USA |
The idea here is that rows 4, 5, and 6 exactly duplicate rows 2 and 3 except for the timestamps (I'd need to compare by all three fields - name, status, location). However, row 3 does not duplicate row 1 (status changed from Online to Offline), so the _dateExpired field is set in row 1, and row 3 becomes the most recent record.
I'm querying this table with something like this:
SELECT * FROM Data WHERE
date(_dateEffective) <= date("now")
AND (_dateExpired IS NULL OR date(_dateExpired) > date("now"))
Is this sort of reduction possible in SQLite?
I am still a beginner to SQL and database design in general, so it's possible that I haven't structured the database in the best way. I'm open to suggestions there as well...I'm going for the ability to query data at a given point in time - for example, "what was Jim's status around 2016-05-06?"
Thanks in advance!
Consider using a staging table where the dump file goes into a DumpTable (regularly cleaned out before each dump) and then an INSERT...SELECT query migrates to your final table.
Now the SELECT portion maintains a correlated subquery (to calculate new [_dateExpired] for needed rows) and derived table subquery (to filter out non-dups according to your criteria). Finally, the LEFT JOIN...NULL with FinalTable is to ensure no duplicate records are appended, assuming [_id] is a unique identifier. Below is the routine:
Clean Out DumpTable
DELETE FROM DumpTable;
Run Dump Routine to be appended into DumpTable
Append Records to FinalTable
INSERT INTO FinalTable ([_id], [_dateUpdated], [_dateEffective], [_dateExpired],
[name], status, location)
SELECT d.[_id], d.[_dateUpdated], d.[_dateEffective],
(SELECT Min(date(sub.[_dateEffective], '-1 day'))
FROM DumpTable sub
WHERE sub.[name] = DumpTable.[name]
AND sub.[_dateEffective] > DumpTable.[_dateEffective]
AND sub.status <> DumpTable.status) As calcExpired
d.name, d.status, d.location
FROM DumpTable d
INNER JOIN
(SELECT Min(DumpTable.[_id]) AS min_id,
DumpTable.name, DumpTable.status
FROM DumpTable
GROUP BY DumpTable.name, DumpTable.status) AS c
ON (c.name = d.name)
AND (c.min_id = d.[_id])
AND (c.status = d.status)
LEFT JOIN FinalTable f
ON d.[_id] = f.[_id]
WHERE f.[_id] IS NULL;
-- INSERTED RECORDS:
-- _id _dateUpdated _dateEffective _dateExpired name status location
-- 1 2016-05-01 2016-05-01 2016-05-07 Fred Online USA
-- 2 2016-05-01 2016-05-01 Jim Online USA
-- 3 2016-05-08 2016-05-08 Fred Offline USA
Is this sort of reduction possible in SQLite?
The answer to any "reduction" question in SQL is always Yes. The trick is to find what axes you're reducing along.
Here's a partial solution to illustrate; it gives the first Online date for each name & location.
select min(_dateEffective) as start_date
, name
, location
from Data
where status = 'Online'
group by
name
, location
With an outer join back to the table (on name & location) where the status is 'Offline' and the _dateEffective is greater than start_date, you get your _dateExpired.
_id is the primary key
There is a commonly held misunderstanding that every table needs some kind of sequential "ID" number as a primary key. The key you really care about is known as a natural key, 1 or more columns in the data that uniquely identify the data. In your case, it looks to me like that's _dateEffective, name, status, and location. At the very least, declare them unique to prevent accidental duplication.
I apologize in advance if something like this has already been discussed elsewhere, but if it has, I was unable to find it (I'm not even sure how to search such a thing). I'm trying to join two tables, "employees" and "leave." I want to list every employee from the "employees" table AND populate the report with leave data from the "leave" table where the 'leave date' (bdate column in the leave table) is greater than January 1st, 2014 (or current year). The problem is that not every employee has leave data, so I'm finding that a normal join only fetches data from those employees who actually have leave data. I think what I want is a left join, but I'm only getting records from both tables where there is actually data for that employee in both tables (hope that makes sense).
Select bunch_of_columns, leave.bdate, SUM(leave.Vhours) as TotalVacationHours, SUM(leave.shours) as TotalSickHours
from employees
left join leave on employees.id=leave.id
where employees.user_active ='1' AND leave.BDate >= '2014-01-01'
group by employees.id
Order by employees.user_last
This produces ONE record of an individual who has a leave record after "2014-01-01." I want a complete list of employee records from the employee table with available data from the leave table (and blank if there is none) where the "bdate" column in the leave table is greater than new years day.
I want this:
+-----+----------+---------------+---------------+--------------+
|ID | Name | Vacation Hrs | Sick Hrs | Date |
+-----+----------+---------------+---------------+--------------+
| 1 | Bob | 5 | 8 | 2014-01-01 |
| 2 | Lucy | NULL | NULL | NULL |
| 3 | Jerry | NULL | NULL | NULL |
| 4 | Dieter | 3 | 5 | 2014-01-08 |
| 5 | Sprockets| NULL | NULL | NULL |
+-----+----------+---------------+---------------+--------------+
Not this:
+-----+----------+---------------+---------------+--------------+
| row | Name | Vacation Hrs | Sick Hrs | Date |
+-----+----------+---------------+---------------+--------------+
| 1 | Bob | 5 | 8 | 2014-01-01 |
| 4 | Dieter | 3 | 5 | 2014-01-08 |
+-----+----------+---------------+---------------+--------------+
It's because of your WHERE condition.
leave.BDate >= '2014-01-01'
If you do a LEFT JOIN and then filter a column in the right table to something that can't be NULL, it's equivalent to doing an INNER JOIN.
If there's no leave date then the record doesn't fit the criteria. You should check instead that:
(leave.BDate >= '2014-01-01' OR leave.BDate IS NULL)
another way to write it (as pointed out by OGHaza) is apply the date condition to the JOIN portion
Select
bunch_of_columns,
leave.bdate,
COALESCE( SUM(leave.Vhours), 0 ) as TotalVacationHours,
COALESCE( SUM(leave.shours), 0 ) as TotalSickHours
from
employees
left join leave
on employees.id=leave.id
AND leave.BDate >= '2014-01-01'
where
employees.user_active ='1'
group by
employees.id
Order by
employees.user_last
Try Using Full Outer Join for such condition.
From MSDN
The full outer join or full join returns all rows from both tables, matching up the rows wherever a match can be made and placing NULLs in the places where no matching row exists.
What is the most efficient way to get data from two tables set up in the following way:
Table 1:
ID(PK) | Name | Age
--------------------------
1 | Jim | 44
2 | Jane | 35
3 | John | 22
Table 2
Name(PK) | Pet(PK)
-----------------
Jim | Cat
Jim | Dog
Jane | Fish
There is a constraint on "Name" with the FK in Table 2
Results
I want the age and all the pets for a specific person.
Name | Age | Pet
---------------------
Jim | 44 | Cat
Jim | 44 | Dog
As I see it these are my options:
1) Left join table 2 on Name and end up with redundant data in my resulting array for Name and Age (as above).
2) Use a function that turns the pets into a comma separated list.
3) Use 2 separate selects.
My question is relating to performance of the 3 options above. I don't need SQL (specifically, unless you want to suggest another method).
Thanks!
select
tb01.name, tb01.age, tb02.pet
from
table01 tb01
left join table02 tb02 on tb02.name = tb01.name