Query Design, Strange Results - New to SQL-Server - sql

I've written a query that's producing ghost records. Here's the statements which produce correct results on one table JOINed to a second table to grab the student's LAST_ATTEND_DATE, notice the LAST_ATTEND_DATE won't display, commented out:
SELECT DISTINCT TOP 500
SAC.STC_PERSON_ID AS CCID#,
SAC.STC_COURSE_NAME AS CourseName,
SAC.STC_TITLE AS Title,
SAC.STC_VERIFIED_GRADE AS Grade,
--CONVERT(varchar(10),SCS.SCS_LAST_ATTEND_DATE,101) AS LastAttended,
SAC.STC_REPORTING_TERM AS Term,
SAC.STC_ACAD_LEVEL AS AcadLevel
FROM STUDENT_ACAD_CRED SAC
JOIN STUDENT_COURSE_SEC SCS ON SAC.STC_PERSON_ID = SCS.SCS_STUDENT
WHERE (SAC.STC_ACAD_LEVEL = 'UG') AND (SCS.SCS_LAST_ATTEND_DATE IS NOT NULL)
ORDER BY SAC.STC_PERSON_ID;
This produces what I need except I need to display in the resulting data the students Last Attended Date. If I un-comment the statement above to display the LAST_ATTEND_DATE, 4 records appear in which 2 are ghost records. For example student ID = '0000002', he took English 1010 once in the Fall of 1992, made a D, then retook the course again in the Fall of 1993 and made a B.
0000002 ENGL*1010 English I D 92/FA UG
0000002 ENGL*1010 English I B 93/FA UG
With the LAST_ATTEND_DATE statement (CONVERT(varchar(10),SCS.SCS_LAST_ATTEND_DATE,101) AS LastAttended) un-commented to display the date, then 3 additional records appear...
I've tried changing the query between the 2 tables from JOIN, to LEFT JOIN, FULL JOIN and RIGHT JOIN. I always get 3 additional records that don't exist.
0000002 ENGL*1010 English I B 01/19/1995 93/FA UG
0000002 ENGL*1010 English I B 07/18/1996 93/FA UG
0000002 ENGL*1010 English I B 09/25/1992 93/FA UG
0000002 ENGL*1010 English I D 01/19/1995 92/FA UG
0000002 ENGL*1010 English I D 07/18/1996 92/FA UG
Would anyone know the correct syntax to JOIN these 2 tables correctly to display the data correctly?
Thanks so much for sharing your knowledge,
Donald, Casper College

Most likely the Student_Course_Sec table contains more than one record per student, which your join statement is not accounting for.
For example, if the SCS table consists of:
SCS_Student SCS_CourseName SCS_LastAttendDate
1 English 1/1/2014
1 Calculus 2/1/2014
2 English 3/1/2014
2 Philsolphy 4/1/2014
And your SAC table consists of:
STC_PERSON_ID STC_COURSE_NAME etc.
1 English
1 Calculus
2 English
2 Philosophy
then when you SELECT * FROM SAC JOIN SCS ON SAC.STS_PERSON_ID = SCS.SCS_STUDENT, your result set looks like this:
(row) STC_ID STC_Course SCS_ID SCS_Course SCS_Date
1 1 English 1 English 1/1/2014
2 1 English 1 Calculus 2/1/2014
3 1 Calculus 1 English 1/1/2014
4 1 Calculus 1 Calculus 2/1/2014
5 2 English 2 English 3/1/2014
6 2 English 2 Philosophy 4/1/2014
7 2 Philosophy 2 English 3/1/2014
8 2 Philosophy 2 Philosophy 4/1/2014
Your WHERE clause then filters out all the rows where STC_COURSE is not "English", leaving you with 4 rows (row numbers 1,2,5,6) instead of just the 2 you really want (rows 1 and 5). (And, because you're not reporting any of the other fields, it just looks like "phantom records" appear out of nowhere.)
To fix it, you need additional conditions on your JOIN specifying what else besides the ID needs to match up. In my contrived case, you would need to say
JOIN STUDENT_COURSE_SEC SCS on SAC.STS_PERSON_ID = SCS.SCS_STUDENT and SAC.STC_COURSE_NAME = SCS.SCS_COURSE_NAME, selecting only rows where both the student and the course are a proper match.

The 'ghost' records are actually the true result set. The reason that they don't display when you comment out SCS.SCS_LAST_ATTEND_DATE is that you are creating duplicate records since the date is the only differentiator, and your DISTINCT is suppressing the duplicates.
If you remove the DISTINCT, and leave SCS.SCS_LAST_ATTEND_DATE commented out, you should then get the same number of rows as when you uncomment the date.
Playing around with the JOIN types implies that you don't really know what you are trying to query. As #MarkD said in the comments, we would need to see your data model in order to help you further.

Related

Table Join issue

Right now I've got a Main table in which I am uploading data. Because the Main table has many different duplicates, I Append various data out of the Main table into other tables such as, username, phone number, and locations in order to keep things optimized. Once I have everything stripped down from the Main table, I then append what's left into a final optimized Main table. Before this happens though, I run a select query joining all the stripped tables with the original Main table in order to connect the IDs from each table, with the correct data. For example:
Original Main Table
--Name---------Number------Due Date-------Location-------Charges Monthly-----Charges Total--
John Smith 111-1111 4/3 Chicago 234.56 500.23
Todd Jones 222-2222 4/3 New York 174.34 323.56
John Smith 111-1111 4/3 Chicago 274.56 670.23
Bill James 333-3333 4/3 Orlando 100.00 100.00
This gets split into 3 tables (name, number, location) and then there is a date table with all the dates for the year:
Name Table Number Table Location Table Due Date Table
--ID---Name------ -ID--Number--------- ---ID---Location---- --Date---
1 John Smith 1 111-1111 1 Chicago 4/1
2 Todd Jones 2 222-2222 2 New York 4/2
3 Bill James 3 333-3333 3 Orlando 4/3
Before The Original table gets stripped, I run a select query that grabs the ID from the 3 new tables, and joins them based on the connection they have with the original Main table.
Select Output
--Name ID----Number ID---Location ID---Due Date--
1 1 1 4/3
2 2 2 4/3
1 1 1 4/3
3 3 3 4/3
My issue comes when I need to introduce a new table that isn't able to be tied into the Original Main Table. I have an inventory table that, much like the original Main table, has duplicates and needs to be optimized. I do this by creating a secondary table that takes all the duplicated devices out and put them in their own table, and then strips the username and number out and puts them into their tables. I would like to add the IDs from this new device table into the select output that I have above. Resulting in:
Select Output
--Name ID----Number ID---Location ID---Due Date--Device ID---
1 1 1 4/3 1
2 2 2 4/3 1
1 1 1 4/3 2
3 3 3 4/3 1
Unlike the previous tables, the device table has no relationship to the originalMain Table, which is what is causing me so much headache. I can't seem to find a way to make this happen...is there anyway to accomplish this?
Any two tables can be joined. A table represents an application relationship. In some versions (not the original) of Entity-Relationship Modelling (notice that the "R" in E-R stands for "(application) relationship"!) a foreign key is sometimes called a "relationship". You do not need other tables or FKs to join any two tables.
Explain, in terms of its column names and the values for those names, exactly when a row should turn up in the result. Maybe you want:
SELECT *
FROM the stripped-and-ID'd version of the Original AS o
JOIN the stripped-and-ID'd version of the Device AS d
USING NameID, NumberID, LocationID and DueDate
Ie
SELECT *
FROM the stripped-and-ID'd version of the Original AS o
JOIN the stripped-and-ID'd version of the Device AS d
ON o.NameID=d.NameId AND o.NumberID=d.NumberID
AND o.LocationID=d.LocationID AND o.DueDateID=d.DueDate.
Suppose p(a,...) is some statement parameterized by a,... .
If o holds the rows where o(NameID,NumberID,LocationID,DueDate) and d holds the rows where d(NameID,NumberID,LocationID,DueDate,DeviceID) then the above holds the rows where o(NameID, NumberID, LocationID, DueDate) AND d(NameID,NumberID,LocationID,DueDate,DeviceID). But you really have not explained what rows you want.
The only way to "join" tables that have no relation is by unioning them together:
select attribute1, attribute2, ... , attributeN
from table1
where <predicate>
union // or union all
select attribute1, attribute2, ... , attributeN
from table2
where <predicate>
the where clauses are obviously optional
EDIT
optionally you could join the tables together by stating ON true which will act like a cross product

Query to JOIN / *overwrite* field

I'm not sure if I'm using the correct terminology.
SELECT movies.*, actors.`First Name`, actors.`Last Name`
From movies
Inner Join actors on movies.`actor1` Where movies.`actor1` = actors.`indexActors`;
#Inner Join actors on movies.`actor2` Where movies.`actor2` = actors.`indexActors`;
I have the 2nd line commented out, each one works individually, and I'm wondering how to combine them.
2ndly, when I execute the query, I get the results:
ID Title Runtime Rating Actor1 Actor2 First Name Last Name
1 Se7en 127 R 1 2 Morgan Freeman
2 Bruce Almighty 101 PG-13 1 3 Morgan Freeman
3 Mr. Popper's Penguins 94 PG 3 4 Jim Carrey
4 Superbad 113 R 4 5 Emma Stone
5 Crazy, Stupid, Love. 118 PG-13 4 Null Emma Stone
Is there a way to add the results from the 2nd join to the rightmost columns?
Also, is it possible to combine the strings/VARCHARs from First Name and Last Name, and then have that value show up under the corresponding Actor Field?
(aka the field under Actor 1 for row 1 would be "Morgan Freeman" instead of "1")
Thanks.
Your sql is not valid, but you can achieve your goal by joining to the same table twice, with different aliases. This sort of thing
select blah blah blah
from table1 t1 join table2 t2 on t1.field1 = t2.field1
join table2 t2_again on t1.field1 = t2_again.field2
etc
As far as joining first and last names in a single field, most databases have a way to concatenate strings, but they are not all the same. You'll have to specify your db engine.

SQL Join When Dates Codes are Involved

I was curious about something. Let's say I have two tables, one with sales and promo codes, the other with only promo codes and attributes. Promo codes can be turned on and off, but promo attributes can change. Here is the table structure:
tblSales tblPromo
sale promo_cd date promo_cd attribute active_dt inactive_dt
2 AAA 1/1/2013 AAA "fun" 1/1/2013 1/1/3001
3 AAA 6/2/2013 BBB "boo" 1/1/2013 6/1/2013
8 BBB 2/2/2013 BBB "green" 6/2/2013 1/1/3001
9 BBB 2/3/2013
10 BBB 8/1/2013
Please note, this is not my table/schema/design. I don't understand why they don't just make new promo_cd's for each change in attribute, especially when attribute is what we want to measure. Anyway, I'm trying to make a table that looks like this:
sale promo_cd attribute
2 AAA fun
3 AAA fun
8 BBB boo
9 BBB boo
10 BBB green
The only thing I have done so far is just create an inner join (which causes duplicate records) and then filter by comparing the sale date to the promo active/inactive dates. Is there a better way to do this, though? I was really curious since this is a pretty big set of data and I'd love to keep it efficient.
This is one of those cases where I like to put the filtering conditions right into the JOIN clause. At least in my brain, the duplicate records never make it into the result set. That leaves the WHERE clause for actual filtering conditions.
Select s.sale, s.promo_cd, p.attribute
From tblSales s
Inner Join tblPromo p
on s.promo_cd=p.promo_cd
and s.date between p.active_dt and p.inactive_dt
Assuming I understand you correctly, you can use:
SELECT s.sale, s.promo_cd, p.attribute
FROM tblSales s
JOIN tblPromo p ON p.promo_cd = s.promo_cd AND s.date BETWEENp.active_dt and p.inactive_dt
This assumes that tblPromo dates will never overlap (which seems likely given the schema they chose)
Just add the date to your JOIN criteria:
SELECT a.sale, a.promo_cd, b.attribute
FROM tblSales a
JOIN tblPromo b
ON a.promo_cd = b.promo_cd
AND a.date BETWEEN b.active_dt AND b.inactive_dt
Demo: SQL Fiddle

How can I avoid using multiple SQL calls to get data?

I have a MySQL database, in which I have a table of monkeys:
id name
1 Alice
2 Bill
3 Donkey Kong
4 Edna
5 Feefee
I also have a table of bananas and where they were picked from.
id where_from
1 USA
2 Botswana
3 Banana-land
4 USA
Finally, I have a table matches that describes which bananas belong to which monkeys. Each monkey can only have one banana, and no monkeys can share a banana. Some monkeys may lack a banana.
id monkey_id banana_id
1 3 4
2 4 1
3 5 2
How can I use a single SQL statement to retrieve all the matches? For each match, I want the name of the monkey as well as where the banana is from.
I have tried the following 3 SQL statements, which work:
SELECT * FROM matches
SELECT * FROM monkeys WHERE id=[monkey_id from 1st SQL query]
SELECT * FROM bananas WHERE id=[banana_id from 1st SQL query]
I feel that 3 SQL statements is cumbersome though. Any ideas on how I can just use a single SQL statement? I am just learning SQL and am monkeying around with the basics. Thanks!
Since some monkeys may lack a banana, that implies a LEFT JOIN between matches and monkeys. That will ensure all monkeys are listed, even if they have no bananas in matches.
SELECT
monkeys.name,
bananas.where_from
FROM
monkeys
/* List all monkeys, even if they have no match */
LEFT JOIN matches ON monkeys.id = matches.monkey_id
/* And another LEFT JOIN to link matches to bananas */
LEFT JOIN bananas ON bananas.id = matches.banana_id
Here is an example on SQLfiddle.com
I very highly recommend reading over Jeff Atwood's (co-founder of Stack Overflow) excellent article explaining SQL joins.

Multiple JOIN (SQL)

My problem is Play! Framework / JPA specific. But I think it's applicable to general SQL syntax.
Here is a sample query with a simple JOIN:
return Post.find(
"select distinct p from Post p join p.tags as t where t.name = ?", tag
).fetch();
It's simple and works well.
My question is: What if I want to JOIN on more values in the same table?
Example (Doesn't work. It's a pseudo-syntax I created):
return Post.find(
"select distinct p from Post p join p.tags1 as t, p.tags2 as u, p.tags3 as v where t.name = ?, u.name = ?, v.name = ?", tag1, tag2, tag3,
).fetch();
Your programming logic seems okay, but the SQL statement needs some work. Seems you're new to SQL, and as you pointed out, you don't seem to understand what a JOIN is.
You're trying to select data from 4 tables named POST, TAG1, TAG2, and TAG3.
I don't know what's in these tables, and it's hard to give sample SQL statements without that information. So, I'm going to make something up, just for the purposes of discussion. Let's say that table POST has 6 columns, and there's 8 rows of data in it.
P Fname Lname Country Color Headgear
- ----- ----- ------- ----- --------
1 Alex Andrews 1 1 0
2 Bob Barker 2 3 0
3 Chuck Conners 1 5 0
4 Don Duck 3 6 1
5 Ed Edwards 2 4 2
6 Frank Farkle 4 2 1
7 Geoff Good 1 1 0
8 Hank Howard 1 3 0
We'll say that TAG1, TAG2, and TAG3 are lookup tables, with only 2 columns each. Table TAG1 has 4 country codes:
C Name
- -------
1 USA
2 France
3 Germany
4 Spain
Table TAG2 has 6 Color codes:
C Name
- ------
1 Red
2 Orange
3 Yellow
4 Green
5 Blue
6 Violet
Table TAG3 has 4 Headgear codes:
C Name
- -------
0 None
1 Glasses
2 Hat
3 Monacle
Now, when you select data from these 4 tables, for P=6, you're trying to get something like this:
Fname Lname Country Color Headgear
----- ------ ------- ------ -------
Frank Farkle Spain Orange None
First thing, let's look at your WHERE clause:
where t.name = ?, u.name = ?, v.name = ?
Sorry, but using commas like this is a syntax error. Normally you only want to find data where all 3 conditions are true; you do this by using AND:
where t.name=? AND u.name=? AND v.name=?
Second, why are you joining tables together? Because you need more information. Table POST says that Frank's COUNTRY value is 4; table TAG1 says that 4 means Spain. So we need to "join" these tables together.
The ancient (before 1980, I think) way to join tables is to list more than one table name in the FROM clause, separated by commas. This gives us:
SELECT P.FNAME, P.LNAME, T.NAME As Country, U.NAME As Color, V.NAME As Headgear
FROM POST P, TAG1 T, TAG2 U, TAG3 V
The trouble with this query is that you're not telling it WHICH rows you want, or how they relate to each other. So the database generates something called a "Cartesian Product". It's extremely rare that you want a Cartesian Product - normally this is a HUGE MISTAKE. Even though your database only has 22 rows in it, this SELECT statement is going to return 768 rows of data:
Alex Andrews USA Red None
Alex Andrews USA Red Glasses
Alex Andrews USA Red Hat
Alex Andrews USA Red Monacle
Alex Andrews USA Orange None
Alex Andrews USA Orange Glasses
...
Hank Howard Spain Violet Monacle
That's right, it returns every possible combination of data from the 4 tables. Imagine for a second that the POST table eventually grows to 20000 rows, and the three TAG tables have 100 rows each. The whole database would be less than a megabyte, but the Cartesian Product would have 20,000,000,000 rows of data -- probably about 120 GB of data. Any database engine would choke on that.
So if you want to use the Ancient way of specifying tables, it is VERY IMPORTANT to make sure that your WHERE clause shows the relationship between every table you're querying. This makes a lot more sense:
SELECT P.FNAME, P.LNAME, T.NAME As Country, U.NAME As Color, V.NAME As Headgear
FROM POST P, TAG1 T, TAG2 U, TAG3 V
WHERE P.Country=T.C AND P.Color=U.C AND P.Headgear=V.C
This only returns 8 rows of data.
Using the Ancient way, it's easy to accidentally create Cartesian Products, which are almost always bad. So they revised SQL to make it harder to do. That's the JOIN keyword. Now, when you specify additional tables you can specify how they relate at the same time. The New Way is:
SELECT P.FNAME, P.LNAME, T.NAME As Country, U.NAME As Color, V.NAME As Headgear
FROM POST P
INNER JOIN TAG1 T ON P.Country=T.C
INNER JOIN TAG2 U ON P.Color=U.C
INNER JOIN TAG3 V ON P.Headgear=V.C
You can still use a WHERE clause, too.
SELECT P.FNAME, P.LNAME, T.NAME As Country, U.NAME As Color, V.NAME As Headgear
FROM POST P
INNER JOIN TAG1 T ON P.Country=T.C
INNER JOIN TAG2 U ON P.Color=U.C
INNER JOIN TAG3 V ON P.Headgear=V.C
WHERE P.P=?
If you call this and pass in the value 6, you get only one row back:
Fname Lname Country Color Headgear
----- ------ ------- ------ --------
Frank Farkle Spain Orange None
As was mentioned in the comments, you are looking for an ON clause.
SELECT * FROM TEST1
INNER JOIN TEST2 ON TEST1.A = TEST2.A AND TEST1.B = TEST2.B ...
See example usage of join here:
http://en.wikibooks.org/wiki/Java_Persistence/Relationships#Join_Fetching