SELECT DISTINCT. Please explain? - sql

Wondering if someone could please explain the difference between these two queries and advise why one works and the other doesn't.
This one works. Gives me two records of the distinct GantryRtn value and their corresponding SSD value.
SELECT DISTINCT GantryRtn as Gantry, ROUND(Field.SSD,1) as SSD
FROM Field, PlanSetup, Course, Patient, Radiation
WHERE Field.RadiationSer=Radiation.RadiationSer
AND Radiation.PlanSetupSer=PlanSetup.PlanSetupSer
AND PlanSetup.CourseSer=Course.CourseSer
AND Course.PatientSer=Patient.PatientSer
AND Patient.PatientId='ZZZ456'
AND PlanSetup.PlanSetupId='F T1 R CHEST'
However there is a foreign key in the Field table that links to the primary key of another table that contains a plain text name for each field. I'd also like to extract that name (in a separate query if I have to) by pulling out this foreign key RadiationSer. But as soon as I put RadiationSer into the query, I lose my DISTINCT result.
SELECT DISTINCT GantryRtn as Gantry, ROUND(Field.SSD,1) as SSD, Field.RadiationSer
FROM Field, PlanSetup, Course, Patient, Radiation
WHERE Field.RadiationSer=Radiation.RadiationSer
AND Radiation.PlanSetupSer=PlanSetup.PlanSetupSer
AND PlanSetup.CourseSer=Course.CourseSer
AND Course.PatientSer=Patient.PatientSer
AND Patient.PatientId='ZZZ456'
AND PlanSetup.PlanSetupId='F T1 R CHEST'
This second query gives me 7 records with non-distinct GantryRtn values.
Why does this happen??
I have investigated using GROUP BY but this slows the query down and appears to pull ALL GantryRtn's out of the database (100s of records).
Thanks
Greg

The DISTINCT keyword applys to a result set (all fields) and not just to the first field.
In your case:
SELECT DISTINCT GantryRtn as Gantry, ROUND(Field.SSD,1) as SSD, Field.RadiationSer
will return any records that are distinct (not the same) when taken together with Gantry, SSD, and RadiationSer
So, you may have 7 records for the same Gantry and with different values for RadiationSer.
If you'd like to first filter by distinct Gantry values you can accomplish that with a sub-query and an inner join but somehow you must settle on which RadiationSer value to use.

Related

Return query results where two fields are different (Access 2010)

I'm working in a large access database (Access 2010) and am trying to return records where two locations are different.
In my case, I have a large number of birds that have been observed on multiple dates and potentially on different islands. Each bird has a unique BirdID and also an actual physical identifier (unfortunately that may have changed over time). [I'm going to try addressing the changing physical identifier issue later]. I currently want to query individual birds where one or more of their observations is different than the "IslandAlpha" (the first island where they were observed). Something along the lines of a criteria for BirdID: WHERE IslandID [doesn't equal] IslandAlpha.
I then need a separate query to find where all observations DO equal where they were first observed. So where IslandID = IslandAlpha
I'm new to Access, so let me know if you need more information on how my tables/relationships are set up! Thanks in advance.
Assuming the following tables:
Birds table in which all individual birds have records with a unique BirdID and IslandAlpha.
Sightings table in which individual sightings are recorded, including IslandID.
Your first query would look something like this:
SELECT *
FROM Birds
INNER JOIN Sightings ON Birds.BirdID=Sightings.BirdID
WHERE Sightings.IslandID <> Birds.IslandAlpha
You second query would be the same but with = instead of <> in the WHERE clause.
Please provide us information about the tables and columns you are using.
I will presume you are asking this question because a simple join of tables and filtering where IslandAlpha <> ObsLoc is not possible because IslandAlpha is derived from first observation record for each bird. Pulling first observation record for each bird requires a nested query. Need a unique record identifier in Observations - autonumber should serve. Assuming there is an observation date/time field, consider:
SELECT * FROM Observations WHERE ObsID IN
(SELECT TOP 1 ObsID FROM Observations AS Dupe
WHERE Dupe.ObsBirdID = Observations.ObsBirdID ORDER BY Dupe.ObsDateTime);
Now use that query for subsequent queries.
SELECT * FROM Observations
INNER JOIN Query1 ON Observations.ObsBirdID = Query1.ObsBirdID
WHERE Observations.ObsLocID <> Query1.ObsLocID;

Difference in where clause and join

im very new to SQL and currently working with joins the first time in my life. What I am trying to figure out right now is trying to get the difference between to queries.
Query 1:
SELECT name
FROM actor
JOIN casting ON id = actorid
where (SELECT COUNT(ord) FROM casting join actor on actorid = actor.id AND ord=1) >= 30
GROUP BY name
Query 2:
SELECT name
FROM actor
JOIN casting ON id = actorid
AND (SELECT COUNT(ord) FROM casting WHERE actorid = actor.id AND ord=1)>=30)
GROUP BY name
So I would think that doing
FROM casting join actor on actorid = actor.id
in the subquery is the same as
FROM casting WHERE actorid = actor.id.
But apparently it is not. Could anyone help me out and explain why?
Edit: If anyone is wondering: The queries are based on question 13 from http://sqlzoo.net/wiki/More_JOIN_operations
Actually, the part that really looks like a "where" statement is only what's after the keyword ON. We sometimes fall on queries performing some data filtering directly at this stage, but its actual purpose is to specify the criteria used
A "join" is a very common operation that consists of associating the rows of two distinct tables according to a common criteria. For example, if you have, on one side, a table containing a client list in which each of them has a unique client number, and on a other side a order list table in which each order contains the client's number, then you may want to "resolve" the number of the latter table into its name, address, and so on.
Before SQL92 (26 years ago), the only way to achieve this was to write something like this :
SELECT client.name,
client.adress,
orders.product,
orders.totalprice
FROM client,orders
WHERE orders.clientNumber = client.clientNumber
AND orders.totalprice > 100.00
Selecting something from two (or more) tables induces a "cartesian product" which actually consists of associating every row from the first set which every row of the second one. This means that if your first table contains 3 rows and the second one 8 rows, the resulting set would be 24-row wide. And out of these, you use the WHERE clause to exclude basically everything and retain only rows in which the client number is the same on both side.
We understand that the size of the resulting set before filtering can grow exponentially if the contents of the different tables exceed a few rows (which is always the case) and it can get even worse if you imply more than two tables. Also, on the programmer's side, it rapidly becomes rather unreadable.
Therefore, if this is what you actually want to do, you now can explicitly tell the server about it, and specify the criteria at first, which will avoid unnecessary growing temporary subsets, while still letting you filter the results with WHERE if needed.
SELECT client.name,
client.adress,
orders.product,
orders.totalprice
FROM client
JOIN orders
ON orders.clientNumber = client.clientNumber
WHERE orders.totalprice > 100.00
It becomes critical when performing multiple JOIN in a single query, especially when performing both INNER and OUTER joins.
In the 2nd query your nested query takes the actor.id from its root query and only counts the results from that. In the 1st query your nested query counts results from all actors instead of only the specified one.

DISTINCT in a simple SQL query

When executing SQL queries I have been trying to figure out the following:
In this example:
SELECT DISTINCT AL.id, AL.name
FROM albums AL
why is there a need to specify distinct? I thought that the Id being a primary key was enough to avoid duplicate results.
When you specify distinct you are specifying that you want the whole row to be distinct. For example if you have two rows:
ID=1 and Name='Joe Smith'
ID=2 and Name='Joe Smith'
then your query is going to return both rows because the different ID values make the rows distinct.
However, if you are selecting only the ID column (and it's your primary key) then the distinct is pointless.
If you're trying to find all of the unique names then you'd want to:
SELECT DISTINCT AL.name
FROM albums AL
You are right, in your case there should be no need for the word distinct because you are asking for the id and the name. Now, for sake of example where distinct is necessary, say you had multiple id's with the same name. Let It Be is an album by both the Beatles and the Replacements. And let's say you were using your database to write out labels that only included the names of the albums. The query you would want would be:
select distinct al.name
from albums al;
Sometimes your database is not perfect and it ends up with a bunch of junk data. If the id has not been designated as unique, you might end up with duplicate records, and then you might want to avoid seeing the duplicates in your query results.

How do you JOIN tables to a view using a Vertica DB?

Good morning/afternoon! I was hoping someone could help me out with something that probably should be very simple.
Admittedly, I’m not the strongest SQL query designer. That said, I’ve spent a couple hours beating my head against my keyboard trying to get a seemingly simple three way join working.
NOTE: I'm querying a Vertica DB.
Here is my query:
SELECT A.CaseOriginalProductNumber, A.CaseCreatedDate, A.CaseNumber, B.BU2_Key as BusinessUnit, C.product_number_desc as ModelNumber
FROM pps_sfdc.v_Case A
INNER JOIN reference_data.DIM_PRODUCT_LINE_HIERARCHY B
ON B.PL_Key = A.CaseOriginalProductLine
INNER JOIN reference_data.DIM_PRODUCT C
ON C.product_line_code = A.CaseOriginalProductLine
WHERE B.BU2_Key = 'XWT'
LIMIT 20
I have a view (v_Case) that I’m trying to join to two other tables so I can lookup a value from each of them. The above query returns identical data on everything EXCEPT the last column (see below). It's like it's iterating through the last column to pull out the unique entries, sort of like a "GROUP BY" clause. What SHOULD be happening is that I get unique rows with specific "BusinessUnit" and "ModelNumber" for that record.
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 1
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 2
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 3
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 4
I modeled my solution after this post:
How to deal with multiple lookup tables for beginners of SQL?
What am I doing wrong?
Thank you for any help you can provide.
Data issue. General rule in trouble shooting these is the column that is distinct (in this case C.product_number_desc as ModelNumber) for each record is generally where the issue is going to be...and why I pointed you towards dim_product.
If you receive duplicates, this query below will help identify if this table is giving you the issues. Remember key in this statement can be multiple fields...whatever you are joining the table on:
Select key,count(1) from table group by key having count(1)>1
Other options for the future...don't assume it's your code, duplicates like this almost always point towards dirty data (other option is you are causing cross joins because keys are not correct). If you comment out the 'c' table and the column referred to in the select clause, you would have received one row...hence your dupes were coming from the 'c' table here.
Good luck with it

Deleting Duplicate Rows in VIew in SQL Server 2008

I have two tables, TABLE1 and TABLE2. TABLE1 Has 4 columns: Name, Client, Position, and ID. TABLE2 has 3 columns: Amount, Time, and ID. For every ID in TABLE1 there are one or more entries in TABLE2, all with identical ID and Time values but varying Amount values.
In a view, I have a concatenated string of Name, Client, Position, and ID from TABLE1, but I also need to concatenate the Time for each ID from TABLE2 to this string. If I do a join as I create the view, I get a ton of duplicate strings in the view as it lists the same ID multiple times for each value of Amount in TABLE2.
I need to get rid of the duplicates, so I either need to avoid the replication that occurs from the join, or find a way to simply delete all the duplicates from the view.
Hopefully this is all clear enough. Thank you for reading and for any help you can provide!
DISTINCT could be appropriate:
CREATE VIEW [dbo].[View1]
AS
SELECT distinct
dbo.Table1.Id,
dbo.Table1.Name,
dbo.Table1.Client,
dbo.Table1.Position,
dbo.Table2.[Time]
FROM dbo.Table1
LEFT OUTER JOIN dbo.Table2
ON dbo.Table1.Id = dbo.Table2.Id
SqlFiddle: http://sqlfiddle.com/#!3/65651/1
On a side note, depending on what your table structures really are, you might consider having a surrogate primary key on the table2 if you don't have a natural one. I have not put one in the example, because chances are that you already have one - just making sure.