SQL select statement with except - sql

I'm trying to generate a SQL statement where I need to get rid of all 'users' that have a certain trait attributed to them. Here is the example here
+------+-------+
| User | Trait |
+------+-------+
| A | Fire |
| A | Water |
| A | Air |
| B | Water |
| B | Air |
| C | Water |
| C | Fire |
+------+-------+
With SQL I'd like to remove all users who have the trait fire associated with them.
So basically, afterwards, we'd be left with
+------+-------+
| User | Trait |
+------+-------+
| B | Water |
| B | Air |
+------+-------+
If I was able to use something in excel to filter it out instead of through SQL, this would work as well. I've been looking through various ways, but from what I've tried, most will only remove the single row with the trait, but not the user along with it.
I need sql to translate something in the lines of
For (i = table.length; i++)
If Trait = Fire
getVal(User(i))
DeleteRows(User(i))
I'm looked into sql except, but the table I'm using is quite a bit more complex, so some help using a basic example would be nice to lead me in the right direction.
Thanks

You can use a sub-select and discard userids with NOT IN
SELECT *
FROM mytable
WHERE userid NOT IN (SELECT userid FROM mytable WHERE Trait = 'Fire')

The simplest way of doing this is using not exists.
select *
from tablename t
where not exists (select 1 from tablename where userid = t.userid and Trait = 'Fire')

Try:
select a.user, a.trait
from the_table a
where not exists
( select null
from the_table b
where b.user = a.user
and b.trait = 'Fire'
);

Related

SQL Join to the latest record in MS ACCESS

I want to join tables in MS Access in such a way that it fetches only the latest record from one of the tables. I've looked at the other solutions available on the site, but discovered that they only work for other versions of SQL. Here is a simplified version of my data:
PatientInfo Table:
+-----+------+
| ID | Name |
+-----+------+
| 1 | John |
| 2 | Tom |
| 3 | Anna |
+-----+------+
Appointments Table
+----+-----------+
| ID | Date |
+----+-----------+
| 1 | 5/5/2001 |
| 1 | 10/5/2012 |
| 1 | 4/20/2018 |
| 2 | 4/5/1999 |
| 2 | 8/8/2010 |
| 2 | 4/9/1982 |
| 3 | 7/3/1997 |
| 3 | 6/4/2015 |
| 3 | 3/4/2017 |
+----+-----------+
And here is a simplified version of the results that I need after the join:
+----+------+------------+
| ID | Name | Date |
+----+------+------------+
| 1 | John | 4/20/2018 |
| 2 | Tom | 8/8/2010 |
| 3 | Anna | 3/4/2017 |
+----+------+------------+
Thanks in advance for reading and for your help.
You can use aggregation and JOIN:
select pi.id, pi.name, max(a.date)
from appointments as a inner join
patientinfo as pi
on a.id = pi.id
group by pi.id, pi.name;
something like this:
select P.ID, P.name, max(A.Date) as Dt
from PatientInfo P inner join Appointments A
on P.ID=A.ID
group by P.ID, P.name
Both Bing and Gordon's answers work if your summary table only needs one field (the Max(Date)) but gets more tricky if you also want to report other fields from the joined table, since you would need to include them either as an aggregated field or group by them as well.
Eg if you want your summary to also include the assessment they were given at their last appointment, GROUP BY is not the way to go.
A more versatile structure may be something like
SELECT Patient.ID, Patient.Name, Appointment.Date, Appointment.Assessment
FROM Patient INNER JOIN Appointment ON Patient.ID=Appointment.ID
WHERE Appointment.Date = (SELECT Max(Appointment.Date) FROM Appointment WHERE Appointment.ID = Patient.ID)
;
As an aside, you may want to think whether you should use a field named 'ID' to refer to the ID of another table (in this case, the Apppintment.ID field refers to the Patient.ID). You may make your db more readable if you leave the 'ID' field as an identifier specific to that table and refer to that field in other tables as OtherTableID or similar, ie PatientID in this case. Or go all the way and include the name of the actual table in its own ID field.
Edited after comment:
Not quite sure why it would crash. I just ran an equivalent query on 2 tables I have which are about 10,000 records each and it was pretty instanteneous. Are your ID fields (i) unique numbers and (ii) indexed?
Another structure which should do the same thing (adapted for your field names and assuming that there is an ID field in Appointments which is unique) would be something like:
SELECT PatientInfo.UID, PatientInfo.Name, Appointments.StartDateTime, Appointments.Assessment
FROM PatientInfo INNER JOIN Appointments ON PatientInfo_UID = Appointments.PatientFID
WHERE Appointments.ID = (SELECT TOP 1 ID FROM Appointments WHERE Appointments.PatientFID = PatientInfo_UID ORDER BY StartDateTime DESC)
;
But that is starting to look a bit contrived. On my data they both produce the same result (as they should!) and are both almost instantaneous.
Always difficult to troubleshoot Access when it crashes - I guess you see no error codes or similar? Is this against a native .accdb database or another server?

Oracle SQL query comparing multiple rows with same identifier

I'm honestly not sure how to title this - so apologies if it is unclear.
I have two tables I need to compare. One table contains tree names and nodes that belong to that tree. Each Tree_name/Tree_node combo will have its own line. For example:
Table: treenode
| TREE_NAME | TREE_NODE |
|-----------|-----------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 1 | E |
| 2 | A |
| 2 | B |
| 2 | D |
| 3 | C |
| 3 | D |
| 3 | E |
| 3 | F |
I have another table that contains names of queries and what tree_nodes they use. Example:
Table: queryrecord
| QUERY | TREE_NODE |
|---------|-----------|
| Alpha | A |
| Alpha | B |
| Alpha | D |
| BRAVO | A |
| BRAVO | B |
| BRAVO | D |
| CHARLIE | A |
| CHARLIE | B |
| CHARLIE | F |
I need to create an SQL where I input the QUERY name, and it returns any ‘TREE_NAME’ that includes all the nodes associated with the query. So if I input ‘ALPHA’, it would return TREE_NAME 1 & 2. If I ask it for CHARLIE, it would return nothing.
I only have read access, and don’t believe I can create temp tables, so I’m not sure if this is possible. Any advice would be amazing. Thank you!
You can use group by and having as follows:
Select t.tree_name
From tree_node t
join query_record q
on t.tree_node = q.tree_node
WHERE q.query = 'ALPHA'
Group by t.tree_name
Having count(distinct t.tree_node)
= (Select count(distinct q.tree_node) query_record q WHERE q.query = 'ALPHA');
Using an IN condition (a semi-join, which saves time over a join):
with prep (tree_node) as (select tree_node from queryrecord where query = :q)
select tree_name
from treenode
where tree_node in (select tree_node from prep)
group by tree_name
having count(*) = (select count(*) from prep)
;
:q in the prep subquery (in the with clause) is the bind variable to which you will assign the various QUERY values at runtime.
EDIT
I don't generally set up the test case on online engines; but in a comment below this answer, the OP said the query didn't work for him. So, I set up the example on SQLFiddle, here:
http://sqlfiddle.com/#!4/b575e/2
A couple of notes: for some reason, SQLFiddle thinks table names should be at most eight characters, so I had to change the second table name to queryrec (instead of queryrecord). I changed the name in the query, too, of course. And, second, I don't know how I can give bind values on SQLFiddle; I hard-coded the name 'Alpha'. (Note also that in the OP's sample data, this query value is not capitalized, while the other two are; of course, text values in SQL are case sensitive, so one should pay attention when testing.)
You can do this with a join and aggregation. The trick is to count the number of nodes in query_record before joining:
select qr.query, t.tree_name
from (select qr.*,
count(*) over (partition by query) as num_tree_node
from query_record qr
) qr join
tree_node t
on t.tree_node = qr.tree_node
where qr.query = 'ALPHA'
group by qr.query, t.tree_name, qr.num_tree_node
having count(*) = qr.num_tree_node;
Here is a db<>fiddle.

How do I structure my SQL query to prevent the return of duplicate rows with related data?

I need some help with an SQL Query. I have a database table that has related data with other tables. When I query the table it returns the duplicate rows for every row of related data i.e.
|-------------| |-------------| |-------------|
| Cars | | Options | | Value |
|-------------| ------> |-------------| ------> |-------------|
| CarId | | OptionsId | | ValueId |
| CarMake | | OptionName | | CostValue |
| CarModel | | Confirmed | | CarId |
|-------------| | CarId | | OptionsId |
|-------------| |-------------|
|
|
---------------> |-------------|
| Warranty |
|-------------|
| WarrantyId |
| WarrantyType|
| CarId |
|-------------|
The query that I have made, which was designed in the query builder of SSMS (because of this it is not using aliases and has the 3 stage naming convention, this will be changed) is as follows:
SELECT dbo.Cars.CarId,
dbo.Cars.Make,
dbo.Cars.Model,
dbo.Options.OptionName,
dbo.Warranty.WarrantyType,
dbo.Value.CostValue
FROM dbo.Cars
LEFT JOIN dbo.Options ON dbo.Cars.CarId = dbo.Options.CarId
LEFT JOIN Value ON Options.OptionsId = Value.OptionsId
LEFT JOIN dbo.Warranty on dbo.Cars.CarId = dbo.Warranty.CarId
Executing this query as it stands returns my data, however, for cars with multiple options I receive duplicate rows i.e.
Id | Make | Model | Option Name | Warranty Type | Value
27 | Ford | Fiesta | Heated Seats | Static | 500
27 | Ford | Fiesta | Front Fog Lights | Static | 400
I've been looking around for possible answers to this question and found that the proposed solution is to use the keyword DISTINCT or to create a subquery. I added DISTINCT to my query but the same data was returned, probably because the options are both distinct in their own right, I don't know I'm guessing.
I'm happy to use a subquery but not sure how to apply that to my above query code. All I want to do here is return one single row for each car with the highest option value i.e.
27 | Ford | Fiesta | Heated Seats | Static | 500
Can anyone help me write this query? I think I've included everything in this question but if I can offer more, please let me know.
Instead of joining the table Value which gives you multiple rows,
you must join this query:
SELECT
dbo.Value.CarId,
dbo.Value.OptionsId,
MAX(dbo.Value.CostValue) AS CostValue
FROM dbo.Value
GROUP BY dbo.Value.CarId, dbo.Value.OptionsId
which you will give you from the table Value for each car the option with the max value.
So try this:
SELECT dbo.Cars.CarId,
dbo.Cars.Make,
dbo.Cars.Model,
dbo.Options.OptionName,
v.CostValue,
dbo.Warranty.WarrantyType
FROM dbo.Cars
LEFT JOIN dbo.Options ON dbo.Cars.CarId = dbo.Options.CarId
INNER JOIN (
SELECT
dbo.Value.CarId,
dbo.Value.OptionsId,
MAX(dbo.Value.CostValue) AS CostValue
FROM dbo.Value
GROUP BY dbo.Value.CarId, dbo.Value.OptionsId
) AS v ON Options.OptionsId = v.OptionsId
LEFT JOIN dbo.Warranty on dbo.Cars.CarId = dbo.Warranty.CarId
you can try like below by using window function
with cte as(
SELECT dbo.Cars.CarId,
dbo.Cars.Make,
dbo.Cars.Model,
dbo.Options.OptionName,
Value.CostValue,
row_number() over(partition by dbo.Cars.CarId,
dbo.Cars.Make,
dbo.Cars.Model order by Value.CostValue desc) rn
FROM dbo.Cars
LEFT JOIN dbo.Options ON dbo.Cars.CarId = dbo.Options.CarId
LEFT JOIN Value ON Options.OptionsId = Value.OptionsId
LEFT JOIN dbo.Warranty on dbo.Cars.CarId = dbo.Warranty.CarId
) select * from cte where rn=1

Aggregate ENTIRE rows based on single field without querying source twice or using CTEs?

Assume I have the following table:
+--------+--------+--------+
| field1 | field2 | field3 |
+--------+--------+--------+
| a | a | 1 |
| a | b | 2 |
| a | c | 3 |
| b | a | 1 |
| b | b | 2 |
| c | b | 2 |
| c | b | 3 |
+--------+--------+--------+
I want to select only the rows where field3 is the minimum value, so only these rows:
+--------+--------+--------+
| field1 | field2 | field3 |
+--------+--------+--------+
| a | a | 1 |
| b | a | 1 |
| c | b | 2 |
+--------+--------+--------+
The most popular solution is to query the source twice, once directly and then joined to a subquery where the source is queried again and then aggregated. However, since my data source is actually a derived table/subquery itself, I'd have to duplicate the subquery in my SQL which is ugly. The other option is to use the WITH CTE and reuse the subquery which would be nice, but Teradata, the database I am using, doesn't support CTEs in views, though it does in macros which is not an option for me now.
So is it possible in standard SQL to group multiple records into a single record by using only a single field in the aggregation without querying the source twice or using a CTE?
This is possible using a window function:
select *
from (
select column_1, column_2, column_3,
min(column_3) over (partition by column_1) as min_col_3
from the_table
) t
where column_3 = min_col_3;
The above is standard SQL and I believe Teradata also supports window functions.
The derived table is necessary because you can't refer to a column alias in the where clause - at least not in standard SQL.
I think Teradata actually allows that using the qualify operator, but as I have never used it, I am not sure:
select *
from the_table
qualify min(column_3) over (partition by column_1) = column_3;
Use NOT EXISTS to return a row if there are no other row with same field1 value but a lower field3 value:
select * from table t1
where not exists (select 1 from table t2
where t2.field1 = t1.field1
and t2.field3 < t1.field3)

Aggregate function in Tuple Relational Calculus

How do you translate a COUNT or a GROUP BY or any other aggregate function you find in SQL into TRC, i can't find any way on internet.
So I have a table User
+----+----------+--+
| | User | |
+----+----------+--+
| pk | email | |
| | password | |
| | ... | |
+----+----------+--+
And a table frienship
+----+-------------+--+
| | FriendShip | |
+----+-------------+--+
| pk | user1_email | |
| pk | user2_email | |
| | date | |
| | accepted | |
+----+-------------+--+
And the following query in SQL:
SELECT *
FROM user u
LEFT OUTER JOIN friendship f ON (f.user1_email = u.email
OR f.user2_email = u.email)
GROUP BY u.email
HAVING COUNT(u.email) < 3
I would like to transform this query into tuple relational Calculus, the JOIN and the SELECT are pretty straightforward, but for the GROUP BY and the COUNT I don't know.
Thanks,
As Lennart says, it's not possible to express those functions so, I decided to transform the count in another way.
First let's assert the following predicate:
Then we can say that having 2 or less friends, is having 0 friends, 1, or 2. To have 1 friend is like saying that there exists a friend (friend1) for wich Friends(me, friend1) is true.
To have 2 friends, you must have 1 friend and another, different. And finally you must not have any more friend.
All this can be express like this:
I don't think you can express aggregate functions in neither TRC nor RA. However, there have been proposals to extend them, see for example:
http://cis.csuohio.edu/~matos/notes/cis-612/NestedRelations/Extending%20Relational%20Algebra%20and%20Relational%20Calculus%20with%20Se.pdf