SQL Query for finding values that do not exist in one table, with WHERE clause - sql

I'm struggling to compile a query for the following and wonder if anyone can please help (I'm a SQL newbie).
I have two tables:
(1) student_details, which contains the columns: student_id (PK), firstname, surname (and others, but not relevant to this query)
(2) membership_fee_payments, which contains details of monthly membership payments for each student and contains the columns: membership_fee_payments_id (PK), student_id (FK), payment_month, payment_year, amount_paid
I need to create the following query:
which students have not paid fees for March 2012?
The query could be for any month/year, March is just an example. I want to return in the query firstname, surname from student_details.
I can query successfully who has paid for a certain month and year, but I can't work out how to query who has not paid!
Here is my query for finding out who has paid:
SELECT student_details.firstname, student_details.surname,
FROM student_details
INNER JOIN membership_fee_payments
ON student_details.student_id = membership_fee_payments.student_id
WHERE membership_fee_payments.payment_month = "March"
AND membership_fee_payments.payment_year = "2012"
ORDER BY student_details.firstname
I have tried a left join and left outer join but get the same result. I think perhaps I need to use NOT EXISTS or IS NULL but I haven't had much luck writing the right query yet.
Any help much appreciated.

I'm partial to using WHERE NOT EXISTS Typically that would look something like this
SELECT D.firstname, D.surname
FROM student_details D
WHERE NOT EXISTS (SELECT * FROM membership_fee_payments P
WHERE P.student_id = D.student_id
AND P.payment_year = '2012'
AND P.payment_month = 'March'
)
This is know an a correlated subquery as it contains references to the outer query. This allows you to include your join criteria in the subquery without necessarily writing a JOIN. Also, most RDBMS query optimizers will implement this as a SEMI JOIN which does not typically do as much 'work' as a complete join.

You could use a left join. When the payment is missing, all the columns in the left join table will be null:
SELECT student_details.firstname, student_details.surname,
FROM student_details
LEFT JOIN membership_fee_payments
ON student_details.student_id = membership_fee_payments.student_id
AND membership_fee_payments.payment_month = "March"
AND membership_fee_payments.payment_year = "2012"
WHERE membership_fee_payments.student_id is null
ORDER BY student_details.firstname

You can also write following query. This will gives your expected output.
SELECT student_details.firstname,
student_details.surname,
FROM student_details
Where
student_details.student_id Not in
(SELECT membership_fee_payments.student_id
from membership_fee_payments
WHERE
membership_fee_payments.payment_year = '2012'
AND membership_fee_payments.payment_month = 'March'
)

Related

SQL dividing a count from one table by a number from a different table

I am struggling with taking a Count() from one table and dividing it by a correlating number from a different table in Microsoft SQL Server.
Here is a fictional example of what I'm trying to do
Lets say I have a table of orders. One column in there is states.
I have a second table that has a column for states, and second column for each states population.
I'd like to find the order per population for each sate, but I have struggled to get my query right.
Here is what I have so far:
SELECT Orders.State, Count(*)/
(SELECT StatePopulations.Population FROM Orders INNER JOIN StatePopulations
on Orders.State = StatePopulations.State
WHERE Orders.state = StatePopulations.State )
FROM Orders INNER JOIN StatePopulations
ON Orders.state = StatePopulations.State
GROUP BY Orders.state
So far I'm contending with an error that says my sub query is returning multiple results for each state, but I'm newer to SQL and don't know how to overcome it.
If you really want a correlated sub-query, then this should do it...
(You don't need to join both table in either the inner or outer query, the correlation in the inner query's where clause does the 'join'.)
SELECT
Orders.state,
COUNT(*) / (SELECT population FROM StatePopulation WHERE state = Orders.state)
FROM
Orders
GROUP BY
Orders.state
Personally, I'd just join them and use MAX()...
SELECT
Orders.state,
COUNT(*) / MAX(StatePopulation.population)
FROM
Orders
INNER JOIN
StatePopulation
StatePopulation.state = Orders.state
GROUP BY
Orders.state
Or aggregate your orders before you join...
SELECT
Orders.state,
Orders.order_count / StatePopulation.population
FROM
(
SELECT
Orders.state,
COUNT(*) AS order_count
FROM
Orders
GROUP BY
Orders.state
)
Orders
INNER JOIN
StatePopulation
StatePopulation.state = Orders.state
(Please forgive typos and smelling pistakes, I'm doing this on a phone.)

How to count total crossover from two tables each with specific conditions

I am working from two tables in a dataset. Let's call the first one 'Demographic_Info', the other 'Study_Info'. The two tables both have a Subject_ID column. How can I run a query that will return all of the Subject_IDs where Sex = Male (from Demographic_Info) but also where the Study Case = Case (from Study_Info)?
Is this an inner join? Do I need to make a combined table?
I just don't know what function to use. I know how to select for each of these conditions in each table individually, but not how to run them against eachother.
Yes, you will want to inner join and then use the where clause to filter on both tables.
select
s.Subject_ID
from `Study_info` s
inner join `Demographic_info` d on s.Subject_ID = d.Subject_ID
where d.Sex = 'Male'
and s.Study_Case = 'Case' -- Unclear from your question about the actual field name
The aliases s and d will be useful for organizing which table each field comes from (or if the same field occurs in both tables).
Similarly, you could filter first and then perform the join.
with study as (select * from `Study_info` where Study_Case = 'Case'),
demographics as (select * from `Demographic_info` where Sex = 'Male')
select s.Subject_ID
from study s
inner join demographics d on s.Subject_ID = d.Subject_ID

SQL Server 2016 Sub Query Guidance

I am currently working on an assignment for my SQL class and I am stuck. I'm not looking for full code to answer the question, just a little nudge in the right direction. If you do provide full code would you mind a small explanation as to why you did it that way (so I can actually learn something.)
Here is the question:
Write a SELECT statement that returns three columns: EmailAddress, ShipmentId, and the order total for each Client. To do this, you can group the result set by the EmailAddress and ShipmentId columns. In addition, you must calculate the order total from the columns in the ShipItems table.
Write a second SELECT statement that uses the first SELECT statement in its FROM clause. The main query should return two columns: the Client’s email address and the largest order for that Client. To do this, you can group the result set by the EmailAddress column.
I am confused on how to pull in the EmailAddress column from the Clients table, as in order to join it I have to bring in other tables that aren't being used. I am assuming there is an easier way to do this using sub Queries as that is what we are working on at the time.
Think of SQL as working with sets of data as opposed to just tables. Tables are merely a set of data. So when you view data this way you immediately see that the query below returns a set of data consisting of the entirety of another set, being a table:
SELECT * FROM MyTable1
Now, if you were to only get the first two columns from MyTable1 you would return a different set that consisted only of columns 1 and 2:
SELECT col1, col2 FROM MyTable1
Now you can treat this second set, a subset of data as a "table" as well and query it like this:
SELECT
*
FROM (
SELECT
col1,
col2
FROM
MyTable1
)
This will return all the columns from the two columns provided in the inner set.
So, your inner query, which I won't write for you since you appear to be a student, and that wouldn't be right for me to give you the entire answer, would be a query consisting of a GROUP BY clause and a SUM of the order value field. But the key thing you need to understand is this set thinking: you can just wrap the ENTIRE query inside brackets and treat it as a table the way I have done above. Hopefully this helps.
You need a subquery, like this:
select emailaddress, max(OrderTotal) as MaxOrder
from
( -- Open the subquery
select Cl.emailaddress,
Sh.ShipmentID,
sum(SI.Value) as OrderTotal -- Use the line item value column in here
from Client Cl -- First table
inner join Shipments Sh -- Join the shipments
on Sh.ClientID = Cl.ClientID
inner join ShipItem SI -- Now the items
on SI.ShipmentID = Sh.ShipmentID
group by C1.emailaddress, Sh.ShipmentID -- here's your grouping for the sum() aggregation
) -- Close subquery
group by emailaddress -- group for the max()
For the first query you can join the Clients to Shipments (on ClientId).
And Shipments to the ShipItems table (on ShipmentId).
Then group the results, and count or sum the total you need.
Using aliases for the tables is usefull, certainly when you select fields from the joined tables that have the same column name.
select
c.EmailAddress,
i.ShipmentId,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
order by i.ShipmentId, c.EmailAddress;
Using that grouped query in a subquery, you can get the Maximum total per EmailAddress.
select EmailAddress,
-- max(TotalShipItems) as MaxTotalShipItems,
max(TotalPriceDiscounted) as MaxTotalPriceDiscounted
from (
select
c.EmailAddress,
-- i.ShipmentId,
-- count(*) as TotalShipItems,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
) q
group by EmailAddress
order by EmailAddress
Note that an ORDER BY is mostly meaningless inside a subquery if you don't use TOP.

Where SQL column value not found in other column

I've read similar questions and I think I am doing it correct, but I just wanted to make sure my SQL is correct. (Still new to SQL)
I have 2 different tables
Students
id, name, address
Staff
id, name, address
I need to find the total number of students (who are not also staff)
SO I have the following SQL
create view nstudents as
select students.id
from students
LEFT JOIN staff ON staff.id = students.id;
Then I run the count(*) on the view.
Can someone confirm my SQL is correct or is there better way to do it?
Your LEFT JOIN doesn't eliminate students who are also staff, but it could be useful in achieving your goal. LEFT JOIN provides you with all results from the left table and matching results from the right table, or NULL results if the right table doesn't have a match. If you do this:
select count(*)
from students
LEFT JOIN staff ON staff.id = students.id
WHERE staff.id IS NULL;
I expect you'll get what you're looking for.
You might find it more natural to do something like this:
create view nstudents as
select s.id
from students s
where not exists (select 1 from staff st where st.id = s.id) ;
This should have the same performance as the left join.

Three tables join given me the all combination of records

When i written the query like the following.. It's written the combination of all the records.
What's the mistake in the query?
SELECT ven.vendor_code, add.address1
FROM vendor ven INNER JOIN employee emp
ON ven.emp_fk = emp.id
INNER JOIN address add
ON add.emp_name = emp.emp_name;
Using inner join, you've to put all the links (relations) between two tables in the ON clause.
Assuming the relations are good, you may test the following queries to see if they really make the combination of all records:
SELECT count(*)
from vendor ven
inner join employee emp on ven.emp_fk = emp.id
inner join address add on add.emp_name = emp.emp_name;
SELECT count(*)
add.address1
from vendor ven, employee emp, address add
If both queries return the same result (which I doubt), you really have what you say.
If not, as I assume, maybe you are missing a relation or a restriction to filter the number of results.