Group by a field not in select - sql

I want to find how many modules a lecturer taught in a specific year and want to select name of the lecturer and the number of modules for that lecturer.
Problem is that because I am selecting Name, and I have to group it by name to make it work. But what if there are two lecturers with same name? Then sql will make them one and that would be wrong output.
So what I really want to do is select name but group by id, which sql is not allowing me to do. Is there a way around it?
Below are the tables:
Lecturer(lecturerID, lecturerName)
Teaches(lecturerID, moduleID, year)
This is my query so far:
SELECT l.lecturerName, COUNT(moduleID) AS NumOfModules
FROM Lecturer l , Teaches t
WHERE l.lecturerID = t.lecturerID
AND year = 2011
GROUP BY l.lecturerName --I want lectureID here, but it doesn't run if I do that

SELECT a.lecturerName, b.NumOfModules
FROM Lecturer a,(
SELECT l.lecturerID, COUNT(moduleID) AS NumOfModules
FROM Lecturer l , Teaches t
WHERE l.lecturerID = t.lecturerID
AND year = 2011
GROUP BY l.lecturerID) b
WHERE a.lecturerID = b.lecturerID

You should probably just group by lecturerID and include it in the select column list. Otherwise, you're going to end up with two rows containing the same name with no way to distinguish between them.
You raise the problem of "wrong output" when grouping just by name but "undecipherable output" is just as big a problem. In other words, your desired output (grouping by ID but giving name):
lecturerName Module
------------ ------
Bob Smith 1
Bob Smith 2
is no better than your erroneous output (grouping by, and giving, name):
lecturerName Module
------------ ------
Bob Smith 3
since, while you now know that one of the lecturers taught two modules and the other taught one, you have no idea which is which.
The better output (grouping by ID and displaying both ID and name) would be:
lecturerId lecturerName Module
---------- ------------ ------
314159 Bob Smith 1
271828 Bob Smith 2
And, yes, I'm aware this doesn't answer your specific request but sometimes the right answer to "How do I do XYZZY?" is "Don't do XYZZY, it's a bad idea for these reasons ...".
Things like writing operating systems in COBOL, accounting packages in assembler, or anything in Pascal come to mind instantly :-)

You could subquery your count statement.
SELECT lecturername,
(SELECT Count(*)
FROM teaches t
WHERE t.lecturerid = l.lecturerid
AND t.year = 2011) AS NumOfModules
FROM lecturer l
Note there are other ways of doing this. If you also wanted to elimiate the rows with no modules you can then try.
SELECT *
FROM (SELECT lecturername,
(SELECT Count(*)
FROM teaches t
WHERE t.lecturerid = l.lecturerid
AND t.year = 2011) AS NumOfModules
FROM lecturer l) AS temp
WHERE temp.numofmodules > 0

Related

Oracle SQL - Joins with condition

I'm sorry for asking these questions but I am really new and studying Oracle SQL for my university and I am having a hard time with a few things. Anyways, for this one, I am supposed to retrieve data from 2 different tables. One is from 'STAFF' and one is from 'BRANCH' I want the output to display the staff name (SNAME), the start date (STARTDATE) and the area (AREA). SNAME and STARTDATE are from the table 'STAFF', and the AREA is from the table 'Branch', how do I access that? Also, I ONLY want to display the names and start dates of those who are in the STOKE areea.
This is my code
SELECT SNAME, STARTDATE, BRANCH.AREA
FROM STAFF CROSS JOIN BRANCH
WHERE STAFF.BRANCHID = 20;
Note: The STAFF.BRANCHID = 20 is because the area 'STOKE' has a BRANCHID of 20.
This is what I get:
SNAME STARTDATE AREA
---------- --------- -------------
SMITH 15-NOV-00 ECCLESHALL
JONES 02-MAR-01 ECCLESHALL
SONG 03-JAN-02 ECCLESHALL
SMITH 15-NOV-00 STOKE
SONG 03-JAN-02 STOKE
SMITH 15-NOV-00 STAFFORD
As you can see, the WHERE clause is not working because it outputs every area instead of stoke only.
I know I am supposed to use the JOIN function but I don't get which one and why, any useful links would be appreciated :)
Thank you
You need to use the INNER JOIN and proper join condition as follows:
SELECT S.SNAME, S.STARTDATE, B.AREA
FROM STAFF S INNER JOIN BRANCH B
ON S.BRANCHID = B.BRANCHID
WHERE S.AREA = 'STOKE';

Querying for swapped columns in SQL database

We've had a few cases of people entering in first names where last names should be and vice versa. So I'm trying to come up with a SQL search to match the swapped columns. For example, someone may have entered the record as first_name = Smith, last_name = John by accident. Later, another person may see that John Smith is not in the database and enter a new user as first_name = John, last_name = Smith, when in fact it is the same person.
I used this query to help narrow my search:
SELECT person_id, first_name, last_name
FROM people
WHERE first_name IN (
SELECT last_name FROM people
) AND last_name IN (
SELECT first_name FROM people
);
But if we have people named John Allen, Allen Smith, and Smith John, they would all be returned even though none of those are actually duplicates. In this case, it's actually good enough that I can see the duplicates in my particular data set, but I'm wondering if there's a more precise way to do this.
I would do a self join like this:
SELECT p1.person_id, p1.first_name, p1.last_name
FROM people p1
join people p2 on p1.first_name = p2.last_name and p1.last_name = p2.first_name
To also find typos on names I recommend this:
SELECT p1.person_id, p1.first_name, p1.last_name
FROM people p1
join people p2 on soundex(p1.first_name) = soundex(p2.last_name) and
soundex(p1.last_name) = soundex(p2.first_name)
soundex is a neat function that "hashes" words in a way that two words that sound the same get the same hash. This means Anne and Ann will have the same soundex. So if you had an Anne Smith and a Smith Ann the query above would find them as a match.
Interesting. This is a problem that I cover in Data Analysis Using SQL and Excel (note: I only very rarely mention books in my answers or comments).
The idea is to summarize the data to get a likelihood of a mismatch. So, look at the number of times a name appears as a first name and as a last name and then combine these. So:
with names as (
select first_name as name, 1.0 as isf, 0.0 as isl
from people
union all
select last_name, 0, 1
from people
),
nl as (
select name, sum(isf) as numf, sum(isl) as numl,
avg(isf) as p_f, avg(isl) as p_l
from names
group by name
)
select p.*
from people p join
nl nlf
on p.first_name = nlf.name join
nl nll
on p.last_name = nll.name
order by (coalesce(nlf.p_l, 0) + coalesce(nll.p_f, 0));
This orders the records by a measure of mismatch of the names -- the sum of the probabilities of the first name used by a last name and a last name used as a first name.

SSRS query and WHERE with multiple

Being new with SQL and SSRS and can do many things already, but I think I must be missing some basics and therefore bang my head on the wall all the time.
A report that is almost working, needs to have more results in it, based on conditions.
My working query so far is like this:
SELECT projects.project_number, project_phases.project_phase_id, project_phases.project_phase_number, project_phases.project_phase_header, project_phase_expensegroups.projectphase_expense_total, invoicerows.invoicerow_total
FROM projects INNER JOIN
project_phases ON projects.project_id = project_phases.project_id
LEFT OUTER JOIN
project_phase_expensegroups ON project_phases.project_phase_id = project_phase_expensegroups.project_phase_id
LEFT OUTER JOIN
invoicerows ON project_phases.project_phase_id = invoicerows.project_phase_id
WHERE ( projects.project_number = #iProjectNumber )
AND
( project_phase_expensegroups.projectphase_expense_total >0 )
The parameter is for selectionlist that is used to choose a project to the report.
How to have also records that have
( project_phase_expensegroups.projectphase_expense_total ) with value 0 but there might be invoices for that project phase?
Tried already to add another condition like this:
WHERE ( projects.project_number = #iProjectNumber )
AND
( project_phase_expensegroups.projectphase_expense_total > 0 )
OR
( invoicerows.invoicerow_total > 0 )
but while it gives some results - also the one with projectphase_expense_total with value 0, but the report is total mess.
So my question is: what am I doing wrong here?
There is a core problem with your query in that you are left joining to two tables, implying that rows may not exist, but then putting conditions on those tables, which will eliminate NULLs. That means your query is internally inconsistent as is.
The next problem is that you're joining two tables to project_phases that both may have multiple rows. Since these data are not related to each other (as proven by the fact that you have no join condition between project_phase_expensegroups and invoicerows, your query is not going to work correctly. For example, given a list of people, a list of those people's favorite foods, and a list of their favorite colors like so:
People
Person
------
Joe
Mary
FavoriteFoods
Person Food
------ ---------
Joe Broccoli
Joe Bananas
Mary Chocolate
Mary Cake
FavoriteColors
Person Color
------ ----------
Joe Red
Joe Blue
Mary Periwinkle
Mary Fuchsia
When you join these with links between Person <-> Food and Person <-> Color, you'll get a result like this:
Person Food Color
------ --------- ----------
Joe Broccoli Red
Joe Bananas Red
Joe Broccoli Blue
Joe Bananas Blue
Mary Chocolate Periwinkle
Mary Chocolate Fuchsia
Mary Cake Periwinkle
Mary Cake Fuchsia
This is essentially a cross-join, also known as a Cartesian product, between the Foods and the Colors, because they have a many-to-one relationship with each person, but no relationship with each other.
There are a few ways to deal with this in the report.
Create ExpenseGroup and InvoiceRow subreports, that are called from the main report by a combination of project_id and project_phase_id parameters.
Summarize one or the other set of data into a single value. For example, you could sum the invoice rows. Or, you could concatenate the expense groups into a single string separated by commas.
Some notes:
Please, please format your query before posting it in a question. It is almost impossible to read when not formatted. It seems pretty clear that you're using a GUI to create the query, but do us the favor of not having to format it ourselves just to help you
While formatting, please use aliases, Don't use full table names. It just makes the query that much harder to understand.
You need an extra parentheses in your where clause in order to get the logic right.
WHERE ( projects.project_number = #iProjectNumber )
AND (
(project_phase_expensegroups.projectphase_expense_total > 0)
OR
(invoicerows.invoicerow_total > 0)
)
Also, you're using a column in your WHERE clause from a table that is left joined without checking for NULLs. That basically makes it a (slow) inner join. If you want to include rows that don't match from that table you also need to check for NULL. Any other comparison besides IS NULL will always be false for NULL values. See this page for more information about SQL's three value predicate logic: http://www.firstsql.com/idefend3.htm
To keep your LEFT JOINs working as you intended you would need to do this:
WHERE ( projects.project_number = #iProjectNumber )
AND (
project_phase_expensegroups.projectphase_expense_total > 0
OR project_phase_expensegroups.project_phase_id IS NULL
OR invoicerows.invoicerow_total > 0
OR invoicerows.project_phase_id IS NULL
)
I found the solution and it was kind easy after all. I changed the only the second LEFT OUTER JOIN to INNER JOIN and left away condition where the query got only results over zero. Also I used SELECT DISTINCT
Now my report is working perfectly.

MS Access: Selecting the first item according to a rank

Imagine I have a query called QueryA that returns stuff like this:
Employee Description Rank
John Happy 1
John Depressed 3
James Happy 1
James Confused 2
Mark Depressed 3
I am trying to make a query that grabs the Employee and the Description, but only one description -- the one with the best "rank." (the lower the rank the better). I sort QueryA by Employee then by Rank (descending).
So I'd want my new query QueryB to show that John as Happy, James as Happy, and Mark as Depressed.
However I try selecting Employee and then First of Description and it doesn't always work.
I'm not able to check this for Access, but it should work fine. Check my SQL Fiddle
select
r.employee, d.description
from
table1 as d
inner join (select min(rank) as rank, employee
from
table1
group by employee) r on d.rank = r.rank
and d.employee = r.employee

Use Access SQL to do a grouped ranking

How do I rank salespeople by # customers grouped by department (with ties included)?
For example, given this table, I want to create the Rank column on the right. How should I do this in Access?
SalesPerson Dept #Customers Rank
Bill DeptA 20 1
Ted DeptA 30 2
Jane DeptA 40 3
Bill DeptB 50 1
Mary DeptB 60 2
I already know how to do a simple ranking with this SQL code. But I don't know how to rework this to accept grouping.
Select Count(*) from [Tbl] Where [#Customers] < [Tblx]![#Customers] )+1
Also, there's plenty of answers for this using SQL Server's Rank() function, but I need to do this in Access. Suggestions, please?
SELECT *, (select count(*) from tbl as tbl2 where
tbl.customers > tbl2.customers and tbl.dept = tbl2.dept) + 1 as rank from tbl
Just add the dept field to the subquery...
Great solution with subquery! Except for huge recordsets, the subquery solution gets very slow. Its better(quicker) to use a Self JOIN, look at the folowing solution: self join
SELECT tbl1.SalesPerson , count(*) AS Rank
FROM tbl AS tbl1 INNER JOIN tbl AS tbl2 ON tbl1.DEPT = tbl2.DEPT
AND tbl1.#Customers < tbl2.#Customers
GROUP BY tbl1.SalesPerson
I know this is an old thread. But since I spent a great deal of time on a very similar problem and was greatly helped by the former answers given here, I would like to share what I have found to be a MUCH faster way. (Beware, it is more complicated.)
First make another table called "Individualizer". This will have one field containing a list of numbers 1 through the-highest-rank-that-you-need.
Next create a VBA module and paste this into it:
'Global Declarations Section.
Option Explicit
Global Cntr
'*************************************************************
' Function: Qcntr()
'
' Purpose: This function will increment and return a dynamic
' counter. This function should be called from a query.
'*************************************************************
Function QCntr(x) As Long
Cntr = Cntr + 1
QCntr = Cntr
End Function
'**************************************************************
' Function: SetToZero()
'
' Purpose: This function will reset the global Cntr to 0. This
' function should be called each time before running a query
' containing the Qcntr() function.
'**************************************************************
Function SetToZero()
Cntr = 0
End Function
Save it as Module1.
Next, create Query1 like this:
SELECT Table1.Dept, Count(Table1.Salesperson) AS CountOfSalesperson
FROM Table1
GROUP BY Table1.Dept;
Create a MakeTable query called Query2 like this:
SELECT SetToZero() AS Expr1, QCntr([ID]) AS Rank, Query1.Dept,
Query1.CountOfSalesperson, Individualizer.ID
INTO Qtable1
FROM Query1
INNER JOIN Individualizer
ON Query1.CountOfSalesperson >= Individualizer.ID;
Create another MakeTable query called Query3 like this:
SELECT SetToZero() AS Expr1, QCntr([Identifier]) AS Rank,
[Salesperson] & [Dept] & [#Customers] AS Identifier, Table1.Salesperson,
Table1.Dept, Table1.[#Customers]
INTO Qtable2
FROM Table1;
If you have another field already that uniquely identifies every row you wouldn't need to create an Identifier field.
Run Query2 and Query3 to create the tables.
Create a fourth query called Query4 like this:
SELECT Qtable2.Salesperson, Qtable2.Dept, Qtable2.[#Customers], Qtable1.ID AS Rank
FROM Qtable1
INNER JOIN Qtable2 ON Qtable1.Rank = Qtable2.Rank;
Query4 returns the result you are looking for.
Practically, you would want to write a VBA function to run Query2 and Query3 and then call that function from a button placed in a convenient location.
Now I know this sounds ridiculously complicated for the example you gave. But in real life, I am sure your table is more complicated than this. Hopefully my examples can be applied to your actual situation. In my database with over 12,000 records this method is by FAR the fastest (as in: 6 seconds with 12,000 records compared to over 1 minute with 262 records ranked with the subquery method).
The real secret for me was the MakeTable query because this ranking method is useless unless you immediately output the results to a table. But, this does limit the situations that it can be applied to.
P.S. I forgot to mention that in my database I was not pulling results directly from a table. The records had already gone through a string of queries and multiple calculations before they needed to be ranked. This probably contributed greatly to the huge difference in speed between the two methods in my situation. If you are pulling records directly from a table, you might not notice nearly as big an improvement.
You need to do some math. I typically take advantage of the combination of a counter field and an "offset" field. You're aiming for a table which looks like this (#Customers isn't necessary, but will give you a visual that you're doing it properly):
SalesPerson Dept #Customers Ctr Offset
Bill DeptA 20 1 1
Ted DeptA 30 2 1
Jane DeptA 40 3 1
Bill DeptB 50 4 4
Mary DeptB 60 5 4
So, to give rank, you'd do [Ctr]-[Offset]+1 AS Rank
build a table with SalesPerson, Dept, Ctr, and Offset
insert into that table, ordered by Dept and #Customers (so that they're all sorted properly)
Update Offset to be the MIN(Ctr), grouping on Dept
Perform your math calculation to determine Rank
Clear out the table so you're ready to use it again next time.
To add to this and any other related Access Ranking or Rank Tie Breaker how-tos for other versions of Access, ranking should not be performed on crosstab queries if your FROM clause happens to NOT contain a table but a query that is either a crosstab query or a query that contains within it elsewhere a crosstab query.
The code referenced above where a SELECT statement within a SELECT statment is used (sub query),
"SELECT *, (select count(*) from tbl as tbl2 where tbl.customers > tbl2.customers and tbl.dept = tbl2.dept) + 1 as rank from tbl"
will not work and will always fail expressing a error on portion of the code where "tbl.customers > tbl2.customers" cannot be found.
In my situation on a past project, I was referencing a query instead of a table and within that query I had referenced a crosstab query thus failing and producing an error. I was able to resolve this by creating a table from the crosstab query first, and when I referenced the newly created table in the FROM clause, it started working for me.
So in final, normally you can reference a query or table in the FROM clause of the SELECT statement as what was shared previously above to do ranking, but be carefull as to if you are referencing a query instead of a table, that query must Not be a crosstab query or reference another query that is a crosstab query.
Hope this helps anyone else that may have had problems looking for a possible reason if you happen to reference the statements above and you are not referencing a table in your FROM clause within your own project. Also, performing subqueries on aliases with crosstab queries in Access probably isn't good idea or best practice either so stray away from that if/when possible.
If you found this useful, and wish that Access would allow the use of a scrolling mouse in a passthru query editor, give me a like please.
I normally pick tips and ideas from here and sometimes end up building amazing things from it!
Today, (well let’s say for the past one week), I have been tinkering with Ranking of data in Access and to the best of my ability, I did not anticipate what I was going to do something so complex as to take me a week to figure it out! I picked titbits from two main sites:
https://usefulgyaan.wordpress.com/2013/04/23/ranking-in-ms-access/ (seen that clever ‘>=’ part, and the self joins? Amazing… it helped me to build my solution from just one query, as opposed to the complex method suggested above by asonoftheMighty (not discrediting you… just didn’t want to try it for now; may be when I get to large data I might want to try that as well…)
Right here, from Paul Abott above ( ‘and tbl.dept = tbl2.dept’)… I was lost after ranking because I was placing AND YearID = 1, etc, then the ranking would end up happening only for sub-sets, you guessed right, when YearID = 1! But I had a lot of different scenarios…
Well, I gave that story partly to thank the contributors mentioned, because what I did is to me one of the most complex of the ranking that I think can help you in almost any situation, and since I benefited from others, I would like to share here what I hope may benefit others as well.
Forgive me that I am not able to post my table structures here, it is a lot of related tables. I will only post the query, so if you need to you may develop your tables to end up with that kind of query. But here is my scenario:
You have students in a school. They go through class 1 to 4, can either be in stream A or B, or none when the class is too small. They each take 4 exams (this part is not important now), so you get the total score for my case. That’s it. Huh??
Ok. Lets rank them this way:
We want to know the ranking of
• all students who ever passed through this school (best ever student)
• all students in a particular academic year (student of the year)
• students of a particular class (but remember a student will have passed through all classes, so basically his/her rank in each of those classes for the different years) this is the usual ranking that appears in report cards
• students in their streams (above comment applies)
• I would also like to know the population against which we ranked this student in each category
… all in one table/query. Now you get the point?
(I normally like to do as much of my 'programming' in the database/queries to give me visuals and to reduce the amount of code I will later have to right. I actually won't use this query in my application :), but it let's me know where and how to send my parameters to the query it came from, and what results to expect in my rdlc)
Don't you worry, here it is:
SELECT Sc.StudentID, Sc.StudentName, Sc.Mark,
(SELECT COUNT(Sch.Mark) FROM [StudentScoreRankTermQ] AS Sch WHERE (Sch.Mark >= Sc.Mark)) AS SchoolRank,
(SELECT Count(s.StudentID) FROM StudentScoreRankTermQ AS s) As SchoolTotal,
(SELECT COUNT(Yr.Mark) FROM [StudentScoreRankTermQ] AS Yr WHERE (Yr.Mark >= Sc.Mark) AND (Yr.YearID = Sc.YearID) ) AS YearRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS Yt WHERE (Yt.YearID = Sc.YearID) ) AS YearTotal,
(SELECT COUNT(Cl.Mark) FROM [StudentScoreRankTermQ] AS Cl WHERE (Cl.Mark >= Sc.Mark) AND (Cl.YearID = Sc.YearID) AND (Cl.TermID = Sc.TermID) AND (Cl.ClassID=Sc.ClassID)) AS ClassRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS C WHERE (C.YearID = Sc.YearID) AND (C.TermID = Sc.TermID) AND (C.ClassID = Sc.ClassID) ) AS ClassTotal,
(SELECT COUNT(Str.Mark) FROM [StudentScoreRankTermQ] AS Str WHERE (Str.Mark >= Sc.Mark) AND (Str.YearID = Sc.YearID) AND (Str.TermID = Sc.TermID) AND (Str.ClassID=Sc.ClassID) AND (Str.StreamID = Sc.StreamID) ) AS StreamRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS St WHERE (St.YearID = Sc.YearID) AND (St.TermID = Sc.TermID) AND (St.ClassID = Sc.ClassID) AND (St.StreamID = Sc.StreamID) ) AS StreamTotal,
Sc.CalendarYear, Sc.Term, Sc.ClassNo, Sc.Stream, Sc.StreamID, Sc.YearID, Sc.TermID, Sc.ClassID
FROM StudentScoreRankTermQ AS Sc
ORDER BY Sc.Mark DESC;
You should get something like this:
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
| StudentID | StudentName | Mark | SchoolRank | SchoolTotal | YearRank | YearTotal | ClassRank | ClassTotal | StreamRank | StreamTotal | Year | Term | Class | Stream |
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
| 1 | Jane | 200 | 1 | 20 | 2 | 12 | 1 | 9 | 1 | 5 | 2017 | I | 2 | A |
| 2 | Tom | 199 | 2 | 20 | 1 | 12 | 3 | 9 | 1 | 4 | 2016 | I | 1 | B |
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
Use the separators | to reconstruct the result table
Just an idea about the tables, each student will be related to a class. Each class relates to years. Each stream relates to a class. Each term relates to a year. Each exam relates to a term and student and a class and a year; a student can be in class 1A in 2016 and moves on to class 2b in 2017, etc…
Let me also add that this a beta result, I have not tested it well enough and I do not yet have an opportunity to create a lot of data to see the performance. My first glance at it told me that it is good. So if you find reasons or alerts you want to point my way, please do so in comments so I may keep learning!