Complex SQL query with group by and two rows in one - sql

Okay, I need help. I'm usually pretty good at SQL queries but this one baffles me. By the way, this is not a homework assignment, it's a real situation in an Access database and I've written the requirements below myself.
Here is my table layout. It's in Access 2007 if that matters; I'm writing the query using SQL.
Id (primary key)
PersonID (foreign key)
EventDate
NumberOfCredits
SuperCredits (boolean)
There are events that people go to. They can earn normal credits, or super credits, or both at one event. The SuperCredits column is true if the row represents a number of super credits earned at the event, or false if it represents normal credits.
So for example, if there is an event which person 174 attends, and they earn 3 normal credits and 1 super credit at the event, the following two rows would be added to the table:
ID PersonID EventDate NumberOfCredits SuperCredits
1 174 1/1/2010 3 false
2 174 1/1/2010 1 true
It is also possible that the person could have done two separate things at the event, so there might be more than two columns for one event, and it might look like this:
ID PersonID EventDate NumberOfCredits SuperCredits
1 174 1/1/2010 1 false
2 174 1/1/2010 2 false
3 174 1/1/2010 1 true
Now we want to print out a report. Here will be the columns of the report:
PersonID
LastEventDate
NumberOfNormalCredits
NumberOfSuperCredits
The report will have one row per person. The row will show the latest event that the person attended, and the normal and super credits that the person earned at that event.
What I am asking of you is to write, or help me write, the SQL query to SELECT the data and GROUP BY and SUM() and whatnot. Or, let me know if this is for some reason not possible, and how to organize my data to make it possible.
This is extremely confusing and I understand if you do not take the time to puzzle through it. I've tried to simplify it as much as possible, but definitely ask any questions if you give it a shot and need clarification. I'll be trying to figure it out but I'm having a real hard time with it, this is grouping beyond my experience...

Create a query/view of the following (say its called PersonLastEvents):
select PersonId, max(EventDate) as LastEventDate
from Events
group by PersonId
Then you can get the data you need with the following. I'm not fully familiar with Access, so you may need to modify some of the syntax, but hopefully this will give you an approach.
select l.PersonId, l.LastEventDate,
sum(case when e.SuperCredits = 'false' then e.NumberOfCredits end)
as NumberOfNormalCredits
sum(case when e.SuperCredits = 'true' then e.NumberOfCredits end)
as NumberOfSuperCredits
from PersonLastEvents l
join Events e on l.PersonId = e.PersonId and l.LastEventDate = l.EventDate
group by l.PersonId, l.LastEventDate
As an aside, it may be easier to change your table to have two number columns (NormalCredits, SuperCredits) as it allows you to simply sum() the columns as required.

Related

SQL - Selecting Records with an odd number of a given attribute

I'm just brushing up on some SQL - in other words, I'm really rusty - and am a bit stuck at the moment. It's probably something trivial, but we'll see.
I'd like to select all people that possess an odd number of a certain attribute that isn't an integer ( in this example, TransactionType). So, for example, take the following test/not real info where these people are buying a car or some similarly big purchase.
Name TransactionType Date
John Buy 5/1
John Cancel 5/1
John Buy 5/2
Joseph Buy 5/25
Joseph Cancel 5/25
Tanya Buy 5/28
I would like it to return the people who had an odd number of transactions; in other words, they ended up purchasing the item. So, in this case, John and Tanya would be selected and Joseph would not.
I know I can use the modulus operand here, but I'm a bit lost how to utilize it correctly.
I thought of using
count(TransactionType) % 2 != 0
in the where clause but that's obviously a no-go. Any pointers in the right direction would be very helpful. Let me know if this is unclear, and thanks!
You are close. You need a having clause instead of a where clause.
select Name
from table
group by Name
having count(TransactionType) % 2 != 0
Wouldn't you be better off getting the latest status by the transaction date and using that rather than relying on counting TransactionType to determine the latest status:
Something like this:
SELECT b.Name, b.TransactionType, b.[Date]
FROM (
SELECT Name, MAX(t1.[DATE]) latestDate
FROM [Transactions] t1
GROUP BY t1.Name
) a
INNER JOIN [Transactions] b ON b.Name = a.Name AND a.latestDate = b.[Date]
WHERE b.TransactionType = 'Buy'
Assuming your dates are valid dates with times included, this should work.
Sample SQL Fiddle
If you only store the date portion the max date would be the same for people that Buy and Cancel on the same date, therefore it would return more data and some incorrect records.

using MS Access 2010, editing a form, 3 tables, one table is "one to many" and want to have query across one row

Example:
(using a comma, to show columns & "A_", "B_", "C_" to show tables)
I currently have...
A_John, A_Doe, B_MemStartDate, C_Date,
A_John, A_Doe, B_MemStartDate, C_Date,
A_John, A_Doe, B_MemStartDate, C_Date,
I would like Table "C" columns to be pulled by Course but shown as the "Date" column... in other words, example, where Course = 1, use Date... etc...
The table would have each member's course history.
MemberID, CourseID, Date
JohnDoe, Course1, 10-10-13
JohnDoe, Course2, 10-11-13
JohnDoe, Course3, 10-12-13
The table C is the one to many of course, with the goal to show the date that the course was taken as I would title the columns with the 3 different courses I want to show... (I want to pull only 3 different courses)
I would like to have them in a row...
A_John, A_Doe, B_Start Date, C_Course1Date, C_Course2Date, C_Course3Date
Sorry for the lack of experience in my question, but I usually get by with "copy/paste"...LOL
Keeping in mind I am using Access... can I do this?
Clarifying... sorry don't know how to do tables in basic html so have commas
Have this. Pulling from 3 different tables where the member# is unique and joined.
MEMBER, STARTDATE, STATUS, COURSE, COURSEDATE
JohnSmith, 08-01-2013, Active, Workshop1, 10-20-2013
JohnSmith, 08-01-2013, Active, Workshop2, 10-13-2013
JohnSmith, 08-01-2013, Active, Workshop3, 10-28-2013
LaraBentt, 12-01-2012, Inactive, Workshop1, 02-20-2012
LaraBentt, 12-01-2012, Inactive, Workshop2, 02-13-2012
LaraBentt, 12-01-2012, Inactive, Workshop3, 02-28-2012
Want this...
MEMBER, STARTDATE, STATUS, WORKSHOP1, WORKSHOP2, WORKSHOP3
JohnSmith, 08-01-2013, Active, 10-20-2013, 10-13-2013, 10-28-2013
LaraBentt, 12-01-2012, Inactive, 02-20-2012, 02-13-2012, 02-28-2012
Tables columns are basically like this...
Table 1 - tblMember (one - one)
MEMBER, STARTDATE
Table 2 - tblRegStatus (one - one)
MEMBER, STATUS
Table 3 - tblCourses (one to many)
MEMBER, COURSE, COURSEDATE
Hope this explains it better!
Your amendment to your question made it much more clear.
One of the problems you are going to encounter is if you have someone that has registered for the same course, twice. There has to be the understanding that you are only showing the most recent registration per person. Now that we understand that caveat - lets look at how you can build this.
The simplest way of building this is to change the query type from Select to Crosstab. This is done through the Access interface.
For each of the fields in the query, set the Total and Crosstab values to:
MemberName: Total = *Group By* Crosstab = *Row Heading*
StartDate: Total = *Group By* Crosstab = *Row Heading*
Status: Total = *Group By* Crosstab = *Row Heading*
CourseName: Total = *Group By* Crosstab = *Column Heading*
CourseDate: Total = *Max* Crosstab = *Value*
This will result in query results such as:
MemberName StartDate Status Biology English Math
---------- --------- ------ ------- ------- ----
John 10/1/2013 Active 11/14/2013 12/1/2013
Matthew 9/1/2013 Inactive 1/1/2013
Peter 8/7/2013 Active 1/1/2013 4/1/2013
Sam 1/14/2013 Inactive 11/14/2013
William 5/19/2013 Active 1/1/2013 4/1/2013
The advantage to the crosstab is that it will automatically add columns as there is data.
You have to construct your query (the joins) to give you the data that you want however. For example - do you want to show members that have no courses on the list? Do you want to show courses that have no enrollment?
Microsoft has a page on using crosstab queries:
http://office.microsoft.com/en-us/access-help/make-summary-data-easier-to-read-by-using-a-crosstab-query-HA010229577.aspx
If you want to hard code your three columns, you can also manually pivot the data. Again - keep in mind that we are using the Max aggregate, so only the latest courses will show. With a manual pivot you also need to remember that it will be limited to the data you are requesting.
Take your select query, right click and click "Totals" to show the Totals row. You can also click on the Totals icon in the ribbon under Query Tools / Design.
Set up your three columns:
MemberName: Total = *Group By*
StartDate: Total = *Group By*
Status: Total = *Group By*
Now add in a separate field for each course you wish to include. In the "Field" definition enter the following.
Science: IIf([CourseName]="Science",[CourseDate],Null)
Biology: IIf([CourseName]="Biology",[CourseDate],Null)
Math: IIf([CourseName]="Math",[CourseDate],Null)
English: IIf([CourseName]="English",[CourseDate],Null)
Set the "Total" definition value to "Max".
That should get you the results you are looking for.

How should you separate dimension tables from fact tables if you are not building a data warehouse?

I realize that referring to these as dimension and fact tables is not exactly appropriate. I am at a lost for better terminology, so please excuse this categorization that I use in the post.
I am building an application for employee record keeping.
The database will contain organizational information. The information is mostly defined in three tables: Locations, Divisions, and Departments. However, there are others with similar problems. First, I need to store the available values for these tables. This will allow for available values in the application when managing an employee and for management of these values when adding/deleting departments and such. For instance, the Locations table may look like,
LocationId | LocationName | LocationStatus
1 | New York | Active
2 | Denver | Inactive
3 | New Orleans | Active
I then need to store these values for each employee and keep their history. My first thought was to create LocationHistory, DivisionHistory, and DepartmentHistory tables. I cannot pinpoint why, but this struck me as poor design. My next inclination was to create a DimLocation/FactLocation, DimDivision/FactDivision, DimDepartment/FactDepartment set of tables. I do not believe this makes sense either. I have also considered naming them as a combination of Employee, i.e. EmployeeLocations, EmployeeDivisions, etc. Regardless of the naming convention for these tables, I imagine that data would look similar to a simplified version I have below:
EmployeeId | LocationId | EffectiveDate | EndDate
1 | 3 | 2008-07-01 | NULL
1 | 2 | 2007-04-01 | 2008-06-30
I realize any of the imagined solutions I described above could work, but I am really looking to create a design that will be easy for others to maintain with an intuitive, familiar structure. I would like to receive this community's help, opinions, and experience with this matter. I am open to and would welcome any suggestion to consider. For instance, should I even store the available values for these three tables in the database? Should they be maintained in the application code/business logic layer? Do I just need to get over seeing the word History repeating three times?
Thanks!
Firstly, I see no issue in describing these as Dimension and Fact tables outside of a warehouse :)
In terms of conceptualising and understanding the relationships, I personally see the use of start/end dates perfectly easy for people to understand. Allowing Agent and Location fact tables, and then time dependant mapping tables such as Agent_At_Location, etc. They do, however, have issues worthy of taking note.
If EndDate is 2008-08-30, was the employee in that location UP TO 30th August, or UP TO and INCLUDING 30th August.
Dealing with overlapping date periods in queries can give messy queries, but more importantly, slow queries.
The first one seems simply a matter of convention, but it can have certain implications when dealign with other data. For example, consider that an EndDate of 2008-08-30 means that they ARE at that location UP TO and INCLUDING 30th August. Then you join on to their Daily Agent Data for that day (Such as when they Actually arrived at work, left for breaks, etc). You need to join ON AgentDailyData.EventTimeStamp < '2008-08-30' + 1 in order to include all the events that happened during that day.
This is because the data's EventTimeStamp isn't measured in days, but probably minutes or seconds.
If you consider that the EndDate of '2008-08-30' means that the Agent was at that Location UP TO but NOT INCLDUING 30th August, the join does not need the + 1. In fact you don't need to know if the date is DAY bound, or can include a time component or not. You just need TimeStamp < EndDate.
By using EXCLUSIVE End markers, all of your queries simplify and never need + 1 day, or + 1 hour to deal with edge conditions.
The second one is much harder to resolve. The simplest way of resolving an overlapping period is as follows:
SELECT
CASE WHEN TableA.InclusiveFrom > TableB.InclusiveFrom THEN TableA.InclusiveFrom ELSE TableB.InclusiveFrom END AS [NetInclusiveFrom],
CASE WHEN TableA.ExclusiveFrom < TableB.ExclusiveFrom THEN TableA.ExclusiveFrom ELSE TableB.ExclusiveFrom END AS [NetExclusiveFrom],
FROM
TableA
INNER JOIN
TableB
ON TableA.InclusiveFrom < TableB.ExclusiveFrom
AND TableA.ExclusiveFrom > TableB.InclusiveFrom
-- Where InclusiveFrom is the StartDate
-- And ExclusiveFrom is the EndDate, up to but NOT including that date
The problem with that query is one of indexing. The first condition TableA.InclusiveFrom < TableB.ExclusiveFrom could be be resolved using an index. But it could give a Massive range of dates. And then, for each of those records, the ExclusiveDates could all be just about anything, and certainly not in an order that could help quickly resolve TableA.ExclusiveFrom > TableB.InclusiveFrom
The solution I have previously used for that is to have a maximum allowed gap between InclusiveFrom and ExclusiveFrom. This allows something like...
ON TableA.InclusiveFrom < TableB.ExclusiveFrom
AND TableA.InclusiveFrom >= TableB.InclusiveFrom - 30
AND TableA.ExclusiveFrom > TableB.InclusiveFrom
The condition TableA.ExclusiveFrom > TableB.InclusiveFrom STILL can't benefit from indexes. But instead we've limitted the number of rows that can be returned by searching TableA.InclusiveFrom. It's at most only ever 30 days of data, because we know that we restricted the duration to a maximum of 30 days.
An example of this is to break up the associations by calendar month (max duration of 31 days).
EmployeeId | LocationId | EffectiveDate | EndDate
1 | 2 | 2007-04-01 | 2008-05-01
1 | 2 | 2007-05-01 | 2008-06-01
1 | 2 | 2007-06-01 | 2008-06-25
(Representing Employee 1 being in Location 2 from 1st April to (but not including) 25th June.)
It's effectively a trade off; using Disk Space to gain performance.
I've even seen this pushed to the extreme of not actually storing date Ranges, but storing the actual mapping for each and every day. Essentially, it's like restricting the maximum duration to 1 day...
EmployeeId | LocationId | EffectiveDate
1 | 2 | 2007-06-23
1 | 2 | 2007-06-24
1 | 3 | 2007-06-25
1 | 3 | 2007-06-26
Instinctively I initially rebelled against this. But in subsequent ETL, Warehousing, Reporting, etc, I actually found it Very powerful, adaptable, and maintainable. I actually saw people making fewer coding mistakes, writing code in less time, the code ending up running faster, and being much more able to adapt to clients' changing needs.
The only two down sides were:
1. More disk space taken (But trival compared to the size of fact tables)
2. Inserts and Updates to this mapping was slower
The actual slow down for Inserts and Updates only actually matter Once, where this model was being used to represent a constantly changing process net; where the app wanted to change the mapping about 30 times a second. Even then it worked, it just chomped up more CPU time than was ideal.
If you want to be efficient and keep a history, do these things. There are multiple solutions to this problem, but this is the one that I keep going back to:
Remember that each row represents a single entity, if you make corrections that entity, that's fine, but don't re-use and ID for a new Location. Set it up so that instead of deleting a Location, you mark it as deleted with a bit and hide it from the interface, that way when it's referenced historically, it's still there.
Create a history table that includes the current value, or no records if a value isn't currently set. Have the foreign key tie back to the employee and tie to the location.
Create a column in the employee table that points to the current active location in the history. When you need to get the employees location, you join in the history table based on this ID. When you need to get all of the history for an employee you join from the history table.
This structure keeps it all normalized, and gives you an easy way to find the current value without having to do any date comparisons.
As far as using the word history, think of it in different terms: since it contains the current item as well as historical items, it's really just a junction table that keeps around the old item. As such you can name it something like EmployeeLocations.

Use Access SQL to do a grouped ranking

How do I rank salespeople by # customers grouped by department (with ties included)?
For example, given this table, I want to create the Rank column on the right. How should I do this in Access?
SalesPerson Dept #Customers Rank
Bill DeptA 20 1
Ted DeptA 30 2
Jane DeptA 40 3
Bill DeptB 50 1
Mary DeptB 60 2
I already know how to do a simple ranking with this SQL code. But I don't know how to rework this to accept grouping.
Select Count(*) from [Tbl] Where [#Customers] < [Tblx]![#Customers] )+1
Also, there's plenty of answers for this using SQL Server's Rank() function, but I need to do this in Access. Suggestions, please?
SELECT *, (select count(*) from tbl as tbl2 where
tbl.customers > tbl2.customers and tbl.dept = tbl2.dept) + 1 as rank from tbl
Just add the dept field to the subquery...
Great solution with subquery! Except for huge recordsets, the subquery solution gets very slow. Its better(quicker) to use a Self JOIN, look at the folowing solution: self join
SELECT tbl1.SalesPerson , count(*) AS Rank
FROM tbl AS tbl1 INNER JOIN tbl AS tbl2 ON tbl1.DEPT = tbl2.DEPT
AND tbl1.#Customers < tbl2.#Customers
GROUP BY tbl1.SalesPerson
I know this is an old thread. But since I spent a great deal of time on a very similar problem and was greatly helped by the former answers given here, I would like to share what I have found to be a MUCH faster way. (Beware, it is more complicated.)
First make another table called "Individualizer". This will have one field containing a list of numbers 1 through the-highest-rank-that-you-need.
Next create a VBA module and paste this into it:
'Global Declarations Section.
Option Explicit
Global Cntr
'*************************************************************
' Function: Qcntr()
'
' Purpose: This function will increment and return a dynamic
' counter. This function should be called from a query.
'*************************************************************
Function QCntr(x) As Long
Cntr = Cntr + 1
QCntr = Cntr
End Function
'**************************************************************
' Function: SetToZero()
'
' Purpose: This function will reset the global Cntr to 0. This
' function should be called each time before running a query
' containing the Qcntr() function.
'**************************************************************
Function SetToZero()
Cntr = 0
End Function
Save it as Module1.
Next, create Query1 like this:
SELECT Table1.Dept, Count(Table1.Salesperson) AS CountOfSalesperson
FROM Table1
GROUP BY Table1.Dept;
Create a MakeTable query called Query2 like this:
SELECT SetToZero() AS Expr1, QCntr([ID]) AS Rank, Query1.Dept,
Query1.CountOfSalesperson, Individualizer.ID
INTO Qtable1
FROM Query1
INNER JOIN Individualizer
ON Query1.CountOfSalesperson >= Individualizer.ID;
Create another MakeTable query called Query3 like this:
SELECT SetToZero() AS Expr1, QCntr([Identifier]) AS Rank,
[Salesperson] & [Dept] & [#Customers] AS Identifier, Table1.Salesperson,
Table1.Dept, Table1.[#Customers]
INTO Qtable2
FROM Table1;
If you have another field already that uniquely identifies every row you wouldn't need to create an Identifier field.
Run Query2 and Query3 to create the tables.
Create a fourth query called Query4 like this:
SELECT Qtable2.Salesperson, Qtable2.Dept, Qtable2.[#Customers], Qtable1.ID AS Rank
FROM Qtable1
INNER JOIN Qtable2 ON Qtable1.Rank = Qtable2.Rank;
Query4 returns the result you are looking for.
Practically, you would want to write a VBA function to run Query2 and Query3 and then call that function from a button placed in a convenient location.
Now I know this sounds ridiculously complicated for the example you gave. But in real life, I am sure your table is more complicated than this. Hopefully my examples can be applied to your actual situation. In my database with over 12,000 records this method is by FAR the fastest (as in: 6 seconds with 12,000 records compared to over 1 minute with 262 records ranked with the subquery method).
The real secret for me was the MakeTable query because this ranking method is useless unless you immediately output the results to a table. But, this does limit the situations that it can be applied to.
P.S. I forgot to mention that in my database I was not pulling results directly from a table. The records had already gone through a string of queries and multiple calculations before they needed to be ranked. This probably contributed greatly to the huge difference in speed between the two methods in my situation. If you are pulling records directly from a table, you might not notice nearly as big an improvement.
You need to do some math. I typically take advantage of the combination of a counter field and an "offset" field. You're aiming for a table which looks like this (#Customers isn't necessary, but will give you a visual that you're doing it properly):
SalesPerson Dept #Customers Ctr Offset
Bill DeptA 20 1 1
Ted DeptA 30 2 1
Jane DeptA 40 3 1
Bill DeptB 50 4 4
Mary DeptB 60 5 4
So, to give rank, you'd do [Ctr]-[Offset]+1 AS Rank
build a table with SalesPerson, Dept, Ctr, and Offset
insert into that table, ordered by Dept and #Customers (so that they're all sorted properly)
Update Offset to be the MIN(Ctr), grouping on Dept
Perform your math calculation to determine Rank
Clear out the table so you're ready to use it again next time.
To add to this and any other related Access Ranking or Rank Tie Breaker how-tos for other versions of Access, ranking should not be performed on crosstab queries if your FROM clause happens to NOT contain a table but a query that is either a crosstab query or a query that contains within it elsewhere a crosstab query.
The code referenced above where a SELECT statement within a SELECT statment is used (sub query),
"SELECT *, (select count(*) from tbl as tbl2 where tbl.customers > tbl2.customers and tbl.dept = tbl2.dept) + 1 as rank from tbl"
will not work and will always fail expressing a error on portion of the code where "tbl.customers > tbl2.customers" cannot be found.
In my situation on a past project, I was referencing a query instead of a table and within that query I had referenced a crosstab query thus failing and producing an error. I was able to resolve this by creating a table from the crosstab query first, and when I referenced the newly created table in the FROM clause, it started working for me.
So in final, normally you can reference a query or table in the FROM clause of the SELECT statement as what was shared previously above to do ranking, but be carefull as to if you are referencing a query instead of a table, that query must Not be a crosstab query or reference another query that is a crosstab query.
Hope this helps anyone else that may have had problems looking for a possible reason if you happen to reference the statements above and you are not referencing a table in your FROM clause within your own project. Also, performing subqueries on aliases with crosstab queries in Access probably isn't good idea or best practice either so stray away from that if/when possible.
If you found this useful, and wish that Access would allow the use of a scrolling mouse in a passthru query editor, give me a like please.
I normally pick tips and ideas from here and sometimes end up building amazing things from it!
Today, (well let’s say for the past one week), I have been tinkering with Ranking of data in Access and to the best of my ability, I did not anticipate what I was going to do something so complex as to take me a week to figure it out! I picked titbits from two main sites:
https://usefulgyaan.wordpress.com/2013/04/23/ranking-in-ms-access/ (seen that clever ‘>=’ part, and the self joins? Amazing… it helped me to build my solution from just one query, as opposed to the complex method suggested above by asonoftheMighty (not discrediting you… just didn’t want to try it for now; may be when I get to large data I might want to try that as well…)
Right here, from Paul Abott above ( ‘and tbl.dept = tbl2.dept’)… I was lost after ranking because I was placing AND YearID = 1, etc, then the ranking would end up happening only for sub-sets, you guessed right, when YearID = 1! But I had a lot of different scenarios…
Well, I gave that story partly to thank the contributors mentioned, because what I did is to me one of the most complex of the ranking that I think can help you in almost any situation, and since I benefited from others, I would like to share here what I hope may benefit others as well.
Forgive me that I am not able to post my table structures here, it is a lot of related tables. I will only post the query, so if you need to you may develop your tables to end up with that kind of query. But here is my scenario:
You have students in a school. They go through class 1 to 4, can either be in stream A or B, or none when the class is too small. They each take 4 exams (this part is not important now), so you get the total score for my case. That’s it. Huh??
Ok. Lets rank them this way:
We want to know the ranking of
• all students who ever passed through this school (best ever student)
• all students in a particular academic year (student of the year)
• students of a particular class (but remember a student will have passed through all classes, so basically his/her rank in each of those classes for the different years) this is the usual ranking that appears in report cards
• students in their streams (above comment applies)
• I would also like to know the population against which we ranked this student in each category
… all in one table/query. Now you get the point?
(I normally like to do as much of my 'programming' in the database/queries to give me visuals and to reduce the amount of code I will later have to right. I actually won't use this query in my application :), but it let's me know where and how to send my parameters to the query it came from, and what results to expect in my rdlc)
Don't you worry, here it is:
SELECT Sc.StudentID, Sc.StudentName, Sc.Mark,
(SELECT COUNT(Sch.Mark) FROM [StudentScoreRankTermQ] AS Sch WHERE (Sch.Mark >= Sc.Mark)) AS SchoolRank,
(SELECT Count(s.StudentID) FROM StudentScoreRankTermQ AS s) As SchoolTotal,
(SELECT COUNT(Yr.Mark) FROM [StudentScoreRankTermQ] AS Yr WHERE (Yr.Mark >= Sc.Mark) AND (Yr.YearID = Sc.YearID) ) AS YearRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS Yt WHERE (Yt.YearID = Sc.YearID) ) AS YearTotal,
(SELECT COUNT(Cl.Mark) FROM [StudentScoreRankTermQ] AS Cl WHERE (Cl.Mark >= Sc.Mark) AND (Cl.YearID = Sc.YearID) AND (Cl.TermID = Sc.TermID) AND (Cl.ClassID=Sc.ClassID)) AS ClassRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS C WHERE (C.YearID = Sc.YearID) AND (C.TermID = Sc.TermID) AND (C.ClassID = Sc.ClassID) ) AS ClassTotal,
(SELECT COUNT(Str.Mark) FROM [StudentScoreRankTermQ] AS Str WHERE (Str.Mark >= Sc.Mark) AND (Str.YearID = Sc.YearID) AND (Str.TermID = Sc.TermID) AND (Str.ClassID=Sc.ClassID) AND (Str.StreamID = Sc.StreamID) ) AS StreamRank,
(SELECT COUNT(StudentID) FROM StudentScoreRankTermQ AS St WHERE (St.YearID = Sc.YearID) AND (St.TermID = Sc.TermID) AND (St.ClassID = Sc.ClassID) AND (St.StreamID = Sc.StreamID) ) AS StreamTotal,
Sc.CalendarYear, Sc.Term, Sc.ClassNo, Sc.Stream, Sc.StreamID, Sc.YearID, Sc.TermID, Sc.ClassID
FROM StudentScoreRankTermQ AS Sc
ORDER BY Sc.Mark DESC;
You should get something like this:
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
| StudentID | StudentName | Mark | SchoolRank | SchoolTotal | YearRank | YearTotal | ClassRank | ClassTotal | StreamRank | StreamTotal | Year | Term | Class | Stream |
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
| 1 | Jane | 200 | 1 | 20 | 2 | 12 | 1 | 9 | 1 | 5 | 2017 | I | 2 | A |
| 2 | Tom | 199 | 2 | 20 | 1 | 12 | 3 | 9 | 1 | 4 | 2016 | I | 1 | B |
+-----------+-------------+------+------------+-------------+----------+-----------+-----------+------------+------------+-------------+------+------+-------+--------+
Use the separators | to reconstruct the result table
Just an idea about the tables, each student will be related to a class. Each class relates to years. Each stream relates to a class. Each term relates to a year. Each exam relates to a term and student and a class and a year; a student can be in class 1A in 2016 and moves on to class 2b in 2017, etc…
Let me also add that this a beta result, I have not tested it well enough and I do not yet have an opportunity to create a lot of data to see the performance. My first glance at it told me that it is good. So if you find reasons or alerts you want to point my way, please do so in comments so I may keep learning!

SQL Query with multiple values in one column

I've been beating my head on the desk trying to figure this one out. I have a table that stores job information, and reasons for a job not being completed. The reasons are numeric,01,02,03,etc. You can have two reasons for a pending job. If you select two reasons, they are stored in the same column, separated by a comma. This is an example from the JOBID table:
Job_Number User_Assigned PendingInfo
1 user1 01,02
There is another table named Pending, that stores what those values actually represent. 01=Not enough info, 02=Not enough time, 03=Waiting Review. Example:
Pending_Num PendingWord
01 Not Enough Info
02 Not Enough Time
What I'm trying to do is query the database to give me all the job numbers, users, pendinginfo, and pending reason. I can break out the first value, but can't figure out how to do the second. What my limited skills have so far:
select Job_number,user_assigned,SUBSTRING(pendinginfo,0,3),pendingword
from jobid,pending
where
SUBSTRING(pendinginfo,0,3)=pending.pending_num and
pendinginfo!='00,00' and
pendinginfo!='NULL'
What I would like to see for this example would be:
Job_Number User_Assigned PendingInfo PendingWord PendingInfo PendingWord
1 User1 01 Not Enough Info 02 Not Enough Time
Thanks in advance
You really shouldn't store multiple items in one column if your SQL is ever going to want to process them individually. The "SQL gymnastics" you have to perform in those cases are both ugly hacks and performance degraders.
The ideal solution is to split the individual items into separate columns and, for 3NF, move those columns to a separate table as rows if you really want to do it properly (but baby steps are probably okay if you're sure there will never be more than two reasons in the short-medium term).
Then your queries will be both simpler and faster.
However, if that's not an option, you can use the afore-mentioned SQL gymnastics to do something like:
where find ( ',' |fld| ',', ',02,' ) > 0
assuming your SQL dialect has a string search function (find in this case, but I think charindex for SQLServer).
This will ensure all sub-columns begin and start with a comma (comma plus field plus comma) and look for a specific desired value (with the commas on either side to ensure it's a full sub-column match).
If you can't control what the application puts in that column, I would opt for the DBA solution - DBA solutions are defined as those a DBA has to do to work around the inadequacies of their users :-).
Create two new columns in that table and make an insert/update trigger which will populate them with the two reasons that a user puts into the original column.
Then query those two new columns for specific values rather than trying to split apart the old column.
This means that the cost of splitting is only on row insert/update, not on _every single select`, amortising that cost efficiently.
Still, my answer is to re-do the schema. That will be the best way in the long term in terms of speed, readable queries and maintainability.
I hope you are just maintaining the code and it's not a brand new implementation.
Please consider to use a different approach using a support table like this:
JOBS TABLE
jobID | userID
--------------
1 | user13
2 | user32
3 | user44
--------------
PENDING TABLE
pendingID | pendingText
---------------------------
01 | Not Enough Info
02 | Not Enough Time
---------------------------
JOB_PENDING TABLE
jobID | pendingID
-----------------
1 | 01
1 | 02
2 | 01
3 | 03
3 | 01
-----------------
You can easily query this tables using JOIN or subqueries.
If you need retro-compatibility on your software you can add a view to reach this goal.
I have a tables like:
Events
---------
eventId int
eventTypeIds nvarchar(50)
...
EventTypes
--------------
eventTypeId
Description
...
Each Event can have multiple eventtypes specified.
All I do is write 2 procedures in my site code, not SQL code
One procedure converts the table field (eventTypeIds) value like "3,4,15,6" into a ViewState array, so I can use it any where in code.
This procedure does the opposite it collects any options your checked and converts it in
If changing the schema is an option (which it probably should be) shouldn't you implement a many-to-many relationship here so that you have a bridging table between the two items? That way, you would store the number and its wording in one table, jobs in another, and "failure reasons for jobs" in the bridging table...
Have a look at a similar question I answered here
;WITH Numbers AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS N
FROM JobId
),
Split AS
(
SELECT JOB_NUMBER, USER_ASSIGNED, SUBSTRING(PENDING_INFO, Numbers.N, CHARINDEX(',', PENDING_INFO + ',', Numbers.N) - Numbers.N) AS PENDING_NUM
FROM JobId
JOIN Numbers ON Numbers.N <= DATALENGTH(PENDING_INFO) + 1
AND SUBSTRING(',' + PENDING_INFO, Numbers.N, 1) = ','
)
SELECT *
FROM Split JOIN Pending ON Split.PENDING_NUM = Pending.PENDING_NUM
The basic idea is that you have to multiply each row as many times as there are PENDING_NUMs. Then, extract the appropriate part of the string
While I agree with DBA perspective not to store multiple values in a single field it is doable, as bellow, practical for application logic and some performance issues. Let say you have 10000 user groups, each having average 1000 members. You may want to have a table user_groups with columns such as groupID and membersID. Your membersID column could be populated like this:
(',10,2001,20003,333,4520,') each number being a memberID, all separated with a comma. Add also a comma at the start and end of the data. Then your select would use like '%,someID,%'.
If you can not change your data ('01,02,03') or similar, let say you want rows containing 01 you still can use " select ... LIKE '01,%' OR '%,01' OR '%,01,%' " which will insure it match if at start, end or inside, while avoiding similar number (ie:101).