Order enrollment records by date per student - sql

I essentially have a table in MS Access that has student enrollment records like this:
Student ID, Enrollment Date, Enrollment Code
12345, 8/25/2014, E01
12345, 9/5/2014, WD02
12345, 10/3/2014, E01
23456, 8/25/2014, E01
34567, 8/25/2014, E01
34567, 10/01/2014, WD03
The above basically would mean that student 12345 enrolled on 8/25, withdrew on 9/5, and re-enrolled on 10/3; Student 23456 enrolled on 8/25 and is still enrolled; Student 34567 enrolled on 8/25, withdrew on 10/1 and is still withdrawn.
I need to check the order of these records and make sure that we don't have two enrollment records without a withdraw in between and other similar logical errors, as that clearly doesn't make sense.
Here's the issue: I can't figure out for the life of me how to rank these records in Access! Here's what I would like to end up with:
Student ID, Enrollment Date, Enrollment Code, Rank
12345, 8/25/2014, E01, 1
12345, 9/5/2014, WD02, 2
12345, 10/3/2014, E01, 3
23456, 8/25/2014, E01, 1
34567, 8/25/2014, E01, 1
34567, 10/01/2014, WD03, 2
So the rank should start over at every student. This way I can check that every record with a rank of an odd number is an E01 (since that is the only valid entry code) and that each even number is like "WD*", etc. It's not that hard to check right now because as of October we don't have that much movement, but as kiddos start transferring and coming in and out this starts to take hours to look at if you need to look at every student that has more than one record (what I'm currently doing).
Any help would be greatly appreciated. The eventual goal is to automate this in a macro so that it just spits out any crazy records each week and we just fix and move on without having to review every kid that moved.

Try this: You may have to change the field names/table name.
SELECT s.StudentID, s.EnrollmentDate, s.enrollmentcode,
(
SELECT Count(*)
FROM students AS t
WHERE t.enrollmentdate < students.enrollmentdate
AND s.studentid = t.studentid
)+1 AS Test
FROM students AS s
ORDER BY s.StudentID, s.EnrollmentDate;
We basically just use the Enrollment Date as a means to order the records to help us build out the row number. (I don't think anyone can withdraw before they enroll.)

Related

How to get the values corresponding to another table?

I'm new to SQL and am a bit confused on how I would write a query in order to get the count of state in a different table.
Ie i have this table [student]
id
school_code
0
0123
1
2345
2
2345
And this other table [school]
school_code
name
State
0123
xxyy
New Jersey
2345
xyxy
Washington
3456
yxyx
Colarado
I want to find out how I would get this table which tells me the entries for state by checking each student and making a count of how often that state occurs, ordered by most occurrences in student table.
State
No. times occured (iterating through student)
Washington
2
New Jersey
1
SELECT school.state, count(school.state)
FROM student, school
WHERE student.school_code = school.school_code
GROUP BY school.state
ORDER BY count(school.state)`
I'm not sure whether this would be iterating through each student and counting them?
Or just natural-joinging student and school and then counting all the states
When I run this on data supplied, the numbers of times occurred is a really low number which doesn't seem right?
We can simply JOIN the two tables and COUNT the school code in the students table, with GROUP BY state:
SELECT
sc.state, COUNT(st.school_code)
FROM
school sc
JOIN student st
ON sc.school_code = st.school_code
GROUP BY sc.state;
We can try out here: db<>fiddle

Query that will identify rows that should be identical (Except with 2 exceptions)

we have a customer table and some customers appear multiple times if they are available for multiple offices. If someone adds a customer multiple times, I want to be able to run a report to see if they have done it correctly for each location, but I don't really have a clue where to start. Here is an example
UID, Office, CustomerCode, CUSTOMERNAME, CUSTOMERHEADOFFICEPHONE
001, Manchester, 123, 123 Supplied Ltd, 0161 123456
002, London, 123, 123 Suppls Ltd, 0161 123446
So we can sell stuff to customer '123' from both our Manchester and London offices. The report should focus on the CustomerCode and make sure whenever '123' appears all the CAPITAL rows should be the same with the exception of UID (Always different), Office (Naturally that is different). So the report would see that not all the CustomerNames are identical and they have also made a mistake on the CustomerHeadOfficePhone. If I had thousands of rows and all the other 'multi-customers' matched up, I'd want my report to just show these 2 rows. How would I go about that please?
I would try:
select CustomerCode, CustomerName, CustomerHeadOfficePhone, count(*)
from TableName
group by CustomerCode, CustomerName, CustomerHeadOfficePhone
having count(*) > 1
That will list every occurrence of the records with different values, which you would need if you were going to delete them. You could shorten the list by using DISTINCT, and you could present the UIDs in a better way with an additional query around this one. But it seems that you have only asked for the list of things to fix, and I believe this will give it to you.

match tables with intermediate mapping table (fuzzy joins with similar strings)

I'm using BigQuery.
I have two simple tables with "bad" data quality from our systems. One represents revenue and the other production rows for bus journeys.
I need to match every journey to a revenue transaction but I only have a set of fields and no key and I don't really know how to do this matching.
This is a sample of the data:
Revenue
Year, Agreement, Station_origin, Station_destination, Product
2020, 123123, London, Manchester, Qwerty
Journeys
Year, Agreement, Station_origin, Station_destination, Product
2020, 123123, Kings Cross, Piccadilly Gardens, Qwer
2020, 123123, Kings Cross, Victoria Station, Qwert
2020, 123123, London, Manchester, Qwerty
Every station has a maximum of 9 alternative names and these are stored in a "station" table.
Stations
Station Name, Station Name 2, Station Name 3,...
London, Kings Cross, Euston,...
Manchester, Piccadilly Gardens, Victoria Station,...
I would like to test matching or joining the tables first with the original fields. This will generate some matches but there are many journeys that are not matched. For the unmatched revenue rows, I would like to change the product name (shorten it to two letters and possibly get many matches from production table) and then station names by first change the station_origin and then station_destination. When using a shorter product name I could possibly get many matches but I want the row from the production table with the most common product.
Something like this:
1. Do a direct match. That is, I can use the fields as they are in the tables.
2. Do a match where the revenue.product is changed by shortening it to two letters. substr(product,0,2)
3. Change the rev.station_origin to the first alternative, Station Name 2, and then try a join. The product or other station are not changed.
4. Change the rev.station_origin to the first alternative, Station Name 2, and then try a join. The product is changed as above with a substr(product,0,2) but rev.station_destination is not changed.
5. Change the rev.station_destination to the first alternative, Station Name 2, and then try a join. The product or other station are not changed.
I was told that maybe I should create an intermediate table with all combinations of stations and products and let a rank column decide the order. The station names in the station's table are in order of importance so "station name" is more important than "station name 2" and so on.
I started to do a query with a subquery per rank and do a UNION ALL but there are so many combinations that there must be another way to do this.
Don't know if this makes any sense but I would appreciate any help or ideas to do this in a better way.
Cheers,
Cris
To implement a complex joining strategy with approximate matching, it might make more sense to define the strategy within JavaScript - and call the function from a BigQuery SQL query.
For example, the following query does the following steps:
Take the top 200 male names in the US.
Find if one of the top 200 female names matches.
If not, look for the most similar female name within the options.
Note that the logic to choose the closest option is encapsulated within the JS UDF fhoffa.x.fuzzy_extract_one(). See https://medium.com/#hoffa/new-in-bigquery-persistent-udfs-c9ea4100fd83 to learn more about this.
WITH data AS (
SELECT name, gender, SUM(number) c
FROM `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY 1,2
), top_men AS (
SELECT * FROM data WHERE gender='M'
ORDER BY c DESC LIMIT 200
), top_women AS (
SELECT * FROM data WHERE gender='F'
ORDER BY c DESC LIMIT 200
)
SELECT name male_name,
COALESCE(
(SELECT name FROM top_women WHERE name=a.name)
, fhoffa.x.fuzzy_extract_one(name, ARRAY(SELECT name FROM top_women))
) female_version
FROM top_men a

Get the row with the max date value with criteria - access 2007/2010

My main table, from which I take all the data from is "RequestTable" (I reduced it down to make it easier) in which I have:
ID_student
ID_professor
Date (and the three altogether are primary keys)
changeprofessor-note - if student wants to change the professor
then he/she should write in that field a sentence
why he/she wants to do the change
professor-reject-note - if the professor is not happy about the work of
the student, then he can choose not to mentor that
student anymore, leaving him without a mentor and the
student should choose another mentor later.
ID-seminar- after choosing a mentor the students
can choose the seminar they want to work on
changeofSeminar-note - if the student wants to change the seminar
then they need to write the reason why in here
(then the ID of the new seminar should be written in
the ID seminar field also)
IDapprove-reject - all approving or rejecting is going through this field
My initial theory was that the students could choose the mentor and the seminar in one row, but it seems too complicated now because I have no idea how to make everything work after changing mentors, declined mentoring, changing seminars and so on.
I set a more comfortable theory that all the students need to choose the mentor first. So that I could get easier the data of mentoring when needed. And I set "is null" in the query under the "ID_seminar" and "changeofseminar-note" because any changes on just the seminar part can't affect the rows where the students chosen their mentors/professors and got approved.
I implemented your code and got this:
SELECT [requesttable].ID_Student, Max([requesttable].Datum) AS MaxOfDatum, First([requesttable].ID_Profesor) AS ID_Profesor, [requesttable].ID_status_odobrenja
FROM [requesttable]
WHERE ((([requesttable].ID_Student) Not In (SELECT [ID_Student]
FROM [requesttable]
WHERE [IDapprove-reject] IS NOT NULL )))
GROUP BY [requesttable].ID_Student, [requesttable].IDapprove-reject, [requesttable].changeseminar-note, [requesttable].ID_seminar
HAVING ((([requesttable].IDapprovereject)=1) AND (([requesttable].changeseminar-note) Is Null) AND (([requesttable].Id_seminar) Is Null))
ORDER BY [requesttable].ID_Student, Max([requesttable].Datum), First([requesttable].ID_Profesor), [requesttable].IDapproved-reject;
And i get:
3 12 1
15 11 1
55 5 1
And I need:
3 6 1
15 6 1
52 5 1 - after being rejected by mentor 10,
the student choose another mentor (id 5) and got approved.
55 5 1
Old info below:
I got my query to this point and two other data are set to show only rows with null values to get this:
ID student Id professor date professor-reject-note ID accept/reject
3 12 12.11.2012 null 1
3 6 13.11.2012 null 1
52 10 12.11.2012 null 1
52 10 15.11.2012 NOT null 1
55 5 12.11.2012 null 1
I want my results to be
3 6 12.10.2013 null 1
15 6 7.1.2013 null 1
55 5 12.11.2012 null 1
Totally exclude StudentID 52 because of the professor-reject-note meaning the professor doesn't want to mentor the student anymore. Also I have a doubt about the ID accept/reject number in that option , maybe I could set it to 2 instead of 1 to make it easier. 1 means accepted, 2 would mean rejected, but if I set it to 2 and exclude the entire row I still can't get rid of the other ID 52 row. I'm a bit confused about it and have no clue how make it work.
If I set date to maxdate and Id professor to group by FIRST I almost get what I want, all the data is right except the Student ID 52 is still there - both rows.
You could use:
SELECT t.[id student],
t.[id professor],
t.DATE,
t.[professor-reject-note],
t.[id accept/reject]
FROM atable t
WHERE t.[id student] NOT IN
(SELECT [id student]
FROM atable
WHERE [professor-reject-note] IS NOT NULL)
Your field / column names could do with some work.

Group by a field not in select

I want to find how many modules a lecturer taught in a specific year and want to select name of the lecturer and the number of modules for that lecturer.
Problem is that because I am selecting Name, and I have to group it by name to make it work. But what if there are two lecturers with same name? Then sql will make them one and that would be wrong output.
So what I really want to do is select name but group by id, which sql is not allowing me to do. Is there a way around it?
Below are the tables:
Lecturer(lecturerID, lecturerName)
Teaches(lecturerID, moduleID, year)
This is my query so far:
SELECT l.lecturerName, COUNT(moduleID) AS NumOfModules
FROM Lecturer l , Teaches t
WHERE l.lecturerID = t.lecturerID
AND year = 2011
GROUP BY l.lecturerName --I want lectureID here, but it doesn't run if I do that
SELECT a.lecturerName, b.NumOfModules
FROM Lecturer a,(
SELECT l.lecturerID, COUNT(moduleID) AS NumOfModules
FROM Lecturer l , Teaches t
WHERE l.lecturerID = t.lecturerID
AND year = 2011
GROUP BY l.lecturerID) b
WHERE a.lecturerID = b.lecturerID
You should probably just group by lecturerID and include it in the select column list. Otherwise, you're going to end up with two rows containing the same name with no way to distinguish between them.
You raise the problem of "wrong output" when grouping just by name but "undecipherable output" is just as big a problem. In other words, your desired output (grouping by ID but giving name):
lecturerName Module
------------ ------
Bob Smith 1
Bob Smith 2
is no better than your erroneous output (grouping by, and giving, name):
lecturerName Module
------------ ------
Bob Smith 3
since, while you now know that one of the lecturers taught two modules and the other taught one, you have no idea which is which.
The better output (grouping by ID and displaying both ID and name) would be:
lecturerId lecturerName Module
---------- ------------ ------
314159 Bob Smith 1
271828 Bob Smith 2
And, yes, I'm aware this doesn't answer your specific request but sometimes the right answer to "How do I do XYZZY?" is "Don't do XYZZY, it's a bad idea for these reasons ...".
Things like writing operating systems in COBOL, accounting packages in assembler, or anything in Pascal come to mind instantly :-)
You could subquery your count statement.
SELECT lecturername,
(SELECT Count(*)
FROM teaches t
WHERE t.lecturerid = l.lecturerid
AND t.year = 2011) AS NumOfModules
FROM lecturer l
Note there are other ways of doing this. If you also wanted to elimiate the rows with no modules you can then try.
SELECT *
FROM (SELECT lecturername,
(SELECT Count(*)
FROM teaches t
WHERE t.lecturerid = l.lecturerid
AND t.year = 2011) AS NumOfModules
FROM lecturer l) AS temp
WHERE temp.numofmodules > 0