Left Join with same table and group by is returning duplicated tuples with reverse order

Left Join with same table and group by is returning duplicated tuples with reverse order - sql

I'm trying to query a database(not owned by me) that contains the following columns :
NumEpoca (epoch), Turma(class), Dia (day - indicates day of the week),
Hora (hour - each value indicates a 30mins time, a 3h class generate 6 tuples),
Disciplina (course), TipoAula (type of class, theoretical or practical),
Sala (classroom)
This is basically a class schedule, so for a given Class, in the same day of the week I can have in one week a practical class and in the other a theoretical class.
Now, I want to get for a given day, the minimum and max hour (so I can calculate the starting and ending hour of the class), but I also want to get the classrooms for them, and I don't know at priori, if it's going to be a theoretical or practical class.
Also, certain classes are only practical, and some are only theoretical, so I just want 1 classroom.
The query I'm doing, gives me basically everything but
LectureDetails(beginDate=2020-05-26 15:30:00, endDate=2020-05-26 18:30:00, classroom=L_H1/G.0.08)
LectureDetails(beginDate=2020-05-26 15:30:00, endDate=2020-05-26 18:30:00, classroom=G.0.08/L_H1)
as you can see here, I get the same class (that starts and ends at the same hour, for the 2 classrooms Teo and Pract. But I only need 1 tuple for that day and I'm getting L_H1/G.0.08 and G.0.08/L_H1.
"SELECT a1.Dia,MIN(a1.Hora),MAX(a1.Hora),a1.Sala, a2.Sala FROM Aulas as a1 LEFT JOIN Aulas as a2 " +
"on a1.Sala <> a2.Sala and a1.Disciplina = a2.Disciplina and a1.NumEpoca = a2.NumEpoca and a1.Turma = a2.Turma " +
"and a1.Dia= a2.Dia and a1.Hora = a2.Hora " +
"where NumEpoca = ? AND Turma = ? AND Disciplina=? GROUP BY a1.Dia,a1.Sala,a2.Sala"
Thanks in advance.

You could have eliminated the mis-ordered results by using a1.Sala < a2.Sala instead of the inequality.
But that's not really the approach you want anyway. Try something like this:
SELECT Dia, MIN(Hora), MAX(Hora),
MIN(Sala),
CASE WHEN MIN(Sala) = MAX(Sala) THEN NULL ELSE MAX(Sala) END
FROM Aulas
WHERE NumEpoca = ? AND Turma = ? AND Disciplina = ?
GROUP BY Dia

Related

Defaulting missing data

I have a complex set of schema that I am trying to pull data out of for a report. The query for it joins a bunch of tables together and I am specifically looking to pull a subset of data where everything for it might be null. The original relations for the tables look as such.
Location.DeptFK
Dept.PK
Section.DeptFK
Subsection.SectionFK
Question.SubsectionFK
Answer.QuestionFK, SubmissionFK
Submission.PK, LocationFK
From here my problems begin to compound a little.
SELECT Section.StepNumber + '-' + Question.QuestionNumber AS QuestionNumberVar,
Question.Question,
Subsection.Name AS Subsection,
Section.Name AS Section,
SUM(CASE WHEN (Answer.Answer = 0) THEN 1 ELSE 0 END) AS NA,
SUM(CASE WHEN (Answer.Answer = 1) THEN 1 ELSE 0 END) AS AnsNo,
SUM(CASE WHEN (Answer.Answer = 2) THEN 1 ELSE 0 END) AS AnsYes,
(select count(distinct Location.Abbreviation) from Department inner join Plant on location.DepartmentFK = Department.PK WHERE(Department.Name = 'insertParameter'))
as total
FROM Department inner join
section on Department.PK = section.DepartmentFK inner JOIN
subsection on Subsection.SectionFK = Section.PK INNER JOIN
question on Question.SubsectionFK = Subsection.PK INNER JOIN
Answer on Answer.QuestionFK = question.PK inner JOIN
Submission on Submission.PK = Answer.SubmissionFK inner join
Location on Location.DepartmentFK = Department.PK AND Location.pk = Submission.PlantFK
WHERE (Department.Name = 'InsertParameter') AND (Submission.MonthTested = '1/1/2017')
GROUP BY Question.Question, QuestionNumberVar, Subsection.Name, Section.Name, Section.StepNumber
ORDER BY QuestionNumberVar;
There are 15 total locations, with this query I get 12. If I remove a relation in the join for Location I get 15 total locations but my answer data gets multiplied by 15. My issue is that not all locations are required to test at the same time so their answers should default to NA, They don't get records placed in the DB so the relationship between Location/Submission is absent.
I have a workaround almost in place via the select count distinct but, The second part is a query for finding what each location answered instead of a sum which brings the problem right back around. It also has to be dynamic because the input parameters for a department won't bring a static number of locations back each time.
I am still learning my SQL so any additional material to look at for building this query would also be appreciated. So I guess the big question here is, How would I go about creating default data in this query for anytime the Location/Submission relation has a null value?
Edit: Dummy Data
QuestionNumberVar | Section | Subsection | Question | AnsYes | AnsNo | NA (expected)
1-1.1 Math Algebra Did you do your homework? 10 1 1(4)
1-1.2 Math Algebra Did your dog eat it? 9 3 0(3)
2-1.1 English Greek Did you do your homework? 8 0 4(7)
I have tried making left joins at various applicable portions of the code to no avail. All attempts at left joins have ended with no effect on info output. This query feeds into the Dataset for an SSRS report. There are a couple workarounds for this particular section via an expression to take total Locations and subtract AnsYes and AnsNo to get the true NA value but as explained above doesn't help with my next query.
Edit: SQL Server 2012 for those who asked
Edit: my attempt at an isnull() on the missing data returns nothing I suspect because the query already eliminates the "null/missing" data. Left joining while doing this has also failed. The point of failure is on Submissions. if we bind it to Locations there are locations missing but if we don't bind it there are multiplied duplicates because Department has a One-To-Many with Location and not vice versa. I am unable to make any schema changes to improve this process.
There is a previous report that I am trying to emulate/update. It used C# logic to process data and run multiple queries to attain the same data. I don't have this luxury. (previous report exports to excel directly instead of SSRS). Here is the previous logic used.
select PK from Department where Name = 'InsertParameter';
select PK from Submission where LocationFK = 'Location.PK_var' and MonthTested = '1/1/2017'
Then it runs those into a loop where it processes nulls into NA using C# logic
EDIT (Mediocre Solution): I ended up doing the workaround of making a calculated field that subtracts Yes and No from the total # of Locations that have that Dept. This is a mediocre solution because I didn't solve my original problem and made 3 datasets that should have been displayed as a singular dataset. One for question info, one for each locations answer and one for locations that didnt participate. If a true answer comes up I will check its validity but for now, Problem psuedo solved.

filtering where start & end Dates between selected Date

I have a fact Table that contains people & Thier qualifications. The fact table has two dates, the StartStudyDate & EndStudyDate. This represents the period the students were studying.
I then have a Person Dimension, Qualification Dimension, a Grouping Dimension & one Date Dimension.
Im trying to find a count of students who were actively studying on a particular date.
In SQL its relatively simple:
select a.PaygroupDescription, a.[Qualification], count(a.[PersonID])
from (
select distinct p.[PersonID], PaygroupDescription, q.[Qualification]
from [hr].[Appointment Detail] ad
join hr.Paygroup pg on ad.PaygroupID = pg.PaygroupID
join hr.Qualification q on q.QualificationID = ad.QualificationID
join hr.Person p on p.PersonID = ad.PersonID
join dimdate sd on sd.DateID = ad.StartDateID
join dimDate ed on ed.DateID = ad.EndDateID
where sd.date <= 20150101 and ed.date >= 20150101
) as a
group by a.PaygroupDescription, a.[Qualification]
The problem is i cant figure out how to do this in dax.
I started out by adding two columns to the fact table in the TabularModel:
ActualStartDate
=LOOKUPVALUE(
'Date'[Date],
'Date'[DateID],
'Appointment Detail'[StartDateID])
ActualEndDate
=LOOKUPVALUE(
'Date'[Date],
'Date'[DateID],
'Appointment Detail'[EndDateID])
I then wrote the measure that checks if one date is selected from DimDate, it gets all distinct rows where the selectedDate is <= ActualStartDate && >= ActualEndDate.
Problem is that this behaves like an absolute dog. if i try to add any attributes for breaking the data down, i run out of memory (at least in 32bit excel). I know i could try 64 bit excel, but my dataset is small, so memory should not be an issue. This is before i even add filters to the calculation for specific qualifications:
EmployeeCount:=if(HASONEVALUE('Date'[Date]),
CALCULATE
(distinctcount('Appointment Detail'[PersonID]),
DATESBETWEEN('Date'[Date], min('Appointment Detail'[ActualStartDate]),Max( 'Appointment Detail'[ActualEndDate]))
)
,BLANK())
Id appreciate help in understanding the problem correctly as im obviously missing something here regarding the problem & my dax experience is also very light.

I would remove the relationships to DimDate and then use something like the following pattern for your measure:
EmployeeCount := CALCULATE( COUNTROWS ( 'Person' ),
FILTER('Appointment Detail',
'Appointment Detail'[ActualStartDate] <= MAX(DimDate[Date])
&& 'Appointment Detail'[ActualEndDate] >= MIN(DimDate[Date])
))

Find whether something booked in a time range

I'm having difficulty trying to find whether a room is booked within a certain time range. I store the starttime and endtime of a room in a table called periodInformation.
I can't quite get my head around time/range in sql. I need my query below to interrogate pi.StartTime and pi.EndTime and see whether the room is booked. In my example below user has typed in they want a room from 08:20 to 09:20. I want to ensure whilst searching pi.StartTime and pi.EndTime I also get results back if room A1 was booked from 8:10 - 8:30 or 8:10-9:30 etc.
I think I've selected the appropriate joins, can only find examples of joining two tables, here I'm joining 4 tables to figure out whether a room is booked or not.
ExecuteSQL ( "
SELECT lr.pk_LessonRoomID
FROM LESSONROOM AS lr
LEFT JOIN Lesson AS l ON lr.fk_LessonID = l.pk_LessonID
LEFT JOIN PeriodInformation AS pi ON l.fk_PeriodListID = pi.fk_PeriodListID
LEFT JOIN PeriodList AS pl ON pi.fk_PeriodListID = pl.pk_PeriodListID
WHERE lr.fk_RoomID = ? AND pl.DayShort = ? AND pi.StartTime >=? AND pi.EndTime <=?" ;
"";
"" ;
"A1" ; "Mon" ; "08:20" ; "09:20")

For a room to be BOOKED between 8:20 and 9:20, it would have to have a lesson that ENDS after 8:20, and STARTS before 9:20.
So if you change your last two conditions to this:
AND pi.EndTime >=? AND pi.StartTime <=?
That should cover it.

Calculating Dates

I have this problem: List of customers with their next scheduled, reoccurring appointment, that is either yearly, monthly, or quarterly.
The tables\columns I have are:
customer
customer_ID
service
customer_ID
service_RecID
Resource
service_RecID
Recurrence_RecID
Date_Time_Start
Recurrence
Recurrence_RecID
RecurType
RecurInterval
DaysOfWeek
AbsDayNbr
SelectInterval
It is modeled such that when the schedule is setup, the date_start_time is the date of when the first reoccurring appointment took place. Ex.
Recurrence_RecID = 10
RecurType = m (could be y, or d as well for yearly or daily)
RecurInterval = 6 (if recurType = y, this would mean every 6 years)
Given that the system generates these nightly, how would I write a query to calculate the next scheduled appointment, for each customer? I originally thought of using the Resource.Date_Time_Start and just cycling through until a variable nextAppointment >= today(), but is it good practice to run loops in SQL?
If anymore info is needed, let me know. Thank you much!
Edit: I will make a sqlfiddle.

I would suggest using a sub-query as opposed to looping. More efficient that way. This may not be exact but something like...
SELECT
*
FROM
(
SELECT
customer.customer_id,
service.service_RecID,
Resource.Date_Time_Start,
Recurrence.Recurrence_RecID,
RecurType,
RecurInterval,
DaysOfWeek,
AbsDayNbr,
SelectInterval,
NextAppointmentDate=
CASE
WHEN RecurType='m' THEN DATEADD(MONTH,RecurInterval,Resource.Date_Time_Start)
WHEN RecurType='y' THEN DATEADD(YEAR,RecurInterval,Resource.Date_Time_Start)
ELSE
NULL
END
FROM
Recurrence
INNER JOIN Resource ON Resource.Recurrence_RecID=Recurrence.Recurrence_RecID
INNER JOIN service ON service.service_RecID=Resource.service_RecID
INNER JOIN customer ON customer.customer_ID=service.customerID
)AS X
WHERE
NextAppointmentDate>=GETDATE()
ORDER BY Fields...

MySQL to return only last date / time record

We have a database that stores vehicle's gps position, date, time, vehicle identification, lat, long, speed, etc., every minute.
The following select pulls each vehicle position and info, but the problem is that returns the first record, and I need the last record (current position), based on date (datagps.Fecha) and time (datagps.Hora). This is the select:
SELECT configgps.Fichagps,
datacar.Ficha,
groups.Nombre,
datagps.Hora,
datagps.Fecha,
datagps.Velocidad,
datagps.Status,
datagps.Calleune,
datagps.Calletowo,
datagps.Temp,
datagps.Longitud,
datagps.Latitud,
datagps.Evento,
datagps.Direccion,
datagps.Provincia
FROM asigvehiculos
INNER JOIN datacar ON (asigvehiculos.Iddatacar = datacar.Id)
INNER JOIN configgps ON (datacar.Configgps = configgps.Id)
INNER JOIN clientdata ON (asigvehiculos.Idgroup = clientdata.group)
INNER JOIN groups ON (clientdata.group = groups.Id)
INNER JOIN datagps ON (configgps.Fichagps = datagps.Fichagps)
Group by Fichagps;
I need same result I'm getting, but instead of the older record I need the most recent
(LAST datagps.Fecha / datagps.Hora).
How can I accomplish this?

Add ORDER BY datagps.Fecha DESC, datagps.Hora DESC LIMIT 1 to your query.

I'm not sure why you are having any problems with this as Lex's answers seem good.
I would start putting ORDER BY's in your query so it puts them in an order, when it's showing the record you want as the first one in the list, then add the LIMIT.
If you want the most recent, then the following should be good enough:
ORDER BY datagps.Fecha DESC, datagps.Hora DESC
If you simply want the record that was added to the database most recently (irregardless of the date/time fields), then you could (assuming you have an auto-incremental primary key in the datagps table (I assume it's called dataID for this example)):
ORDER BY datagps.dataID DESC
If these aren't showing the data you want - then there is something missing from your example (maybe data-types aren't DATETIME fields? - if not - then maybe a CONVERT to change them from their current type before ORDERing BY would be a good idea)
EDIT:
I've seen the screenshot and I'm confused as to what the issue is still. That appears to be showing everything in order. Are you implying that there are many more than 5 records? How many are you expecting?

Do you mean: for each record returned, you want the one row from the table datagps with the latest date and time attached to the result? If so, how about this:
# To show how the query will be executed
# comment to return actual results
EXPLAIN
SELECT
configgps.Fichagps, datacar.Ficha, groups.Nombre, datagps.Hora, datagps.Fecha,
datagps.Velocidad, datagps.Status, datagps.Calleune, datagps.Calletowo,
datagps.Temp, datagps.Longitud, datagps.Latitud, datagps.Evento,
datagps.Direccion, datagps.Provincia
FROM asigvehiculos
INNER JOIN datacar ON (asigvehiculos.Iddatacar = datacar.Id)
INNER JOIN configgps ON (datacar.Configgps = configgps.Id)
INNER JOIN clientdata ON (asigvehiculos.Idgroup = clientdata.group)
INNER JOIN groups ON (clientdata.group = groups.Id)
INNER JOIN datagps ON (configgps.Fichagps = datagps.Fichagps)
########### Add this section
LEFT JOIN datagps b ON (
configgps.Fichagps = b.Fichagps
# wrong condition
#AND datagps.Hora < b.Hora
#AND datagps.Fecha < b.Fecha)
# might prevent indexes to be used
AND (datagps.Fecha < b.Fecha OR (datagps.Fecha = b.Fecha AND datagps.Hora < b.Hora))
WHERE b.Fichagps IS NULL
###########
Group by configgps.Fichagps;
Similar question here only that that one uses outer joins.
Edit (again):
The conditions are wrong so corrected it. Can you show us the output of the above EXPLAIN query so we can pinpoint where the bottle neck is?
As hurikhan77 said, it will be better if you could convert both of the the columns into a single datetime field - though I'm guessing this would not be possible for your case (since your database is already being used?)
Though if you can convert it, the condition (on the join) would become:
AND datagps.FechaHora < b.FechaHora
After that, add an index for datagps.FechaHora and the query would be fast(er).

What you probably want is getting the maximum of (Fecha,Hora) per grouped dataset? This is a little complicated to accomplish with your column types. You should combine Fecha and Hora into one column of type DATETIME. Then it's easy to just SELECT MAX(FechaHora) ... GROUP BY Fichagps.
It could have helped if you posted your table structure to understand the problem.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas