Am I doing the grouping by right?

Am I doing the grouping by right? - sql

I have a system that logs errors. Selection from my errortable:
SELECT message, personid, count(*)
FROM errorlog
WHERE time BETWEEN TO_DATE(foo) AND TO_DATE(foo) AND substr(message,0,3) = 'ERR'
GROUP BY personid, message
ORDER BY 3
What I want is to see if any user is "producing" more errors then others. For instance ERROR FOO, if user A has 4 errors and user B has 4000, then logic strikes me that user B is doing something wrong.
But can I group the way I do? This is a modified version where the selection only grouped message and counted it, resolving so that ERROR FOO resulted in 4004 from my example over.

With your query, if the messages are different, then you will get multiple records per person.
If you only want one record per person, you would need to put an aggregate function around message
For example you could do:
SELECT MIN(message), personid, count(*)
FROM errorlog
WHERE time BETWEEN TO_DATE(foo) AND TO_DATE(foo) AND substr(message,0,3) = 'ERR'
GROUP BY personid, message
ORDER BY 3
Here I've changed message to MIN(message) which will return the first message for this person, alphabetically.
However if you are happy to return multiple records per person, then I see no problem with your script. It will show a list of personid and message ordered by the ones which are in the table the most, displaying only records which have a message starting with ERR

Related

Stream Analytics Left outer join Not Producing Rows

What I am trying to do:
I want to "throttle" an input stream to its output. Specifically, as I receive multiple similar inputs, I only want to produce an output if one hasn't already been produced in the last N hours.
For example, the input could be thought of as "send an email", but I will get dozens/hundreds of those events. I only want to send an email if I haven't already sent one in the last N hours (or have never sent one).
See the final example here: https://learn.microsoft.com/en-us/stream-analytics-query/join-azure-stream-analytics#examples for something similar to what I am trying to do
What my setup looks like:
There are two inputs to my query:
Ingress: this is the "raw" input stream
Throttled-Sent: this is just a consumer group off of my output stream
My query is as follows:
WITH
AllEvents as (
/* This CTE is here only because we can't seem to use GetMetadataPropertyValue in a join clause, so "materialize" it here for use- later */
SELECT
*,
GetMetadataPropertyValue([Ingress], '[User].[Type]') AS Type,
GetMetadataPropertyValue([Ingress], '[User].[NotifyType]') AS NotifyType,
GetMetadataPropertyValue([Ingress], '[User].[NotifyEntityId]') AS NotifyEntityId
FROM
[Ingress]
),
UseableEvents as (
SELECT *
FROM AllEvents
WHERE NotifyEntityId IS NOT NULL
),
AlreadySentEvents as (
/* These are the events that would have been previously output (referenced here via a consumer group). We want to capture these to make sure we are not sending new events when an older "already-sent" event can be found */
SELECT
*,
GetMetadataPropertyValue([Throttled-Sent], '[User].[Type]') AS Type,
GetMetadataPropertyValue([Throttled-Sent], '[User].[NotifyType]') AS NotifyType,
GetMetadataPropertyValue([Throttled-Sent], '[User].[NotifyEntityId]') AS NotifyEntityId
FROM
[Throttled-Sent]
)
SELECT i.*
INTO Throttled
FROM UseableEvents i
/* Left join our sent events, looking for those within a particular time frame */
LEFT OUTER JOIN AlreadySentEvents s
ON i.Type = s.Type
AND i.NotifyType = s.NotifyType
AND i.NotifyEntityId = s.NotifyEntityId
AND DATEDIFF(hour, i, s) BETWEEN 0 AND 4
WHERE s.Type IS NULL /* The is null here is for only returning those Ingress rows that have no corresponding AlreadySentEvents row */
The results I'm seeing:
This query is producing no rows to the output. However, I believe it should be producing something because the Throttled-Sent input has zero rows to begin with. I have validated that my Ingress events are showing up (by simply adjusting the query to remove the left join and checking the results).
I feel like my problem is probably linked to one of the following areas:
I can't have an input that is a consumer group off of the output (but I don't know why that wouldn't be allowed)
My datediff usage/understanding is incorrect
Appreciate any help/guidance/direction!

For throttling, I would recommend looking at IsFirst function, it might be easier solution that will not require reading from the output.
For the current query, I think order of DATEDIFF parameters need to be changed as s comes before i: DATEDIFF(hour, s, i) BETWEEN 0 AND 4

GROUP BY/SELECT DISTINCT issues

I have gone through dozens of questions on this site and others, but am still having issues adding a GROUP BY to my code. One came close, but not sure I fully understand it:
SQL/mysql - Select distinct/UNIQUE but return all columns?
I'm working on a program that tracks tracking numbers for orders shipped, but need each sales order only listed once as I need to assign a status to each order. PackageReferenceNo1 contains the sales order numbers, but I only need one tracking number for each order. The following code pulls all tracking numbers and their sales orders. My thought was if I could SELECT DISTINCT, or GROUP BY, I could resolve this issue but haven't had any luck with either option.
My working code is:
SELECT tblImportUPS.ManifestDate,
tblImportUPS.TrackingNumber,
tblImportUPS.PackageReferenceNo1,
tblImportUPS.PackageReferenceNo2,
tblImportUPS.STATUS,
tblTrackingLinks.Notes,
tblImportUPS.ShipToName,
tblTrackingLinks.PGId,
tblTrackingLinks.BBReleased,
tblTrackingLinks.BBReleaseDate,
tblTrackingLinks.SRFNumber,
tblImportUPS.ShipToCity,
tblImportUPS.[ShipToState/Province],
tblImportUPS.ShipToCountry,
tblImportUPS.ShipperName,
tblImportUPS.NumberofPieces,
tblImportUPS.Weight,
tblImportUPS.ScheduledDelivery,
tblImportUPS.DateDelivered
FROM tblTrackingLinks
INNER JOIN tblImportUPS ON (tblTrackingLinks.TrackingNumber = tblImportUPS.TrackingNumber)
AND (tblTrackingLinks.TrackingNumber = tblImportUPS.TrackingNumber)
WHERE (((tblImportUPS.PackageReferenceNo1) LIKE "*56*")
AND ((tblImportUPS.STATUS) <> "Void"))
ORDER BY tblImportUPS.ManifestDate DESC;
This works fine, but repeats multiple tracking numbers for each order. I try to add GROUP BY tblImportUPS.PackageReferenceNo1 before the ORDER BY line and receive an aggregate error.
Can anyone advise the correct way to proceed and why? I'd prefer to understand and not just receive the solution.

sql-server-2008 : get the last status of subjects of a student

Salam, (Greetings) to all.
Intro:
I am working on a Student Examination System, where Students appear and pass or fail or absent.
Problem:
I am tasked to fetch their Summary of Status. you may say a Result Card which should print their very last status of a Subject.
Below is a sample of the data where a student has appeared many times, in different sessions. I have highlighted one subject in which a student has appeared three times.
Now, I write the following Query which extract the same result as the picture above:
SELECT DISTINCT
gr.STUDKEY,gr.SUBJECT_ID, gr.SUBJECT_DESC,gr.MARKS,
gr.PASSFAIL, gr.GRADE,max(gr.SESSION_ID), gr.LEVEL_ID
FROM RESULT gr
WHERE gr.STUDKEY = '0100106524'
GROUP BY gr.STUDKEY,gr.SUBJECT_ID, gr.SUBJECT_DESC,gr.MARKS,
gr.PASSFAIL, gr.GRADE, gr.LEVEL_ID
Desired:
I want to get only the last status of a subject in which a student has appeared.
Help is requested. Thanks in advanced.
Regards
I am using sql-server-2008.

This won't work because you include fields like gr.MARKS and gr.GRADE in the group by and in the select which means that the query might return more than 1 record for each session id while their grade or marks is different.
SELECT
gr.STUDKEY,gr.SUBJECT_ID, gr.SUBJECT_DESC,
gr.PASSFAIL, gr.GRADE,gr.SESSION_ID, gr.LEVEL_ID
FROM RESULT gr
JOIN (SELECT MAX(SessionId) as sessionId, STUDKEY
FROM RESULT
GROUP BY STUDKEY ) gr1 ON gr1.sessionId=gr.sessionid AND gr1.STUDKEY =gr.STUDKEY

Hopefully there is a date field, or something that indicates the order of the students appearances in this class. Use that to order your query in descending order, so that the most recent occurrence is the first record, then specifiy "Top 1" which will then give you only the most recent record for that student, which will include in his most recent status.
SELECT TOP 1
gr.STUDKEY,gr.SUBJECT_ID, gr.SUBJECT_DESC,gr.MARKS,
gr.PASSFAIL, gr.GRADE,gr.SESSION_ID, gr.LEVEL_ID
FROM RESULT gr
WHERE gr.STUDKEY = '0100106524'
ORDER BY gr.Date DESC //swap "Date" out for your field indicating the sequence.
or use a Group by with MAX(Date) if you're looking for multiple classes with the same student at the same time.

Returning unique values in a MS Access query based upon a set a rules

I have a query that returns a dataset with UserID, CourseID, Course Completion date, Course Registration Date and Status (successful, not attempted, not finished).
However people are allowed to do a course multiple times.
For 1 report I need to get a unique recordset (based upon the Combination UserID and CourseID) back according to the following rules:
If a course is completed successfully by a learner take only that value
If a course is completed successfully by a learner multiple times, take the first completion date.
if a course not completed successfully by a learner take the last registration date.
I know how to create a query that only returns unique (Distinct) values, but not how to do it with a set of rules.

I see that this is a bit older. I don't know if this directly answers your question because I couldn't think of a way to write a single query to do this. I did however come up with a solution that I tested myself.
Using 3 tables and 4 queries, I think that I have produced the results you're looking for.
The first query I wrote was to get the "First Completed" dates. Since if a person only completes a course once, that is their first completion, this actually means your criteria 1 and 2 are identical.
I did this by grouping on the UserID and the CourseID and then taking the aggregate MIN of the CourseCompleteionDate where the Status is equal to 1.
SELECT tblRegistrations.UserID, tblRegistrations.CourseID, Min(tblRegistrations.CourseCompletionDate) AS FirstCompleted
FROM tblRegistrations
WHERE ((tblRegistrations.StatusID)=1)
GROUP BY tblRegistrations.UserID, tblRegistrations.CourseID;
The second query I wrote was to get the "Last Not Completed" registration dates. Again I grouped on the UserID and the CourseID, but this time I took the aggregate MAX of the CourseRegistrationDate to get the last time this course was attempted and not completed (with a status equal to 3)
SELECT tblRegistrations.UserID, tblRegistrations.CourseID, Max(tblRegistrations.CourseRegistrationDate) AS LastAttempt
FROM tblRegistrations
WHERE (((tblRegistrations.StatusID)=3))
GROUP BY tblRegistrations.UserID, tblRegistrations.CourseID;
The third query I wrote was to get the unique UserID/CourseID relationships, and that simply grouped them.
SELECT tblRegistrations.CourseID, tblRegistrations.UserID
FROM tblRegistrations
GROUP BY tblRegistrations.CourseID, tblRegistrations.UserID;
Finally, I wrote my result query which makes sure to pull every UserID/CourseID combination, and then show whether it was completed or not, and the requested date.
SELECT qryUserCourses.UserID, qryUserCourses.CourseID, Nz([FirstCompleted],[LastAttempt]) AS ResultDate, IIf([FirstCompleted],"Completed","Not Completed") AS Result
FROM (qryUserCourses
LEFT JOIN qryFirstCompleted ON (qryUserCourses.CourseID = qryFirstCompleted.CourseID) AND (qryUserCourses.UserID = qryFirstCompleted.UserID))
LEFT JOIN qryLastNotCompleted ON (qryUserCourses.CourseID = qryLastNotCompleted.CourseID) AND (qryUserCourses.UserID = qryLastNotCompleted.UserID);
This produced, for me, a list of UserIDs, CourseIDs, a date, and completion status.

Logic to use COUNT() to return number of matching values

I'm trying to do something with SQL, teaching it to myself, and I'm unsure what to do next with what I want.
I'm trying to write a query into my DB (using just an "employee info" DB) that returns their name, a job they've worked on, but most importantly I want each job to have a "completion code" that tells about how the job was completed, and an int that shows how many times they've done a job with that specific completion code.
Right now, I can return all the info, sorted by person, job, then code, but I do not know how to get the count of each individual completion code (per employee). Here's what I have:
SELECT crew.EMPLOYEE_NAME, o.WORK_TYPE, oc.COMPLETION_CODE, COUNT(oc.COMPLETION_CODE)
FROM CREW_WORK_SCHEDULE crew, ORDERS o, ORDER_COMPLETION oc
WHERE crew.CREW_ID = o.ASSIGNED_TO_USER_ID
AND oc.ORDER_ID = o.ORDER_ID
ORDER BY (crew.EMPLOYEE_NAME, o.WORK_TYPE, oc.COMPLETION_CODE)
But that COUNT in the select statement would just return the total number of completion codes, not the one for each code of an employee's job type.
Sorry if it's not perfectly clear, but does anyone know?

Look into the group by statement,
Group By crew.EMPLOYEE_NAME, o.WORK_TYPE, oc.COMPLETION_CODE

SELECT crew.EMPLOYEE_NAME, o.WORK_TYPE, oc.COMPLETION_CODE, (SELECT COUNT(*) FROM ...)
And then replace your conditional with the ellipses. It is difficult for me to write it since I don't know your table structures.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Am I doing the grouping by right? - sql

Related

Stream Analytics Left outer join Not Producing Rows

GROUP BY/SELECT DISTINCT issues

sql-server-2008 : get the last status of subjects of a student

Returning unique values in a MS Access query based upon a set a rules

Logic to use COUNT() to return number of matching values

Categories

Resources