Can a nested Group By be done in a single Select? - sql

Using T-SQL (we're on 2008, but if it can be done in 2012 using some new function/extension, please note)
This is purely out of curiosity...I ended up just going with a GROUP BY within a GROUP BY. But I'm curious to see if there is a way to do this in a single query, maybe there's some fancy shmancy functions or extensions I haven't learned yet....It's more of a challenge than it is a need to get the job done, as it's already done.
I tried building an example table on here, but it's too large to build, so here's the concept. The table has three columns, UserID, UserGroupID and Minutes. In one hour increments, we log how much time a user spends within an application. So, for example, UserID 1 spent 10 min during the hour of 04/28/2014 10:00:00, and then 15 minutes during the hour of 04/28/2014 11:00:00...and so on. (for this example, please ignore any time constraints as far as per day or per month, etc)
I wanted to see the number of users per group that have used the application for at least 30 minutes. This is the logic that was used:
SELECT UserGroupID, COUNT(*)
FROM (
SELECT UserGroupID, UserID
FROM Example
GROUP BY UserGroupID, UserID
HAVING SUM([Minutes]) >= 30
) AS x
GROUP BY UserGroupID
The question is, can this be done in a single query? Not looking for efficiency here, I'm just curious.

I don't think so, but a negative is quite hard to prove.
The following query (without the having clause) can be simplified. So:
SELECT UserGroupID, COUNT(*)
FROM (
SELECT UserGroupID, UserID
FROM Example
GROUP BY UserGroupID, UserID
) AS x
GROUP BY UserGroupID;
Is pretty much the same as:
SELECT UserGroupId, COUNT(DISTINCT UserId)
FROM Example
GROUP BY UserGroupId;
(These are not exactly equivalent if UserId can be NULL, but that case could also be handled.)
I don't think there is a way to do your full query, though. You need to aggregate by UserGroupId, UserId to get the sum() condition. Then you need to aggregate just by UserGroupId. Nothing comes to mind.

Related

SQL Server: I have multiple records per day and I want to return only the first of the day

I have some records track inquires by DATETIME. There is an glitch in the system and sometimes a record will enter multiple times on the same day. I have a query with a bunch of correlated subqueries attached to these but the numbers are off because when there were those glitches in the system then these leads show up multiple times. I need the first entry of the day, I tried fooling around with MIN but I couldn't quite get it to work.
I currently have this, I am not sure if I am on the right track though.
SELECT SL.UserID, MIN(SL.Added) OVER (PARTITION BY SL.UserID)
FROM SourceLog AS SL
Here's one approach using row_number():
select *
from (
select *,
row_number() over (partition by userid, cast(added as date) order by added) rn
from sourcelog
) t
where rn = 1
You could use group by along with min to accomplish this.
Depending on how your data is structured if you are assigning a unique sequential number to each record created you could just return the lowest number created per day. Otherwise you would need to return the ID of the record with the earliest DATETIME value per day.
--Assumes sequential IDs
select
min(Id)
from
[YourTable]
group by
--the conversion is used to stip the time value out of the date/time
convert(date, [YourDateTime]

SQL Server - how to select x number of task per x number of workers

I need some help writing a query in SQL Server 2012 to select a specific number of tasks per a selected number of workers. If I was doing something like this is a traditional programming language I would use something like a foreach. However I can't find a nice way to implement a foreach function into sql. I'm sure there is a simpler way to do this.
For example, lets say I have 3 tables:
MonthlyReview
DateReviewed,
WorkerID
Workers
WorkerID,
WorkerName
Tasks
TaskID,
WorkerID
First I select the workers I want to be selecting from (they are filtered on some other data such as name or org (not pictured)) so I thought it would make things easier to put it in a temp table
CREATE TABLE #WorkersToAudit (
WorkerID varchar(45),
DateReviewed datetime)
INSERT INTO #WorkersToAudit(WorkerID, DateReviewed)
SELECT TOP (4) Workers.WorkerID, MIN(MonthlyReview.DateReviewed) AS DateReviewed FROM Workers
LEFT JOIN MonthlyReview ON Workers.WorkerID = MonthlyReview.WorkerID
WHERE Workers.WorkerName LIKE '%Browne%'
GROUP BY Worker.WorkerID
DROP TABLE #WorkersToAudit
I was thinking I could then grab the (4) WokerID's in the results and find (4) TaskID's for each, but I haven't found a nice way to do this despite a lot of searching. The number of WorkerID's searched for and the number of TraderID's returned for each one can be anywhere from 1-10.
Any help would be greatly appreciated.
The easiest way to do this in one query is to use WINDOWING functions. In this case ROW_NUMBER() will work:
select WorkerID, TaskID
FROM (
SELECT Workers.WorkerID, Tasks.TaskID,
row_number() over (partition by Workers.WorkerID order by Tasks.TaskID) AS rn
FROM Workers
LEFT JOIN Tasks ON Workers.WorkerID = Tasks.WorkerID
WHERE Workers.WorkerName LIKE '%Browne%'
) where rn <= 4
Notice the key parts of the over clause of the row_number() function - the partition by essentially "groups" the counting of rows by WorkerID, and the order by specifies, well, the ordering of rows that are counted.
You can change how the row_number does its grouping and ordering, and you can include whatever columns you want in the select clause, but the key part is indeed the use of the row_number() function itself.
Good luck!

Make table ID appear as a column and select across all tables

I've been requested by my superiors to write a query that will search every table in a database (each representative of a road and their total counts of traffic) and take the total counts by hour of motorcycles. Here's what I have so far whilst testing on one table:
WITH
totalCount AS
(
SELECT DATEDIFF(dd,0,event_time) AS DaySerial,
DATEPART(dd,event_time) AS theDay,
DATEDIFF(mm,0,event_time) AS MonthSerial,
DATEPART(mm,event_time) AS MonthofYear,
DATEDIFF(hh,0,event_time) AS HourSerial,
DATEPART(hh,event_time) AS Hour,
COUNT(*) AS HourlyCount,
DATEDIFF(yy,0,event_time) AS YearSerial,
DATEPART(yy,event_time) AS theYear
FROM [RUD].dbo.[10011E]
WHERE length <='1.7'
GROUP BY DATEDIFF(hh,0,event_time),
DATEPART(hh,event_time),
DATEDIFF(dd,0,event_time),
DATEPART(dd,event_time),
DATEDIFF(mm,0,event_time),
DATEPART(mm,event_time),
DATEDIFF(yy,0,event_time),
DATEPART(yy,event_time)
)
SELECT
theYear,
MonthofYear,
theDay,
Hour,
AVG(HourlyCount) AS Avg_Count
FROM
totalCount
GROUP BY
theYear,
MonthofYear,
theDay,
Hour
ORDER BY
theYear,
MonthofYear,
theDay,
Hour
Now I'm sure some of this is redundant or not needed, that's ok for now (I'm new to SQL btw, which is why some of this will be redundant). Basically as it stands, I list the year, month, date, hour and hourly count of motorcycles for one road. Now my two questions:
How do I take this query and make it so that it searches across every single table in the RUD database? Do I just need to list them all and UNION them, or is there a quicker way?
I realise if I search through every table gathering only the above (year, month, day, hour, hourly count) I will end up with the right data but with no way to distinguish which road all the counts are coming from. Is there a way to select the table ID (in this example, 10011E is the ID, and is the assigned name for a specific road) and place it in a column next to the rows that were selected from it?
If anyone needs clarification on what I mean, please let me know! Thanks!
One option would be to use UNION ALL and add an additional column for which source. You'll have to write out each of your tables in this case, but it's perhaps your fastest option:
SELECT ID, 'YourTable' TableName
FROM YourTable
UNION ALL
SELECT ID, 'YourOtherTable'
FROM YourOtherTable
....
Alternatively, dynamic sql could produce you the same results -- you might not have to type out all your table names, but it comes with a performance hit.

MySQL/Ms SQL latest records with multiple id's

I'm no sql-expert, but came across this problem:
I have to retrieve data from Microsoft SQL 2008 server. It holds different measurement data from different probes, that don't have any recording intervals. Meaning that some probe can transfer data in the database once every week, another once every second. Probes are identified by id's (not unique), and the point is to retrieve only the last record from each id (probe). Table looks like this (last 5, order by SampleDateTime desc):
TagID SampleDateTime SampleValue QualityID
13 634720670797944946 112 192
23 634720670797944946 38.1 192
17 634720670797944946 107.5 192
14 634720670748012090 110.6 192
19 634720670748012090 99.7 192
I CAN'T modify the server or even the settings, am only authorized to do queries. And I'd need to retrieve the requested data on even intervals (say once every minute or so). There are over 100 probes (with different id's) of which about 40 need to be read. So I am guessing that if this could be done in a single query it could be way more efficient than to get each row in a separate query.
Using MySQL and a similar table got the desired result this way (suggestions for a better way highly appreciated!):
SELECT TagID,SampleDateTime,SampleValue FROM
(
SELECT TagID,SampleDateTime,SampleValue FROM measurements
WHERE TagID IN(101,102,103) ORDER BY SampleDateTime DESC
)
AS table1 GROUP BY TagID;
Thought that would do the trick (didn't manage with MAX() or DISTINCT or no matter what I tried), as it did, with the correct data even. But naturally it doesn't work in Ms SQL because of 'GROUP BY'.
Column 'table1.SampleValue' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I'm extremely stuck with this and so any insight would be more than welcome.
I am slightly confused as you have tagged MySQL and SQL-Server. For SQL-Server, I would use the ROW_NUMBER function to assist:
SELECT m.TagID, m.SampleDateTime, m.SampleValue, m.QualityID
FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY TagID ORDER BY SampleDateTime DESC) [RowNumber]
FROM Measurements
) m
WHERE Rownumber = 1
The ROW_NUMBER function does exactly what it says on the tin, gives each row a number based on criteria you provide. So in the example above PARTITION BY TagID tells ROW_NUMBER to start again at 1 each time a new TagID is encountered. ORDER BY SampleDateTime DESC tells ROW_NUMBER to start numbering the each TagID at the latest entry and work upwards to the earliest entry.
The reason your query failed is because MySQL allows implicit group by, meaning that because you have only specified GROUP BY TagID any fields that are in the select list and not contained within an aggregate function will get the values of a "random" row assigned to them (the latest row in your case because you specified ORDER BY SampleDateTime DESC in the subquery.
Just in case it is required the following should work in most DBMS and is a better way of producing a similar query to the one you have been running in MySQL:
SELECT m.TagID, m.SampleDateTime, m.SampleValue, m.QualityID
FROM Measurements m
INNER JOIN
( SELECT TagID, MAX(SampleDateTime) AS SampleDateTime
FROM Measurements
GROUP BY TagID
) MaxTag
ON MaxTag.TagID = m.TagID
AND MaxTag.SampleDateTime = m.SampleDateTime

How to produce a distinct count of records that are stored by day by month

I have a table with several "ticket" records in it. Each ticket is stored by day (i.e. 2011-07-30 00:00:00.000) I would like to count the unique records in each month by year I have used the following sql statement
SELECT DISTINCT
YEAR(TICKETDATE) as TICKETYEAR,
MONTH(TICKETDATE) AS TICKETMONTH,
COUNT(DISTINCT TICKETID) AS DAILYTICKETCOUNT
FROM
NAT_JOBLINE
GROUP BY
YEAR(TICKETDATE),
MONTH(TICKETDATE)
ORDER BY
YEAR(TICKETDATE),
MONTH(TICKETDATE)
This does produce a count but it is wrong as it picks up the unique tickets for every day. I just want a unique count by month.
Try combining Year and Month into one field, and grouping on that new field.
You may have to cast them to varchar to ensure that they don't simply get added together. Or.. you could multiple through the year...
SELECT
(YEAR(TICKETDATE) * 100) + MONTH(TICKETDATE),
count(*) AS DAILYTICKETCOUNT
FROM NAT_JOBLINE GROUP BY
(YEAR(TICKETDATE) * 100) + MONTH(TICKETDATE)
Presuming that TICKETID is not a primary or unique key, but does appear multiple times in table NAT_JOBLINE, that query should work. If it is unique (does not occur in more than 1 row per value), you will need to select on a different column, one that uniquely identifies the "entity" that you want to count, if not each occurance/instance/reference of that entity.
(As ever, it is hard to tell without working with the actual data.)
I think you need to remove the first distinct. You already have the group by. If I was the first Distict I would be confused as to what I was supposed to do.
SELECT
YEAR(TICKETDATE) as TICKETYEAR,
MONTH(TICKETDATE) AS TICKETMONTH,
COUNT(DISTINCT TICKETID) AS DAILYTICKETCOUNT
FROM NAT_JOBLINE
GROUP BY YEAR(TICKETDATE), MONTH(TICKETDATE)
ORDER BY YEAR(TICKETDATE), MONTH(TICKETDATE)
From what I understand from your comments to Phillip Kelley's solution:
SELECT TICKETDATE, COUNT(*) AS DAILYTICKETCOUNT
FROM NAT_JOBLINE
GROUP BY TICKETDATE
should do the trick, but I suggest you update your question.