Order By clause on an aggregated SQL request - sql

let's say my table schema is like bellow (it's only a simplified example):
MyTable (table name)
ID - int (unique, auto increment)
Message - string
Timestamp - Datetime
I want to select the number of ID, group them by message and order them by timestamp, so I'll do something like this:
SELECT count (ID), Message FROM MyTable
GROUP BY (Message)
ORDER BY Timestamp desc
However, SQL Server management studio throws me this error:
Column 'Timestamp ' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
The problem is that if I put Timestamp in the Group By statement with Message, it messes up my grouping. The other suggestion to put Timestamp in an aggregate function doesn't make sense (ordering by say, count(Timestamp) doesn't mean anything...)
Any idea on how to do this?
Thanks a lot!

Looking for something like this?
SELECT Message, count (ID), max(Timestamp) as maxDate FROM MyTable
GROUP BY (Message)
ORDER BY maxDate desc

When you do aggregation, you are GROUPing rows together based on certain criteria. This mean that each row of your result set actually represents multiple rows in the raw data.
When you want to ORDER BY Timestamp, there will be MULTIPLE timestamp values for each row in the result set, since each one of those rows represents several rows of data.
So, you need to decide which timestamp you want for each set. The MAX? The MIN? You will need to aggregate that field as well to get accurate or meaningful results.

Let's say the same message is in your table multiple times:
1|the mackerel likes frying|1/1/1917
2|at night all cats are grey|12/15/1956
3|the mackerel likes frying|2/2/1918
And you want to group by the message-string, counting the number of times the message appears in the table:
the mackerel likes frying|2
at night all cats are grey|1
The timestamp column is NOT part of the aggregation aka the grouping, but is part of the detail row. It CANNOT appear in the grouping, because timestamp is not "it" (singular) but timestamps, they, plural. There are two different timestamps in the example above for the mackerel message. Which one would you choose? How would the query know which one it was? All you have at your disposal are the aggregate functions:
min(timestamp)
max(timestamp)
count(timestamp)
and if it were other than a datetime, you'd also have AVG(timestamp).

If you want to order the messages based on the max timestamp within the Group then try:
SELECT count (ID), Message
FROM MyTable
GROUP BY (Message)
ORDER BY MAX(Timestamp) DESC

The problem here is that you probably have multiple messages that are the same, but with different timestamps, because you're grouping by message. If you have two messages 'hello' with different timestamps, which should it use for the order by?
This is one way. You could also do a trick with cross apply or row_number.
SELECT count(ID), Message FROM MyTable
GROUP BY (Message)
ORDER BY Max(Timestamp) desc

Related

Having an issue with selecting max rows by date

I am trying to select the max timestamped records from table 1 based on some data from table 2. I am getting the correct records based on the where limits I have put on the query, but I am still getting duplicate entries not the max time stamped entries. Any ideas on what is wrong with the query?
Basically the ID 901413368 has access to certain leveltypes and I'm trying to find out what the max dated requests were that were put in for that same person for the leveltypes that person manages.
SELECT
MAX(timestamp) AS maxtime, Leveltype, assign_ID
FROM
WHERE
(leveltype IN
(SELECT leveltype FROM dbo.idleveltypes WHERE (id = 901413368)))
GROUP BY timestamp, assign_ID, leveltype
HAVING (assign_ID = '901413368')
UPDATE: The issue has been resolved by WEI_DBA's response below:
Remove the timestamp column from your Group By. Also put the assign_ID in the Where Clause and remove the Having clause
The following may be what you want. It should also be a simpler way to write the query:
SELECT MAX(a.timestamp) AS maxtime, a.Leveltype, a.assign_ID
FROM dbo.q_Archive a JOIN
dbo.idleveltypes lt
ON a.leveltype = lt.leveltype AND
a.assign_ID = lt.id
WHERE assign_ID = 901413368
GROUP BY assign_ID, leveltype;
Notes:
Filter on assign_ID before doing the group by. That is much more efficient.
A JOIN is the more typical way to represent the relationship between two tables.
The JOIN condition should be on all the columns needed for matching; there appear to be two.
I don't understand why the leveltype table would have a column called id, but this is your data structure.
The GROUP BY does not need timestamp.
Decide on the type for the id column that should be 901413368. Is it a number or a string? Only use single quotes for string and date constants.
Remove timestamp from GROUP BY clause due you're getting MAX(timestamp)
You shoud not add aggregated fields to GROUP BY clause.
SELECT
MAX(timestamp) AS maxtime,
Leveltype,
assign_ID
FROM
dbo.q_Archive
WHERE
(leveltype IN (SELECT leveltype
FROM dbo.idleveltypes
WHERE (id = 901413368)))
GROUP assign_ID, leveltype
HAVING (assign_ID = '901413368')

How to get the order distinct rows were last inserted into a database?

This is a more to the point follow-up of my other question:
How does DISTINCT interact with ORDER BY?
Given a table:
CREATE TABLE events (
order TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
value INT NOT NULL
);
At every insertion, value will be taken from a finite set.
How do I get the order all the elements of my set were last inserted into the database? I thought of doing SELECT DISTINCT value FROM event ORDER BY "order" DESC;, but according to answer in my other question, this won't work.
Answer from your other post.
Using max and grouping on the user you get the most recent timestamp by user.
SELECT
MAX([order]) AS MaxOrd
, value
FROM Event
GROUP BY value
ORDER BY MaxOrd DESC
Firstly I'm assuming you mean where instead of were.
So if you're looking to order by insertion date descending, then it would be just as you have:
SELECT DISTINCT value FROM event ORDER BY "order" DESC;
I don't know why that wouldn't work. I tested it on one of my tables containing a timestamp and it works fine.

Group by required even thought i don't want to group by

The following table has Key and StartTime and EndTime. I want to my query to return, my key, the number of records with this Key and the total minutes of all records with the same Key. I cant run with out StartTime and EndTime in my group by unfortunately this groups each row with a different start or stop time.
SELECT sn.Key,
COUNT(*) as SessonNoteCount,
sum( dbo.fnCalcTime( sn.StartTime, sn.EndTime)) as min
FROM SessionNote sn
group by sn.Key, sn.StartTime, sn.EndTime
order by sn.Key
You should group by all non-aggregated selected columns, which in your case is just sn.Key:
group by sn.Key
As an aside, IMHO the group by clause should be optional because it is entirely determined by the column selected columns and could easily be generated by the query parser, but that boat has sailed...

How to produce a distinct count of records that are stored by day by month

I have a table with several "ticket" records in it. Each ticket is stored by day (i.e. 2011-07-30 00:00:00.000) I would like to count the unique records in each month by year I have used the following sql statement
SELECT DISTINCT
YEAR(TICKETDATE) as TICKETYEAR,
MONTH(TICKETDATE) AS TICKETMONTH,
COUNT(DISTINCT TICKETID) AS DAILYTICKETCOUNT
FROM
NAT_JOBLINE
GROUP BY
YEAR(TICKETDATE),
MONTH(TICKETDATE)
ORDER BY
YEAR(TICKETDATE),
MONTH(TICKETDATE)
This does produce a count but it is wrong as it picks up the unique tickets for every day. I just want a unique count by month.
Try combining Year and Month into one field, and grouping on that new field.
You may have to cast them to varchar to ensure that they don't simply get added together. Or.. you could multiple through the year...
SELECT
(YEAR(TICKETDATE) * 100) + MONTH(TICKETDATE),
count(*) AS DAILYTICKETCOUNT
FROM NAT_JOBLINE GROUP BY
(YEAR(TICKETDATE) * 100) + MONTH(TICKETDATE)
Presuming that TICKETID is not a primary or unique key, but does appear multiple times in table NAT_JOBLINE, that query should work. If it is unique (does not occur in more than 1 row per value), you will need to select on a different column, one that uniquely identifies the "entity" that you want to count, if not each occurance/instance/reference of that entity.
(As ever, it is hard to tell without working with the actual data.)
I think you need to remove the first distinct. You already have the group by. If I was the first Distict I would be confused as to what I was supposed to do.
SELECT
YEAR(TICKETDATE) as TICKETYEAR,
MONTH(TICKETDATE) AS TICKETMONTH,
COUNT(DISTINCT TICKETID) AS DAILYTICKETCOUNT
FROM NAT_JOBLINE
GROUP BY YEAR(TICKETDATE), MONTH(TICKETDATE)
ORDER BY YEAR(TICKETDATE), MONTH(TICKETDATE)
From what I understand from your comments to Phillip Kelley's solution:
SELECT TICKETDATE, COUNT(*) AS DAILYTICKETCOUNT
FROM NAT_JOBLINE
GROUP BY TICKETDATE
should do the trick, but I suggest you update your question.

Using a DISTINCT clause to filter data but still pull other fields that are not DISTINCT

I am trying to write a query in Postgresql that pulls a set of ordered data and filters it by a distinct field. I also need to pull several other fields from the same table row, but they need to be left out of the distinct evaluation. example:
SELECT DISTINCT(user_id) user_id,
created_at
FROM creations
ORDER BY created_at
LIMIT 20
I need the user_id to be DISTINCT, but don't care whether the created_at date is unique or not. Because the created_at date is being included in the evaluation, I am getting duplicate user_id in my result set.
Also, the data must be ordered by the date, so using DISTINCT ON is not an option here. It required that the DISTINCT ON field be the first field in the ORDER BY clause and that does not deliver the results that I seek.
How do I properly use the DISTINCT clause but limit its scope to only one field while still selecting other fields?
As you've discovered, standard SQL treats DISTINCT as applying to the whole select-list, not just one column or a few columns. The reason for this is that it's ambiguous what value to put in the columns you exclude from the DISTINCT. For the same reason, standard SQL doesn't allow you to have ambiguous columns in a query with GROUP BY.
But PostgreSQL has a nonstandard extension to SQL to allow for what you're asking: DISTINCT ON (expr).
SELECT DISTINCT ON (user_id) user_id, created_at
FROM creations
ORDER BY user_id, created_at
LIMIT 20
You have to include the distinct expression(s) as the leftmost part of your ORDER BY clause.
See the manual on DISTINCT Clause for more information.
If you want the most recent created_at for each user then I suggest you aggregate like this:
SELECT user_id, MAX(created_at)
FROM creations
WHERE ....
GROUP BY user_id
ORDER BY created_at DESC
This will return the most recent created_at for each user_id
If you only want the top 20, then append
LIMIT 20
EDIT: This is basically the same thing Unreason said above... define from which row you want the data by aggregation.
The GROUP BY should ensure distinct values of the grouped columns, this might give you what you are after.
(Note I'm putting in my 2 cents even though I am not familiar with PostgreSQL, but rather MySQL and Oracle)
In MySql
SELECT user_id, created_at
FROM creations
GROUP BY user_id
ORDER BY user_id
In Oracle sqlplus
SELECT user_id, FIRST(created_at)
FROM creations
GROUP BY user_id
ORDER BY user_id
These will give you the user_id followed by the first created_at associated with that user_id. If you want a different created_at you have the option to substitute FIRST with other functions like AVG, MIN, MAX, or LAST in Oracle, you can also try adding ORDER BY on other columns (including ones that are not returned, to give you a different created_at.
Your question is not well defined - when you say you need also other data from the same row you are not defining which row.
You do say you need to order the results by created_at, so I will assume that you want values from the row with min created_at (earliest).
This now becomes one of the most common so SQL questions - retrieving rows containing some aggregate value (MIN, MAX).
For example
SELECT user_id, MIN(created_at) AS created_at
FROM creations
GROUP BY user_id
ORDER BY MIN(create_at)
LIMIT 20
This approach will not let you (easily) pick other values from the same row.
One approach that will let you pick other values is
SELECT c.user_id, c.created_at, c.other_columns
FROM creations c LEFT JOIN creation c_help
ON c.user_id = c_help.user_id AND c.created_at > c_help.create_at
WHERE c_help IS NULL
ORDER BY c.created_at
LIMIT 20
Using a sub-query was suggested by someone on the irc #postgresql channel. It worked:
SELECT user_id
FROM (SELECT DISTINCT ON (user_id) * FROM creations) ss
ORDER BY created_at DESC
LIMIT 20;