Having an issue with selecting max rows by date - sql

I am trying to select the max timestamped records from table 1 based on some data from table 2. I am getting the correct records based on the where limits I have put on the query, but I am still getting duplicate entries not the max time stamped entries. Any ideas on what is wrong with the query?
Basically the ID 901413368 has access to certain leveltypes and I'm trying to find out what the max dated requests were that were put in for that same person for the leveltypes that person manages.
SELECT
MAX(timestamp) AS maxtime, Leveltype, assign_ID
FROM
WHERE
(leveltype IN
(SELECT leveltype FROM dbo.idleveltypes WHERE (id = 901413368)))
GROUP BY timestamp, assign_ID, leveltype
HAVING (assign_ID = '901413368')
UPDATE: The issue has been resolved by WEI_DBA's response below:
Remove the timestamp column from your Group By. Also put the assign_ID in the Where Clause and remove the Having clause

The following may be what you want. It should also be a simpler way to write the query:
SELECT MAX(a.timestamp) AS maxtime, a.Leveltype, a.assign_ID
FROM dbo.q_Archive a JOIN
dbo.idleveltypes lt
ON a.leveltype = lt.leveltype AND
a.assign_ID = lt.id
WHERE assign_ID = 901413368
GROUP BY assign_ID, leveltype;
Notes:
Filter on assign_ID before doing the group by. That is much more efficient.
A JOIN is the more typical way to represent the relationship between two tables.
The JOIN condition should be on all the columns needed for matching; there appear to be two.
I don't understand why the leveltype table would have a column called id, but this is your data structure.
The GROUP BY does not need timestamp.
Decide on the type for the id column that should be 901413368. Is it a number or a string? Only use single quotes for string and date constants.

Remove timestamp from GROUP BY clause due you're getting MAX(timestamp)
You shoud not add aggregated fields to GROUP BY clause.
SELECT
MAX(timestamp) AS maxtime,
Leveltype,
assign_ID
FROM
dbo.q_Archive
WHERE
(leveltype IN (SELECT leveltype
FROM dbo.idleveltypes
WHERE (id = 901413368)))
GROUP assign_ID, leveltype
HAVING (assign_ID = '901413368')

Related

Order by date, while grouping matches by another column

I have this query
SELECT *, COUNT(app.id) AS totalApps FROM users JOIN app ON app.id = users.id
GROUP BY app.id ORDER BY app.time DESC LIMIT ?
which is supposed to get all results from "users" ordered by another column (time) in a related table (the id from the app tables references the id from the users table).
The issue I have is that the grouping is done before the ordering by date, so I get very old results. But I need the grouping in order to get distinct users, because each user can have multiple 'apps'... Is there a different way to achieve this?
Table users:
id TEXT PRIMARY KEY
Table app:
id TEXT
time DATETIME
FOREIGN KEY(id) REFERENCES users(id)
in my SELECT query I want to get a list of users, ordered by the app.time column. But because one user can have multiple app records associated, I could get duplicate users, that's why I used GROUP BY. But then the order is messed up
The underlying issue is that the SELECT is an aggregate query as it contains a GROUP BY clause :-
There are two types of simple SELECT statement - aggregate and
non-aggregate queries. A simple SELECT statement is an aggregate query
if it contains either a GROUP BY clause or one or more aggregate
functions in the result-set.
SQL As Understood By SQLite - SELECT
And thus that the column's value for that group, will be an arbitrary value the column of that group (first according to scan/search, I suspect, hence the lower values) :-
If the SELECT statement is an aggregate query without a GROUP BY
clause, then each aggregate expression in the result-set is evaluated
once across the entire dataset. Each non-aggregate expression in the
result-set is evaluated once for an arbitrarily selected row of the
dataset. The same arbitrarily selected row is used for each
non-aggregate expression. Or, if the dataset contains zero rows, then
each non-aggregate expression is evaluated against a row consisting
entirely of NULL values.
So in short you cannot rely upon the column values that aren't part of the group/aggregation, when it's an aggregate query.
Therefore have have to retrieve the required values using an aggregate expression, such as max(app.time). However, you can't ORDER by this value (not sure exactly why by it's probably inherrent in the efficiency aspect)
HOWEVER
What you can do is use the query to build a CTE and then sort without aggregates involved.
Consider the following, which I think mimics your problem:-
DROP TABLE IF EXISTS users;
DROP TABLE If EXISTS app;
CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY, username TEXT);
INSERT INTO users (username) VALUES ('a'),('b'),('c'),('d');
CREATE TABLE app (the_id INTEGER PRIMARY KEY, id INTEGER, appname TEXT, time TEXT);
INSERT INTO app (id,appname,time) VALUES
(4,'app9',721),(4,'app10',7654),(4,'app11',11),
(3,'app1',1000),(3,'app2',7),
(2,'app3',10),(2,'app4',101),(2,'app5',1),
(1,'app6',15),(1,'app7',7),(1,'app8',212),
(4,'app9',721),(4,'app10',7654),(4,'app11',11),
(3,'app1',1000),(3,'app2',7),
(2,'app3',10),(2,'app4',101),(2,'app5',1),
(1,'app6',15),(1,'app7',7),(1,'app8',212)
;
SELECT * FROM users;
SELECT * FROM app;
SELECT username
,count(app.id)
, max(app.time) AS latest_time
, min(app.time) AS earliest_time
FROM users JOIN app ON users.id = app.id
GROUP BY users.id
ORDER BY max(app.time)
;
This results in :-
Where although the latest time for each group has been extracted the final result hasn't been sorted as you would think.
Wrapping it into a CTE can fix that e.g. :-
WITH cte1 AS
(
SELECT username
,count(app.id)
, max(app.time) AS latest_time
, min(app.time) AS earliest_time
FROM users JOIN app ON users.id = app.id
GROUP BY users.id
)
SELECT * FROM cte1 ORDER BY cast(latest_time AS INTEGER) DESC;
and now :-
Note simple integers have been used instead of real times for my convenience.
Since you need the newest date in every group, you could just MAX them:
SELECT
*,
COUNT(app.id) AS totalApps,
MAX(app.time) AS latestDate
FROM users
JOIN app ON app.id = users.id
GROUP BY app.id
ORDER BY latestDate DESC
LIMIT ?
You could use windowed COUNT:
SELECT *, COUNT(app.id) OVER(PARTITION BY app.id) AS totalApps
FROM users
JOIN app
ON app.id = users.id
ORDER BY app.time DESC
LIMIT ?
Maybe you could use?
SELECT DISTINCT
Read more here: https://www.w3schools.com/sql/sql_distinct.asp
Try to grouping by id and time and then order by time.
select ...
group by app.id desc, app.time
I assume that id is unique in app table.
and how you assign ID to? maybe you have enough to order by id desc

SELECT list expression references column user_id which is neither grouped nor aggregated at [8:5]

I have 2 data sets. One of all patients who got ill (endo-2) and one of a special group of patients that also exists in endo-2 called "xp-56"
I've been trying to run this query and I'm not sure why it isn't working. I want to do counts of 3 columns in endo-2 of those patients that belong in the xp-56 table.
this is the code I've been using with the following error
SELECT list expression references column user_id which is neither grouped nor aggregated at [8:5]
how do I fix this so I never make the same mistake again!
SELECT
Virus_Exposure,
Medical_Delivery,
Number_of_Site
FROM
(
SELECT
medical_id,
COUNT(DISTINCT Virus_id) AS Virus_Exposure,
COUNT(EndoCrin_id) AS Medical_Delivery,
COUNT (site_id_clinic) AS Number_of_Site
FROM
`endo-2`
WHERE
_PARTITIONTIME BETWEEN TIMESTAMP("2017-12-15")
AND TIMESTAMP("2018-01-10")) AS a
RIGHT JOIN
(
SELECT
medical_id
FROM
`xp-56`
ORDER BY
medical_id DESC) AS b
ON
a.medical_id=b.medical_id
GROUP BY
medical_id
Why doesnt the medical_id in table a work?
Why not just do this?
SELECT e.medical_id,
COUNT(DISTINCT e.Virus_id) AS Virus_Exposure,
COUNT(e.EndoCrin_id) AS Medical_Delivery,
COUNT(e.site_id_clinic) AS Number_of_Site
FROM `endo-2` e JOIN
`xp-56` x
ON x.medical_id = e.medical_id
WHERE e._PARTITIONTIME BETWEEN TIMESTAMP("2017-12-15") AND TIMESTAMP("2018-01-10")
GROUP BY e.medical_id;

Join results from distinct query with other results in oracle

I have a requirement where I need to select the most recent occurrences of unique_customer_id and message_id, no more than one customer row per message id. I also need to return the associated data for channel and reason_code, but these can and will be duplicate data. I also have a unique_row_id i can use too if i need to. how can I add those 2 fields to my current query? (or do it some other way all together)
SELECT DISTINCT unique_customer_id, message_id, MAX(date)
FROM Table1
GROUP BY unique_customer_id, message_id
If you need to add two more columns, and you know they will duplicate the rows (because for unique_customer_id, message_id columns combination there will be more than 1 value in channel and reason_code), you have use an aggregate function on those columns. The question is, which one? How will you determine which channel and which reason code should be selectd?
Example:
SELECT DISTINCT unique_customer_id, message_id, MAX(channel), MAX(reason_code), MAX(date)
FROM Table1
GROUP BY unique_customer_id, message_id;

Order By clause on an aggregated SQL request

let's say my table schema is like bellow (it's only a simplified example):
MyTable (table name)
ID - int (unique, auto increment)
Message - string
Timestamp - Datetime
I want to select the number of ID, group them by message and order them by timestamp, so I'll do something like this:
SELECT count (ID), Message FROM MyTable
GROUP BY (Message)
ORDER BY Timestamp desc
However, SQL Server management studio throws me this error:
Column 'Timestamp ' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
The problem is that if I put Timestamp in the Group By statement with Message, it messes up my grouping. The other suggestion to put Timestamp in an aggregate function doesn't make sense (ordering by say, count(Timestamp) doesn't mean anything...)
Any idea on how to do this?
Thanks a lot!
Looking for something like this?
SELECT Message, count (ID), max(Timestamp) as maxDate FROM MyTable
GROUP BY (Message)
ORDER BY maxDate desc
When you do aggregation, you are GROUPing rows together based on certain criteria. This mean that each row of your result set actually represents multiple rows in the raw data.
When you want to ORDER BY Timestamp, there will be MULTIPLE timestamp values for each row in the result set, since each one of those rows represents several rows of data.
So, you need to decide which timestamp you want for each set. The MAX? The MIN? You will need to aggregate that field as well to get accurate or meaningful results.
Let's say the same message is in your table multiple times:
1|the mackerel likes frying|1/1/1917
2|at night all cats are grey|12/15/1956
3|the mackerel likes frying|2/2/1918
And you want to group by the message-string, counting the number of times the message appears in the table:
the mackerel likes frying|2
at night all cats are grey|1
The timestamp column is NOT part of the aggregation aka the grouping, but is part of the detail row. It CANNOT appear in the grouping, because timestamp is not "it" (singular) but timestamps, they, plural. There are two different timestamps in the example above for the mackerel message. Which one would you choose? How would the query know which one it was? All you have at your disposal are the aggregate functions:
min(timestamp)
max(timestamp)
count(timestamp)
and if it were other than a datetime, you'd also have AVG(timestamp).
If you want to order the messages based on the max timestamp within the Group then try:
SELECT count (ID), Message
FROM MyTable
GROUP BY (Message)
ORDER BY MAX(Timestamp) DESC
The problem here is that you probably have multiple messages that are the same, but with different timestamps, because you're grouping by message. If you have two messages 'hello' with different timestamps, which should it use for the order by?
This is one way. You could also do a trick with cross apply or row_number.
SELECT count(ID), Message FROM MyTable
GROUP BY (Message)
ORDER BY Max(Timestamp) desc

Using a DISTINCT clause to filter data but still pull other fields that are not DISTINCT

I am trying to write a query in Postgresql that pulls a set of ordered data and filters it by a distinct field. I also need to pull several other fields from the same table row, but they need to be left out of the distinct evaluation. example:
SELECT DISTINCT(user_id) user_id,
created_at
FROM creations
ORDER BY created_at
LIMIT 20
I need the user_id to be DISTINCT, but don't care whether the created_at date is unique or not. Because the created_at date is being included in the evaluation, I am getting duplicate user_id in my result set.
Also, the data must be ordered by the date, so using DISTINCT ON is not an option here. It required that the DISTINCT ON field be the first field in the ORDER BY clause and that does not deliver the results that I seek.
How do I properly use the DISTINCT clause but limit its scope to only one field while still selecting other fields?
As you've discovered, standard SQL treats DISTINCT as applying to the whole select-list, not just one column or a few columns. The reason for this is that it's ambiguous what value to put in the columns you exclude from the DISTINCT. For the same reason, standard SQL doesn't allow you to have ambiguous columns in a query with GROUP BY.
But PostgreSQL has a nonstandard extension to SQL to allow for what you're asking: DISTINCT ON (expr).
SELECT DISTINCT ON (user_id) user_id, created_at
FROM creations
ORDER BY user_id, created_at
LIMIT 20
You have to include the distinct expression(s) as the leftmost part of your ORDER BY clause.
See the manual on DISTINCT Clause for more information.
If you want the most recent created_at for each user then I suggest you aggregate like this:
SELECT user_id, MAX(created_at)
FROM creations
WHERE ....
GROUP BY user_id
ORDER BY created_at DESC
This will return the most recent created_at for each user_id
If you only want the top 20, then append
LIMIT 20
EDIT: This is basically the same thing Unreason said above... define from which row you want the data by aggregation.
The GROUP BY should ensure distinct values of the grouped columns, this might give you what you are after.
(Note I'm putting in my 2 cents even though I am not familiar with PostgreSQL, but rather MySQL and Oracle)
In MySql
SELECT user_id, created_at
FROM creations
GROUP BY user_id
ORDER BY user_id
In Oracle sqlplus
SELECT user_id, FIRST(created_at)
FROM creations
GROUP BY user_id
ORDER BY user_id
These will give you the user_id followed by the first created_at associated with that user_id. If you want a different created_at you have the option to substitute FIRST with other functions like AVG, MIN, MAX, or LAST in Oracle, you can also try adding ORDER BY on other columns (including ones that are not returned, to give you a different created_at.
Your question is not well defined - when you say you need also other data from the same row you are not defining which row.
You do say you need to order the results by created_at, so I will assume that you want values from the row with min created_at (earliest).
This now becomes one of the most common so SQL questions - retrieving rows containing some aggregate value (MIN, MAX).
For example
SELECT user_id, MIN(created_at) AS created_at
FROM creations
GROUP BY user_id
ORDER BY MIN(create_at)
LIMIT 20
This approach will not let you (easily) pick other values from the same row.
One approach that will let you pick other values is
SELECT c.user_id, c.created_at, c.other_columns
FROM creations c LEFT JOIN creation c_help
ON c.user_id = c_help.user_id AND c.created_at > c_help.create_at
WHERE c_help IS NULL
ORDER BY c.created_at
LIMIT 20
Using a sub-query was suggested by someone on the irc #postgresql channel. It worked:
SELECT user_id
FROM (SELECT DISTINCT ON (user_id) * FROM creations) ss
ORDER BY created_at DESC
LIMIT 20;