Running a query without GROUP BY and aggregation functions - sql

Let's say that you are limited to use only this syntax (you can't use aggregation functions like MAX or MIN, and neither you can't use GROUP BY clauses; please don't ask me why):
{SQL query} ::= SELECT [DISTINCT | ALL] [TOP {integer}]
{select_list}
FROM {table_reference}
[WHERE {search_condition}]
[ORDER BY {orderby} { ',' {orderby} } ]
Let's say that we have an ITEM table, where the identifier is called ITEM_ID. For a given ITEM_ID you could have many rows with the same ITEM_ID but different SHIP_DATE. How would you write a query to return only the ITEMS with the most recent SHIP_DATE given the previous syntax?
I already tried using TOP N (to retrieve the first row in the result set) combined with an ORDER BY (to sort from the max SHIP_DATE to the min SHIP_DATE). Any ideas or suggestions?
What I tried is something like this:
SELECT TOP N * FROM ITEM WHERE ITEM_ID='X' ORDER BY SHIP_DATE DESC
Actually the previous query seems to be working, but I'm wondering if there is a better way to do it.
This is not homework, I need to create a query using the supported FileNet P8 syntax: http://publib.boulder.ibm.com/infocenter/p8docs/v4r5m1/index.jsp?topic=/com.ibm.p8.doc/developer_help/content_engine_api/guide/query_sql_syntax_ref.htm

Assuming that ITEM.ship_date doesn't fall on the same date for any other ITEM record:
SELECT TOP x
i.item_id,
i.ship_date
FROM ITEM i
ORDER BY i.ship_date DESC
...where x is the number of unique/distinct ITEM.item_id records. How'd you'd get that without being able to use COUNT...
You have to use ORDER BY i.ship_date DESC in order to have the most recent ship_date records at the top of the result set.

You just want the one item with the absolute latest ship_date? If so then your query is exactly right:
SELECT TOP 1 * FROM ITEM WHERE ITEM_ID='X' ORDER BY SHIP_DATE DESC
The only problem might be that if SHIP_DATE is something coarse (like a calendar date), rather than fine (like a datetime) then you could have ties. In that case, you database engine is just going to pick one of them and you won't know in advance how it will figure that out. So you might want to consider adding another column to your order by just to make it deterministic.

select distinct item_id from item
where ship_date =
(select top 1 ship_date from item order by ship_date desc)

Is your syntax as strict as it seems there, or can distinct versus all be specified on a per-column basis? For instance, could you do something like this:
SELECT ship_date, DISTINCT item_id FROM tablename ORDER BY ship_date;
and if so, does that work?
If it doesn't, are you able to nest queries? (i.e. use a subquery in place of table_reference)

Related

getting same top 1 result in sql server

I have this query:
SELECT
IT_approvaldate
FROM
t_item
WHERE
IT_certID_fk_ind = (SELECT DISTINCT TOP 1 IT_certID_fk_ind
FROM t_item
WHERE IT_rfileID_fk = '4876')
ORDER BY
IT_typesort
Result when running this query:
I need get top 1 result. (2013-04-27 00:00:00) problem is when I select top 1, getting 2nd result.
I believe reason for that order by column value same in those two result.
please see below,
However I need get only IT_approvaldate column top 1 as result of my query.
How can I do this? Can anyone help me to solve this?
Hi use below query and check
SELECT IT_approvaldate FROM t_item WHERE IT_certID_fk_ind =(SELECT DISTINCT top 1 IT_certID_fk_ind FROM t_item WHERE IT_rfileID_fk ='4876' ) and IT_approvaldate is not null ORDER BY IT_typesort
This will remove null values from the result
If you want NULL to be the last value in the sorted list you can use ISNULL in ORDER BY clause to replace NULL by MAX value of DATETIME
Below code might help:
SELECT TOP 1 IT_approvaldate
FROM t_item
WHERE IT_certID_fk_ind = (SELECT DISTINCT top 1 IT_certID_fk_ind FROM t_item WHERE IT_rfileID_fk ='4876' )
ORDER BY IT_typesort ASC, ISNULL(IT_approvaldate,'12-31-9999 23:59:59') ASC;
TSQL Select queries are not inherently deterministic. You must add a tie-breaker or by another row that is not.
The theory is SQL Server will not presume that the NULL value is greater or lesser than your row, and because your select statement is not logically implemented until after your HAVING clause, the order depends on how the database is setup.
Understand that SQL Server may not necessarily choose the same path twice unless it thinks it is absolutely better. This is the reason for the ORDER BY clause, which will treat NULLs consistently (assuming there is a unique grouping).
UPDATE:
It seemed a good idea to add a link to MSDN's documentation on the ORDER BY. Truly, it is good practice to start from the Standard/MSDN. ORDER BY Clause - MSDN

Select Distinct On while Order By a different column

I'm coming from a MySQL background, where GROUP BY worked very differently than in Postgres. In Postgres - and apparently any standards-based SQL database - you have to group by all selected columns, while in MySQL you can handpick which ones to group by.
I read that you can get an equivalent effect with DISTINCT ON, and for the most part that's the case. The hitch is that you have to ORDER BY all the distinct columns, and this ordering has to be the left-most ordering. That's a problem when I want to order primarily by another column.
Right now my query looks like this:
SELECT
DISTINCT ON (eventable_id, eventable_type)
events.eventable_id, events.eventable_type, events.*
FROM events
WHERE <query>
ORDER BY eventable_id, eventable_type, events.created_at DESC
I would like to swap around the order by to look like this:
ORDER BY events.created_at, eventable_id, eventable_type DESC
Any advice for getting this to work?
Since you are selecting events.*, you shouldn't add eventable_id, and eventable_type to the output columns redundantly. Would result in duplicate column names. You know that you don't have to include the columns in the DISTINCT ON clause in target list, right?
Also, it's probably faster to use eventable_type DESC right away, since you have that in your final sort order. That's allowed, too.
SELECT DISTINCT ON (eventable_id, eventable_type)
*
FROM events
WHERE <condition>
ORDER BY eventable_id, eventable_type DESC, created_at DESC
#Denis already covers the rest: make that a subquery and order as you like in the outer query.
The alternative would be a subselect with GROUP BY and max(), but that yields multiple columns per group, when the latest created_at per group is not unique. (May or may not be desirable.) And it's probably still slower than DISTINCT ON with an additional ORDER BY step. Test with EXPLAIN ANALYZE.
SELECT e.*
FROM events e
JOIN (
SELECT eventable_id, eventable_type, max(created_at) AS created_at
FROM events
WHERE <condition>
GROUP BY 1, 2 DESC
) sub USING (eventable_id, eventable_type, created_at) -- maybe not unique
WHERE <repeat condition if dupes may be eliminated>
ORDER BY e.created_at, e.eventable_id, e.eventable_type DESC
If Postgres complains, use a subselect:
select * from ( ... ) q order by ...
(If it does, though, I'd take it as a hint that the query plan will suck.)

opposite to top

I'm working with sql server 2005.
I have a view which sorts its columns according to order date. I call:
SELECT TOP 1 [OrderDate]
FROM [ordersview]
to get the latest time. How do I get the earliest time?
SELECT TOP 1 OrderDate FROM ordersview ORDER BY OrderDate DESC
Also:
SELECT MIN(OrderDate) FROM ordersview
Use a descending ordering:
select top 1 OrderDate from ordersview order by OrderDate desc
I think,
this is little bit tricky question.
Every body will say for white is opposite black.
And for first is it last.
but when u are not specifying initial order
what is really the first.
I think it is internal/vendor specific thing.
So both answers are right ,but actually not answering really your question.
I'm not really mssql-guy but think that your select will return random
row (maybe depending on inserting sequence , or same internal db thing like rowId).
And what is opposite for random ?
One more thing is that, ordering is pretty demanding(resource/performance) function ,
for such thing u should have index on column.
And basically when u are doing select like that u should thing about real paging
not only one item .
But then the result will have different order then original one ( so ... )

GROUP BY / aggregate function confusion in SQL

I need a bit of help straightening out something, I know it's a very easy easy question but it's something that is slightly confusing me in SQL.
This SQL query throws a 'not a GROUP BY expression' error in Oracle. I understand why, as I know that once I group by an attribute of a tuple, I can no longer access any other attribute.
SELECT *
FROM order_details
GROUP BY order_no
However this one does work
SELECT SUM(order_price)
FROM order_details
GROUP BY order_no
Just to concrete my understanding on this.... Assuming that there are multiple tuples in order_details for each order that is made, once I group the tuples according to order_no, I can still access the order_price attribute for each individual tuple in the group, but only using an aggregate function?
In other words, aggregate functions when used in the SELECT clause are able to drill down into the group to see the 'hidden' attributes, where simply using 'SELECT order_no' will throw an error?
In standard SQL (but not MySQL), when you use GROUP BY, you must list all the result columns that are not aggregates in the GROUP BY clause. So, if order_details has 6 columns, then you must list all 6 columns (by name - you can't use * in the GROUP BY or ORDER BY clauses) in the GROUP BY clause.
You can also do:
SELECT order_no, SUM(order_price)
FROM order_details
GROUP BY order_no;
That will work because all the non-aggregate columns are listed in the GROUP BY clause.
You could do something like:
SELECT order_no, order_price, MAX(order_item)
FROM order_details
GROUP BY order_no, order_price;
This query isn't really meaningful (or most probably isn't meaningful), but it will 'work'. It will list each separate order number and order price combination, and will give the maximum order item (number) associated with that price. If all the items in an order have distinct prices, you'll end up with groups of one row each. OTOH, if there are several items in the order at the same price (say £0.99 each), then it will group those together and return the maximum order item number at that price. (I'm assuming the table has a primary key on (order_no, order_item) where the first item in the order has order_item = 1, the second item is 2, etc.)
The order in which SQL is written is not the same order it is executed.
Normally, you would write SQL like this:
SELECT
FROM
JOIN
WHERE
GROUP BY
HAVING
ORDER BY
Under the hood, SQL is executed like this:
FROM
JOIN
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Reason why you need to put all the non-aggregate columns in SELECT to the GROUP BY is the top-down behaviour in programming. You cannot call something you have not declared yet.
Read more: https://sqlbolt.com/lesson/select_queries_order_of_execution
SELECT *
FROM order_details
GROUP BY order_no
In the above query you are selecting all the columns because of that its throwing an error not group by something like..
to avoid that you have to mention all the columns whichever in select statement all columns must be in group by clause..
SELECT *
FROM order_details
GROUP BY order_no,order_details,etc
etc it means all the columns from order_details table.
To use group by clause you have to mention all the columns from select statement in to group by clause but not the column from aggregate function.
TO do this instead of group by you can use partition by clause you can use only one port to group as a partition by.
you can also make it as partition by 1
use Common table expression(CTE) to avoid this issue.
multiple CTes also come handy, pasting a case where I have used...maybe helpful
with ranked_cte1 as
( select r.mov_id,DENSE_RANK() over ( order by r.rev_stars desc )as rankked from ratings r ),
ranked_cte2 as ( select * from movie where mov_id=(select mov_id from ranked_cte1 where rankked=7 ) ) select * from ranked_cte2
select * from movie where mov_id=902

Can I group by something that isn't in the SELECT line?

Given a command in SQL;
SELECT ...
FROM ...
GROUP BY ...
Can I group by something that isn't in the SELECT line?
Yes.
This is often used in the superaggregate queries like this:
SELECT AVG(cnt)
FROM (
SELECT COUNT(*) AS cnt
FROM sales
GROUP BY
product
HAVING COUNT(*) > 10
) q
, which aggregate the aggregates.
Yes of course e.g.
select
count(*)
from
some_table_with_updated_column
group by
trunc(updated, 'MM.YYYY')
Yes you can do it, but if you do that you won't be able to tell which result is for which group.
As a result, you almost always want to return the columns you've grouped by in the select clause. But you don't have to.
Yes, you can. Example:
select count(1)
from sales
group by salesman_id
What you can't do, of course, if having something on your select clause (other than aggregate functions) that are not part of the group by clause.
Hmm, I think the question should have been in the other way round like,
Can I SELECT something that is not there in the GROUP BY?
It's alright to write a code like:
SELECT customerId, count(orderId) FROM orders
GROUP BY customerId, orderedOn
If you want to find out the number of orders done by a customer datewise.
But you cannot do it the other way round:
SELECT customerId, orderedOn count(orderId) FROM orders
GROUP BY customerId
You can issue an aggregate function on the column that is not there in the group by. But you cannot give it in the select line without the aggregate function. As it will not make much sense. Like for the above query. You group by just customerId for order counts and you want the date also to be printed in the output??!! You don't involve the date factor in the group for counting then will it mean something to have a date in it?
I don't know about other DBMS' but DB2/z, for one, does this just fine. It's not required to have the column in the select portion but, of course, it does have to extract the data from the table in order to aggregate so you're probably not saving any time by leaving it off. You should only select the columns that you need, aggregation of the data is a separate task from that.
I'm pretty certain the SQL standard allows this (although that's only based on the knowledge that the mainframe DB2 product follows it pretty closely).