Partition Over Ordering Effect - sql

Does the Oracle clause OVER(PARTITION BY SUM(some_field))
have an implicit ordering effect and will my result data be sorted by SUM(some_field) without an additional
ORDER BY SUM(some_field) clause?

No. An analytic function in the SELECT statement does not imply any particular ordering of the result. Remember that you can have mulitple analytic functions in your query each of which is looking at rows in a different order so it wouldn't make sense for there to be an implied ordering of the result. If you want your results returned in a specific order, use an ORDER BY clause.

Related

Does adding ROW_NUMBER to query get sorted automatically?

It seems if I add ROW_NUMBER in a simple select query, the results are sorted automatically by the ROW_NUMBER column even without order by added to the select query at the end. I tried this on
without ROW_NUMBER - results are in random order
with ROW_NUMBER over (order by some_col) - results automatically ordered by this ROW_NUMBER column
with ROW_NUMBER over (order by some_col desc) - results again automatically order by this ROW_NUMBER column reflecting the new direction
Why is it behaving like this? Is there an implicit order by when using ROW_NUMBER?
If this is vendor specific, I was testing on MSSQL2014
The ORDER BY Clause in the window function only controls the order of the rows considered for the window function. It does not guarantee the final result set order.
Over clause
Note: The ORDER BY clause in the OVER clause only controls the order that the rows in the partition will be utilized by the window
function. It does not control the order of the final result set.
Without an ORDER BY clause on the query itself, the order of the rows
is not guaranteed. You may notice that your query may be returning in
the order of the last specified OVER clause – this is due to the way
that this is currently implemented in SQL Server. If the SQL Server
team at Microsoft changes the way that it works, it may no longer
order your results in the manner that you are currently observing. If
you need a specific order for the result set, you must provide an
ORDER BY clause against the query itself.

If you do a simple SELECT-WHERE on a CTE that is already sorted, are your results guaranteed to still be in that same order, just filtered?

Wondering about expected/deterministic ordering output from Oracle 11g for queries based on sorted CTEs.
Consider this (extremely-oversimplified for the sake of the) example SQL query. Again, note how the CTE has an ORDER BY clause in it.
WITH SortedArticles as (
SELECT. *
FROM Articles
ORDER BY DatePublished
)
SELECT *
FROM SortedArticles
WHERE Author = 'Joe';
Can it be assumed that the outputted rows are guaranteed to be in the same order as the CTE, or do I have to re-sort them a second time?
Again, this is an extremely over-simplified example but it contains the important parts of what I'm asking. They are...
The CTE is sorted
The final SELECT statement selects only against the CTE, nothing else (no joins, etc.), and
The final SELECT statement only specifies a WHERE clause. It is purely a filtering statement.
The short answer is no. The only way to guarantee ordering is with an ORDER BY clause on your outer query. But there is no need to sort the results in the CTE in that situation.
However, if the sort expression is complex, and you need sorting in the derived CTEs (e.g. because of using OFFSET/FETCH or ROWNUM), you could simplify the subsequent sorting by adding a row number field to the original CTE based on its sort criteria and then just sorting the derived CTEs by that row number. For your example:
WITH SortedArticles as (
SELECT *,
ROW_NUMBER() OVER (ORDER BY DatePublished) AS rn
FROM Articles
)
SELECT *
FROM SortedArticles
WHERE Author = 'Joe'
ORDER BY rn
No, the results are not guaranteed to be in the same order as in the subquery. Never was, never will be. You may observe a certain behaviour, especially if the CTE is materialized, which you can try to influence with optimizer hints like /*+ MATERIALIZE */ and /*+ INLINE */. However, the behaviour of the query optimizer depends also on data volume, IO v cpu speed, and most importantly on the database version. For instance, Oracle 12.2 introduces a feature called "In-Memory Cursor Duration Temp Table" that tries to speed up queries like yours, without preserving the order in the subquery.
I'd go along with #Nick's suggestion of adding a row number field in the subquery.

exclude a column from group by statement

I would like to exclude a column from group by statement, because it results in some redundant records. Are there any recommendations?
I use Oracle, and have a complex query which join 6 tables together, and want to use sql aggregate function (count), without duplicate result.
You can't.
When using aggregate functions every column/column expression which is not an aggregate must be in the GROUP BY.
This is completely logical. If you're not aggregating the column then excluding it from the GROUP BY would force Oracle to chose a random value, which is not very useful.
If you don't want this column in your GROUP BY then you must decide what aggregation to apply to this column in order to return the appropriate data for your situation. You can't hand this responsibility off to the database engine.

how to resolve this - group by changes the Order of items in SQL Server

I'm using SQL server 2014,I'm fetching data from a view.The order of items is getting changed once i use Group by ,how can i get the order back after using this Group by,There is one date column,but its not saving any time,So i can't sort it based on date also..
How can I display the data in the same order as it displayed before using Group by?Anyone have any idea please help?
Thanks
Tables and views are essentially unordered sets. To get rows in a specific order, you should always add an ORDER BY clause on the columns you wish to order on.
I'm assuming you previously selected from the VIEW without an ORDER BY clause. The order in which rows are returned from a SELECT statement without an ORDER BY statement is undefined. The order you are getting them in, can change due to any number of reasons (eg some are listed here).
Your problem stems from the mistake you made on relying on the order from a SELECT from a VIEW without an ORDER BY. You should have had an ORDER BY clause in your SELECT statement to begin with.
How can I display the data in the same order as it displayed before using Group by?
The answer: You can't if your initial statement did not have an ORDER BY clause.
The resolution: Determine the order you want the resultset in and add an ORDER BY clause on those columns, both in your initial query and the version with the GROUP BY clause.
Maybe you can use the row_number() function without any OVER and ORDER BY keywords? This should be done in a sub-select and when you group the data in the outer SELECT, use the AVG() function on the numbered column and ORDER the result by this. The problem is, that when you group rows, the original rows disappear. That's kind if the purpose of GROUP BY. ;) Depending on what you GROUP BY, what you're asking might be logically impossible.
EDIT:
Found this solution Googling: http://blog.sqlauthority.com/2015/05/05/sql-server-generating-row-number-without-ordering-any-columns/
So you can number rows like this to maintain the order of rows from the table before you GROUP BY:
row_number() OVER (ORDER BY (SELECT 1))
The only way you can enforce a specific order is to explicitly use a ORDER BY clause. Otherwise the order of rows is not guaranteed (take a look at this article for more details) and the database engine will return the rows based on "as fast as it can" or "as fast as it can retrieve them from disk" rule. So, order can also vary between executions of the same query in the span of a few seconds.
When doing a DISTINCT, GROUP BY or ORDER BY, SQL Server automatically does a SORT of the data based on an index it uses for that query.
Looking at the execution plan of your query will show you what index (and implicitly columns in that index) is being used to sort the data.

Any reason for GROUP BY clause without aggregation function?

I'm (thoroughly) learning SQL at the moment and came across the GROUP BYclause.
GROUP BY aggregates or groups the resultset according to the argument(s) you give it. If you use this clause in a query you can then perform aggregate functions on the resultset to find statistical information on the resultset like finding averages (AVG()) or frequency (COUNT()).
My question is: is the GROUP BY statement in any way useful without an accompanying aggregate function?
Update
Using GROUP BY as a synonym for DISTINCT is (probably) a bad idea because I suspect it is slower.
is the GROUP BY statement in any way useful without an accompanying aggregate function?
Using DISTINCT would be a synonym in such a situation, but the reason you'd want/have to define a GROUP BY clause would be in order to be able to define HAVING clause details.
If you need to define a HAVING clause, you have to define a GROUP BY - you can't do it in conjunction with DISTINCT.
You can perform a DISTINCT select by using a GROUP BY without any AGGREGATES.
Group by can used in Two way Majorly
1)in conjunction with SQL aggregation functions
2)to eliminate duplicate rows from a result set
SO answer to your question lies in second part of USEs above described.
Note: everything below only applies to MySQL
GROUP BY is guaranteed to return results in order, DISTINCT is not.
GROUP BY along with ORDER BY NULL is of same efficiency as DISTINCT (and implemented in the say way). If there is an index on the field being aggregated (or distinctified), both clauses use loose index scan over this field.
In GROUP BY, you can return non-grouped and non-aggregated expressions. MySQL will pick any random values from from the corresponding group to calculate the expression.
With GROUP BY, you can omit the GROUP BY expressions from the SELECT clause. With DISTINCT, you can't. Every row returned by a DISTINCT is guaranteed to be unique.
It is used for more then just aggregating functions.
For example, consider the following code:
SELECT product_name, MAX('last_purchased') FROM products GROUP BY product_name
This will return only 1 result per product, but with the latest updated value of that records.