Order Fields priority - sql

I am using this query:
select o.orderno,
o.day,
a.name,
o.description,
a.adress,
o.quantity,
a.orderType,
o.status,
a.Line,
a.price,
a.serial
from orders o
inner join account a
on o.orderid=a.orderid
order by o.day
I am ordering by day. After sorting the results based on day, what is the next field that is considered on sorting,for the same day, what order is considered?

There is no further sorting. You'll get the results within each day in whatever order Oracle happened to retrieve them, which is not guaranteed in any way, and can be different for the same query being run multiple times. It depends on many things under the hood which you generally have no control over or even visibility of. You may see the results in an apparent order that suits you at the moment, but it could change for a future execution. Changing data will affect the execution plan, for example, which can affect the order to see the results.
If you need a specific order, or just want them returned in a consistent order every time you run the query, you must specify it in the order by clause.
This is something Tom Kyte often stresses; for example in this often-quoted article.

It tries to order by the unique/primary key. In this case, orderno if it is your primary key.
However, your query is laden with errors.
e.g. The table aliases are used in the SELECT clause but are not specified in the FROM

Related

GROUP BY clause order omitting results in Oracle 11g query

I have a simple query that appears to give the desired result:
select op.opr, op.last, op.dept, count(*) as counter
from DWRVWR.BCA_M_OPRIDS1 op
where op.opr = '21B'
group by op.opr, op.last ,op.dept;
My original query returns no results. The only difference was the order of the group by clause:
select op.opr, op.last, op.dept, count(*) as counter
from DWRVWR.BCA_M_OPRIDS1 op
where op.opr = '21B'
group by op.opr, op.dept, op.last;
In actuality, this was part of a much larger, more complicated query, but I narrowed down the problem to this. All documentation I was able to find states that the order of the group by clause doesn't matter. I really want to understand why I am getting different results, as I would have to review all of my queries that use the group by clause, if there is a potential issue. I'm using SQL Developer, if it matters.
Also, if the order of the group by clause did not matter and every field not used in an aggregate function is required to be listed in the group by clause, wouldn't the group by clause simply be redundant and seemingly unnecessary?
All documentation I was able to find states that the order of the group by clause doesn't matter
That's not entirely true, it depends.
The grouping functionality is not impacted by the order of columns in the GROUP BY clause. It will produce the same group set regardless of the order. Perhaps that's what those documentation that you found were referring to. However the order does matter for other aspects.
Before Oracle 10g, the GROUP BY performed implicitly an ORDER BY, so the order of the columns in the GROUP BY clause did matter. The group sets are the same, but only ordered differently. Starting with Oracle10g, if you want the result set to be in any specific order, then you must add an ORDER BY clause. Other databases have similar history.
Another case where the order matters is if you have indexes on the table. Multi-column indexes are only used if the columns exactly match the columns specified in the GROUP BY or ORDER BY clauses. So if you change the order, your query will not use the index and will perform differently. The result is the same, but the performance is not.
Also the order of the columns in the GROUP BY clause becomes important if you use some features like ROLLUP. This time the results themselves will not be the same.
It is recommended to follow the best practice of listing the fields in the GROUP BY clause in the order of the hierarchy. This makes the query more readable and more easily maintainable.
Also, if the order of the group by clause did not matter and every field not used in an aggregate function is required to be listed in the group by clause, wouldn't the group by clause simply be redundant and seemingly unnecessary?
No, the GROUP BY clause is mandatory in the standard SQL and in Oracle. There is only one exception in which you can omit the GROUP BY clause, if you want the aggregate functions to apply to the entire result set. In this case, your SELECT list must consist only of aggregate expressions.

SQL query multiple columns in SELECT - one needs to be DISTINCT

This is my SQL code:
SELECT
T.Template_Id, T.TemplateName,
CONVERT(NVARCHAR(MAX), T.CatalogDescription),
T.MasterNo, T.Customer_Id,
O.Quantity, O.Cost
FROM
Template as T
INNER JOIN
[Order] AS O ON T.Template_Id = O.Template_Id
ORDER BY
O.Cost
My problem is that none of the fields I'm selecting are unique, and I want to have T.Template_Id be DISTINCT, which I couldn't find a way to do. Other columns don't matter, as long as they're there and that the T.Template_Id column is DISTINCT (no duplicates).
Other fields don't matter.
If this is really true*, you can do it like this:
SELECT T.Template_Id, MAX(T.TemplateName) As TemplateName,
CONVERT(NVARCHAR(MAX),MAX(T.CatalogDescription)) As CatalogDescription,
MAX(T.MasterNo) As MasterNo, MAX(T.Customer_Id) As CustomerId,
MAX(O.Quantity) As Quantity, MAX(O.Cost) As Cost
FROM Template as T
INNER JOIN [Order] as O ON T.Template_Id=O.Template_Id
GROUP BY T.Template_Id
ORDER BY MAX(O.Cost)
It's a bit less unusual to see queries where it doesn't matter which corresponding Order fields are used, as long as you're using fields from the same Order record. In that case, you can do it like this:
SELECT T.Template_Id, T.TemplateName,
CONVERT(NVARCHAR(MAX),T.CatalogDescription),
T.MasterNo, T.Customer_Id, O.Quantity, O.Cost
FROM Template as T
CROSS APPLY (SELECT TOP 1 * FROM [Order] WHERE T.Template_Id=[Order].Template_Id) As O
ORDER BY O.Cost
Assuming, of course, that the records at least within the Template table are already unique based on the ID. This has the nice benefit of also making it easier to select which order is chosen, simply by adding an ORDER BY clause inside the nested query.
* Tip: It turns out this is rarely the case. You'll pretty much always find out that it does matter at some point, for at least one of the fields.
Typically you'd GROUP BY that column, but that requires specifying an aggregate function for all the other columns. In your case that may work since you say the other columns don't matter (which makes me wonder why they're being returned).
SELECT T.Template_Id, Max(T.TemplateName),
Max(CONVERT(NVARCHAR(MAX),T.CatalogDescription)), Max(T.MasterNo), Max(T.Customer_Id),
Max(O.Quantity), Max(O.Cost)
FROM Template as T INNER JOIN [Order] as O ON T.Template_Id=O.Template_Id
GROUP BY T.Template_Id
ORDER BY O.Cost
SQL will not allow you to aggregate on only specific fields in a dataset, they must all be aggregated. If it did then like VFP and other database engines that allow this, will just pick a row to fill in the other values.
If you are trying to achieve what I believe you are, then you want a list of all distinct values for the one field and just a sample of the other fields.
I have done this before using window functions such as Rank and Row_Number depending on exactly what I was trying to accomplish. This allows you to also choose your samples if you want such as ordering by OrderDate Desc to get sample fields from the most recent order for a customer.

How records are being sorted while using join in psql?

How records are being sorted while using join in psql?
In some cases seems like sorted by any one of the column
otherwise, seems like sorting by order in which it maches our query after ON . but not sure how its done.
Especially while using right and left join
Simply, What is the nature of displaying records?
In general there is no internal order to the records in a SQL table. From the Postgres documentation on ORDER BY:
After a query has produced an output table (after the select list has been processed) it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.
If you want to have a certain order in your result set you need to specify one using ORDER BY. For example, if you were joining two tables A and B you could use:
select
a.col, a.col2, b.col3
from A a
inner join B b
on a.col = b.col
order by
a.col2 -- or replace with whatever column/logic you want to use
Even if there appears to be some order to your current query, you should not rely on it. Adding more data, adding a column, or doing a vacuum all could cause this "order" to change.

Time based accumulation based on type: Speed considerations in SQL

Based on surfing the web, I came up with two methods of counting the records in a table "Table1". The counter field increments according to a date field "TheDate". It does this by summing records with an older TheDate value. Furthermore, records with different values for the compound field (Field1,Field2) are counted using separate counters. Field3 is just an informational field that is included for added awareness and does not affect the counting or how records are grouped for counting.
Method 1: Use corrrelated subquery
SELECT MainQuery.Field1,
MainQuery.Field2,
MainQuery.Field3,
MainQuery.TheDate,
(
SELECT SUM(1) FROM Table1 InnerQuery
WHERE InnerQuery.Field1 = MainQuery.Field1 AND
InnerQuery.Field2 = MainQuery.Field2 AND
InnerQuery.TheDate <= MainQuery.TheDate
) AS RunningCounter
FROM Table1 MainQuery
ORDER BY MainQuery.Field1,
MainQuery.Field2,
MainQuery.TheDate,
MainQuery.Field3
Method 2: Use join and group-by
SELECT MainQuery.Field1,
MainQuery.Field2,
MainQuery.Field3,
MainQuery.TheDate,
SUM(1) AS RunningCounter
FROM Table1 MainQuery INNER JOIN Table1 InnerQuery
ON InnerQuery.Field1 = MainQuery.Field1 AND
InnerQuery.Field2 = MainQuery.Field2 AND
InnerQuery.TheDate <= MainQuery.TheDate
GROUP BY MainQuery.Field1,
MainQuery.Field2,
MainQuery.Field3,
MainQuery.TheDate
ORDER BY MainQuery.Field1,
MainQuery.Field2,
MainQuery.TheDate,
MainQuery.Field3
There is no inner query per se in Method 2, but I use the table alias InnerQuery so that a ready parellel with Method 1 can be drawn. The role is the same; the 2nd instance of Table 1 is for accumulating the counts of the records which have TheDate less than that of any record in MainQuery (1st instance of Table 1) with the same Field1 and Field2 values.
Note that in Method 2, Field 3 is include in the Group-By clause even though I said that it does not affect how the records are grouped for counting. This is still true, since the counting is done using the matching records in InnerQuery, whereas the GROUP By applies to Field 3 in MainQuery.
I found that Method 1 is noticably faster. I'm surprised by this because it uses a correlated subquery. The way I think of a correlated subquery is that it is executed for each record in MainQuery (whether or not that is done in practice after optimization). On the other hand, Method 2 doesn't run an inner query over and over again. However, the inner join still has multiple records in InnerQuery matching each record in MainQuery, so in a sense, it deals with a similar order of complexity.
Is there a decent intuitive explanation for this speed difference, as well as best practice or considerations in choosing an approach for time-base accumulation?
I've posted this to
Microsoft Answers
Stack Exchange
In fact, I think the easiest way is to do this:
SELECT MainQuery.Field1,
MainQuery.Field2,
MainQuery.Field3,
MainQuery.TheDate,
COUNT(*)
FROM Table1 MainQuery
GROUP BY MainQuery.Field1,
MainQuery.Field2,
MainQuery.Field3,
MainQuery.TheDate
ORDER BY MainQuery.Field1,
MainQuery.Field2,
MainQuery.TheDate,
MainQuery.Field3
(The order by isn't required to get the same data, just to order it. In other words, removing it will not change the number or contents of each row returned, just the order in which they are returned.)
You only need to specify the table once. Doing a self-join (joining a table to itself as both your queries do) is not required. The performance of your two queries will depend on a whole load of things which I don't know - what the primary keys are, the number of rows, how much memory is available, and so on.
First, your experience makes a lot of sense. I'm not sure why you need more intuition. I imagine you learned, somewhere along the way, that correlated subqueries are evil. Well, as with some of the things we teach kids as being really bad ("don't cross the street when the walk sign is not green") turn out to be not so bad, the same is true of correlated subqueries.
The easiest intuition is that the uncorrelated subquery has to aggregate all the data in the table. The correlated version only has to aggregate matching fields, although it has to do this over and over.
To put numbers to it, say you have 1,000 rows with 10 rows per group. The output is 100 rows. The first version does 100 aggregations of 10 rows each. The second does one aggregation of 1,000 rows. Well, aggregation generally scales in a super-linear fashion (O(n log n), technically). That means that 100 aggregations of 10 records takes less time than 1 aggregation of 1000 records.
You asked for intuition, so the above is to provide some intuition. There are a zillion caveats that go both ways. For instance, the correlated subquery might be able to make better use of indexes for the aggregation. And, the two queries are not equivalent, because the correct join would be LEFT JOIN.
Actually, I was wrong in my original post. The inner join is way, way faster than the correlated subquery. However, the correlated subquery is able to display its results records as they are generated, so it appears faster.
As a side curiosity, I'm finding that if the correlated sub-query approach is modified to use sum(-1) instead of sum(1), the number of returned records seems to vary from N-3 to N (where N is the correct number, i.e., the number of records in Table1). I'm not sure if this is due to some misbehaviour in Access's rush to display initial records or what-not.
While it seems that the INNER JOIN wins hands-down, there is a major insidious caveat. If the GROUP BY fields do not uniquely distinguish each record in Table1, then you will not get an individual SUM for each record of Table1. Imagine that a particular combination of GROUP BY field values matching (say) THREE records in Table1. You will then get a single SUM for all of them. The problem is, each of these 3 records in MainQuery also matches all 3 of the same records in InnerQuery, so those instances in InnerQuery get counted multiple times. Very insidious (I find).
So it seems that the sub-query may be the way to go, which is awfully disturbing in view of the above problem with repeatability (2nd paragraph above). That is a serious problem that should send shivers down any spine. Another possible solution that I'm looking at is to turn MainQuery into a subquery by SELECTing the fields of interest and DISTINCTifying them before INNER JOINing the result with InnerQuery.

dense_rank filling up tempdb on SQL server?

I've got this query here which uses dense_rank to number groups in order to select the first group only. It is working but its slow and tempdb (SQL server) becomes so big that the disk is filled up. Is it normal for dense_rank that it's such a heavy operation? And how else should this be done then without resorting to coding?
select
a,b,c,d
from
(select a,b,c,d,
dense_rank() over (order by s.[time] desc) as gn
from [Order] o
JOIN Scan s ON s.OrderId = o.OrderId
JOIN PriceDetail p ON p.ScanId = s.ScanId) as p
where p.OrderNumber = #OrderNumber
and p.Number = #Number
and p.Time > getdate() - 20
and p.gn = 1
group by a,b,c,d,p.gn
Any operation that has to sort a large dataset may fill tempdb. dense_rank is no exception, just like rank, row_number, ntile etc etc.
You are asking for a sort over what appears to be a global, complete sort of every scan entry, since database start. The way you expressed the query the join must occur before the sort, so the sort will be both big and wide. After all is said and done, consuming a lot of IO, CPU and tempdb space, you will restrict the result to a small subset for only a specified order and some conditions (which mentions columns not present in projection, so they must be some made up example not the real code).
You have a filter on WHERE gn=1 followed by a GROUP BY gn. This is unnecessary, the gn is already unique from the predicate so it cannot contribute to the group by.
You compute the dense_rank over every order scan and then you filter by p.OrderNumber = #OrderNumber AND p.gn = 1. This makes even less sense. This query will only return results if the #OrderNumber happens to contain the scan with rank 1 over all orders! It cannot possibly be correct.
Your query makes no sense. The fact that is slow is just a bonus. Post your actual requirements.
If you want to learn about performance investigation, read How to analyse SQL Server performance.
PS. As a rule, computing ranks and selecting =1 can always be expressed as a TOP(1) correlated subquery, with usually much better results. Indexes help, obviously.
PPS. Use of group by without any aggregate function is yest another serious code smell.