I have a simple query that appears to give the desired result:
select op.opr, op.last, op.dept, count(*) as counter
from DWRVWR.BCA_M_OPRIDS1 op
where op.opr = '21B'
group by op.opr, op.last ,op.dept;
My original query returns no results. The only difference was the order of the group by clause:
select op.opr, op.last, op.dept, count(*) as counter
from DWRVWR.BCA_M_OPRIDS1 op
where op.opr = '21B'
group by op.opr, op.dept, op.last;
In actuality, this was part of a much larger, more complicated query, but I narrowed down the problem to this. All documentation I was able to find states that the order of the group by clause doesn't matter. I really want to understand why I am getting different results, as I would have to review all of my queries that use the group by clause, if there is a potential issue. I'm using SQL Developer, if it matters.
Also, if the order of the group by clause did not matter and every field not used in an aggregate function is required to be listed in the group by clause, wouldn't the group by clause simply be redundant and seemingly unnecessary?
All documentation I was able to find states that the order of the group by clause doesn't matter
That's not entirely true, it depends.
The grouping functionality is not impacted by the order of columns in the GROUP BY clause. It will produce the same group set regardless of the order. Perhaps that's what those documentation that you found were referring to. However the order does matter for other aspects.
Before Oracle 10g, the GROUP BY performed implicitly an ORDER BY, so the order of the columns in the GROUP BY clause did matter. The group sets are the same, but only ordered differently. Starting with Oracle10g, if you want the result set to be in any specific order, then you must add an ORDER BY clause. Other databases have similar history.
Another case where the order matters is if you have indexes on the table. Multi-column indexes are only used if the columns exactly match the columns specified in the GROUP BY or ORDER BY clauses. So if you change the order, your query will not use the index and will perform differently. The result is the same, but the performance is not.
Also the order of the columns in the GROUP BY clause becomes important if you use some features like ROLLUP. This time the results themselves will not be the same.
It is recommended to follow the best practice of listing the fields in the GROUP BY clause in the order of the hierarchy. This makes the query more readable and more easily maintainable.
Also, if the order of the group by clause did not matter and every field not used in an aggregate function is required to be listed in the group by clause, wouldn't the group by clause simply be redundant and seemingly unnecessary?
No, the GROUP BY clause is mandatory in the standard SQL and in Oracle. There is only one exception in which you can omit the GROUP BY clause, if you want the aggregate functions to apply to the entire result set. In this case, your SELECT list must consist only of aggregate expressions.
Related
I am trying to make groups and make joins with the below tables but I get an
ORA-00918: column ambiguously defined
error.
Any ideas how to fix?
SELECT staffn, job, COUNT(*)"staffcount", AVG(sal)"AverageSal"
FROM staff, shop
WHERE staff.shopno= shop.shopno
GROUP BY shopno, job;
You should use proper alias and your group by clause must include all the unaggregated columns as follows:
SELECT s.staffn, sh.job,
COUNT(*)"staffcount",
AVG(s.sal)"AverageSal"
FROM staff s join shop sh
WHERE s.shopno= sh.shopn
Group by s.staffn, sh.job
Did you mean to include staffn In your select? I would guess that this was unique to a row in staff and would make selecting the average (or any other aggregation) sal a bit useless (and if you did want to do that, you’d need to include it in the group by). I think you really meant to select the same column in your group by.
Your error is telling you that Oracle doesn’t know where a column should be taken from, multiple row sources in your query could provide it. The complete error message will also make it clear which column this is referring to, but we can already see that at least shopno is shared, we can arbitrarily take it from staff.
SELECT staff.shopno, job, COUNT(*)"staffcount", AVG(sal)"AverageSal"
FROM staff, shop
WHERE staff.shopno= shop.shopno
GROUP BY staff.shopno, job;
In both tables you used, there is at least a field with the same name. You must specify which field used which table.
for more information
Never use commas in the FROM clause. Always use proper, explicit, standard, readable JOIN syntax.
In a query that references multiple tables, qualify all column references.
You haven't shown the layout of your tables, but you presumably want something like this:
SELECT st.staffn, st.job, COUNT(*) as staffcount,
AVG(st.sal) as AverageSal
FROM staff st JOIN
shop sh
ON st.shopno = sh.shopno
GROUP BY st.staffn, st.job;
This assumes that all the columns come from the staff table, which seems reasonable enough in the absence of other information.
I am new to SQL. Could anyone help me to figure out why the "Group By" Expression isn't working in this sql query? I get this error
ERROR at line 3:
ORA-00979: not a GROUP BY expression
The code I am using is
CREATE OR REPLACE VIEW CUSTOMER_LINE_ITEM AS
SELECT CUSTOMER_ORDER_CART_INFO.loginName,CUSTOMER_ORDER_CART_INFO.FirstName,
CUSTOMER_ORDER_CART_INFO.LastName,CUSTOMER_ORDER_CART_INFO.orderCartID,(lineItems.orderPrice*lineItems.qtyOrdered) AS TOTAL_ORDER
FROM CUSTOMER_ORDER_CART_INFO
INNER JOIN lineItems
ON CUSTOMER_ORDER_CART_INFO.orderCartID = lineItems.orderCartID
GROUP BY CUSTOMER_ORDER_CART_INFO.loginName,CUSTOMER_ORDER_CART_INFO.FirstName,
CUSTOMER_ORDER_CART_INFO.LastName,CUSTOMER_ORDER_CART_INFO.orderCartID
ORDER BY orderCartID;
Without the Group By expression I generate this view. I think the group by expression should just remove the duplicates and just give me the results with different order cart ID. Could anyone help me understand what I am doing wrong here?
VIEW of CUSTOMER_LINE_ITEM without 'group by'
The error is with group by clause. Remember simple rule of thumb, all columns being selected to be in group by clause, or the columns to be selected which are not part of group by clause are to be selected as some aggregate function, like, MAX, MIN, SUM, AVG, etc.
Try the following query, which would run without issue. But I can't say its logical correctness which you need to figure out on your requirement basis.
CREATE OR REPLACE VIEW customer_line_item AS
SELECT cac.loginName,
cac.FirstName,
cac.LastName,
cac.orderCartID,
(SUM(li.orderPrice) * SUM(li.qtyOrdered)) AS TOTAL_ORDER
FROM customer_order_cart_info cac
INNER JOIN lineItems li
ON cac.orderCartID = li.orderCartID
GROUP BY cac.loginName,
cac.FirstName,
cac.LastName,
cac.orderCartID
ORDER BY cac.orderCartID;
Now thing to note here is, li.orderPrice and li.qtyOrdered were being selected, but were neither in group by nor in a aggregate function.
The use of group by is that, the columns in group by clause are used to logically group your data. Here your data is grouped by loginName, firstname, lastname, ordercartid. But there is a probability that multiple orderprice and qty exist for each group, and SQL is not able to justify the grouping logic then. Per your query one requirement that I could think of was, you want find the total value of order for a customer in his cart. Hence, you are multiplying orderPrice with qtyOrdered. To achieve this, you need to multiply orderPrice and orderqty of each lineItem. Hence, what you need is a sum of (orderPrice*orderQty) group by lineItem(lineItemID/lineItemNo maybe, just a guess). For this one, give me some time, let me devise an example and I will edit my answer with that. Till then you try something like above.
The cause of the error message is that you don't aggregate (lineItems.orderPrice*lineItems.qtyOrdered).
The Oracle documentation tells us
SelectItems in the SelectExpression with a GROUP BY clause must
contain only aggregates or grouping columns.
That means you should aggregate TOTAL_ORDER by using e.g.
sum(lineItems.orderPrice*lineItems.qtyOrdered)
or whatever the requirement is.
Based on surfing the web, I came up with two methods of counting the records in a table "Table1". The counter field increments according to a date field "TheDate". It does this by summing records with an older TheDate value. Furthermore, records with different values for the compound field (Field1,Field2) are counted using separate counters. Field3 is just an informational field that is included for added awareness and does not affect the counting or how records are grouped for counting.
Method 1: Use corrrelated subquery
SELECT MainQuery.Field1,
MainQuery.Field2,
MainQuery.Field3,
MainQuery.TheDate,
(
SELECT SUM(1) FROM Table1 InnerQuery
WHERE InnerQuery.Field1 = MainQuery.Field1 AND
InnerQuery.Field2 = MainQuery.Field2 AND
InnerQuery.TheDate <= MainQuery.TheDate
) AS RunningCounter
FROM Table1 MainQuery
ORDER BY MainQuery.Field1,
MainQuery.Field2,
MainQuery.TheDate,
MainQuery.Field3
Method 2: Use join and group-by
SELECT MainQuery.Field1,
MainQuery.Field2,
MainQuery.Field3,
MainQuery.TheDate,
SUM(1) AS RunningCounter
FROM Table1 MainQuery INNER JOIN Table1 InnerQuery
ON InnerQuery.Field1 = MainQuery.Field1 AND
InnerQuery.Field2 = MainQuery.Field2 AND
InnerQuery.TheDate <= MainQuery.TheDate
GROUP BY MainQuery.Field1,
MainQuery.Field2,
MainQuery.Field3,
MainQuery.TheDate
ORDER BY MainQuery.Field1,
MainQuery.Field2,
MainQuery.TheDate,
MainQuery.Field3
There is no inner query per se in Method 2, but I use the table alias InnerQuery so that a ready parellel with Method 1 can be drawn. The role is the same; the 2nd instance of Table 1 is for accumulating the counts of the records which have TheDate less than that of any record in MainQuery (1st instance of Table 1) with the same Field1 and Field2 values.
Note that in Method 2, Field 3 is include in the Group-By clause even though I said that it does not affect how the records are grouped for counting. This is still true, since the counting is done using the matching records in InnerQuery, whereas the GROUP By applies to Field 3 in MainQuery.
I found that Method 1 is noticably faster. I'm surprised by this because it uses a correlated subquery. The way I think of a correlated subquery is that it is executed for each record in MainQuery (whether or not that is done in practice after optimization). On the other hand, Method 2 doesn't run an inner query over and over again. However, the inner join still has multiple records in InnerQuery matching each record in MainQuery, so in a sense, it deals with a similar order of complexity.
Is there a decent intuitive explanation for this speed difference, as well as best practice or considerations in choosing an approach for time-base accumulation?
I've posted this to
Microsoft Answers
Stack Exchange
In fact, I think the easiest way is to do this:
SELECT MainQuery.Field1,
MainQuery.Field2,
MainQuery.Field3,
MainQuery.TheDate,
COUNT(*)
FROM Table1 MainQuery
GROUP BY MainQuery.Field1,
MainQuery.Field2,
MainQuery.Field3,
MainQuery.TheDate
ORDER BY MainQuery.Field1,
MainQuery.Field2,
MainQuery.TheDate,
MainQuery.Field3
(The order by isn't required to get the same data, just to order it. In other words, removing it will not change the number or contents of each row returned, just the order in which they are returned.)
You only need to specify the table once. Doing a self-join (joining a table to itself as both your queries do) is not required. The performance of your two queries will depend on a whole load of things which I don't know - what the primary keys are, the number of rows, how much memory is available, and so on.
First, your experience makes a lot of sense. I'm not sure why you need more intuition. I imagine you learned, somewhere along the way, that correlated subqueries are evil. Well, as with some of the things we teach kids as being really bad ("don't cross the street when the walk sign is not green") turn out to be not so bad, the same is true of correlated subqueries.
The easiest intuition is that the uncorrelated subquery has to aggregate all the data in the table. The correlated version only has to aggregate matching fields, although it has to do this over and over.
To put numbers to it, say you have 1,000 rows with 10 rows per group. The output is 100 rows. The first version does 100 aggregations of 10 rows each. The second does one aggregation of 1,000 rows. Well, aggregation generally scales in a super-linear fashion (O(n log n), technically). That means that 100 aggregations of 10 records takes less time than 1 aggregation of 1000 records.
You asked for intuition, so the above is to provide some intuition. There are a zillion caveats that go both ways. For instance, the correlated subquery might be able to make better use of indexes for the aggregation. And, the two queries are not equivalent, because the correct join would be LEFT JOIN.
Actually, I was wrong in my original post. The inner join is way, way faster than the correlated subquery. However, the correlated subquery is able to display its results records as they are generated, so it appears faster.
As a side curiosity, I'm finding that if the correlated sub-query approach is modified to use sum(-1) instead of sum(1), the number of returned records seems to vary from N-3 to N (where N is the correct number, i.e., the number of records in Table1). I'm not sure if this is due to some misbehaviour in Access's rush to display initial records or what-not.
While it seems that the INNER JOIN wins hands-down, there is a major insidious caveat. If the GROUP BY fields do not uniquely distinguish each record in Table1, then you will not get an individual SUM for each record of Table1. Imagine that a particular combination of GROUP BY field values matching (say) THREE records in Table1. You will then get a single SUM for all of them. The problem is, each of these 3 records in MainQuery also matches all 3 of the same records in InnerQuery, so those instances in InnerQuery get counted multiple times. Very insidious (I find).
So it seems that the sub-query may be the way to go, which is awfully disturbing in view of the above problem with repeatability (2nd paragraph above). That is a serious problem that should send shivers down any spine. Another possible solution that I'm looking at is to turn MainQuery into a subquery by SELECTing the fields of interest and DISTINCTifying them before INNER JOINing the result with InnerQuery.
I am using this query:
select o.orderno,
o.day,
a.name,
o.description,
a.adress,
o.quantity,
a.orderType,
o.status,
a.Line,
a.price,
a.serial
from orders o
inner join account a
on o.orderid=a.orderid
order by o.day
I am ordering by day. After sorting the results based on day, what is the next field that is considered on sorting,for the same day, what order is considered?
There is no further sorting. You'll get the results within each day in whatever order Oracle happened to retrieve them, which is not guaranteed in any way, and can be different for the same query being run multiple times. It depends on many things under the hood which you generally have no control over or even visibility of. You may see the results in an apparent order that suits you at the moment, but it could change for a future execution. Changing data will affect the execution plan, for example, which can affect the order to see the results.
If you need a specific order, or just want them returned in a consistent order every time you run the query, you must specify it in the order by clause.
This is something Tom Kyte often stresses; for example in this often-quoted article.
It tries to order by the unique/primary key. In this case, orderno if it is your primary key.
However, your query is laden with errors.
e.g. The table aliases are used in the SELECT clause but are not specified in the FROM
Here I have an SQL statement which is retrieving all of the right stuff, but I need it to be DISTINCT.
So, for WEEK_NUMBER its returning week_number = 1,1,1,1,1,1 etc
I want it to condense into 1. It is a 3 table query and I'm not sure how I could include the SELECT DISTINCT feature or an alternative, any ideas??
SELECT WEEKLY_TIMECARD.*,DAILY_CALCULATIONS.*,EMPLOYEE_PROFILES.EMPLOYEE_NUMBER
FROM WEEKLY_TIMECARD, DAILY_CALCULATIONS, EMPLOYEE_PROFILES
WHERE EMPLOYEE_PROFILES.EMPLOYEE_NUMBER = WEEKLY_TIMECARD.EMPLOYEE_NUMBER
AND EMPLOYEE_PROFILES.EMPLOYEE_NUMBER = DAILY_CALCULATIONS.EMPLOYEE_NUMBER
AND WEEKLY_TIMECARD.WEEK_NUMBER = DAILY_CALCULATIONS.WEEK_NUMBER
Try this:
SELECT DISTINCT WEEKLY_TIMECARD.WEEK_NUMBER
FROM
WEEKLY_TIMECARD,
DAILY_CALCULATIONS,
EMPLOYEE_PROFILES
WHERE EMPLOYEE_PROFILES.EMPLOYEE_NUMBER = WEEKLY_TIMECARD.EMPLOYEE_NUMBER
AND EMPLOYEE_PROFILES.EMPLOYEE_NUMBER = DAILY_CALCULATIONS.EMPLOYEE_NUMBER
AND WEEKLY_TIMECARD.WEEK_NUMBER = DAILY_CALCULATIONS.WEEK_NUMBER
you should add GROUP BY WEEK_NUMBER
Since you are showing all the fields from tables WEEKLY_TIMECARD and DAYLY_CALCULATIONS, if you use SELECT DISTINCT... you may end up with exactly the same situation you are encountering now (many rows with the same value).
Besides the DISTINCT and GROUP BY usage, you need to consider the following:
Do yo really need all the fields? If you do, then maybe you do need the duplicate values. If you don't, just include the fields you need.
Do you need to aggregate data? Or you only need to deduplicate the values? If you need to aggregate data, you must use GROUP BY, and the appropriate aggregating functions. If you don't need to aggregate data, I would advise you not to use GROUP BY, because it can make your query to be executed very slowly (it may depend on which RDBMBS you are using).
Whichever solution you choose, be sure your tables are properly indexed.
Besides that, I would use INNER JOIN to explicitly define the relations between your data (rather than implicitly defining them using WHERE conditions)... but that's my personal preference.