Using Distinct in Aggregate Select query - sql

I am using oracle DB. I have a Aggregated script. We found that some of the rows in the table are repeated, unwanted and hence, is not supposed to be added in the sum.
now suppose i use Distinct command just after the select statement, will distinct command applied before aggregation or after it.

If you use SELECT DISTINCT, then the result set will have no duplicate rows.
If you use SELECT COUNT(DISTINCT), then the count will only count distinct values.
If you are thinking of using SUM(DISTINCT) (or DISTINCT with any other aggregation function) be warned. I have never used it (except perhaps as a demonstration), and I have written a fair number of queries.
You really need to solve the problem at the source. For instance, if accounts are being repeated, then SUM(DISTINCT) does not distinguish between accounts, only by the values assigned to the account. You need to get the logic right.

when you say that you have repeated rows - you must have a clear idea of uniqueness for the combination of some specific columns.
If you expect that certain column combinations are unique within specified groups yo can detect the groups deviating from that using queries following the pattern below.
select <your group by columns>
from <your table name>
group by <your group by predicate>
having (max(A)!=min(A) or max(B)!=min(B) or max(C)!=min(C))
Then you have to decide what to do with the problem. I would suggest cleaning up and adding unique constraints to the table.
The aggregate query you mention would run successfully for the rows in your table not having duplicate values for the combination of columns that needs to be unique. Using my example you could get the aggregates for that part of your data using the inverted having predicate.
It would be something like this
select <your aggregate functions, counts, sums, averages and so on>
from <your table name>
group by <your group by predicate>
having (max(A)=min(A) and max(B)=min(B) and max(C)=min(C))
If you must include the groups breaking uniqueness expectations you must somehow do a qualified selection of which of the variants in the group to use - you could for example go for the last one or the first one if one of your columns should happen to express something about when the row was created.

Related

exclude a column from group by statement

I would like to exclude a column from group by statement, because it results in some redundant records. Are there any recommendations?
I use Oracle, and have a complex query which join 6 tables together, and want to use sql aggregate function (count), without duplicate result.
You can't.
When using aggregate functions every column/column expression which is not an aggregate must be in the GROUP BY.
This is completely logical. If you're not aggregating the column then excluding it from the GROUP BY would force Oracle to chose a random value, which is not very useful.
If you don't want this column in your GROUP BY then you must decide what aggregation to apply to this column in order to return the appropriate data for your situation. You can't hand this responsibility off to the database engine.

Postgres - Group by without having to aggregate?

This is hard to explain, but say I have this query:
SELECT *
FROM "late_fee_tiers"
And it returns this:
I have a validation in code set up to prevent duplicate days from being saved (notice there are 2 rows of days = 2).
I want my query to double-check there are only unique rows of day, and if there are multiple, select the first one (so it should return 3 rows with 2,3,5).
My first thought is to use GROUP BY day, while selecting a MIN("id").
The problem is, I don't understand SQL enough, because it forces me to add different aggregator functions to every single column... but what if I don't want to do that? I want THAT row to be "chosen" according to the single aggregator function I define, I don't need multiple aggregators creating some weird hybrid row. I just want the MIN() function to choose that 1 row and fill in all the rest of the values for that row.
What function do I use to do this, or how would I do it?
Thanks
You want to use DISTINCT ON:
select distinct on (day) *
from "late_fee_tiers"
order by day, id;
Why day is also required in the order by:
From the official documentation:
The DISTINCT ON expression(s) must match the leftmost ORDER BY
expression(s). The ORDER BY clause will normally contain additional
expression(s) that determine the desired precedence of rows within
each DISTINCT ON group.

SQL column reference is invalid

I am using Jaspersoft's iReport to create a report that will pull data from my Maintenance Assistant CMMS database. The DB is on the localhost, and I am not creating any tables or columns. MA CMMS takes care of that. I only want to pull the data to arrange in a report.
Here is my code:
SELECT *
FROM "tblworkordertask"
WHERE "dbltimespenthours" > 0
AND "dtmdatecompleted" BETWEEN $P{DATE_FROM} AND $P{DATE_TO}
GROUP BY "intworkorderid"
and my error:
Caused by: java.sql.SQLSyntaxErrorException: Column reference 'tblWorkOrderTask.id' is invalid, or is part of an invalid expression.  For a SELECT list with a GROUP BY, the columns and expressions being selected may only contain valid grouping expressions and valid aggregate expressions.
I don't know why the error is referring to 'tblWorkOrderTask.id' because I don't have such a column, nor did I ask for that column.
If I take out the group by clause, it works fine, but as you could expect, I get multiple results with the same WorkOrderID. I want to group it by this column, and then count the results. I tried using SELECT DISTINCT, but then I get errors about columns that aren't selected.
You're selecting all columns in the tblWorkOrderTask table. The "id" column is the first column in that table. You are getting an error because you do not have all columns specified in the select list.
This select would work, but I'm not sure what information you need out of your table.
SELECT id, intworkorderid
FROM tblWorkOrderTask
group by id, intworkorderid
http://www.w3schools.com/sql/sql_groupby.asp
Get rid of the GROUP BY clause -- if you're just trying to order the result, then use ORDER BY instead; but otherwise, you don't need either.
EDIT
As the error says, everythign in your SELECT list must be one of two things -- either 1) also listed in your GROUP BY list, or 2) an aggregated value. Here is a sample that will work:
SELECT intworkorderid, COUNT(*)
FROM "tblworkordertask"
WHERE "dbltimespenthours" > 0
AND "dtmdatecompleted" BETWEEN $P{DATE_FROM} AND $P{DATE_TO}
GROUP BY "intworkorderid"
Yes - in order to use group by, you need to be specific in the select line.
So first, decide which fields you want to display. If you want them all, then include them all.
As soon as you add a COUNT() function to get a count of the selected fields, you will need to add the GROUP BY clause. COUNT() is an AGGREGATE function, like SUM() and AVG().
It's a little counter-intuitive and a bit of a pain to specify so many fields in the GROUP BY clause, but it's necessary.
The FIRST GROUP BY field is the most important, since this is usually what you are concerned about.
This first field can be any of the SELECTed fields, it is not necessarily the first.
Include EVERY field in your GROUP BY that is not an AGGREGATE function like COUNT().
Also, if you are trying to COUNT a group of orders, you probably don't want or need all of the fields in the SELECT.
You probably want to specify just the fields that are unique to the work order ID.
Example: If you want to get a COUNT of these fields, you would specify all of the SELECTED fields EXCEPT the COUNT().
SELECT
intWorkOrderID,
COUNT(id),
strDescription
FROM tblworkordertask
WHERE dbltimespenthours > 0
AND dtmdatecompleted BETWEEN $P{DATE_FROM} AND $P{DATE_TO}
GROUP BY
intworkorderid,
strDescription

How to get other columns in this query

I am using a group by clause in my query. I want to get other columns not specified in the group by parameters
SELECT un.user, un.role
FROM [Unique] un
group by user, role
In the query about [Unique] has 7 columns altogether. How do I get the other columns?
In most databases (MySQL and SQLite are the exceptions I know of), you cannot include a column in a GROUP BY SELECT unless:
The column is included in the GROUP BY clause.
The column is aggregated in one of the supported aggregate functions.
In MySQL and SQLite, the rows inside the aggregate groups from which the extra values get taken are undefined.
If you want extra columns in any other engine, you can wrap the column names in MAX():
SELECT un.user, un.role, MAX(un.city), MAX(un.bday)
FROM [Unique] un
GROUP BY user, role
In this case, the values for the extra columns are likely to come from different rows in the input record set. If this is important (sometimes it isn't since the extra columns come from the one side of a one-to-many JOIN), you can't use this technique.
Just to be clear: If you use GROUP BY in a SELECT, then each row you get back is constructed out of groups of multiple rows in the table you're SELECTing against. If you include columns that are not part of the GROUP BY clause, you're not giving the engine any instructions on which row from the table you want that value read from. Most engines, therefore, do not allow you to run this kind of SQL. MySQL does, with undefined results but I personally consider it bad practice to do this.
You have to choose on what basis you want the other columns. If multiple entries exist for the same user / role, do you want the first / last / random? You have to make choices on the other columns, by aggregating them or choosing to include them in the group by statement.
Some RDBMS do provide a default behaviour for performing this, but since the question is just marked SQL, we do not know if it applies.
Have you tried just specifying them?
SELECT un.user, un.role, un.col3, un.col4
FROM [Unique] un
group by user, role
You need to use a Order By to get extra column. or you end up specifying every column in your group by.
Use LEFT JOIN to self-join the Unique or use the SELECT with GROUP BY as sub-query.

Any reason for GROUP BY clause without aggregation function?

I'm (thoroughly) learning SQL at the moment and came across the GROUP BYclause.
GROUP BY aggregates or groups the resultset according to the argument(s) you give it. If you use this clause in a query you can then perform aggregate functions on the resultset to find statistical information on the resultset like finding averages (AVG()) or frequency (COUNT()).
My question is: is the GROUP BY statement in any way useful without an accompanying aggregate function?
Update
Using GROUP BY as a synonym for DISTINCT is (probably) a bad idea because I suspect it is slower.
is the GROUP BY statement in any way useful without an accompanying aggregate function?
Using DISTINCT would be a synonym in such a situation, but the reason you'd want/have to define a GROUP BY clause would be in order to be able to define HAVING clause details.
If you need to define a HAVING clause, you have to define a GROUP BY - you can't do it in conjunction with DISTINCT.
You can perform a DISTINCT select by using a GROUP BY without any AGGREGATES.
Group by can used in Two way Majorly
1)in conjunction with SQL aggregation functions
2)to eliminate duplicate rows from a result set
SO answer to your question lies in second part of USEs above described.
Note: everything below only applies to MySQL
GROUP BY is guaranteed to return results in order, DISTINCT is not.
GROUP BY along with ORDER BY NULL is of same efficiency as DISTINCT (and implemented in the say way). If there is an index on the field being aggregated (or distinctified), both clauses use loose index scan over this field.
In GROUP BY, you can return non-grouped and non-aggregated expressions. MySQL will pick any random values from from the corresponding group to calculate the expression.
With GROUP BY, you can omit the GROUP BY expressions from the SELECT clause. With DISTINCT, you can't. Every row returned by a DISTINCT is guaranteed to be unique.
It is used for more then just aggregating functions.
For example, consider the following code:
SELECT product_name, MAX('last_purchased') FROM products GROUP BY product_name
This will return only 1 result per product, but with the latest updated value of that records.