MIN() aggregate function in SQL - sql

What is the logic behind MIN() aggregate function to evaluate data-types like 'CHAR' or 'VARCHAR2' ?

Min( VarcharType ) will return the lowest string value for that column in the result set, given the sorting order of the column in question.

Also MIN() returns the shortest string (in terms of length) but it's alphabetically ordered.
eg- There are two fruits :
1. Apple,
2. Kiwi.
Output:
Apple (because A comes before K although Kiwi is shorter than Apple).

Related

Difference between count(*) and count(true) in sql?

what is the difference between the select count(*) and select count(true)?
so is there any different between the count(*) and count(true) which one should I use?
can you give me situation example for each one that is better option to choose?
The result of both is the same, but count(*) is slightly faster than count(true). That is because in the first case, the aggregate function has no arguments (that's what the * means in SQL), whereas in the second case the argument true is checked for NULLness, since count skips rows where the argument is NULL.
The same result, it will give you total number of rows in a table

Impala mathematical operation containing avg fails with AnalysisException

I am attempting to subtract a value in a column (column_18) from the average of another column (avg(column_19)) and obtain this result as a third column (result) for each row of the table:
cur.execute("Select avg(column_19) - column_18 as result FROM test1")
This doesn't seem to be working well, and I get this error:
impala.error.HiveServer2Error: AnalysisException: select list expression not produced by aggregation output (missing from GROUP BY clause?): SUM(column_19) / COUNT(column_19) - column_18
I do not want the result to be grouped
avg() in this context is an aggregate function, which means that it is applied to a group of rows, which may specified with a GROUP BY clause (or all rows if not specified). The output of an aggregate expression is a single value per-group, so it is not applied per-row as you want.
However, you can accomplish what you're trying to do in a few ways. I think the easiest is by using avg() as an analytic function. For example, you can do something like:
select column_19, column_18, (avg(column_19) over () - column_18) as result from test1
See the documentation for more details about how aggregations and analytic functions work.

What is MAX(DISTINCT x) in SQL?

I just stumbled over jOOQ's maxDistinct SQL aggregation function.
What does MAX(DISTINCT x) do different from just MAX(x) ?
maxDistinct and minDistinct were defined in order to keep consistency with the other aggregate functions where having a distinct option actually makes a difference (e.g., countDistinct, sumDistinct).
Since the maximum (or minimum) calculated between the distinct values of a dataset is mathematically equivalent with the simple maximum (or minimum) of the same set, these function are essentially redundant.
In short, there will be no difference. In case of MySQL, it's even stated in manual page:
Returns the maximum value of expr. MAX() may take a string argument;
in such cases, it returns the maximum string value. See Section 8.5.3,
“How MySQL Uses Indexes”. The DISTINCT keyword can be used to find the
maximum of the distinct values of expr, however, this produces the
same result as omitting DISTINCT.
The reason why it's possible - is because to keep compatibility with other platforms. Internally, there will be no difference - MySQL will just omit influence of DISTINCT. It will not try to do something with set of rows (i.e. produce distinct set first). For indexed columns it will be Select tables optimized away (thus reading one value from index, not a table), for non-indexed - full scan.
If i'm not wrong there are no difference
For Columns
ID
1
2
2
3
3
4
5
5
The OUTPUT for both quires are same 5
MAX(DISTINCT x)
// ID = 1,2,2,3,3,4,5,5
// DISTINCT = 1,2,3,4,5
// MAX = 5
// 1 row
and for
MAX(x)
// ID = 1,2,2,3,3,4,5,5
// MAX = 5
// 1 row
Theoretically, DISTINCT x ensures that every element is different from a certain set. The max operator selects the highest value from a set. In plain SQL there should be no difference between both.

MDX: Do you have to use ORDER when using RANK?

Will RANK get confused if you have not ORDERed the data?
Will the resulting ranking column contain garbage or would it just return #ERROR ?
No, it will return a valid value (the ordinal position of the specified tuple in the set)
Rank (MDX)
http://msdn.microsoft.com/en-us/library/ms144726.aspx
It appears that this was only the case for 2000 (so says Wiley's "MDX Solutions")

In Oracle, find number which is larger than 80% of a set of a numbers

Assume I have a table with a column of integers in Oracle. There are a good amount of rows; somewhere in the millions. I want to write a query that gives me back an integer that is larger than 80% of all of the numbers in table. What is the best way to approach this?
If it matters, this is Oracle 10g r1.
Sounds like you want to use the PERCENTILE_DISC function if you want an actual value from the set, or PERCENTILE_CONT if you want an interpolated value for a particular percentile, say 80%:
SELECT PERCENTILE_DISC(0.8)
WITHIN GROUP(ORDER BY integer_col ASC)
FROM some_table
EDIT
If you use PERCENTILE_DISC, it will return an actual value from the dataset, so if you wanted a larger value, you'd want to increment that by 1 (for an integer column).
I think you could use the NTILE function to divide the input into 5 buckets, then select the MIN(Column) from the top bucket.