Count(*) with order by not working on PostgreSQL which works on Oracle - sql

Below is the Sql query which works on oracle but not working on PostgreSQL.
select count(*) from users where id>1 order by username;
I know that order by has no meaning in this query but still why it's working on oracle. Below is error on PostgreSQL
ERROR: column "users.username" must appear in the GROUP BY clause or be used in an aggregate function
Position: 48
SQLState: 42803
PostgreSQL version 9.6.3

As seen by Oracle's execution plan, there is no sorting after the rows are aggregated, which suggests that the SQL engine Oracle has implemented ignores that phrase.
Why doesn't it work in PostgreSQL -- because the people running Postgres know what they're doing ;) Just kidding, but that question would be highly speculative for me, without seeing the Oracle vs MySQL source. The bigger questions is if Oracle and MySQL allow for this by coincidence, or because Oracle owns both.
Final note:
If you're going to ask why similar software applications behave differently, I think it's also important to include what version you're referring to. Even different versions of the same application may follow different instructions.

If you are looking for the count of all records only, then there is no need of order by clause because it has no meaning even in oracle also. In such case remove order by.
select count(*) from users where id>1
If you are looking for username wise count, then there is a meaning of sorting on username and in such case you can use following query.
select count(*) from users where id>1 group by username order by username;
Hope your doubt will be cleared.

You can use a with statement for doing things like MySQL in PostgreSQL.
with cnt (cnt1) AS ( select count(*) as cnt1 from sample )
select *, c.cnt1 as len from sample ,cnt as c;

Related

When does the aliasing take effect? [duplicate]

I have a doubt and question regarding alias in sql. If i want to use the alias in same query can i use it. For eg:
Consider Table name xyz with column a and b
select (a/b) as temp , temp/5 from xyz
Is this possible in some way ?
You are talking about giving an identifier to an expression in a query and then reusing that identifier in other parts of the query?
That is not possible in Microsoft SQL Server which nearly all of my SQL experience is limited to. But you can however do the following.
SELECT temp, temp / 5
FROM (
SELECT (a/b) AS temp
FROM xyz
) AS T1
Obviously that example isn't particularly useful, but if you were using the expression in several places it may be more useful. It can come in handy when the expressions are long and you want to group on them too because the GROUP BY clause requires you to re-state the expression.
In MSSQL you also have the option of creating computed columns which are specified in the table schema and not in the query.
You can use Oracle with statement too. There are similar statements available in other DBs too. Here is the one we use for Oracle.
with t
as (select a/b as temp
from xyz)
select temp, temp/5
from t
/
This has a performance advantage, particularly if you have a complex queries involving several nested queries, because the WITH statement is evaluated only once and used in subsequent statements.
Not possible in the same SELECT clause, assuming your SQL product is compliant with entry level Standard SQL-92.
Expressions (and their correlation names) in the SELECT clause come into existence 'all at once'; there is no left-to-right evaluation that you seem to hope for.
As per #Josh Einstein's answer here, you can use a derived table as a workaround (hopefully using a more meaningful name than 'temp' and providing one for the temp/5 expression -- have in mind the person who will inherit your code).
Note that code you posted would work on the MS Access Database Engine (and would assign a meaningless correlation name such as Expr1 to your second expression) but then again it is not a real SQL product.
Its possible I guess:
SELECT (A/B) as temp, (temp/5)
FROM xyz,
(SELECT numerator_field as A, Denominator_field as B FROM xyz),
(SELECT (numerator_field/denominator_field) as temp FROM xyz);
This is now available in Amazon Redshift
E.g.
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Ref:
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/
You might find W3Schools "SQL Alias" to be of good help.
Here is an example from their tutorial:
SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'
Regarding using the Alias further in the query, depending on the database you are using it might be possible.

In what order SQL statement execute if Select statement has CONCAT function on some columns?

I have been trying to understand the code that is the part of the BigQuery course on Coursera. The query looks like this
SELECT
CONCAT(fullVisitorId, CAST(visitID AS STRING)) AS unique_session_id,
sessionQualityDim,
SUM(productRevenue) AS transactions_revenue
FROM
transaction_table
WHERE sessionQualityDim > 60
GROUP BY unique_session_id, sessionQualityDim
My question is how would the order of the SQL statement would execute. Mainly when the GROUP BY is done on unique_session_id (which is the CONCAT between two columns), how would 'GROUP BY' knows about the calculated CONCAT results(unique_session_id). From my knowledge the SELECT statement would run in the last. But in this case it seems like first field is calculated using CONCAT and then group by is using that for grouping. Can someone give more insights on this ?
According to the SQL standard, the GROUP BY is parsed before the SELECT.
However, this is not a hard-and-fast rule among databases. What BigQuery is doing is determining the column aliases from the SELECT. It then allows these aliases in the GROUP BY. Other databases do this as well -- Postgres and all its derived databases for example.
Do not confuse the parsing of the query with the execution, though. The execution is through some very complicated parallel directed acyclic graph. What is happening here is simply that the BigQuery parser is (conveniently) allowing users to use table aliases in GROUP BY.

Duplicate columns in Oracle query using row limiting clause

Since Oracle 12c, we can finally use the SQL standard row limiting clause like this:
SELECT * FROM t FETCH FIRST 10 ROWS ONLY
Now, in Oracle 12.1, there was a limitation that is quite annoying when joining tables. It's not possible to have two columns of the same name in the SELECT clause, when using the row limiting clause. E.g. this raises ORA-00918 in Oracle 12.1
SELECT t.id, u.id FROM t, u FETCH FIRST 10 ROWS ONLY
This is a restriction is documented in the manual for all versions 12.1, 12.2, 18.0:
The workaround is obviously to alias the columns
SELECT t.id AS t_id, u.id AS u_id FROM t, u FETCH FIRST 10 ROWS ONLY
Or to resort to "classic" pagination using ROWNUM or window functions.
Curiously, though, the original query with ambiguous ID columns runs just fine from Oracle 12.2 onwards. Is this a documentation bug, or an undocumented feature?
Seems in this case when you are using the row limiting clause, Oracle internally calling the ROW_NUMBER() function where it using the column name in OVER clause Like ROW_NUMBER OVER(ORDER BY ID). because of this you are getting the ORA-00918 error.
I noticed that you have an implicit join. It would be interesting to see if the problem goes away when joining explicitly. I'm wondering if behind the scenes Oracle is doing a join based on id=id and not using the table aliases you assigned them.
That would also explain the column aliases fixing the issue. Try explicitly joining; that could force oracle to use the table aliases and resolve the ambiguity it thinks it sees.

How is query working at Teradata?

I tried to run below query at teradata and It resulted as expected :
select column1 as c1Alias from my_table where column2 in ( c1Alias , 10 , 20 , 30) ;
But I tried to run same query on HIVE , It throws exception as given below :
FAILED: SemanticException [Error 10004]: Line 1:44 Invalid table alias or column reference 'c1Alias': (possible column names are: .......)
I am not surprised why it is failing at HIVE , but surprised how it is working on Teradata.
As per my understanding, Clauses are executed in order as WHERE >> SELECT. Apparently alias generated at SELECT clause would not be available for use in WHERE clause. Correct me if I am wrong here.
I really wanted to know how it is working in teradata ?
You're correct, logically any SELECT is processed in following order:
FROM
WHERE
GROUP BY
HAVING
OLAP functions
QUALIFY
create SELECT column list
SAMPLE
ORDER BY
Besides the proprietary QUALIFY/SAMPLE every DBMS will do it exactly the same.
When you add a filter to the WHERE-condition the column list is not yet created, thus using an alias should fail (and will fail in almost every other DBMS, afaik only Access allows using it similar to Teradata).
It's not failing because Teradata is older than Standard SQL and this seems to be an relict of the query language Teradata implemented first.
But it's a nice extension (just never alias to an existing column name to avoid confusing the optimizer and/or end user) and you get used to it very fast, it avoids lots cut&paste or Derived Tables.
The order of execution of SQL is explained very well over here:
https://www.eversql.com/sql-order-of-operations-sql-query-order-of-execution/
An excerpt from the post for your quick reference: (Credits to the author for covering all 10 parts of SQL)
FROM, including JOINs
WHERE
GROUP BY
HAVING
WINDOW functions
SELECT
DISTINCT
UNION
ORDER BY
10.LIMIT and OFFSET

Using alias in query and using it

I have a doubt and question regarding alias in sql. If i want to use the alias in same query can i use it. For eg:
Consider Table name xyz with column a and b
select (a/b) as temp , temp/5 from xyz
Is this possible in some way ?
You are talking about giving an identifier to an expression in a query and then reusing that identifier in other parts of the query?
That is not possible in Microsoft SQL Server which nearly all of my SQL experience is limited to. But you can however do the following.
SELECT temp, temp / 5
FROM (
SELECT (a/b) AS temp
FROM xyz
) AS T1
Obviously that example isn't particularly useful, but if you were using the expression in several places it may be more useful. It can come in handy when the expressions are long and you want to group on them too because the GROUP BY clause requires you to re-state the expression.
In MSSQL you also have the option of creating computed columns which are specified in the table schema and not in the query.
You can use Oracle with statement too. There are similar statements available in other DBs too. Here is the one we use for Oracle.
with t
as (select a/b as temp
from xyz)
select temp, temp/5
from t
/
This has a performance advantage, particularly if you have a complex queries involving several nested queries, because the WITH statement is evaluated only once and used in subsequent statements.
Not possible in the same SELECT clause, assuming your SQL product is compliant with entry level Standard SQL-92.
Expressions (and their correlation names) in the SELECT clause come into existence 'all at once'; there is no left-to-right evaluation that you seem to hope for.
As per #Josh Einstein's answer here, you can use a derived table as a workaround (hopefully using a more meaningful name than 'temp' and providing one for the temp/5 expression -- have in mind the person who will inherit your code).
Note that code you posted would work on the MS Access Database Engine (and would assign a meaningless correlation name such as Expr1 to your second expression) but then again it is not a real SQL product.
Its possible I guess:
SELECT (A/B) as temp, (temp/5)
FROM xyz,
(SELECT numerator_field as A, Denominator_field as B FROM xyz),
(SELECT (numerator_field/denominator_field) as temp FROM xyz);
This is now available in Amazon Redshift
E.g.
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Ref:
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/
You might find W3Schools "SQL Alias" to be of good help.
Here is an example from their tutorial:
SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'
Regarding using the Alias further in the query, depending on the database you are using it might be possible.