I tried to run below query at teradata and It resulted as expected :
select column1 as c1Alias from my_table where column2 in ( c1Alias , 10 , 20 , 30) ;
But I tried to run same query on HIVE , It throws exception as given below :
FAILED: SemanticException [Error 10004]: Line 1:44 Invalid table alias or column reference 'c1Alias': (possible column names are: .......)
I am not surprised why it is failing at HIVE , but surprised how it is working on Teradata.
As per my understanding, Clauses are executed in order as WHERE >> SELECT. Apparently alias generated at SELECT clause would not be available for use in WHERE clause. Correct me if I am wrong here.
I really wanted to know how it is working in teradata ?
You're correct, logically any SELECT is processed in following order:
FROM
WHERE
GROUP BY
HAVING
OLAP functions
QUALIFY
create SELECT column list
SAMPLE
ORDER BY
Besides the proprietary QUALIFY/SAMPLE every DBMS will do it exactly the same.
When you add a filter to the WHERE-condition the column list is not yet created, thus using an alias should fail (and will fail in almost every other DBMS, afaik only Access allows using it similar to Teradata).
It's not failing because Teradata is older than Standard SQL and this seems to be an relict of the query language Teradata implemented first.
But it's a nice extension (just never alias to an existing column name to avoid confusing the optimizer and/or end user) and you get used to it very fast, it avoids lots cut&paste or Derived Tables.
The order of execution of SQL is explained very well over here:
https://www.eversql.com/sql-order-of-operations-sql-query-order-of-execution/
An excerpt from the post for your quick reference: (Credits to the author for covering all 10 parts of SQL)
FROM, including JOINs
WHERE
GROUP BY
HAVING
WINDOW functions
SELECT
DISTINCT
UNION
ORDER BY
10.LIMIT and OFFSET
Related
I have a doubt and question regarding alias in sql. If i want to use the alias in same query can i use it. For eg:
Consider Table name xyz with column a and b
select (a/b) as temp , temp/5 from xyz
Is this possible in some way ?
You are talking about giving an identifier to an expression in a query and then reusing that identifier in other parts of the query?
That is not possible in Microsoft SQL Server which nearly all of my SQL experience is limited to. But you can however do the following.
SELECT temp, temp / 5
FROM (
SELECT (a/b) AS temp
FROM xyz
) AS T1
Obviously that example isn't particularly useful, but if you were using the expression in several places it may be more useful. It can come in handy when the expressions are long and you want to group on them too because the GROUP BY clause requires you to re-state the expression.
In MSSQL you also have the option of creating computed columns which are specified in the table schema and not in the query.
You can use Oracle with statement too. There are similar statements available in other DBs too. Here is the one we use for Oracle.
with t
as (select a/b as temp
from xyz)
select temp, temp/5
from t
/
This has a performance advantage, particularly if you have a complex queries involving several nested queries, because the WITH statement is evaluated only once and used in subsequent statements.
Not possible in the same SELECT clause, assuming your SQL product is compliant with entry level Standard SQL-92.
Expressions (and their correlation names) in the SELECT clause come into existence 'all at once'; there is no left-to-right evaluation that you seem to hope for.
As per #Josh Einstein's answer here, you can use a derived table as a workaround (hopefully using a more meaningful name than 'temp' and providing one for the temp/5 expression -- have in mind the person who will inherit your code).
Note that code you posted would work on the MS Access Database Engine (and would assign a meaningless correlation name such as Expr1 to your second expression) but then again it is not a real SQL product.
Its possible I guess:
SELECT (A/B) as temp, (temp/5)
FROM xyz,
(SELECT numerator_field as A, Denominator_field as B FROM xyz),
(SELECT (numerator_field/denominator_field) as temp FROM xyz);
This is now available in Amazon Redshift
E.g.
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Ref:
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/
You might find W3Schools "SQL Alias" to be of good help.
Here is an example from their tutorial:
SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'
Regarding using the Alias further in the query, depending on the database you are using it might be possible.
Since Oracle 12c, we can finally use the SQL standard row limiting clause like this:
SELECT * FROM t FETCH FIRST 10 ROWS ONLY
Now, in Oracle 12.1, there was a limitation that is quite annoying when joining tables. It's not possible to have two columns of the same name in the SELECT clause, when using the row limiting clause. E.g. this raises ORA-00918 in Oracle 12.1
SELECT t.id, u.id FROM t, u FETCH FIRST 10 ROWS ONLY
This is a restriction is documented in the manual for all versions 12.1, 12.2, 18.0:
The workaround is obviously to alias the columns
SELECT t.id AS t_id, u.id AS u_id FROM t, u FETCH FIRST 10 ROWS ONLY
Or to resort to "classic" pagination using ROWNUM or window functions.
Curiously, though, the original query with ambiguous ID columns runs just fine from Oracle 12.2 onwards. Is this a documentation bug, or an undocumented feature?
Seems in this case when you are using the row limiting clause, Oracle internally calling the ROW_NUMBER() function where it using the column name in OVER clause Like ROW_NUMBER OVER(ORDER BY ID). because of this you are getting the ORA-00918 error.
I noticed that you have an implicit join. It would be interesting to see if the problem goes away when joining explicitly. I'm wondering if behind the scenes Oracle is doing a join based on id=id and not using the table aliases you assigned them.
That would also explain the column aliases fixing the issue. Try explicitly joining; that could force oracle to use the table aliases and resolve the ambiguity it thinks it sees.
I have a SQL statement that pulls data I need but I can't get the syntax right in Crystal Reports.
This statement works in SQL:
SELECT
max([meter_reading])
FROM [Forefront].[dbo].[EC_METER_HISTORY_MC]
WHERE [Meter_Number] = '1' AND [Transaction_Date] < '20130101'
GROUP BY
[Company_Code], [Equipment_Code], [Meter_Number]
This is what I changed it to in crystal but I can't get the right syntax.
SELECT
Maximum({EC_METER_HISTORY_MC.meter_reading})
FROM [EC_METER_HISTORY_MC]
WHERE {EC_METER_HISTORY_MC.Meter_Number} = '1'
AND {EC_METER_HISTORY_MC.Transaction_Date} < {1?Startdate}
GROUP BY {EC_METER_HISTORY_MC.Company_Code}
,{EC_METER_HISTORY_MC.Equipment_Code}
,{EC_METER_HISTORY_MC.Meter_Number}
Your first step should be reading up on how SQL Expressions work in Crystal. Here is a good link to get you started.
A few of your problems include:
Using a parameter field. SQL Expressions are not compatible with CR
parameters and cannot be used in them.
SQL Expressions can only return scalar values per row of your report. That means that your
use of GROUP BY doesn't serve any purpose.
Your use of curly braces means that you're referencing those fields in the main report query instead of in the subquery you're trying to create with this expression.
Here's a simplified example that would find the max meter reading of a particular meter (for Oracle since that's what I know and you didn't specify which DB you're using):
case when {EC_METER_HISTORY_MC.Meter_Number} is null then null
else (select max(Meter_Reading)
from EC_METER_HISTORY_MC
where Meter_Number={EC_METER_HISTORY_MC.Meter_Number} --filter by the meter number from main query
and Transaction_Date < Current_Date) --filter by some date. CAN'T use parameter here.
end
You can't use parameter fields in a SQL Expression, sadly. Perhaps you can correlate the Transaction_Date to a table in the main query. Otherwise, I would suggest using a Command.
You have two options for the Command:
Use a single Command object as the data source for the whole report--which involves (potentially) a fair amount of rework.
Add a Command to the existing table set (in the Database 'Expert'). Link it to other tables as desired. This will perform a second SELECT and join the results in memory (WhileReadingRecords, if I'm not mistaken). The slight performance hit may we worth the benefit.
Assuming I have the following aggregate functions:
AGG1
AGG2
AGG3
AGG4
Is it possible to write valid SQL (in a db agnostic way) like this:
SELECT [COL1, COL2 ....], AGG1(param1), AGG2(param2) FROM [SOME TABLES]
WHERE [SOME CRITERIA]
HAVING AGG3(param2) >-1 and AGG4(param4) < 123
GROUP BY COL1, COL2, ... COLN
ORDER BY COL1, COLN ASC
LIMIT 10
Where COL1 ... COLN are columns in the tables being queried, and param1 ... paramX are parameters passed to the AGG funcs.
Note: AGG1 and AGG2 are returned in the results as columns (but do not appear in the HAVING CLAUSE, and AGG3 and AGG4 appear in the HAVING CLAUSE but are not returned in the result set.
Ideally, I want a DB agnostic answer to the solution, but if I have to be tied to a db, I am using PostgreSQL (v9.x).
Edit
Just a matter of clarification: I am not opposed to using GROUP BY in the query. My SQL is not very good, so the example SQL above may have been slightly misleading. I have edited the pseudo sql statement above to hopefully make my intent more clear.
The main thing I wanted to find out was whether a select query that used AGG functions could:
Have agg functions values in the returned column without them being specified in a HAVING clause.
Have agg functions specified in a HAVING clause, but are not returned in the result set.
From the answers I have received so far, it would seem the answer to both questions is YES. The only think I have to do to correct my SQL is to add a GROUP BY clause to make sure that the returned rows are unique.
PostgreSQL major version include the first digit after the dot, thus "PostgreSQL (v9.x)" is not specific enough. As #kekekela said, there is no (cheap) completely db agnostic way. Even between PostgreSQL 9.0 and 9.1 there is an important syntactical difference.
If you had only the grouped values AGG1(param1), AGG2(param2) you would get away without providing an explicit GROUP BY clause. Since you mix grouped and non-grouped columns you have to provide a GROUP BY clause with all non-grouped columns that appear in the SELECT. That's true for any version of PostgreSQL. Read about GROUP BY and HAVING it in the manual.
Starting with version 9.1, however, once you list a primary key in the GROUP BY you can skip additional columns for this table and still use them in the SELECT list. The release notes for version 9.1 tell us:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
Concerning parameters
Do you intend to feed a constant value to an aggregate function? What's the point? The docs tell us
An aggregate function computes a single result from multiple input rows.
Or do you want those parameters to be column names? That kind of dynamic SQL works as long as the statement is generated before committing to the database. Does not work for prepared statements or simple sql or plpgsql functions. You have to use EXECUTE in a plpgsql function for that purpose.
As safeguard against SQLi use the USING $1, $2 syntax for values and quote_ident() for your column or table names.
The only way to aggregate over columns without using GROUP BY is to use windowing functions. You left out details of your problem, but the following might be what you are looking for:
SELECT *
FROM (
SELECT [COL1, COL2 ....],
AGG1(param1) over (partition by some_grouping_column) as agg1,
AGG2(param2) over (partition by some_grouping_column) as agg2,
row_number() over () as rn
FROM [SOME TABLES]
WHERE [SOME CRITERIA]
ORDER BY COL1
) t
WHERE AGG3 >-1
AND AGG4 < 123
AND rn <= 10
ORDER BY col1
This is standard ANSI SQL and works on most database including PostgreSQL (since 8.4).
Note that you do not need to use the same grouping column for both aggregates in the partition by clause.
If you want to stick with ANSI SQL then you should use the row_number() function to limit the result. If you run this only on PostgreSQL (or other DBMS that support LIMIT in some way) move the LIMIT cause into the derived table (the inner query)
That should work from a high level perspective, except you'd need COL1, COL2 etc in a GROUP BY statement or else they won't be valid in the SELECT list. Having AGG1, etc in the SELECT list and not in the HAVING is not a problem.
As far as db agnostic, you're going to have to tweak syntax no matter what you do (the LIMIT for example is going to be different in PostgreSQL, SQL SERVER and Oracle that I know off the top of my head), but you could build logic to construct the statements properly for each provided your high-level representation is solid.
I have a doubt and question regarding alias in sql. If i want to use the alias in same query can i use it. For eg:
Consider Table name xyz with column a and b
select (a/b) as temp , temp/5 from xyz
Is this possible in some way ?
You are talking about giving an identifier to an expression in a query and then reusing that identifier in other parts of the query?
That is not possible in Microsoft SQL Server which nearly all of my SQL experience is limited to. But you can however do the following.
SELECT temp, temp / 5
FROM (
SELECT (a/b) AS temp
FROM xyz
) AS T1
Obviously that example isn't particularly useful, but if you were using the expression in several places it may be more useful. It can come in handy when the expressions are long and you want to group on them too because the GROUP BY clause requires you to re-state the expression.
In MSSQL you also have the option of creating computed columns which are specified in the table schema and not in the query.
You can use Oracle with statement too. There are similar statements available in other DBs too. Here is the one we use for Oracle.
with t
as (select a/b as temp
from xyz)
select temp, temp/5
from t
/
This has a performance advantage, particularly if you have a complex queries involving several nested queries, because the WITH statement is evaluated only once and used in subsequent statements.
Not possible in the same SELECT clause, assuming your SQL product is compliant with entry level Standard SQL-92.
Expressions (and their correlation names) in the SELECT clause come into existence 'all at once'; there is no left-to-right evaluation that you seem to hope for.
As per #Josh Einstein's answer here, you can use a derived table as a workaround (hopefully using a more meaningful name than 'temp' and providing one for the temp/5 expression -- have in mind the person who will inherit your code).
Note that code you posted would work on the MS Access Database Engine (and would assign a meaningless correlation name such as Expr1 to your second expression) but then again it is not a real SQL product.
Its possible I guess:
SELECT (A/B) as temp, (temp/5)
FROM xyz,
(SELECT numerator_field as A, Denominator_field as B FROM xyz),
(SELECT (numerator_field/denominator_field) as temp FROM xyz);
This is now available in Amazon Redshift
E.g.
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Ref:
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/
You might find W3Schools "SQL Alias" to be of good help.
Here is an example from their tutorial:
SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'
Regarding using the Alias further in the query, depending on the database you are using it might be possible.