SQL error code in Athena Your query has the following error(s): SYNTAX_ERROR: line 5:8: Column 'amount' cannot be resolved - sql

In AWS Athena I have the SQL query as follows:
select licence, count(distinct (id)) as amount
from "database_name"
where YEAR(column_year) = 2021
group by licence
having amount > 10
order by amount desc
*Then I get the error:
SYNTAX_ERROR: line 5:8: Column 'amount' cannot be resolved. This query ran against the "database_name", unless qualified by the query.*
What am I doing wrong?

2 Things.
You cannot use aliases in Having clause, So you have to use exact column calculation.
Distinct is not a function, So you can use it without parenthesis.
select licence, count(distinct id) as amount
from "database_name"
where YEAR(column_year) = 2021
group by licence
having count(distinct id) > 10
order by amount desc

Related

AWS Redshift column "view_table_B.cost" must appear in the GROUP BY clause or be used in an aggregate function

I have 2 queries in AWS Redshift, the queries target different table with similar schema. But my issue is that one of the query is working meanwhile the other is failed.
First Query
SELECT view_table_A.accountId, view_table_A.date, SUM(view_table_A.cost) as Cost
FROM view_table_A
GROUP BY accountId, date
HAVING Cost >= '20'
Second Query
SELECT view_table_B.projectname, view_table_B.usagedate, sum(view_table_B.cost) as Cost
FROM view_table_B
GROUP BY projectname, usagedate
HAVING Cost >= '20'
My problem is that the first query is working well meanwhile second query will return error as below
Amazon Invalid operation: column "view_table_B .cost" must appear in the GROUP BY clause or be used in an aggregate function;
Update-1
I try to remove ' from the query but still get same result. And I attach the screenshot of query I tried to execute in redshift.
Redshift identifiers are case insensitive, therefore cost and Cost collide in your query.
I was able to reproduce the problem with:
with src(cost, dat) as (
select 1, current_date
union all
select 2, current_date
)
SELECT
dat,
sum(s.cost) as Cost
FROM src s
GROUP BY dat
HAVING Cost = 3
;
it's giving me
[2020-06-04 11:22:44] [42803][500310] Amazon Invalid operation: column "s.cost" must appear in the GROUP BY clause or be used in an aggregate function;
If you renamed the column to something distinct, that would fix the query:
with src(cost, dat) as (
select 1, current_date
union all
select 2, current_date
)
SELECT
dat,
sum(s.cost) as sum_cost
FROM src s
GROUP BY dat
HAVING sum_cost = 3
;
I was also surprised to see that quoting identifiers with " does not solve the problem - as I initially expected.

Distinct count and group by in HIVE

I am very new to HIVE and have an issue with distinct count and GROUP BY.
I want to calculate maximum temperature from temperature_data table corresponding to those years which have at least 2 entries in the table-
I tried with below query but it is not working
select
SUBSTRING(full_date,7,4) as year,
MAX(temperature) as temperature
from temperature_data
where count(distinct(SUBSTRING(full_date,7,4))) >= 2
GROUP BY SUBSTRING(full_date,7,4);
I am getting an error-
FAILED: SemanticException [Error 10128]: Line 2:0 Not yet supported place for UDAF 'count'
Below is input-
year,zip,temperature
10-01-1990,123112,10
14-02-1991,283901,11
10-03-1990,381920,15
10-01-1991,302918,22
12-02-1990,384902,9
10-01-1991,123112,11
14-02-1990,283901,12
10-03-1991,381920,16
10-01-1990,302918,23
12-02-1991,384902,10
10-01-1993,123112,11
You should use HAVING keyword instead to set a condition on variable you're using for grouping.
Also, you can benefit of using subqueries. See below.
SELECT
year,
MAX(t1.temperature) as temperature
FROM
(select SUBSTRING(full_date,7,4) year, temperature from temperature_data) t1
GROUP BY
year
HAVING
count(t1.year) > 2;
#R.Gold, We can try to simplify the above query without using sub-query as below:
SELECT substring(full_date,7) as year, max(temperature)
FROM your-hive-table
GROUP BY substring(full_date,7)
HAVING COUNT(substring(full_date,7)) >= 2
And, fyi - we can't use aggregate functions with WHERE clause.

ORA-00923 error: FROM keyword not found where expected

When calculating retention on Oracle DB, I wrote this code:
select
sessions.sessionDate ,
count(distinct sessions.visitorIdd) as active_users,
count(distinct futureactivity.visitorIdd) as retained_users,
count(distinct futureactivity.visitorIdd) / count(distinct sessions.visitorIdd)::float as retention
FROM sessions
left join sessions futureactivity on
sessions.visitorIdd=futureactivity.visitorIdd
and sessions.sessionDate = futureactivity.sessionDate - interval '3' day
group by 3;
but I always get the error: "ORA-00923: mot-clé FROM absent à l'emplacement prévu" (ORA-00923 FROM keyword not found where expected)
Can you help me guys?
Oracle does not recognize :: syntax of Postgres, so it complains of the missing FROM keyword not being found where expected.
Use a cast instead:
count(distinct futureactivity.visitorIdd) / cast(count(distinct sessions.visitorIdd) as float) as retention
Here is a more "Oracle" way of writing the query:
select s.sessionDate ,
count(distinct s.visitorIdd) as active_users,
count(distinct fs.visitorIdd) as retained_users,
count(distinct fs.visitorIdd) / count(distinct s.visitorIdd) as retention
from sessions s left join
sessions fs
on s.visitorIdd = fs.visitorIdd and
s.sessionDate = fs.sessionDate - interval '3' day
group by s.sessionDate
order by s.sessionDate;
Notes:
Oracle does not require conversion with dividing integers.
The group by should contain the column name, and it is actually "1", not "3".
Shorter table aliases make the query easier to write and to read.
You'll probably want an order by, because the results will be an in indeterminate order.
There is probably a better way to write this query using window functions.

group by date part of datetime and get number of records for each

I have this so far:
select created_at,
DATEDIFF(TO_DATE(current_date()), TO_DATE(sales_flat_order.created_at)) as delay,
count(*) over() as NumberOfOrders
FROM
magentodb.sales_flat_order
WHERE
status IN ( 'packed' , 'cod_confirmed' )
GROUP BY TO_DATE(created_at)
But this is not working.
syntax error:
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference 'created_at': (possible column names are: (tok_function to_date (tok_table_or_col created_at)))
count(*) does not give sum for each grouped by date but instead all of the rows.
Note : I am actually using hive but it is exactly like sql when it comes to queries
Try this:
select created_at,
DATEDIFF(TO_DATE(current_date()), TO_DATE(sales_flat_order.created_at)) as delay,
count(*) as NumberOfOrders
FROM
magentodb.sales_flat_order
WHERE
status IN ( 'packed' , 'cod_confirmed' )
GROUP BY Date(created_at)
I think you want to use date part(including year, month and day) of created_at for grouping.
select
date(created_at) as created_at_day,
datediff(curdate(), sales_flat_order.created_at) as delay,
count(*) as numberOfOrders
from magentodb.sales_flat_order
WHERE status IN ('packed', 'cod_confirmed' ) GROUP BY created_at_day
This query will show only the first order created on the day. Because you are grouping by the day. You can use average to find average delay of orders created for the day.
My phone won't allow me to post comments. But try this link it might guide you the right way.
stackoverflow.com/questions/29704904/invalid-table-alias-or-column-reference-b

Using a timestamp function in a GROUP BY

I'm working with a large transaction data set and would like to group a count of individual customer transactions by month. I am unable to use the timestamp function in the GROUP BY and return the following error:
BAD_QUERY (expression STRFTIME_UTC_USEC([DATESTART], '%b') in GROUP BY is invalid)
Is there a simple workaround to achieve this or should I build a calendar table (which may be the simplest option)?
You have to use an alias:
SELECT STRFTIME_UTC_USEC(DATESTART, '%b') as month, COUNT(TRANSACTION)
FROM datasetId.tableId
GROUP BY month
#Charles is correct but as an aside you can also group by column number.
SELECT STRFTIME_UTC_USEC(DATESTART, '%b') as month, COUNT(TRANSACTION) as count
FROM [datasetId.tableId]
GROUP BY 1
ORDER BY 2 DESC