Group by in subquery have a wrong scope - hive

I'm using hive 1.1.0 ,and found such a confusing error. I want to know what's the problem and how to explain this kind of problem the next time.
I met this problem,
FAILED: SemanticException [Error 10025]: Line 2:8 Expression not in GROUP BY key 'item_id'
when using
select
item_id,
buy_num_sum_I_7days/sum(buy_num_sum_I_7days) item_buy_probability
FROM
(
select
item_id,
max(buy_num_sum_I_7days) buy_num_sum_I_7days
FROM
mytable
where
dt>=20210206 and dt<=20210208
group BY
item_id
)tt;

You need to give an empty window to sum because it is an aggregate function:
select
item_id,
buy_num_sum_I_7days/(sum(buy_num_sum_I_7days) over ()) item_buy_probability
FROM
(
select
item_id,
max(buy_num_sum_I_7days) buy_num_sum_I_7days
FROM
mytable
where
dt>=20210206 and dt<=20210208
group BY
item_id
)tt;

Related

How to run a subquery in hive

I have this query that I am trying to run in HIVE:
select transaction_date, count(total_distinct) from (
SELECT transaction_date, concat(subid,'**', itemid) as total_distinct
FROM TBL_1
group by transaction_date, subid,itemid
) group by transaction_date
What I am trying to do it get the distinct combination of subid and itemid, but I need the total count per day. When I run the query above, I get this error:
Error while compiling statement: FAILED: ParseException line 6:2 cannot recognize input near 'group' 'by' 'TRANSACTION_DATE' in subquery source
The query looks correct to me though. Has anyone encountered this error?
Hive requires subqueries to be aliased, so you need to specify a name for it:
select transaction_date, count(total_distinct) from (
SELECT transaction_date, concat(subid,'**', itemid) as total_distinct
FROM TBL_1
group by transaction_date, subid,itemid
) dummy -- << note here
group by transaction_date
True, the error message is far from helpful.

Trying to create Total count from the group

SELECT COUNT(DISTINCT user_id) Viewer_Count
, EVENT_NAME
SELECT SUM (COUNT(DISTINCT user_id)) AS total_view
FROM dsv1069.EVENTS
GROUP BY EVENT_NAME
Error
org.postgresql.util.PSQLException: ERROR: syntax error at or near "SELECT"
Position: 66
,EVENT_NAME
SELECT SUM (COUNT(DISTINCT user_id)) AS total_view
Try using derive table technique to get a SUM of aggregated column. You cannot use aggregation over another aggregated column or sub query. And in your query, there are syntax errors where you have defined the sub query wrong and comma is mission. plsql, mysql, sqllite syntaxes are somewhat similar. What matters is the way we use the technique. If you can provide your table definition and data, I can provide a better solution.
SELECT
viewer_Count
,EVENT_NAME
,SUM(total_view) AS [total-View_Sum]
FROM
(
SELECT
COUNT(user_id) AS viewer_Count
, EVENT_NAME
, COUNT(distinct user_id) AS total_view
FROM dsv1069.EVENTS
GROUP BY EVENT_NAME
) AS A
GROUP BY viewer_Count
,EVENT_NAME

Query does not found column, suggests same column in Hive SQL

I have the following query in SQL:
select midquery.account, midquery.name, midquery.label, midquery.labelfrequency
from(
-- Count the appearance of each label.
select count(*) as labelfrequency, account, name, label
from(
select account, name, label from myTable
) innerquery
group by account, name, label
) midquery
-- Select most frequent values only.
where rank() over
(partition by midquery.account, midquery.name
order by midquery.labelfrequency desc) = 1
The idea is to find the most frequent label per name-account set. When I run this query, I get the following error:
Error while compiling statement: FAILED: SemanticException [Error 10002]: Line 12:74 Invalid column reference 'labelfrequency': (possible column names are: labelfrequency, account, name, label)
I don't quite understand why the interpreter does not find the column labelfrequency but can suggest it. Have you got any suggestions on how to tackle this issue?
Edit: if I move the rank() to the select part, I get results.
select midquery.account, midquery.name, midquery.label, midquery.labelfrequency,
rank() over (partition by midquery.account, midquery.name
order by midquery.labelfrequency desc)
from(
-- Count the appearance of each label.
select count(*) as labelfrequency, account, name, label
from(
select account, name, label from myTable
) innerquery
group by account, name, label
) midquery
Window functions are simply not allowed in the WHERE clause. There are good reasons for this, but you can think of it as just another rule of SQL -- similar to column aliases not being recognized.
(The real reason is specifying how the window function would operate when there are multiple filtering conditions. It is (almost ?) impossible to come up with a coherent set of rules.)
Having said that, you can simplify your query:
select t.account, t.name, t.label, t.labelfrequency
from (select count(*) as labelfrequency, account, name, label,
rank() over (partition by account, name
order by count(*) desc
) as seqnum
from myTable t
group by account, name, label
) t
where seqnum = 1;
That is, window functions and aggregation functions can be combined. And you don't need a subquery to specify only a handful a columns.

Can a HIVE SELECT combine GROUP BY and ORDER BY?

I'm doing some relatively simple queries in Hive and cannot seem to combine GROUP BY and ORDER BY in a single statement. I have no problem doing a select into a temporary table of the GROUP BY query and then doing a select on that table with an ORDER BY, but I can't combine them together.
For example, I have a table a and can execute this query:
SELECT place,count(*),sum(weight) from a group by place;
And I can execute this query:
create temporary table result (place string,count int,sumweight int);
insert overwrite table result
select place,count(*),sum(weight) from a group by place;
select * from result order by place;
But if I try this query:
SELECT place,count(*),sum(weight) from a group by place order by place;
I get this error:
Error: Error while compiling statement: FAILED: ParseException line 1:45 mismatched input '' expecting \' near '_c0' in character string literal (state=42000,code=40000)
Try using group by as a sub-query and order by as an outer query as show below:
SELECT
place,
cnt,
sum_
FROM (
SELECT
place,
count(*) as cnt,
sum(weight) as sum_
FROM a
GROUP BY place
) a
ORDER BY place;
use sort by like this:
SELECT place,count(*),sum(weight) from a group by place sort by place;

materialized view using WITH statement

i created a materialized view but i have a mistake i do not understand to resolve it
RA-00937: not a single-group group function
00937. 00000 - "not a single-group group function
on line
SELECT x.*,SUM(x.quantities) as Tquantities
can you help me to resolve it
CREATE MATERIALIZED VIEW TestView AS
With x AS(
SELECT Numclient as CLIENT,
Numcommand as COMMAND,
count(gender) as quantities
FROM customer,
Command
WHERE Numclient = Numcommand
AND gender =2
GROUP BY Numclient,
Numcommand
),
x1 AS (
SELECT x.*,SUM(x.quantities) as Tquantities
FROM x
)
SELECT x.*,ROUND(x.quantities*100/x1.Tquantities) as Percent
FROM x1, x;
In order to eliminate error remove x.*, in your original subquery x1.
Your select statement can be simplified, like here:
select Numclient CLIENT, Numcommand COMMAND, count(gender) quantities,
round(100*count(gender)/sum(count(gender)) over()) percent
from customer
join Command on Numclient = Numcommand and gender = 2
group by Numclient, Numcommand
SQLFiddle
It's little unclear why are you displaying column COMMAND, when it's equal to CLIENT?
I suspect that maybe this is mistake in where condition or this column is superfluous.
Since when is this valid in Oracle? This is not MySQL.
SELECT x.*,SUM(x.quantities) as Tquantities FROM x
In order to this to work, you have to GROUP BY the columns in x.