How to run a subquery in hive - sql

I have this query that I am trying to run in HIVE:
select transaction_date, count(total_distinct) from (
SELECT transaction_date, concat(subid,'**', itemid) as total_distinct
FROM TBL_1
group by transaction_date, subid,itemid
) group by transaction_date
What I am trying to do it get the distinct combination of subid and itemid, but I need the total count per day. When I run the query above, I get this error:
Error while compiling statement: FAILED: ParseException line 6:2 cannot recognize input near 'group' 'by' 'TRANSACTION_DATE' in subquery source
The query looks correct to me though. Has anyone encountered this error?

Hive requires subqueries to be aliased, so you need to specify a name for it:
select transaction_date, count(total_distinct) from (
SELECT transaction_date, concat(subid,'**', itemid) as total_distinct
FROM TBL_1
group by transaction_date, subid,itemid
) dummy -- << note here
group by transaction_date
True, the error message is far from helpful.

Related

Trying to create Total count from the group

SELECT COUNT(DISTINCT user_id) Viewer_Count
, EVENT_NAME
SELECT SUM (COUNT(DISTINCT user_id)) AS total_view
FROM dsv1069.EVENTS
GROUP BY EVENT_NAME
Error
org.postgresql.util.PSQLException: ERROR: syntax error at or near "SELECT"
Position: 66
,EVENT_NAME
SELECT SUM (COUNT(DISTINCT user_id)) AS total_view
Try using derive table technique to get a SUM of aggregated column. You cannot use aggregation over another aggregated column or sub query. And in your query, there are syntax errors where you have defined the sub query wrong and comma is mission. plsql, mysql, sqllite syntaxes are somewhat similar. What matters is the way we use the technique. If you can provide your table definition and data, I can provide a better solution.
SELECT
viewer_Count
,EVENT_NAME
,SUM(total_view) AS [total-View_Sum]
FROM
(
SELECT
COUNT(user_id) AS viewer_Count
, EVENT_NAME
, COUNT(distinct user_id) AS total_view
FROM dsv1069.EVENTS
GROUP BY EVENT_NAME
) AS A
GROUP BY viewer_Count
,EVENT_NAME

Group by in subquery have a wrong scope

I'm using hive 1.1.0 ,and found such a confusing error. I want to know what's the problem and how to explain this kind of problem the next time.
I met this problem,
FAILED: SemanticException [Error 10025]: Line 2:8 Expression not in GROUP BY key 'item_id'
when using
select
item_id,
buy_num_sum_I_7days/sum(buy_num_sum_I_7days) item_buy_probability
FROM
(
select
item_id,
max(buy_num_sum_I_7days) buy_num_sum_I_7days
FROM
mytable
where
dt>=20210206 and dt<=20210208
group BY
item_id
)tt;
You need to give an empty window to sum because it is an aggregate function:
select
item_id,
buy_num_sum_I_7days/(sum(buy_num_sum_I_7days) over ()) item_buy_probability
FROM
(
select
item_id,
max(buy_num_sum_I_7days) buy_num_sum_I_7days
FROM
mytable
where
dt>=20210206 and dt<=20210208
group BY
item_id
)tt;

Can a HIVE SELECT combine GROUP BY and ORDER BY?

I'm doing some relatively simple queries in Hive and cannot seem to combine GROUP BY and ORDER BY in a single statement. I have no problem doing a select into a temporary table of the GROUP BY query and then doing a select on that table with an ORDER BY, but I can't combine them together.
For example, I have a table a and can execute this query:
SELECT place,count(*),sum(weight) from a group by place;
And I can execute this query:
create temporary table result (place string,count int,sumweight int);
insert overwrite table result
select place,count(*),sum(weight) from a group by place;
select * from result order by place;
But if I try this query:
SELECT place,count(*),sum(weight) from a group by place order by place;
I get this error:
Error: Error while compiling statement: FAILED: ParseException line 1:45 mismatched input '' expecting \' near '_c0' in character string literal (state=42000,code=40000)
Try using group by as a sub-query and order by as an outer query as show below:
SELECT
place,
cnt,
sum_
FROM (
SELECT
place,
count(*) as cnt,
sum(weight) as sum_
FROM a
GROUP BY place
) a
ORDER BY place;
use sort by like this:
SELECT place,count(*),sum(weight) from a group by place sort by place;

materialized view using WITH statement

i created a materialized view but i have a mistake i do not understand to resolve it
RA-00937: not a single-group group function
00937. 00000 - "not a single-group group function
on line
SELECT x.*,SUM(x.quantities) as Tquantities
can you help me to resolve it
CREATE MATERIALIZED VIEW TestView AS
With x AS(
SELECT Numclient as CLIENT,
Numcommand as COMMAND,
count(gender) as quantities
FROM customer,
Command
WHERE Numclient = Numcommand
AND gender =2
GROUP BY Numclient,
Numcommand
),
x1 AS (
SELECT x.*,SUM(x.quantities) as Tquantities
FROM x
)
SELECT x.*,ROUND(x.quantities*100/x1.Tquantities) as Percent
FROM x1, x;
In order to eliminate error remove x.*, in your original subquery x1.
Your select statement can be simplified, like here:
select Numclient CLIENT, Numcommand COMMAND, count(gender) quantities,
round(100*count(gender)/sum(count(gender)) over()) percent
from customer
join Command on Numclient = Numcommand and gender = 2
group by Numclient, Numcommand
SQLFiddle
It's little unclear why are you displaying column COMMAND, when it's equal to CLIENT?
I suspect that maybe this is mistake in where condition or this column is superfluous.
Since when is this valid in Oracle? This is not MySQL.
SELECT x.*,SUM(x.quantities) as Tquantities FROM x
In order to this to work, you have to GROUP BY the columns in x.

SELECT statement in WHERE clause on BigQuery not working

I'm trying to run the following query on Google BigQuery:
SELECT SUM(var1) AS Revenue
FROM [table1]
WHERE timeStamp = (SELECT MAX(timeStamp) FROM [table1])
I'm getting the following error:
Error: Encountered "" at line 3, column 19. Was expecting one of:
Is this not supported in BigQuery? If so, would there be an elegant alternative?
Subselect in a comparison predicate is not supported, but you can use IN.
SELECT SUM(var1) AS Revenue
FROM [table1]
WHERE timeStamp IN (SELECT MAX(timeStamp) FROM [table1])
I would use Rank() to get the max timestamp, and filter the #1s in the where clause.
select SUM(var1) AS Revenue
From
(SELECT var1
,RANK() OVER (ORDER BY timestamp DESC) as RNK
FROM [table1]
)
where RNK=1
I don't know how it works with BQ, but in other DB technologies it would be more efficient as it involves only single table scan rather than 2.