Error in group by using hive - hive

I am using the following code and getting the error below
select d.searchpack,d.context, d.day,d,txnid,d.config, c.sgtype from ds3resultstats d join
context_header c on (d.context=c.contextid) where (d.day>='2012-11-15' and d.day<='2012-11-25' and c.sgtype='Tickler' and d.config like
'%people%') GROUP BY d.context limit 10;
FAILED: Error in semantic analysis: line 1:7 Expression Not In Group By Key d
I am guessing I am using the group by incorrectly

when you use group by, you cannot select other additional field. You can only select group key with aggregate function.
See hive group by for more information.
Related questions.
Code example:
select d.context,count(*)
from ds3resultstats
...
group by d.context
or group by multiply fields.
select d.context, d.field2, count(*)
from ds3resultstats
...
group by d.context, d.field2

It is expecting all the columns to be added with group by.
Even I am facing the same issue however I managed to get a work around to these kind of issues.
you can use collect_set with the column name to get the output. For example
select d.searchpack,collect_set(d.context) from sampletable group by d.searchpack;

Related

Why can't I get GROUP BY to work in my LEFT JOIN in Access

I'm trying to populate a combobox from two tables in Access (2007-2016 file format).
I have two tables:
tblSurveyStatus
SurveyID
SurveyStatus
1
Y
2
N
3
N/A
tblWorkOrder
WONumber
SurveyedID
WO2101
1
WO2102
1
WO2103
2
WO2104
3
WO2105
2
WO2106
{Empty}
WO2107
{Empty}
Desired Result:
WONumber(this col will get hidden)
SurveyStatus
WO2101
Y
WO2103
N
WO2104
N/A
This query works in the datasource for the combobox without using GROUP BY:
SELECT SurveyedID, SurveyStatus
FROM [tblWorkOrders] a
LEFT JOIN (
SELECT SurveyID, SurveyStatus
FROM [tblSurveyStatus]
) b
ON a.SurveyedID = b.SurveyID
ORDER BY b.SurveyID
The problem with this query is that it returns duplicates (Y,Y,N,N,N/A).
So I introduced the GROUP BY like this:
SELECT SurveyedID, SurveyStatus
FROM [tblWorkOrders] a
LEFT JOIN (
SELECT SurveyID, SurveyStatus
FROM [tblSurveyStatus]
) b
ON a.SurveyedID = b.SurveyID
GROUP BY a.SurveyStatus ORDER BY b.SurveyID
This causes an error message "Your query does not include the specified expression 'SurveyedID' as part of an aggregate function." So, I put MIN(SurveyedID) and the error message moves to the next field so I keep putting MIN() in the SQL until finally it works but, I get an input box asking for the "SurveyStatus" then another one asking for the "SurveyID".
I have spent three solid days researching, reading threads on this website and many others without success. I am not a programmer but I kind of understand the basics. My programming basically comes from finding snippets of code and altering them for my use. Please Help!
You are using 2 columns : SurveyedID, SurveyStatus in your 'select' expression while in 'group by' you are using only SurveyStatus which is not valid. All non-aggregate columns in select should be used in 'group by' as well. That is why by adding MIN() to those columns solved the error (which converted those non-aggregate columns to aggregate ones).
Adding "SurveyedID" also to your group by clause can resolve your issue here.
Also if your sole motive is to avoid duplicates , just use 'DISTINCT' before the list of columns in select expression

Spark throws : "expression is neither present in the group by, nor is it an aggregate function..."

I'm trying to execute this with pyspark:
query = "SELECT *\
FROM transaction\
INNER JOIN factures\
ON transaction.t_num = factures.f_trx\
WHERE transaction.t_num != ''\
GROUP BY transaction.t_num"
result = sqlContext.sql(query)
Spark gives an error :
u"expression transaction.t_aut is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;
You forgot to add list of columns in group by statement. As you are selecting all columns in select statement.
It's saying that there is column named transaction.t_aut that you have projected in your select statement when you used select * that is not being used in your group by.
Solution is to either replace select * with the columns that are in your group by in your case transaction.t_num or add transaction.t_aut to your group by

Spark SQL query: org.apache.spark.sql.AnalysisException

I am trying to write a query for a twitter json file to extract the most influential person by looking at retweetCount. I need to group my output by the user, their time zone and the number of retweets in descending order.
When I run the query below I keep getting the exception:
org.apache.spark.sql.AnalysisExceptionorg.apache.spark.sql.AnalysisException:
cannot resolve 'total_retweets' given input columns
t.retweeted_screen_name, t.tz, total_retweets, tweet_count;
sqlContext.sql("""
SELECT
t.retweeted_screen_name,
t.tz,
sum(retweets) AS total_retweets,
count(*) AS tweet_count
FROM (SELECT
actor.displayName as retweeted_screen_name,
body,
actor.twitterTimeZone as tz,
max(retweetCount) as retweets
FROM tweetTable WHERE body <> ''
GROUP BY actor.displayName, actor.twitterTimeZone,
body) t
GROUP BY t.retweeted_screen_name, t.tz
ORDER BY total_retweets DESC
LIMIT 10 """).collect.foreach(println)
When I try to simplify this query I run into errors like:
Column total_retweets is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.
Will much appreciate any help.
When you run a SQL query, it does not calculate resolve the aliases for each query until after the WHERE, JOIN, GROUP BY and ORDER BY clauses have run (but it does do so before any HAVING clauses). You therefore can't ORDER BY total_retweets, you will need to ORDER BY sum(retweets)

Nth(n,split()) in bigquery

I am running the following query and keep getting the error message:
SELECT NTH(2,split(Web_Address_,'.')) +'.'+NTH(3,split(Web_Address_,'.')) as D , Web_Address_
FROM [Domains.domain
limit 10
Error message: Error: (L1:110): (L1:119): SELECT clause has mix of
aggregations 'D' and fields 'Web_Address_' without GROUP BY
clause Job ID:
symmetric-aura-572:job_axsxEyfYpXbe2gpmlYzH6bKGdtI
I tried to use group by clause on field D and/or Web_address_, but still getting errors about group by.
Does anyone know why this is the case? I have had success with similar query before.
You probably want to use WITHIN RECORD aggregation here, not GROUP BY
select concat(p1, '.', p2), Web_Address_ FROM
(SELECT
NTH(2,split(Web_Ad`enter code here`dress_,'.')) WITHIN RECORD p1,
NTH(3,split(Web_Address_,'.')) WITHIN RECORD p2, Web_Address_
FROM (SELECT 'a.b.c' as Web_Address_))
P.S. If you just trying to cut off first part of web address, it will be easier to do with RIGHT and INSTR functions.
You can also consider using URL functions: HOST, DOMAIN and TLD

...oracle group by syntax for beginners

What is the problem in this please?
select inst.id
, inst.type as "TypeOfInstall"
, count(inst.id) as "NoOfInstall"
from dm_bsl_ho.installment inst
group by inst.type
You're not allowed to use single function with group function. Like mixing count with single row function.
You should include the group by function:
select inst.type as "TypeOfInstall"
, count(inst.id) as "NoOfInstall"
from dm_bsl_ho.installment inst
GROUP BY inst.type;
When you do a GROUP BY in most RDBMSs, your selection is limited to the following two things:
Columns mentioned in the GROUP BY - in your case, that's inst.type
Aggregate functions - for example, count(inst.id)
However, the inst.id at the top is neither one of these. You need to remove it for the statement to work:
SELECT
type as "TypeOfInstall"
, COUNT(id) as "NoOfInstall"
FROM dm_bsl_ho.installment
GROUP BY type