Spark throws : "expression is neither present in the group by, nor is it an aggregate function..." - sql

I'm trying to execute this with pyspark:
query = "SELECT *\
FROM transaction\
INNER JOIN factures\
ON transaction.t_num = factures.f_trx\
WHERE transaction.t_num != ''\
GROUP BY transaction.t_num"
result = sqlContext.sql(query)
Spark gives an error :
u"expression transaction.t_aut is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;

You forgot to add list of columns in group by statement. As you are selecting all columns in select statement.

It's saying that there is column named transaction.t_aut that you have projected in your select statement when you used select * that is not being used in your group by.
Solution is to either replace select * with the columns that are in your group by in your case transaction.t_num or add transaction.t_aut to your group by

Related

column "pedroad.id" must appear in the GROUP BY clause or be used in an aggregate function

select di.seq, di.node , di.edge , di.cost, a.geom
from pgr_dijkstra(
'select id, target, source, sum(cost) from pedroad',
array(select get_source2('location1'))
,array(select get_target2('test4'))
,false) as di,
pedroad a
where di.node = a.source;
error: column "pedroad.id" must appear in the GROUP BY clause or be used in an aggregate function
How should I use group by?
There is at least one syntax error in the SQL string you are passing as first parameter: sum(cost) is an aggregate function, so all other columns in the SELECT list must appear in the GROUP BY clause or be used in an aggregate function - like the error message says.
This would fix the syntax error:
SELECT di.seq, di.node, di.edge, di.cost, a.geom
FROM pgr_dijkstra('select id, target, source, sum(cost) from pedroad
group by 1,2,3'
, array(select get_source2('location1'))
, array(select get_target2('test4'))
, false) di
JOIN pedroad a ON di.node = a.source;
But its unclear how you actually wanted to sum ...
If id happens to be the PK of pedroad, you can simplify to just group by 1.
Explanation:
Return a grouped list with occurrences using Rails and PostgreSQL
Concatenate multiple result rows of one column into one, group by another column

SqlSyntaxErrorException in DB2 query

I am trying to execute the following db2 query, but I'm getting this error:
SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-119, SQLSTATE=42803, SQLERRMC=ENTITLEMENT
The query is:
SELECT *
FROM reclaimbalance rb
,user_benefit_accrued_period ubap
WHERE rb.user_id = ubap.user_id
AND rb.component_id = ubap.PAY_HEAD_ID
AND ubap.CUSTOMER_ID = 281
AND rb.year = '2016-2017'
AND ubap.STATUS = 1
GROUP BY ubap.user_id
,ubap.PAY_HEAD_ID
HAVING sum(ubap.STD_BALANCE_ADDED_IN_PERIOD) != rb.ENTITLEMENT
One problem is SELECT *. I would expect the error to be a bit different from just a generic syntax error, though.
Also, you should learn to use explicit JOIN syntax. And, your HAVING clause has an unaggregated column.
I assume you want something like this:
SELECT ubap.user_id, ubap.PAY_HEAD_ID, rb.ENTITLEMENT,
sum(ubap.STD_BALANCE_ADDED_IN_PERIOD)
FROM reclaimbalance rb JOIN
user_benefit_accrued_period ubap
ON rb.user_id = ubap.user_id AND rb.component_id = ubap.PAY_HEAD_ID
WHERE ubap.CUSTOMER_ID = 281 AND rb.year = '2016-2017' AND ubap.STATUS = 1
GROUP BY ubap.user_id, ubap.PAY_HEAD_ID, rb.ENTITLEMENT
HAVING sum(ubap.STD_BALANCE_ADDED_IN_PERIOD) <> rb.ENTITLEMENT;
The problem diagnosed by the sqlcode=-119 is that the column "ENTITLEMENT" specified in the HAVING clause is neither coded on that HAVING clause within an aggregate function nor is the column "ENTITLEMENT" specified in the GROUP BY clause.
Recovery is by either removing the non-aggregate reference to the column "ENTITLEMENT" from the HAVING clause, changing the reference to the column "ENTITLEMENT" to be inside of an aggregate function, or adding the column "ENTITLEMENT" to the GROUP BY clause.
Noting however, that despite being valid recovery, the effect may not be what is required. And even after that sqlcode=-119 problem is resolved according to any one of the possible recovery actions noted above, almost surely the next problem will be for a sqlcode-=-122 suggesting that some columns or non-aggregate\non-[effective-]constant expressions included in the SELECT-list are not also named on the GROUP BY clause. FWiW: Despite allusion(s) otherwise, the SELECT * can be compatible with GROUP BY, but just a special-case; as the rules state, the selected column [implicitly selected per the asterisk] must have been specified also, on the GROUP BY.

MS access query aggregation

I am trying to get query like this
SELECT sales.action_date, sales.item_id, items.item_name,
sales.item_quantity, sales.item_price, sales.net
FROM sales INNER JOIN items ON sales.item_id = items.ID
GROUP BY sales.item_id
HAVING (((sales.action_date)=[Forms]![rep_frm]![Text13].[value]));
Every time I try to show data this message show
your query does not include the specified expression ' action date '
as part of aggregate function.
and for all field in the query >>> but i just want the aggregation be for item_id
what i should do?
You don't have any aggregations like SUM in your SELECT statement. I also don't understand why you sales.action_date is in de HAVING clause. This is for aggregated filtering like SUM(sales.item_price) <> 0. It should be possible to put this part in de WHERE-clause, before the GROUP BY instead of the HAVING clause.
This example should work:
SELECT sales.item_id, items.item_name, SUM(sales.item_quantity),
SUM(sales.item_price), SUM(sales.net)
FROM sales INNER JOIN items ON sales.item_id = items.ID
WHERE sales.action_date=[Forms]![rep_frm]![Text13].[value]
GROUP BY sales.item_id, items.item_name;
When you are grouping your data all fields in select query should be either included in group by clause, or some of aggregate functions should be applied to it - otherwise it doesn't makes sanse.
By the way - I far as I can see, you should use WHERE(((sales.action_date)=[Forms]![rep_frm]![Text13].[value])) before group, not having after.
If you want to aggregate by date you have to put the date in the GROUP BY clause
SELECT sales.action_date,
SUM(sales.item_quantity),
SUM(sales.item_quantity * sales.item_price) as Total,
SUM(sales.net)
FROM sales
INNER JOIN items ON sales.item_id = items.ID
WHERE (((sales.action_date)=[Forms]![rep_frm]![Text13].[value]));
GROUP BY sales.action_date
Only the column you want to group by can appear in the GROUP BY clause. Only these columns can appear in the select clause outside of aggregation functions.

Error in group by using hive

I am using the following code and getting the error below
select d.searchpack,d.context, d.day,d,txnid,d.config, c.sgtype from ds3resultstats d join
context_header c on (d.context=c.contextid) where (d.day>='2012-11-15' and d.day<='2012-11-25' and c.sgtype='Tickler' and d.config like
'%people%') GROUP BY d.context limit 10;
FAILED: Error in semantic analysis: line 1:7 Expression Not In Group By Key d
I am guessing I am using the group by incorrectly
when you use group by, you cannot select other additional field. You can only select group key with aggregate function.
See hive group by for more information.
Related questions.
Code example:
select d.context,count(*)
from ds3resultstats
...
group by d.context
or group by multiply fields.
select d.context, d.field2, count(*)
from ds3resultstats
...
group by d.context, d.field2
It is expecting all the columns to be added with group by.
Even I am facing the same issue however I managed to get a work around to these kind of issues.
you can use collect_set with the column name to get the output. For example
select d.searchpack,collect_set(d.context) from sampletable group by d.searchpack;

SQL: How to return any Part that occurs more than Once

I have the following query which returns the following error:
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
SELECT Part from Parts Where count(Part) > 1
How could i rewrite it to return the part that appears more than once.
You need to use a GROUP BY and HAVING clause like this:
SELECT part
FROM Parts
GROUP BY part
HAVING COUNT(*) > 1
A perfect opportunity for the rarely used HAVING clause:
SELECT Part, Count(Part) as PartCount
FROM Parts
GROUP BY Part
HAVING Count(Parts) > 1
try this:
select part from parts group by part having count(part) > 1