Teradata error 3504 (non-aggregate values must be part of group) when using windowing function

Teradata error 3504 (non-aggregate values must be part of group) when using windowing function - sql

So I wrote a query that uses a window function and I keep getting an error 3504 in Teradata, eventhough I'm sure I have the correct columns in the group by clause (all non-aggregate columns). It has something to do with the windowing function I'm using, because when I comment it out I don't get the error, but I have no idea how to resolve it.
This is the query:
select
n.acct_id as bd_acct_id
,n.tran_nr as tran_order
,t.trade_dt - n.tran_dt as days_until_trade
,n.n_total
,sum(t.trade_ct) as trades_ct
,sum(t.trade_gross_am) as tot_trades
,sum(t.trade_gross_am) over (partition by bd_acct_id, tran_order order by tran_order) as running_total
from nnae n
left join trades t
on n.acct_id = t.acct_id
having days_until_trade > 0
group by 1,2,3,4
order by 1,2,3
Would appreciate any help. Thanks!

Presumably, you intend something like this:
sum(sum(t.trade_gross_am)) over (partition by n.acct_id, n.tran_nr
order by min(n.tran_dt)
rows between unbounded preceding and current row
) as running_total
It seems odd to have a running total, without the date column explicitly in the result set.
Also, I replaced the aliases with the original column names. Not all databases support aliases in window functions, so this is just a habit I'm used to.

Related

Get max(timestamp) for each group with repetition (IBM DB2)

ANSWER: Need to use LEAD and PARTITION BY functions. Please refer to Gordon's answer.
I have the following dataset :
I want to get rows 1,3,5,7 in the result set.
RESULT SET SHOULD LOOK LIKE :
11/10/2020 19:36:11.548955 IN_REVIEW
11/8/2020 19:36:11.548955 EXPIRED
11/6/2020 19:36:11.548955 IN_REVIEW
11/4/2020 19:36:11.548955 ACTIVE

Use window functions. LEAD() gets the value from the "next" row, so filter only when the value changes:
SELECT t.*
FROM (SELECT t.*,
LEAD(interac_Reg_stat) OVER (PARTITION BY Acct_No ORDER BY xcn_tmstmp) as next_interac_Reg_stat
FROM TABLE
) t
WHERE interac_Reg_stat <> next_interac_Reg_stat OR
next_interac_Reg_stat IS NULL;

Well, since you are grouping by interac_Reg_stat, you will never get 2 separate rows for the IN_REVIEW status. To get the result you want, you will need to add a sub-query to find all items with a certain interac_Reg_stat that are before another specific interac_Reg_stat.

Same return with and without the SUM operator PostgreSQL

I'm using PostgreSQL 10 and trying to run this query. I started with a CTE which I am referencing as 'query.'
SELECT
ROW_NUMBER()OVER() AS my_new_id,
query.geom AS geom,
query.pop AS pop,
query.name,
query.distance AS dist,
query.amenity_size,
((amenity_size)/(distance)^2) AS attract_score,
SUM((amenity_size)/(distance)^2) AS tot_attract_score,
((amenity_size)/(distance)^2) / SUM((amenity_size)/(distance)^2) as marketshare
INTO table_mktshare
FROM query
WHERE
distance > 0
GROUP BY
query.name,
query.amenity_size,
query.geom,
query.pop,
query.distance
The query runs but the problem lies in the 'markeshare' column. It returns the same answer with or without the SUM operator and returns one, which appears to make both the attract_score and the tot_attract_score the same. Why is the SUM operator read the same as the expression above it?

This is occurring specifically because each combination of columns in the group by clause uniquely identifies one row in the table. I don't know if this is intentional, but more normally, one would expect something like this:
SELECT ROW_NUMBER() OVER() AS my_new_id,
query.geom AS geom, query.pop AS pop, query.name,
SUM((amenity_size)/(distance)^2) AS tot_attract_score,
INTO table_mktshare
FROM query
WHERE distance > 0
GROUP BY query.name, query.geom, query.pop;
This is not your intention, but it does give a flavor of what's expected.

SQL ORDER BY clause causing GROUP BY/aggregate error

I get this error:
PG::GroupingError: ERROR: column "relationships.created_at" must appear in the GROUP BY clause or be used in an aggregate function
from this query:
last_check = #user.last_check.to_i
#new_relationships = User.select('*')
.from("(#{#rels_unordered.to_sql}) AS rels_unordered")
.joins("
INNER JOIN relationships
ON rels_unordered.id = relationships.character_id
WHERE EXTRACT(EPOCH FROM relationships.created_at) > #{last_check}
ORDER BY relationships.created_at DESC
")
Without the ORDER BY line, it works fine. I don't understand what the GROUP BY clause is. How do I get it working and still order by relationships.created_at?
EDIT
I understand you can GROUP BY relationships.created_at. But isn't grouping unnecessary? Is the problem that relationship.created_at is not included in the SELECT? How do you include it? If you've already done an INNER JOIN with relationships, why the hell isn't relationships.created_at included in the result??
I've just realised this is all happening because the logs show the query begins with SELECT COUNT(*) FROM..... So the COUNT is the aggregate function. But I never requested a COUNT! Why does the query start with that?
EDIT 2
Ok, this seems to be happening because of lazy querying. The first thing that happens to #new_relationships is #new_relationships.any? This affects the query and turns it into a count. So I suppose the question is, how do I force the query to run as originally intended? And also to check if #new_relationships is empty without affecting the sql query?

You just need to add group by along with your order by clause
last_check = #user.last_check.to_i
#new_relationships =
User.select('"rels_unordered".*')
.from("(#{#rels_unordered.to_sql}) AS rels_unordered")
.joins("INNER JOIN relationships
ON rels_unordered.id = relationships.character_id
WHERE EXTRACT(EPOCH FROM relationships.created_at) > #{last_check}
GROUP BY relationships.created_at
ORDER BY relationships.created_at DESC ")

It's asking for a GROUP BY after your FROM clause in your EXTRACT (essentially a subselect). There's ways around it, but I've found it's often easier to make a GROUP BY work. Try: ...FROM relationships.created_at GROUP BY id or whatever indexing column you are using from that table. It seems like your ORDER BY is conflicting with itself. By grouping the subselect data it should lose its conflict.

Oracle SQL - Comparing AVG functions in WHERE

I'm trying to write a few Oracle SQL scripts for an assignment. I've managed to get all of it to work, except for one part. To summarize, I have to display data from 2 tables if the average of 1 column in table A is greater than the average of another column in table B. I realize you cannot include AVG functions in a WHERE clause or HAVING clause since it seems unable to properly access the data (from what I've read). When I exclude this clause, the script executes properly, so I'm confident there are no other errors.
I've tried writing it as follows but the error I get is ORA-00936: missing expression and it is just before the > sign. I thought this may be due to improper bracket placing but none of my attempts resolved this. Here is my attempt:
SELECT l.l_category, SUM(r.r_sold), AVG(l.l_cost)
FROM promos l
INNER JOIN sales r
ON r.promo_id = l.promo_id
GROUP BY l.l_category
HAVING (SELECT AVG(l.l_cost) OVER (PARTITION BY l.l_cost)) >
(SELECT AVG(r.r_sold) OVER (PARTITION BY r.r_sold));
I tried doing this without the OVER (PARTITION BY ...) as well as putting it into a WHERE clause but it didn't resolve the error. I'm pretty sure I need to put it into a SELECT statement somehow but I'm at a loss.

You do not need to use the OVER clause when applying the aggregate functions in the HAVING clause. Just use the aggregate functions on their own.
SELECT l.l_category, SUM(r.r_sold), AVG(l.l_cost)
FROM promos l
INNER JOIN sales r
ON r.promo_id = l.promo_id
GROUP BY l.l_category
HAVING HAVING AVG(l.l_cost) > AVG(r.r_sold)

Group by SQL statement

So I got this statement, which works fine:
SELECT MAX(patient_history_date_bio) AS med_date, medication_name
FROM biological
WHERE patient_id = 12)
GROUP BY medication_name
But, I would like to have the corresponding medication_dose also. So I type this up
SELECT MAX(patient_history_date_bio) AS med_date, medication_name, medication_dose
FROM biological
WHERE (patient_id = 12)
GROUP BY medication_name
But, it gives me an error saying:
"coumn 'biological.medication_dose' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.".
So I try adding medication_dose to the GROUP BY clause, but then it gives me extra rows that I don't want.
I would like to get the latest row for each medication in my table. (The latest row is determined by the max function, getting the latest date).
How do I fix this problem?

Use:
SELECT b.medication_name,
b.patient_history_date_bio AS med_date,
b.medication_dose
FROM BIOLOGICAL b
JOIN (SELECT y.medication_name,
MAX(y.patient_history_date_bio) AS max_date
FROM BIOLOGICAL y
GROUP BY y.medication_name) x ON x.medication_name = b.medication_name
AND x.max_date = b.patient_history_date_bio
WHERE b.patient_id = ?

If you really have to, as one quick workaround, you can apply an aggregate function to your medication_dose such as MAX(medication_dose).
However note that this is normally an indication that you are either building the query incorrectly, or that you need to refactor/normalize your database schema. In your case, it looks like you are tackling the query incorrectly. The correct approach should the one suggested by OMG Poinies in another answer.
You may be interested in checking out the following interesting article which describes the reasons behind this error:
But WHY Must That Column Be Contained in an Aggregate Function or the GROUP BY clause?

You need to put max(medication_dose) in your select. Group by returns a result set that contains distinct values for fields in your group by clause, so apparently you have multiple records that have the same medication_name, but different doses, so you are getting two results.
By putting in max(medication_dose) it will return the maximum dose value for each medication_name. You can use any aggregate function on dose (max, min, avg, sum, etc.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas