GROUP by aggregation does not calculate SUM

GROUP by aggregation does not calculate SUM - abap

I have to aggregate in my query SUM of AMUNT field according to WERKS, DATUM and UZEIT
I try to make a group by without any success
I have an error like that:
What is the problem in my code?
That is my ABAP code:
DATA: gt_compr TYPE TABLE OF yrt_h_sales
SELECT werks, extnb, datum, uzeit, sumvt, deprt, dpext, SUM( amunt ) AS amunt
INTO CORRESPONDING FIELDS OF TABLE #gt_compr
FROM yrt_h_sales
WHERE werks IN #so_werks
AND datum IN #so_datum
GROUP BY werks, datum, uzeit.
After I corrected it and I did this, the code looks as follows:
SELECT werks, datum, uzeit, extnb, deprt, dpext, SUM( amunt ) AS amunt
INTO CORRESPONDING FIELDS OF TABLE #gt_compr
FROM yrt_h_sales
WHERE werks IN #so_werks
AND datum IN #so_datum
GROUP BY werks, datum, uzeit, extnb, deprt, dpext.
So I don't have the compilation error anymore but the aggregation is still not working! I have a 43 line result without sum on the AMUNT column
P.S. this is the structure of my table:

Your observation is consistent with the documentation (and what I have so far seen in any other RDBMS I've worked with):
If aggregate expressions are used, any column identifiers that are not
included as arguments of an aggregate function must be included after
the addition GROUP BY.
Take for example the time field UZEIT: You can tell the system to aggregate (in your case, sum up) all amounts for the same point in time by adding it to the GROUP BY clause, or you can apply an aggregate function as well (SUM would not make any sense here, but MIN might), or you could omit the field altogether. You can not leave it dangling around without further specification - the field either needs to be part of the new key set created by GROUP BY or has to have an aggregate function applied to it so that the system knows what to do with multiple datasets that might occur in the group.
(This is basic SQL btw and not ABAP-specific knowledge.)

remove the CORRESPONDING FIELD OF and just place results INTO TABLE

Related

How I fixed the incompatible error in PostgreSQL while pivoting?

I'm pivoting in PostgreSQL but when I run the query the output says:
ERROR: return and sql tuple descriptions are incompatible
SQL state: 42601
Summarizing, I want the distribution channel on raw, the year in the columns and the operative margin as value.
dist_chann_id --> integer
year --> year
operative_margin --> integer
Without the pivot the output is:
dist_chann_name
year
operative_margin
1
2020
20783
1
2021
5791
2
2020
30362
3
2021
14501
3
2020
2765
3
2021
4535
This is my query:
SELECT *
FROM crosstab(
'SELECT dist_chann_id, year, operative_margin
FROM marginality_by_channel
ORDER BY dist_chann_id, year'
) AS ct ("DC" int, "2020" int, "2021" int);

Source of the error msg
One of the columns does not have data type you think it has. Must be operative_margin, probably text?
The 1-parameter form of crosstab() only uses the "category" column (year in your example) only for sorting. And the "row_name" column (dist_chann_name - or dist_chann_id ?) would produce a different error msg.
Solution
Either way, unless you can guarantee that every "row_name" has exactly two values to it, it's safer to use the 2-parameter form of corosstab():
SELECT *
FROM crosstab(
$$
SELECT dist_chann_name, year, operative_margin
FROM marginality_by_channel
ORDER BY 1, 2
$$
, 'VALUES (2020), (2021)'
) AS ct ("DC" int, "2020" int, "2021" int);
db<>fiddle here
This variant also happens to be more tolerant with type mismatches (as everything is passed as text anyway). See:
PostgreSQL Crosstab Query
crosstab() shines for many resulting value columns (faster, shorter). For just two "value" columns, aggregate FILTER might be the better (simpler) choice. Not much performance to gain (if any, after adding some overhead). See:
a_horse's answer under this question
Conditional SQL count
Broken setup
That aside, your setup is ambiguous to begin with. It includes two rows for the same (dist_chann_name, year) = (3, 2021).
a_horse uses sum() in his aggregate FILTER solution. You might also use min() or max(), or whatever ...
My solution with the 2-parameter form outputs the last value according to sort order. (Think of it as each next value overwriting it's dedicated spot.)
The 1-parameter form outputs the first value according to sort order. (Think of it as "first come, first serve". Superfluous rows are discarded.)
A clean solution would use an explicit sort order and document the effect, or work with a query producing distinct values, or use the appropriate aggregate function with the FILTER solution.

Using filtered aggregation is typically much easier than the somewhat convoluted crosstab() function (at least in my opinion).
select dist_chann_name as dc,
sum(operative_margin) filter (where year = 2020) as "2020",
sum(operative_margin) filter (where year = 2021) as "2021"
from marginality_by_channel
group by dist_chann_name
order by dist_chann_name;

SQLite alias (AS) not working in the same query

I'm stuck in an (apparently) extremely trivial task that I can't make work , and I really feel no chance than to ask for advice.
I used to deal with PHP/MySQL more than 10 years ago and I might be quite rusty now that I'm dealing with an SQLite DB using Qt5.
Basically I'm selecting some records while wanting to make some math operations on the fetched columns. I recall (and re-read some documentation and examples) that the keyword "AS" is going to conveniently rename (alias) a value.
So for example I have this query, where "X" is an integer number that I render into this big Qt string before executing it with a QSqlQuery. This query lets me select all the electronic components used in a Project and calculate how many of them to order (rounding to the nearest multiple of 5) and the total price per component.
SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category,
Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer,
Inventory.price, UsedItems.qty_used as used_qty,
UsedItems.qty_used*X AS To_Order,
ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5,
Inventory.price*Nearest5 AS TotPrice
FROM Inventory
LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid
WHERE UsedItems.pid='1'
ORDER BY RefDes, value ASC
So, for example, I aliased UsedItems.qty_used as used_qty. At first I tried to use it in the next field, multiplying it by X, writing "used_qty*X AS To_Order" ... Query failed. Well, no worries, I had just put the original tab.field name and it worked.
Going further, I have a complex calculation and I want to use its result on the next field, but the same issue popped out: if I alias "ROUND(...)" AS Nearest5, and then try to use this value by multiplying it in the next field, the query will fail.
Please note: the query WORKS, but ONLY if I don't use aliases in the following fields, namely if I don't use the alias Nearest5 in the TotPrice field. I just want to avoid re-writing the whole ROUND(...) thing for the TotPrice field.
What am I missing/doing wrong? Either SQLite does not support aliases on the same query or I am using a wrong syntax and I am just too stuck/confused to see the mistake (which I'm sure it has to be really stupid).

Column aliases defined in a SELECT cannot be used:
For other expressions in the same SELECT.
For filtering in the WHERE.
For conditions in the FROM clause.
Many databases also restrict their use in GROUP BY and HAVING.
All databases support them in ORDER BY.
This is how SQL works. The issue is two things:
The logic order of processing clauses in the query (i.e. how they are compiled). This affects the scoping of parameters.
The order of processing expressions in the SELECT. This is indeterminate. There is no requirement for the ordering of parameters.
For a simple example, what should x refer to in this example?
select x as a, y as x
from t
where x = 2;
By not allowing duplicates, SQL engines do not have to make a choice. The value is always t.x.

You can try with nested queries.
A SELECT query can be nested in another SELECT query within the FROM clause;
multiple queries can be nested, for example by following the following pattern:
SELECT *,[your last Expression] AS LastExp From (SELECT *,[your Middle Expression] AS MidExp FROM (SELECT *,[your first Expression] AS FirstExp FROM yourTables));
Obviously, respecting the order that the expressions of the innermost select query can be used by subsequent select queries:
the first expressions can be used by all other queries, but the other intermediate expressions can only be used by queries that are further upstream.
For your case, your query may be:
SELECT *, PRC*Nearest5 AS TotPrice FROM (SELECT *, ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5 FROM (SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category, Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer, Inventory.price AS PRC, UsedItems.qty_used*X AS To_Order FROM Inventory LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid WHERE UsedItems.pid='1' ORDER BY RefDes, value ASC))

Starting from a column type, how to find supported aggregations in Postgres?

I'm trying to figure out from a column type, which aggregates the data type supports. There's a lot of variety amongst types, just a sample below (some of these support more aggregates, of course):
uuid count()
text count(), min(), max()
integer count(), min, max(),avg(),sum()
I've been thrashing around in the system catalogs and views, but haven't found what I'm after. (See "thrashing around.") I've poked at pg_type, pg_aggregate, pg_operator, and a few more.
Is there a straightforward way to start from a column type and gather all supported aggregates?
For background, I'm writing a client-side cross-tab code generator, and the UX is better when the tool automatically prevents you from selecting an aggregation that's not supported. I've hacked in some hard-coded rules for now, but would like to improve the system.
We're on Postgres 11.4.

A plain list of available aggregate functions can be based on pg_proc like this:
SELECT oid::regprocedure::text AS agg_func_plus_args
FROM pg_proc
WHERE prokind = 'a'
ORDER BY 1;
Or with separate function name and arguments:
SELECT proname AS agg_func, pg_get_function_identity_arguments(oid) AS args
FROM pg_proc
WHERE prokind = 'a'
ORDER BY 1, 2;
pg_proc.prokind replaces proisagg in Postgres 11. In Postgres 10 or older use:
...
WHERE proisagg
...
Related:
How to drop all of my functions in PostgreSQL?
How to get function parameter lists (so I can drop a function)
To get a list of available functions for every data type (your question), start with:
SELECT type_id::regtype::text, array_agg(proname) AS agg_functions
FROM (
SELECT proname, unnest(proargtypes::regtype[])::text AS type_id
FROM pg_proc
WHERE proisagg
ORDER BY 2, 1
) sub
GROUP BY type_id;
db<>fiddle here
Just a start. Some of the arguments are just "direct" (non-aggregated) (That's also why some functions are listed multiple times - due to those additional non-aggregate columns, example string_agg). And there are special cases for "ordered-set" and "hypothetical-set" aggregates. See the columns aggkind and aggnumdirectargs of the additional system catalog pg_aggregate. (You may want to exclude the exotic special cases for starters ...)
And many types have an implicit cast to one of the types listed by the query. Prominent example string_agg() works with varchar, too, but it's only listed for text above. You can extend the query with information from pg_cast to get the full picture.
Plus, some aggregates work for pseudo types "any", anyarray etc. You'll want to factor those in for every applicable data type.
The complication of multiple aliases for the same data type names can be eliminated easily, though: cast to regtype to get canonical names. Or use pg_typeof() which returns standard names. Related:
Type conversion. What do I do with a PostgreSQL OID value in libpq in C?
PostgreSQL syntax error in parameterized query on "date $1"
How do I translate PostgreSQL OID using python

Man, that is just stunning Thank you. The heat death of the universe will arrive before I could have figured that out. I had to tweak one line for PG 11 compatibility...says the guy who did not say what version he was on. I've reworked the query to get close to what I'm after and included a bit of output for the archives.
with aggregates as (
SELECT pro.proname aggregate_name,
CASE
WHEN array_agg(typ.typname ORDER BY proarg.position) = '{NULL}'::name[] THEN
'{}'::name[]
ELSE
array_agg(typ.typname ORDER BY proarg.position)
END aggregate_types
FROM pg_proc pro
CROSS JOIN LATERAL unnest(pro.proargtypes) WITH ORDINALITY proarg (oid,
position)
LEFT JOIN pg_type typ
ON typ.oid = proarg.oid
WHERE pro. prokind = 'a' -- I needed this for PG 11, I didn't say what version I was using.
GROUP BY pro.oid,
pro.proname
ORDER BY pro.proname),
-- The *super helpful* code above is _way_ past my skill level with Postgres. So, thrashing around a bit to get close to what I'm after.
-- First up, a CTE to sort everything by aggregation and then combine the types.
aggregate_summary as (
select aggregate_name,
array_agg(aggregate_types) as types_array
from aggregates
group by 1
order by 1)
-- Finally, the previous CTE is used to get the details and a count of the types.
select aggregate_name,
cardinality(types_array) as types_count, -- Couldn't get array_length to work here. ¯\_(ツ)_/¯
types_array
from aggregate_summary
limit 5;
And a bit of output:
aggregate_name types_count types_array
array_agg 2 {{anynonarray},{anyarray}}
avg 7 {{int8},{int4},{int2},{numeric},{float4},{float8},{interval}}
bit_and 4 {{int2},{int4},{int8},{bit}}
bit_or 4 {{int2},{int4},{int8},{bit}}
bool_and 1 {{bool}}
Still on my wish list are
Figuring out how to execute arrays (we aren't using array fields now, and only have a few places that we ever might. At that point, I don't expect we'll try and support pivots on arrays. tab tool
Getting all of the aliases for the various types. it seems like (?) int8, etc. can come through from pg_attribute in multiple ways. For example, timestamptz can come back from "timestamp with time zone".
These results are going to be consumed by client-side code and processed, so I don't need to get Postgres to figure everything out in one query, just enough for me to get the job done.
In any case, thanks very, very much.

There's the pg_proc catalog table, that lists all functions. The column proisagg marks aggregation functions and the column proargtypes holds an array of the OIDs of the argument types.
So for example to get a list of all aggregation functions with the names of their arguments' type you could use:
SELECT pro.proname aggregationfunctionname,
CASE
WHEN array_agg(typ.typname ORDER BY proarg.position) = '{NULL}'::name[] THEN
'{}'::name[]
ELSE
array_agg(typ.typname ORDER BY proarg.position)
END aggregationfunctionargumenttypes
FROM pg_proc pro
CROSS JOIN LATERAL unnest(pro.proargtypes) WITH ORDINALITY proarg (oid,
position)
LEFT JOIN pg_type typ
ON typ.oid = proarg.oid
WHERE pro.proisagg
GROUP BY pro.oid,
pro.proname
ORDER BY pro.proname;
Of course you may need to extend that, e.g. joining and respecting the schemas (pg_namespace) and check for compatible types in pg_type (have a look at the typcategory column for that), etc..
Edit:
I overlooked, that proisagg was removed in version 11 (I'm still mostly on a 9.6) as the other answers mentioned. So for the sake of completeness: As of version 11 replace WHERE pro.proisagg with WHERE pro.prokind = 'a'.

I've been playing around with the suggestions a bit, and want to post one adaptation based on one of Erwin's scripts:
select type_id::regtype::text as type_name,
array_agg(proname) as aggregate_names
from (
select proname,
unnest(proargtypes::regtype[])::text AS type_id
from pg_proc
where prokind = 'a'
order by 2, 1
) subquery
where type_id in ('"any"', 'bigint', 'boolean','citext','date','double precision','integer','interval','numeric','smallint',
'text','time with time zone','time without time zone','timestamp with time zone','timestamp without time zone')
group by type_id;
That brings back details on the types specified in the where clause. Not only is this useful for my current work, it's useful to my understanding generally. I've run into cases where I've had to recast something, like an integer to a double, to get it to work with an aggregate. So far, this has been pretty much trial and error. If you run the query above (or one like it), it's easier to see from the output where you need recasting between similar seeming types.

MDX WHERE vs FILTER options

I have a query in which I need to do some filtering. I can do it in a subcube, but I am wondering if I could do this in a WHERE clause without subcube. I think this solution would be faster/cleaner. I need to filter out product models with IB>0 in last month, this is my solution so far (only part of a query):
SELECT {[Measures].[AFR],[Measures].[IB]} ON COLUMNS,
([dim_ProductModel].[ODM].children)*[Dim_Date].[Date Full].children ON ROWS
FROM
(
SELECT
FILTER([dim_ProductModel].[Product Model].children,
([Measures].[IB]*[Dim_Date].[Date Full].&[2014-04-01]>0)) ON COLUMNS FROM
[cub_dashboard_spares]
)
however, I would prefer to have it in one query without subquery something like this (its not working though):
SELECT {[Measures].[AFR],[Measures].[IB]} ON COLUMNS,
([dim_ProductModel].[ODM].children)*[Dim_Date].[Date Full].children ON ROWS
FROM
[cub_dashboard_spares]
WHERE FILTER([dim_ProductModel].[Product Model].children,
([Measures].[IB]*[Dim_Date].[Date Full].&[2014-04-01]>0))
I get some error message kind of:
he MDX function CURRENTMEMBER failed because the coordinate for the ... contains a set..
I basically understand why is he not accepting is as in an WHERE clause I should be more specific but I wonder if there is some possibility to rewrite it so that it works.
I don't want that ProductModel appears in the results set.

SELECT {[Measures].[AFR],[Measures].[IB]} ON COLUMNS,
([dim_ProductModel].[ODM].children)*[Dim_Date].[Date Full].children ON ROWS
FROM
[cub_dashboard_spares]
WHERE
({[dim_ProductModel].[Product Model].children},
[Measures].[IB],
PERIODSTODATE(
[Dim_Date].[Date Full], //<<needs to be a level from your Dim_date
[Dim_Date].[Date Full].&[2014-04-01]) //<<needs to be a member from the levelyou have used in above argument
)

Group by SQL statement

So I got this statement, which works fine:
SELECT MAX(patient_history_date_bio) AS med_date, medication_name
FROM biological
WHERE patient_id = 12)
GROUP BY medication_name
But, I would like to have the corresponding medication_dose also. So I type this up
SELECT MAX(patient_history_date_bio) AS med_date, medication_name, medication_dose
FROM biological
WHERE (patient_id = 12)
GROUP BY medication_name
But, it gives me an error saying:
"coumn 'biological.medication_dose' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.".
So I try adding medication_dose to the GROUP BY clause, but then it gives me extra rows that I don't want.
I would like to get the latest row for each medication in my table. (The latest row is determined by the max function, getting the latest date).
How do I fix this problem?

Use:
SELECT b.medication_name,
b.patient_history_date_bio AS med_date,
b.medication_dose
FROM BIOLOGICAL b
JOIN (SELECT y.medication_name,
MAX(y.patient_history_date_bio) AS max_date
FROM BIOLOGICAL y
GROUP BY y.medication_name) x ON x.medication_name = b.medication_name
AND x.max_date = b.patient_history_date_bio
WHERE b.patient_id = ?

If you really have to, as one quick workaround, you can apply an aggregate function to your medication_dose such as MAX(medication_dose).
However note that this is normally an indication that you are either building the query incorrectly, or that you need to refactor/normalize your database schema. In your case, it looks like you are tackling the query incorrectly. The correct approach should the one suggested by OMG Poinies in another answer.
You may be interested in checking out the following interesting article which describes the reasons behind this error:
But WHY Must That Column Be Contained in an Aggregate Function or the GROUP BY clause?

You need to put max(medication_dose) in your select. Group by returns a result set that contains distinct values for fields in your group by clause, so apparently you have multiple records that have the same medication_name, but different doses, so you are getting two results.
By putting in max(medication_dose) it will return the maximum dose value for each medication_name. You can use any aggregate function on dose (max, min, avg, sum, etc.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas