How I fixed the incompatible error in PostgreSQL while pivoting?

How I fixed the incompatible error in PostgreSQL while pivoting? - sql

I'm pivoting in PostgreSQL but when I run the query the output says:
ERROR: return and sql tuple descriptions are incompatible
SQL state: 42601
Summarizing, I want the distribution channel on raw, the year in the columns and the operative margin as value.
dist_chann_id --> integer
year --> year
operative_margin --> integer
Without the pivot the output is:
dist_chann_name
year
operative_margin
1
2020
20783
1
2021
5791
2
2020
30362
3
2021
14501
3
2020
2765
3
2021
4535
This is my query:
SELECT *
FROM crosstab(
'SELECT dist_chann_id, year, operative_margin
FROM marginality_by_channel
ORDER BY dist_chann_id, year'
) AS ct ("DC" int, "2020" int, "2021" int);

Source of the error msg
One of the columns does not have data type you think it has. Must be operative_margin, probably text?
The 1-parameter form of crosstab() only uses the "category" column (year in your example) only for sorting. And the "row_name" column (dist_chann_name - or dist_chann_id ?) would produce a different error msg.
Solution
Either way, unless you can guarantee that every "row_name" has exactly two values to it, it's safer to use the 2-parameter form of corosstab():
SELECT *
FROM crosstab(
$$
SELECT dist_chann_name, year, operative_margin
FROM marginality_by_channel
ORDER BY 1, 2
$$
, 'VALUES (2020), (2021)'
) AS ct ("DC" int, "2020" int, "2021" int);
db<>fiddle here
This variant also happens to be more tolerant with type mismatches (as everything is passed as text anyway). See:
PostgreSQL Crosstab Query
crosstab() shines for many resulting value columns (faster, shorter). For just two "value" columns, aggregate FILTER might be the better (simpler) choice. Not much performance to gain (if any, after adding some overhead). See:
a_horse's answer under this question
Conditional SQL count
Broken setup
That aside, your setup is ambiguous to begin with. It includes two rows for the same (dist_chann_name, year) = (3, 2021).
a_horse uses sum() in his aggregate FILTER solution. You might also use min() or max(), or whatever ...
My solution with the 2-parameter form outputs the last value according to sort order. (Think of it as each next value overwriting it's dedicated spot.)
The 1-parameter form outputs the first value according to sort order. (Think of it as "first come, first serve". Superfluous rows are discarded.)
A clean solution would use an explicit sort order and document the effect, or work with a query producing distinct values, or use the appropriate aggregate function with the FILTER solution.

Using filtered aggregation is typically much easier than the somewhat convoluted crosstab() function (at least in my opinion).
select dist_chann_name as dc,
sum(operative_margin) filter (where year = 2020) as "2020",
sum(operative_margin) filter (where year = 2021) as "2021"
from marginality_by_channel
group by dist_chann_name
order by dist_chann_name;

Related

Finding the last 4, 3, 2, 1 months consecutive order drops among clients based on drop variance

Here I have this query that finds out the drop percentage of a bunch of clients based on the orders they have received(i.e. It finds the percentage difference in orders by comparing the current month with the previous month). What I want to achieve here is to have a field where I can see the clients who had 4 months continuous drop, 3 months drop, 2 months drop, and 1 month drop.
I know, it can only be achieved by comparing the last 4 months using the lag function or sub queries. can you guys pls help me out on this one, would appreciate it very much
select
fd.customers2, fd.Month1, fd.year1, fd.variance, case when
(fd.variance < -0.00001 and fd.year1 = '2022.0' and fd.Month1 = '1')
then '1month drop' else fd.customers2 end as 1_most_host_drop
from 
(SELECT
c.*,
sa.customers as customers2,
sum(sa.order) as orders,
date_part(mon, sa.date) as Month1,
date_part(year, sa.date) as year1,
(cast(orders - LAG(orders) OVER(Partition by customers2 ORDER BY
 year1, Month1) as NUMERIC(10,2))/NULLIF(LAG(orders) 
OVER(partition by customers2 ORDER BY year1, Month1) * 1, 0)) AS variance
FROM stats sa join (select distinct
    d.id, d.customers 
     from configer d 
    ) c on sa.customers=c.customers
WHERE sa.date >= '2021-04-1' 
GROUP BY Month1, sa.customers, c.id,  year1, 
     c.customers)fd

In a spirit of friendliness: I think you are a little premature in posting this here as there are several issues with the syntax before even reaching the point where you can solve the problem:
You have at least two places with a comma immediately preceding the word FROM:
...AS variance, FROM stats_archive sa ...
...d.customers, FROM config d...
Recommend you don't use VARIANCE as an alias (it is a system function in PostgreSQL and so is likely also a system function name in Redshift)
Not super important, but there's no need for c.* - just select the columns you will use
DATE_PART requires a string as the first parameter DATE_PART('mon',current_date)
I might be wrong about this, but I suspect you cannot use column aliases in the partition by or order by of a window function. Put the originating expressions there instead:
... OVER (PARTITION BY customers2 ORDER BY DATE_PART('year',sa.date),DATE_PART('mon',sa.date))
LAG has three parameters. (1) The column you want to retrieve the value from, (2) the row offset, where a positive integer indicates how many rows prior to the current row you should retrieve a value from according to the partition and order context and (3) the value the function should return as a default (in case of the first row in the partition). As such, you don't need NULLIF. So, to get the row immediately prior to the current row, or return 0 in case the current row is the first row in the partition:
LAG(orders,1,0) OVER (PARTITION BY customers2 ORDER BY DATE_PART('year',sa.date),DATE_PART('mon',sa.date))
If you use 0 as a default in the calculation of what is currently aliased variance, you will almost certainly run into a div/0 error either now, or worse, when you least expect it in the future. You should protect against that with some CASE logic or better, provide a more appropriate default value or even better, calculate the LAG with the default 0, then filter out the 0 rows before doing the calculation.
You can't use column aliases in the GROUP BY. You must reference each field that is not participating in an aggregate in the group by, whether through direct mention (sa.date) or indirectly in an expression (DATE_PART('mon',sa.date))
Your date should be '2021-04-01'
All in all, without sample data, expected results using the posted sample data and without first removing syntax errors, it is a tall order to attempt to offer advice on the problem which is any more specific than:
Build the source of the calculation as a completely separate query first. Calculate the LAG in that source query. Only when you've run that source query and verified that the LAG is producing the correct result should you then wrap it as a sub-query or CTE (not sure if Redshift supports these, but presumably) at which point you can filter out the rows with a zero as the denominator (the first month of orders for each customer).
Good luck!

How to parse string from one column into delimited values in SQL

This is my column in Redshift
SHIPMENT_ID
-----------------------------------------
FBA15KS66741, FBA15KS6673D
FBA15NHV7PXX (Oct 20th)
FBA15XNW0SWY 27 balance 2 of 2
FBA15M575MDL & FBA15M59W1Y5
FBA15NHV7PXX (Oct 20th)
FBA15D7WPZVR /FBA15D7WWTPK/FBA15D7WW1GL
I would like to make it
SHIPMENT_ID
-----------------------------------------
FBA15KS66741, FBA15KS6673D
FBA15NHV7PXX
FBA15XNW0SWY
FBA15M575MDL, FBA15M59W1Y5
FBA15NHV7PXX
FBA15D7WPZVR, FBA15D7WWTPK, FBA15D7WW1GL
In SQL only, what is the best way to handle this?

This works in PostgreSQL, so may work in Redshift depending on feature availability in PG8.
WITH items AS
(
SELECT shipment_id,
ARRAY_TO_STRING(REGEXP_MATCHES(shipment_id,'FBA15[0-9a-zA-z]{7}','g'),'') AS unique_shipment_ids
FROM dat
)
SELECT shipment_id,
STRING_AGG(unique_shipment_ids,',') AS shipment_id_csv
FROM items
GROUP BY shipment_id;
I've assumed:
Each item begins with the characters 'FBA15'
There are exactly 7 characters after the first 5
You can edit the regexp pattern if my assumptions are incorrect.
The approach is:
Use REGEXP_MATCHES to capture each item within each row. This creates multiple rows per unique value in shipment_id
Use ARRAY_TO_STRING to convert those values to text, rather than text[]
Use STRING_AGG to join them back together with a comma separator
I found that I could not use STRING_AGG directly around REGEXP_MATCHES as I get the error aggregate function calls cannot contain set-returning function calls, so opted for a CTE. I assume a subquery would work as well.

Starting from a column type, how to find supported aggregations in Postgres?

I'm trying to figure out from a column type, which aggregates the data type supports. There's a lot of variety amongst types, just a sample below (some of these support more aggregates, of course):
uuid count()
text count(), min(), max()
integer count(), min, max(),avg(),sum()
I've been thrashing around in the system catalogs and views, but haven't found what I'm after. (See "thrashing around.") I've poked at pg_type, pg_aggregate, pg_operator, and a few more.
Is there a straightforward way to start from a column type and gather all supported aggregates?
For background, I'm writing a client-side cross-tab code generator, and the UX is better when the tool automatically prevents you from selecting an aggregation that's not supported. I've hacked in some hard-coded rules for now, but would like to improve the system.
We're on Postgres 11.4.

A plain list of available aggregate functions can be based on pg_proc like this:
SELECT oid::regprocedure::text AS agg_func_plus_args
FROM pg_proc
WHERE prokind = 'a'
ORDER BY 1;
Or with separate function name and arguments:
SELECT proname AS agg_func, pg_get_function_identity_arguments(oid) AS args
FROM pg_proc
WHERE prokind = 'a'
ORDER BY 1, 2;
pg_proc.prokind replaces proisagg in Postgres 11. In Postgres 10 or older use:
...
WHERE proisagg
...
Related:
How to drop all of my functions in PostgreSQL?
How to get function parameter lists (so I can drop a function)
To get a list of available functions for every data type (your question), start with:
SELECT type_id::regtype::text, array_agg(proname) AS agg_functions
FROM (
SELECT proname, unnest(proargtypes::regtype[])::text AS type_id
FROM pg_proc
WHERE proisagg
ORDER BY 2, 1
) sub
GROUP BY type_id;
db<>fiddle here
Just a start. Some of the arguments are just "direct" (non-aggregated) (That's also why some functions are listed multiple times - due to those additional non-aggregate columns, example string_agg). And there are special cases for "ordered-set" and "hypothetical-set" aggregates. See the columns aggkind and aggnumdirectargs of the additional system catalog pg_aggregate. (You may want to exclude the exotic special cases for starters ...)
And many types have an implicit cast to one of the types listed by the query. Prominent example string_agg() works with varchar, too, but it's only listed for text above. You can extend the query with information from pg_cast to get the full picture.
Plus, some aggregates work for pseudo types "any", anyarray etc. You'll want to factor those in for every applicable data type.
The complication of multiple aliases for the same data type names can be eliminated easily, though: cast to regtype to get canonical names. Or use pg_typeof() which returns standard names. Related:
Type conversion. What do I do with a PostgreSQL OID value in libpq in C?
PostgreSQL syntax error in parameterized query on "date $1"
How do I translate PostgreSQL OID using python

Man, that is just stunning Thank you. The heat death of the universe will arrive before I could have figured that out. I had to tweak one line for PG 11 compatibility...says the guy who did not say what version he was on. I've reworked the query to get close to what I'm after and included a bit of output for the archives.
with aggregates as (
SELECT pro.proname aggregate_name,
CASE
WHEN array_agg(typ.typname ORDER BY proarg.position) = '{NULL}'::name[] THEN
'{}'::name[]
ELSE
array_agg(typ.typname ORDER BY proarg.position)
END aggregate_types
FROM pg_proc pro
CROSS JOIN LATERAL unnest(pro.proargtypes) WITH ORDINALITY proarg (oid,
position)
LEFT JOIN pg_type typ
ON typ.oid = proarg.oid
WHERE pro. prokind = 'a' -- I needed this for PG 11, I didn't say what version I was using.
GROUP BY pro.oid,
pro.proname
ORDER BY pro.proname),
-- The *super helpful* code above is _way_ past my skill level with Postgres. So, thrashing around a bit to get close to what I'm after.
-- First up, a CTE to sort everything by aggregation and then combine the types.
aggregate_summary as (
select aggregate_name,
array_agg(aggregate_types) as types_array
from aggregates
group by 1
order by 1)
-- Finally, the previous CTE is used to get the details and a count of the types.
select aggregate_name,
cardinality(types_array) as types_count, -- Couldn't get array_length to work here. ¯\_(ツ)_/¯
types_array
from aggregate_summary
limit 5;
And a bit of output:
aggregate_name types_count types_array
array_agg 2 {{anynonarray},{anyarray}}
avg 7 {{int8},{int4},{int2},{numeric},{float4},{float8},{interval}}
bit_and 4 {{int2},{int4},{int8},{bit}}
bit_or 4 {{int2},{int4},{int8},{bit}}
bool_and 1 {{bool}}
Still on my wish list are
Figuring out how to execute arrays (we aren't using array fields now, and only have a few places that we ever might. At that point, I don't expect we'll try and support pivots on arrays. tab tool
Getting all of the aliases for the various types. it seems like (?) int8, etc. can come through from pg_attribute in multiple ways. For example, timestamptz can come back from "timestamp with time zone".
These results are going to be consumed by client-side code and processed, so I don't need to get Postgres to figure everything out in one query, just enough for me to get the job done.
In any case, thanks very, very much.

There's the pg_proc catalog table, that lists all functions. The column proisagg marks aggregation functions and the column proargtypes holds an array of the OIDs of the argument types.
So for example to get a list of all aggregation functions with the names of their arguments' type you could use:
SELECT pro.proname aggregationfunctionname,
CASE
WHEN array_agg(typ.typname ORDER BY proarg.position) = '{NULL}'::name[] THEN
'{}'::name[]
ELSE
array_agg(typ.typname ORDER BY proarg.position)
END aggregationfunctionargumenttypes
FROM pg_proc pro
CROSS JOIN LATERAL unnest(pro.proargtypes) WITH ORDINALITY proarg (oid,
position)
LEFT JOIN pg_type typ
ON typ.oid = proarg.oid
WHERE pro.proisagg
GROUP BY pro.oid,
pro.proname
ORDER BY pro.proname;
Of course you may need to extend that, e.g. joining and respecting the schemas (pg_namespace) and check for compatible types in pg_type (have a look at the typcategory column for that), etc..
Edit:
I overlooked, that proisagg was removed in version 11 (I'm still mostly on a 9.6) as the other answers mentioned. So for the sake of completeness: As of version 11 replace WHERE pro.proisagg with WHERE pro.prokind = 'a'.

I've been playing around with the suggestions a bit, and want to post one adaptation based on one of Erwin's scripts:
select type_id::regtype::text as type_name,
array_agg(proname) as aggregate_names
from (
select proname,
unnest(proargtypes::regtype[])::text AS type_id
from pg_proc
where prokind = 'a'
order by 2, 1
) subquery
where type_id in ('"any"', 'bigint', 'boolean','citext','date','double precision','integer','interval','numeric','smallint',
'text','time with time zone','time without time zone','timestamp with time zone','timestamp without time zone')
group by type_id;
That brings back details on the types specified in the where clause. Not only is this useful for my current work, it's useful to my understanding generally. I've run into cases where I've had to recast something, like an integer to a double, to get it to work with an aggregate. So far, this has been pretty much trial and error. If you run the query above (or one like it), it's easier to see from the output where you need recasting between similar seeming types.

GROUP by aggregation does not calculate SUM

I have to aggregate in my query SUM of AMUNT field according to WERKS, DATUM and UZEIT
I try to make a group by without any success
I have an error like that:
What is the problem in my code?
That is my ABAP code:
DATA: gt_compr TYPE TABLE OF yrt_h_sales
SELECT werks, extnb, datum, uzeit, sumvt, deprt, dpext, SUM( amunt ) AS amunt
INTO CORRESPONDING FIELDS OF TABLE #gt_compr
FROM yrt_h_sales
WHERE werks IN #so_werks
AND datum IN #so_datum
GROUP BY werks, datum, uzeit.
After I corrected it and I did this, the code looks as follows:
SELECT werks, datum, uzeit, extnb, deprt, dpext, SUM( amunt ) AS amunt
INTO CORRESPONDING FIELDS OF TABLE #gt_compr
FROM yrt_h_sales
WHERE werks IN #so_werks
AND datum IN #so_datum
GROUP BY werks, datum, uzeit, extnb, deprt, dpext.
So I don't have the compilation error anymore but the aggregation is still not working! I have a 43 line result without sum on the AMUNT column
P.S. this is the structure of my table:

Your observation is consistent with the documentation (and what I have so far seen in any other RDBMS I've worked with):
If aggregate expressions are used, any column identifiers that are not
included as arguments of an aggregate function must be included after
the addition GROUP BY.
Take for example the time field UZEIT: You can tell the system to aggregate (in your case, sum up) all amounts for the same point in time by adding it to the GROUP BY clause, or you can apply an aggregate function as well (SUM would not make any sense here, but MIN might), or you could omit the field altogether. You can not leave it dangling around without further specification - the field either needs to be part of the new key set created by GROUP BY or has to have an aggregate function applied to it so that the system knows what to do with multiple datasets that might occur in the group.
(This is basic SQL btw and not ABAP-specific knowledge.)

remove the CORRESPONDING FIELD OF and just place results INTO TABLE

SQL to powerBI expression?

How to write this expression in PowerBI
select distinct([date]),Temperature from Device47A8F where Temperature>25
Totally new to PowerBI. Is there any tool that can change the query from sql to PowerBI expression?
I have tried so many type of different type of expressions but getting error, Most of the time I am getting this:
The expression refers to multiple columns. Multiple columns cannot be converted to a scalar value.
Need help, Thanks.

After I posted my answer, wondered if your expected result is get only one date by temperature, In other words, without repeated dates in your result set.
A side note: select distinct([date]),Temperature from Device47A8F where Temperature>25 returns repeated dates since DISTINCT keyword evaluate distinct columns values specified in the SELECT statement, it doesn't return distinct values in a specific column even if you surround it with parenthesis.
Now what brings us here. What I can see in your error is that you are trying to use a table-valued (produces a table with multiple columns) expression in a measure which only accepts scalar-valued (calculate only one value).
Supposing you have a table like this:
Running your SQL query you will get the highlighted in yellow rows:
You can see 01/09/2016 date is repeated. If you want to create a measure you have to define what calculation you want to show for temperature. i.e, average, max or min etc.
In the below expression is being calculated the maximum temperature greater than 25 per date:
MaxTempGreaterThan25 =
CALCULATE ( MAX ( Device47A8F[Temperature] ), Device47A8F[Temperature] > 25 )
In this case the measure MaxTempGreaterThan25 is calculated per date.
If you don't want to produce a measure but a table. In the Power BI Tool bar select Modeling tab and click New Table icon.
Use this expression:
MyTemperatureTable =
FILTER ( Device47A8F, Device47A8F[Temperature] > 25 )
It should produce a new table named MyTemperatureTable like this:
I recommend you learn some basics about DAX, it is pretty different from SQL / T-SQL and there are things you can't do depending on your model and data.
Let me know if this helps.

You probably don't need to write any code if your objective is to show the result in a Power BI visual e.g. a table. Power BI naturally aggregates data if the datatype is numeric (e.g. Temperature).
I would just add a Table visual on a Report page and add the Date and Temperature columns to it. Then in Visualizations / Fields / Values I would click the little down-arrow on the Temperature field and set the Aggregation e.g. Maximum. Then in Visualizations / Fields / Filters I would click the little down-arrow on the Temperature field and set the Filter e.g. is greater than: 25
Hard-coded solutions are unlikely to survive the next question from your users e.g. "but what if I want to see Temperature > 24? Or 20? Or 30?"

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How I fixed the incompatible error in PostgreSQL while pivoting? - sql

Related

Finding the last 4, 3, 2, 1 months consecutive order drops among clients based on drop variance

How to parse string from one column into delimited values in SQL

Starting from a column type, how to find supported aggregations in Postgres?

GROUP by aggregation does not calculate SUM

SQL to powerBI expression?

Categories

Resources