How to pivot rows into a single comma separated string - sql

Important update:
When I try to use the suggested string_agg method I get this error - Specified types
or functions (one per INFO message) not supported on Redshift tables.
Original question
I have a query but I'm struggling to "pivot" multiple rows into a single column of strings.
I have a member and a category table and each member can have multiple categories (this is a simplification of the scenario).
So I need to write a query to display what categories each member has, so each member has multiple categories. When I was working in the Microsoft world I was able to use pivot but now in Postgres I'm not able to find an equivalent method.
I've seen references to crosstab and a few other methods but when trying I get errors saying the function isn't recognised.
My attempt!
select
m.member_id,
array.join(c.category, ",") -- this is more like a programming approach but I need something similar to this
from member m
from join category c ON c.member_id = m.id
group by 1
Example with dataset
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=8ea4998f75f7db83d2360ff01bf02c82
I'm using Navicat Premium as my "editor"
A second attempt
select b.member_id, string_agg(distinct c.name, ',')
from bookings b
join category c on c.member_id = b.member_id
group by 1

Redshift doesn't support string_agg() function but has the listagg() function which I believe is equivalent. See: https://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html
Listagg() support DISTINCT and even has a window function form. Does this not produce your desired results?
select b.member_id, listagg(distinct c.name, ',')
from bookings b
join category c on c.member_id = b.member_id
group by 1;
As for the error message in the update, that is Redshift's cryptic way to say that you have attempted to perform a leader node only operation on a compute node (or something of that ilk). I don't see why you would get that unless string_agg() is supported as a leader only operation (generate_series() is an example of a function only supported on the leader node).

Related

columns selected neither in GROUP BY cause or aggregate function?

I have a database with cats, toys and their relationship cat_toys
To find the names of the cats with more than 5 toys, i have the following query:
select
cats.name
from
cats
join
cat_toys on cats.id = cat_toys.cat_id
group by
cats.id
having
count(cat_toys.toy_id) > 7
order by
cats.name
Column cats.name does not appear in the group by or be used in aggregate function, but this query works. in contrast, I cannot select anything in cat_toys table.
Is this something special with psql?
The error message is trying to tell you. It is a general requirement in SQL that you need to list in the group by clause all non-aggregaed columns that belong to the select clause.
Postgres, unlike most other databases, is a bit more clever about that, and understands the notion of functionaly-dependent column: since you are grouping by the primary key of the cats table, you are free to add any other column from that table (since they are functionaly dependent on the primary key). This is why your existing query works.
Now if you want to bring values from the cast_toys table, it is different. There are potentially multiple rows in this table for each row in cats, which, as a consequence, are not functionaly dependent on cats.id. If you still want one row per cat, you need to make use of an aggregate function.
As an example, this generates a comma-separated list of all toy_ids that relate to each cat:
select c.name, string_agg(ct.toy_id, ', ') toy_ids
from cats c
inner join cat_toys ct on t.id = ct.cat_id
group by c.id
having count(*) > 7
order by c.name
Side notes:
table aliases make the query easier to write and read
for this query, I recommend count(*) instead of count(cat_toys.toy_id); this produces the same result (unless you have null values in cat_toys.toy_id, which seems unlikely here), and incurs less work for the database (since it does not need to check each value in the column against null)
This is your query:
select c.name
from cats c join
cat_toys ct
on c.id = ct.cat_id
group by c.id
having count(ct.toy_id) > 7
order by c.name;
You are asking why it works: You are rightly observing that c.id is in the group by but not in the select -- and another column is in the select. Seems wrong. But it isn't. Postgres supports a little known part of the standard, related to functional dependency in aggregation queries.
Let me avoid the technical jargon. cats.id is the primary key of cats. That means the id is unique, so knowing the id specifies all other columns from cats. The database knows this -- that it, it knows that the value of name is always the same for a given id. So, by aggregating on the primary key, you can access the other columns without using aggregation functions -- and it is consistent with the standard.
This is explained in the documentation:
When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

Unnesting 3rd level dependency in Google BigQuery

I'm trying to Replace the schema in existing table using BQ. There are certain fields in BQ which have 3-5 level schema dependency.
For Ex. comsalesorders.comSalesOrdersInfo.storetransactionid this field is nested under two fields.
Since I'm using this to replace existing table, I can not change the field names in query.
The query looks similar to this
SELECT * REPLACE(comsalesorders.comSalesOrdersInfo.storetransactionid AS STRING) FROM CentralizedOrders_streaming.orderStatusUpdated, UNNEST(comsalesorders) AS comsalesorders, UNNEST(comsalesorders.comSalesOrdersInfo) AS comsalesorders.comSalesOrdersInfo
BQ enables unnesting first schema field but presents problem for 2nd nesting.
What changes do I need to make to this query to use UNNEST() for such depedndent schemas ?
Given that you don't have a schema, I will try to provide a generalized answer. Please try to understand the difference between the 2 queries.
-- Provide an alias for each unnest (as if each is a separate table)
select c.stuff
from table
left join unnest(table.first_level_nested) a
left join unnest(a.second_level_nested) b
left join unnest(b.third_level_nested) c
-- b and c won't work here because you are 'double unnesting'
select c.stuff
from table
left join unnest(table.first_level_nested) a
left join unnest(first_level_nested.second_level_nested) b
left join unnest(first_level_nested.second_level_nested.third_level_nested) c
I'm not sure I understand your question, but as I could guess, you want to change one column type to another type, such as STRING.
The UNNEST function is only used with columns that are array types, for example:
"comsalesorders":["comSalesOrdersInfo":{}, comSalesOrdersInfo:{}, comSalesOrdersInfo:{}]
But not with this kind of columns:
"comSalesOrdersInfo":{"storeTransactionID":"X1056-943462","ItemsWarrenty":0,"currencyCountry":"USD"}
Therefore, if a didn't misunderstand your question, I would make a query like this:
SELECT *, CAST(A.comSalesOrdersInfo.storeTransactionID as STRING)
FROM `TABLE`, UNNEST(comsalesorders) as A

Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.[ntext] [duplicate]

This question already has an answer here:
GROUP BY for ntext data
(1 answer)
Closed 5 years ago.
I am new to sql so sorry if the answer is obvious but i couldn't find it anywhere.
So i want to select the CategoryName,Description and the average price of the products that are in the same category.Below is the picture of the 2 tables involved.The problem is the description i cant find a way to show it.
(There are 8 categories and every category has 1 description)
This is the code I have made so far but it has the error:
SELECT c.CategoryName,c.Description,avg(p.UnitPrice)
FROM Categories AS c
INNER JOIN Products AS p ON c.CategoryID=p.CategoryID
GROUP BY c.CategoryName
The error:
Column 'Categories.Description' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Sorry for my bad english :/
(n)text can't be used in aggregate or window functions. It's also been deprecated since SQL Server 2008 (if I recall correctly, possibly 2005). You should really be using (n)varchar(MAX).
If you really "have" to use (n)text then you'll need to do your aggregate first, and then retrieve the value of your (n)text column:
WITH Averages AS (
SELECT p.CategoryID, avg(p.UnitPrice) AS AveragePrice
FROM p.CategoryID
GROUP BY p.CategoryID)
SELECT C.CategoryName, C.Description, A.AveragePrice
FROM Averages A
JOIN Categories C ON A.CategoryID = C.CategoryID;
(Note, this is untested due to lack of DDL and Sample Data)

INNER JOIN Syntax Error

I would like to JOIN 2 databases.
1 database is keyword_data (keyword mapping)
1 database is filled with Google rankings and other metrics
Somehow I cannot JOIN these two databases.
Some context:
DATA SET NAME: visibility
TABLE 1
keyword_data
VALUES
keyword
universe
category
search_volume
cpc
DATA SET NAME: visibility
TABLE 2
results
VALUES
Date
Keyword
Website
Position
In order to receive ranking data by date I wrote the following SQL line.
SELECT Date, Position, Website FROM `visibility.results` Keyword INNER
JOIN `visibility.keyword_data` keyword ON `visibility.results` Keyword
= `visibility.keyword_data` keyword GROUP BY Date;
(besides that, 100 other lines with no success ;-) )
I am using Google BigQuery for this with standard SQL (unchecked Legacy SQL).
How can I JOIN those 2 data tables?
How familiar are you with SQL? I think you're using aliases wrong, something like this should work
SELECT r.Date, r.Position, r.Website
FROM `visibility.results` AS r
INNER JOIN `visibility.keyword_data` AS k
ON r.Keyword = k.keyword
GROUP BY DATE
First of all i have never worked with Google big query but there is a couple of things wrong in my opinion with this query.
To start with you join tables by including the name of the table then you provide the key that the tables are joined by. Also if you don't use aggregate functions (MIN/MAX etc.) in your select statement you must include all values in the group by clause as well. In reference I can provide you a solution that would work if you would of used Microsoft SQL Server if that would be of any help because if you reference here the syntax is quite similar.
SELECT results.Date AS DATE,
,results.Position AS POSITION
,results.Website AS WEBSITE
FROM visibility.dbo.keyword_data AS keyword_data
INNER JOIN visibility.dbo.results AS results
ON results.keyword = keyword_data.keyword
GROUP BY results.Date
,results.Position
,results.Website

Equivalent function to STUFF in SQL (GROUP_CONCAT in MySSQL / LISTAGG in Oracle)

Does anyone know if Firebird 2.5 has a function similar to the "STUFF" function in SQL?
I have a table which contains parent user records, and another table which contains child user records related to the parent. I'd like to be able to pull a comma delimited string of the "ROLES" the user has without having to use a second query, loop over the values returned for the given ID and create the string myself.
I've searched for any other related questions, but have not found any.
The question in this link string equivalent of Sum to concatenate is basically what I want to do too, but with the Firebird 2.5 database.
It looks like you are in luck - Firebird 2.1 introduced a LIST() aggregate function which works like GROUP_CONCAT in MySQL, which allows a query like so:
SELECT p.Name, LIST(c.Name, ', ')
FROM parent p INNER JOIN child c on c.parentid = p.parentid
GROUP by p.Name;
Edit, re Ordering
You may be able to influence ordering by pre-ordering the data in a derived table, prior to applying the LIST aggregation function, like so:
SELECT x.ParentName, LIST(x.ChildName, ', ')
FROM
(
SELECT p.Name as ParentName, c.Name as ChildName
FROM parent p INNER JOIN child c on c.parentid = p.parentid
ORDER BY c.Name DESC
) x
GROUP by x.ParentName;