How to use GROUP BY to concatenate strings in MySQL? - sql

Basically the question is how to get from this:
foo_id foo_name
1 A
1 B
2 C
to this:
foo_id foo_name
1 A B
2 C

SELECT id, GROUP_CONCAT(name SEPARATOR ' ') FROM table GROUP BY id;
https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_group-concat
From the link above, GROUP_CONCAT: This function returns a string result with the concatenated non-NULL values from a group. It returns NULL if there are no non-NULL values.

SELECT id, GROUP_CONCAT( string SEPARATOR ' ') FROM table GROUP BY id
More details here.
From the link above, GROUP_CONCAT: This function returns a string result with the concatenated non-NULL values from a group. It returns NULL if there are no non-NULL values.

SELECT id, GROUP_CONCAT(name SEPARATOR ' ') FROM table GROUP BY id;
:-
In MySQL, you can get the concatenated values of expression combinations
. To eliminate duplicate values, use the DISTINCT clause. To sort values in the result, use the ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause. The default is ascending order; this may be specified explicitly using the ASC keyword. The default separator between values in a group is comma (“,”). To specify a separator explicitly, use SEPARATOR followed by the string literal value that should be inserted between group values. To eliminate the separator altogether, specify SEPARATOR ''.
GROUP_CONCAT([DISTINCT] expr [,expr ...]
[ORDER BY {unsigned_integer | col_name | expr}
[ASC | DESC] [,col_name ...]]
[SEPARATOR str_val])
OR
mysql> SELECT student_name,
-> GROUP_CONCAT(DISTINCT test_score
-> ORDER BY test_score DESC SEPARATOR ' ')
-> FROM student
-> GROUP BY student_name;

The result is truncated to the maximum length that is given by the group_concat_max_len system variable, which has a default value of 1024 characters, so we first do:
SET group_concat_max_len=100000000;
and then, for example:
SELECT pub_id,GROUP_CONCAT(cate_id SEPARATOR ' ') FROM book_mast GROUP BY pub_id

SELECT id, GROUP_CONCAT(CAST(name as CHAR)) FROM table GROUP BY id
Will give you a comma-delimited string

Great answers.
I also had a problem with NULLS and managed to solve it by including a COALESCE inside of the GROUP_CONCAT. Example as follows:
SELECT id, GROUP_CONCAT(COALESCE(name,'') SEPARATOR ' ')
FROM table
GROUP BY id;
Hope this helps someone else

Related

Getting all the occurrences of a substring within a string

How can I get all the occurrences of a substring within a string in PostgreSQL?
I have this string for an ID:
BS Score xxxxxxx075SCxxxBS Score xxxxxxx062SCxxxBS Score xxxxxxx115SCxxx
And I would like to get the numbers in the string for the ID, so the result can look like this:
You can use regexp_matches:
select id intl, regexp_replace(v[1], '^0+', '') values from tbl
cross join regexp_matches(id, '(\d+)', 'g') v
See fiddle.

tricky SQL with substrings

I have a table (postgres) with a varchar field that has content structured like:
".. John;Smith;uuid=7c32e9e1-e29e-4211-b11e-e20b2cb78da9 .."
The uuid can occur in more than one record. But it must not occur for more than one combination of [givenname];[surname], according to a business rule.
That is, if the John Smith example above is present in the table, then if uuid 7c32e9e1.. occurs in any other record, the field in that record most also contain ".. John;Smith; .."
The problem is, this business rule has been violated due to some bug. And I would like to know how many rows in the table contains a uuid such that it occurs in more than one place with different combinations of [givenname];[surname].
I'd appreciate if someone could help me out with the SQL to accomplish this.
Use regular expressions to extract the UUID and the name from the string. Then aggregate per UUID and either count distinct names or compare minimum and maximum name:
select
substring(col, 'uuid=([[:alnum:]]+)') as uuid,
string_agg(distinct substring(col, '([[:alnum:]]+;[[:alnum:]]+);uuid'), ' | ') as names
from mytable
group by substring(col, 'uuid=([[:alnum:]]+)')
having count(distinct substring(col, '([[:alnum:]]+;[[:alnum:]]+);uuid')) > 1;
Demo: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=907a283a754eb7427d4ffbf50c6f0028
If you only want to count:
select
count(*) as cnt_uuids,
sum(num_names) as cnt_names,
sum(num_rows) as cnt_rows
from
(
select
count(*) as num_rows,
count(distinct substring(col, '([[:alnum:]]+;[[:alnum:]]+);uuid')) as num_names
from mytable
group by substring(col, 'uuid=([[:alnum:]]+)')
having count(distinct substring(col, '([[:alnum:]]+;[[:alnum:]]+);uuid')) > 1
) flaws;
But as has been mentioned already: This is not how a database should be used.
I assume you know all the reasons why this is a bad data format, but you are stuck with it. Here is my approach:
select v.user_id, array_agg(distinct names)
from (select v.id,
max(el) filter (where n = un) as user_id,
array_agg(el order by el) filter (where n in (un - 2, un - 1)) as names
from (select v.id, u.*,
max(u.n) filter (where el like 'uuid=%') over (partition by v.id) as un
from (values (1 , 'junkgoeshere;John;Smith;uuid=7c32e9e1-e29e-4211-b11e-e20b2cb78da9; ..'),
(2 , 'junkgoeshere;John;Smith;uuid=7c32e9e1-e29e-4211-b11e-e20b2cb78da9; ..'),
(3 , 'junkgoeshere;John;Smith;uuid=new_7c32e9e1-e29e-4211-b11e-e20b2cb78da9; ..'),
(4 , 'junkgoeshere;John;Jay;uuid=new_7c32e9e1-e29e-4211-b11e-e20b2cb78da9; ..')
) v(id, str) cross join lateral
unnest(regexp_split_to_array(v.str, ';')) with ordinality u(el, n)
) v
where n between un - 2 and un
group by v.id
) v
group by user_id
having min(names) <> max(names);
Here is a db<>fiddle.
This assumes that the fields are separated by semicolons. Your data format is just awful, not just as a string but because the names are not identified. So, I am assuming they are the two fields before the user_id field.
So, this implements the following logic:
Breaks up the string by semicolons, with an identifying number.
Finds the number for the user_id.
Extracts the previous two fields together and the user_id column.
Then uses aggregation to find cases where there are multiple matches.

Loop Through a Table to concatenate Rows

I have a table of similar structure:
Name Movies_Watched
A Terminator
B Alien
A Batman
B Rambo
B Die Hard
....
I am trying to get this:
Name Movies_Watched
A Terminator;Batman
B Alien, Die Hard, Rambo
My initial guess was:
SELECT Name, Movies_Watched || Movies_Watched from TABLE
But obviously that's wrong. Can someone tell me how can I loop through the 2nd column and concatenate them? What's the logic like?
Got to know that group_concat is the right approach. But haven't been able to figure it out yet. When I've tried:
select name, group_concat(movies_watched) from table group by 1
But it throws an error saying User-defined transform function group_concat must have an over clause
You are looking for string_agg():
select name, string_agg(movie_watched, ';') as movies_watched
from t
group by name;
That said, you are using Postgres, so you should learn how to use arrays instead of strings for such things. For instance, there is no confusion with arrays when the movie name has a semicolon. That would be:
select name, array_agg(movie_watched) as movies_watched
from t
group by name;
use array_agg
SELECT Name, array_agg(Movies_Watched)
FROM data_table
GROUP BY Name
i think you need listagg or group_concat as you are using vertica upper is postgrey solution
SELECT Name, listagg(Movies_Watched)
FROM data_table
GROUP BY Name
or
select Name,
group_concat(Movies_Watched) over (partition by Name order by name) ag
from mytable
As already mentioned, in Vertica it's LISTAGG():
WITH
input(nm,movies_watched) AS (
SELECT 'A','Terminator'
UNION ALL SELECT 'B','Alien'
UNION ALL SELECT 'A','Batman'
UNION ALL SELECT 'B','Rambo'
UNION ALL SELECT 'B','Die Hard'
)
SELECT
nm AS "Name"
, LISTAGG(movies_watched) AS movies_watched
FROM input
GROUP BY nm;
-- out Name | movies_watched
-- out ------+----------------------
-- out A | Terminator,Batman
-- out B | Alien,Rambo,Die Hard
-- out (2 rows)
-- out
-- out Time: First fetch (2 rows): 12.735 ms. All rows formatted: 12.776 ms

pgSQL order by desc not work correctly or what I did is wrong?

I have data store in a column as text
806,1250,1225,1080,1891,1878,1243,391,218,1505,1425,586,1801,860,323,1108,1130,1150,1060,1059
I want to select order by desc using this query
SELECT unnest(string_to_array(q.mycolumn, ',')) id FROM mytable q ORDER BY id DESC;
But the other now show correctly as bellow
"860"
"806"
"586"
"391"
"323"
"218"
"1891"
"1878"
"1801"
"1505"
"1425"
"1250"
"1243"
"1225"
"1150"
"1130"
"1108"
"1080"
"1060"
"1059"
Use cast to int to achieve the desired ordering. As you have it, the numbers are treated as characters.
SELECT unnest(string_to_array(q.mycolumn, ',')) id
FROM mytable q
ORDER BY cast(unnest(string_to_array(q.mycolumn, ',')) as int) DESC;

LIMIT ignored in query with GROUP_CONCAT

I need to select some rows from second table and concatenate them in comma-separated string. Query works well except one problem - It always selects all rows and ignores LIMIT.
This is part of my query which gets that string and ignores LIMIT:
select
group_concat(value order by `order` asc SEPARATOR ', ')
from slud_data
left join slud_types on slud_types.type_id=slud_data.type_id
where slud_data.product_id=18 and value!='' and display=0 limit 3;
// Result:
+---------------------------------------------------------+
| group_concat(value order by `order` asc SEPARATOR ', ') |
+---------------------------------------------------------+
| GA-XXXX, Bentley, CONTINENTAL FLYING SPUR, 2006 |
+---------------------------------------------------------+
// Expected result: (only 3 comma-separated records, not 4)
Full query:
SELECT *,product_id id,
(select group_concat(value order by `order` asc SEPARATOR ', ') from slud_data left join slud_types on slud_types.type_id=slud_data.type_id where slud_data.product_id=t1.product_id and value!='' and display=0 limit 3) text
FROM slud_products t1
WHERE
now() < DATE_ADD(date,INTERVAL +ttl DAY) and activated=1
ORDER BY t1.date desc
The LIMIT clause limits the number of rows in the final result set, not the number of rows used to construct the string in the GROUP_CONCAT. Since your query returns only one row in the final result the LIMIT has no effect.
You can solve your issue by constructing a subquery with LIMIT 3, then in an outer query apply GROUP_CONCAT to the result of that subquery.
Your query is not working as you intended for the reasons #Mark Byers outlined in the other answer. You may want to try the following instead:
SELECT GROUP_CONCAT(`value` ORDER BY `order` ASC SEPARATOR ', ')
FROM (
SELECT `value`, `order`
FROM slud_data
LEFT JOIN slud_types ON slud_types.type_id = slud_data.type_id
WHERE slud_data.product_id = 18 AND value != '' AND display = 0
LIMIT 3
) a;
An example of Mark Byers idea:
SELECT GROUP_CONCAT(id, '|', name)
FROM (
SELECT id, name
FROM users
LIMIT 3) inner