Concatenate rows in function PostgreSQL - sql

Assume there's a table projects containing project name, location, team id, start and end years. How can I concatenate rows so that the same names would combine the other information into one string?
name location team_id start end
Library Atlanta 2389 2015 2017
Library Georgetown 9920 2003 2007
Museum Auckland 3092 2005 2007
Expected output would look like this:
name Records
Library Atlanta, 2389, 2015-2017
Georgetown, 9920, 2003-2007
Museum Auckland, 3092, 2005-2007
Each line should contain end-of-line / new line character.
I have a function for this, but I don't think it would work with just using CONCAT. What are other ways this can be done? What I tried:
CREATE OR REPLACE TYPE projects (name TEXT, records TEXT);
CREATE OR REPLACE FUNCTION records (INT)
RETURNS SETOF projects AS
$$
RETURN QUERY
SELECT p.name
CONCAT(p.location, ', ', p.team_id, ', ', p.start, '-', p.end, CHAR(10))
FROM projects($1) p;
$$
LANGUAGE PLpgSQL;
I tried using CHAR(10) for new line, but its giving a syntax error (not sure why?).
The above sample concatenate the string but expectedly leaving out duplicated names.

You do not need PL/pgSQL for that.
First eliminate duplicate names using DISTINCT and then in a subquery you can concat the columns into a single string. After that use array_agg to create an array out of it. It will then "merge" multiple arrays, in case the subquery returns more than one row. Finally, get rid of the commas and curly braces using array_to_string. Instead of using the char value of a newline, you can simply use E'\n' (E stands for escape):
WITH j (name,location,team_id,start,end_) AS (
VALUES ('Library','Atlanta',2389,2015,2017),
('Library','Georgetown',9920,2003,2007),
('Museum','Auckland',3092,2005,2007)
)
SELECT
DISTINCT q1.name,
array_to_string(
(SELECT array_agg(concat(location,', ',team_id,', ',start,'-', end_, E'\n'))
FROM j WHERE name = q1.name),'') AS records
FROM j q1;
name | records
---------+----------------------------
Library | Atlanta, 2389, 2015-2017
| Georgetown, 9920, 2003-2007
|
Museum | Auckland, 3092, 2005-2007
Note: try to not use reserved strings (e.g. end,name,start, etc.) to name your columns. Although PostgreSQL allows you to use them, it is considered a bad practice.
Demo: db<>fiddle

A bit simple query:
select
name,
string_agg( concat(location, ', ', team_id, ', ', start, '-', "end"), E'\n') AS records
FROM t
group by name;
PostgreSQL fiddle

Related

PostgreSQL SQL query to find number of occurrences of substring in string

I’m trying to wrap my head around a problem but I’m hitting a blank. I know SQL quite well, but I’m not sure how to approach this.
My problem:
Given a string and a table of possible substrings, I need to find the number of occurrences.
The search table consists of a single colum:
searchtable
| pattern TEXT PRIMARY KEY|
|-------------------------|
| my |
| quick |
| Earth |
Given the string "Earth is my home planet and where my friends live", the expected outcome is 3 (2x "my" and 1x "Earth").
In my function, I have variable bodytext which is the string to examine.
I know I can do IN (SELECT pattern FROM searchtable) to get the list of substrings, and I could possibly use a LIKE ANY clause to get matches, but how can I count occurrences of the substrings in the table within the search string?
This is easily done without a custom function:
select count(*)
from (values ('Earth is my home planet and where my friends live')) v(str) cross join lateral
regexp_split_to_table(v.str, ' ') word join
patterns p
on word = p.pattern
Just break the original string into "words". Then match on the words.
Another method uses regular expression matching:
select (select count(*) from regexp_matches(v.str, p.rpattern, 'g'))
from (values ('Earth is my home planet and where my friends live')) v(str) cross join
(select string_agg(pattern, '|') as rpattern
from patterns
) p;
This stuffs all the patterns into a regular expression. Not that this version does not take word breaks into account.
Here is a db<>fiddle.
I solved the problem with the following code:
CREATE OR REPLACE FUNCTION count_matches(body TEXT, OUT matches INTEGER) AS $$
DECLARE
results INTEGER := 0;
matchlist RECORD;
BEGIN
FOR matchlist IN (SELECT pattern FROM searchtable)
LOOP
results := results + (SELECT LENGTH(body) -
LENGTH(REPLACE(body, matchlist.pattern, ''))) /
LENGTH(matchlist.pattern);
END LOOP;
matches := results;
END;
$$ LANGUAGE plpgsql;

Extract unmatched content or values

I want to extract the un-matched values in data like in (table1)
name id subject
maria 01 Math computer english
faro 02 Computer stat english
hina 03 Chemistry physics bio
The below query
Select *
from table1
where subject like ‘%english%’ or
subject like ‘%stat%’
returns first two rows that are matched with the criteria.
But I just need to extract the un-matched values from column (subject) like below output
unmatched
math computer
computer
chemistry physics bio
(Because in the first row only math computer values are not matching, in the second row two matches and in third row there are no matches).
can i get that output??
With REPLACE you eliminate all occurrences of the values 'english' and/or 'stat':
SELECT
trim(
replace(replace(replace(subject, 'english', ''), 'stat', ''), ' ', '')
) unmatched
FROM tablename;
The final trim and replace will remove double spaces from the result and spaces from the start and the end.
You have a poor table design. You should be storing lists as separate rows in another table -- a so-called "junction" or "association" table. SQL has a great data type for storing lists. It is called a "table" not a "string".
That said, sometimes we are stuck with other peoples really, really bad choices of data model.
If so, you can use replace() and trim() to get the list you want. I would do:
SELECT trim(replace(replace(' ' || subject || ' ', ' english ', ' '
), ' stat ', ''
), ' ', ' '
) as unmatched
FROM tablename;
This easily generalizes to more values, without worrying about introducing adjacent spaces.

Merged multiple values in one record value using SQL [duplicate]

I have a table and I'd like to pull one row per id with field values concatenated.
In my table, for example, I have this:
TM67 | 4 | 32556
TM67 | 9 | 98200
TM67 | 72 | 22300
TM99 | 2 | 23009
TM99 | 3 | 11200
And I'd like to output:
TM67 | 4,9,72 | 32556,98200,22300
TM99 | 2,3 | 23009,11200
In MySQL I was able to use the aggregate function GROUP_CONCAT, but that doesn't seem to work here... Is there an equivalent for PostgreSQL, or another way to accomplish this?
Since 9.0 this is even easier:
SELECT id,
string_agg(some_column, ',')
FROM the_table
GROUP BY id
This is probably a good starting point (version 8.4+ only):
SELECT id_field, array_agg(value_field1), array_agg(value_field2)
FROM data_table
GROUP BY id_field
array_agg returns an array, but you can CAST that to text and edit as needed (see clarifications, below).
Prior to version 8.4, you have to define it yourself prior to use:
CREATE AGGREGATE array_agg (anyelement)
(
sfunc = array_append,
stype = anyarray,
initcond = '{}'
);
(paraphrased from the PostgreSQL documentation)
Clarifications:
The result of casting an array to text is that the resulting string starts and ends with curly braces. Those braces need to be removed by some method, if they are not desired.
Casting ANYARRAY to TEXT best simulates CSV output as elements that contain embedded commas are double-quoted in the output in standard CSV style. Neither array_to_string() or string_agg() (the "group_concat" function added in 9.1) quote strings with embedded commas, resulting in an incorrect number of elements in the resulting list.
The new 9.1 string_agg() function does NOT cast the inner results to TEXT first. So "string_agg(value_field)" would generate an error if value_field is an integer. "string_agg(value_field::text)" would be required. The array_agg() method requires only one cast after the aggregation (rather than a cast per value).
SELECT array_to_string(array(SELECT a FROM b),', ');
Will do as well.
Try like this:
select field1, array_to_string(array_agg(field2), ',')
from table1
group by field1;
Assuming that the table your_table has three columns (name, id, value), the query is this one:
select name,
array_to_string(array_agg(id), ','),
array_to_string(array_agg(value), ',')
from your_table
group by name
order by name
;
"TM67" "4,9,72" "32556,98200,22300"
"TM99" "2,3" "23009,11200"
KI
and the version to work on the array type:
select
array_to_string(
array(select distinct unnest(zip_codes) from table),
', '
);
My sugestion in postgresql
SELECT cpf || ';' || nome || ';' || telefone
FROM (
SELECT cpf
,nome
,STRING_AGG(CONCAT_WS( ';' , DDD_1, TELEFONE_1),';') AS telefone
FROM (
SELECT DISTINCT *
FROM temp_bd
ORDER BY cpf DESC ) AS y
GROUP BY 1,2 ) AS x
In my experience, I had bigint as column type. So The below code worked for me. I am using PostgreSQL 12.
Type cast is happening here. (::text).
string_agg(some_column::text, ',')
Hope below Oracle query will work.
Select First_column,LISTAGG(second_column,',')
WITHIN GROUP (ORDER BY second_column) as Sec_column,
LISTAGG(third_column,',')
WITHIN GROUP (ORDER BY second_column) as thrd_column
FROM tablename
GROUP BY first_column

how to join several row values in one row in postgresql? [duplicate]

I have a table and I'd like to pull one row per id with field values concatenated.
In my table, for example, I have this:
TM67 | 4 | 32556
TM67 | 9 | 98200
TM67 | 72 | 22300
TM99 | 2 | 23009
TM99 | 3 | 11200
And I'd like to output:
TM67 | 4,9,72 | 32556,98200,22300
TM99 | 2,3 | 23009,11200
In MySQL I was able to use the aggregate function GROUP_CONCAT, but that doesn't seem to work here... Is there an equivalent for PostgreSQL, or another way to accomplish this?
Since 9.0 this is even easier:
SELECT id,
string_agg(some_column, ',')
FROM the_table
GROUP BY id
This is probably a good starting point (version 8.4+ only):
SELECT id_field, array_agg(value_field1), array_agg(value_field2)
FROM data_table
GROUP BY id_field
array_agg returns an array, but you can CAST that to text and edit as needed (see clarifications, below).
Prior to version 8.4, you have to define it yourself prior to use:
CREATE AGGREGATE array_agg (anyelement)
(
sfunc = array_append,
stype = anyarray,
initcond = '{}'
);
(paraphrased from the PostgreSQL documentation)
Clarifications:
The result of casting an array to text is that the resulting string starts and ends with curly braces. Those braces need to be removed by some method, if they are not desired.
Casting ANYARRAY to TEXT best simulates CSV output as elements that contain embedded commas are double-quoted in the output in standard CSV style. Neither array_to_string() or string_agg() (the "group_concat" function added in 9.1) quote strings with embedded commas, resulting in an incorrect number of elements in the resulting list.
The new 9.1 string_agg() function does NOT cast the inner results to TEXT first. So "string_agg(value_field)" would generate an error if value_field is an integer. "string_agg(value_field::text)" would be required. The array_agg() method requires only one cast after the aggregation (rather than a cast per value).
SELECT array_to_string(array(SELECT a FROM b),', ');
Will do as well.
Try like this:
select field1, array_to_string(array_agg(field2), ',')
from table1
group by field1;
Assuming that the table your_table has three columns (name, id, value), the query is this one:
select name,
array_to_string(array_agg(id), ','),
array_to_string(array_agg(value), ',')
from your_table
group by name
order by name
;
"TM67" "4,9,72" "32556,98200,22300"
"TM99" "2,3" "23009,11200"
KI
and the version to work on the array type:
select
array_to_string(
array(select distinct unnest(zip_codes) from table),
', '
);
My sugestion in postgresql
SELECT cpf || ';' || nome || ';' || telefone
FROM (
SELECT cpf
,nome
,STRING_AGG(CONCAT_WS( ';' , DDD_1, TELEFONE_1),';') AS telefone
FROM (
SELECT DISTINCT *
FROM temp_bd
ORDER BY cpf DESC ) AS y
GROUP BY 1,2 ) AS x
In my experience, I had bigint as column type. So The below code worked for me. I am using PostgreSQL 12.
Type cast is happening here. (::text).
string_agg(some_column::text, ',')
Hope below Oracle query will work.
Select First_column,LISTAGG(second_column,',')
WITHIN GROUP (ORDER BY second_column) as Sec_column,
LISTAGG(third_column,',')
WITHIN GROUP (ORDER BY second_column) as thrd_column
FROM tablename
GROUP BY first_column

How to delete all non-numerical letters in db2

I have some data in DATA column (varchar) that looks like this:
Nowshak 7,485 m
Maja e Korabit (Golem Korab) 2,764 m
Tahat 3,003 m
Morro de Moco 2,620 m
Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)
Mount Kosciuszko 2,229 m
Grossglockner 3,798 m
What I want is this:
7485
2764
3003
2620
6960
2229
3798
Is there a way in IBM DB2 version 9.5 to remove/delete all those non-numeric letters by doing something like this:
SELECT replace(DATA, --somekind of regular expression--, '') FROM TABLE_A
or any other ways?
This question follows from this question.
As suggested in the other question, the TRANSLATE function might help. For example, try this:
select translate('Nowshak 7,485 m','','Nowshakm,') from sysibm.sysdummy1;
Returns:
7 485
Probably with a little tweaking you can get it to how you want it...in the third argument of the function you just need to specify the entire alphabet. Kind of ugly but it will work.
One easy way to accomplish that is to use the TRANSLATE(value, replacewith, replacelist) function. It replaces all of the characters in a list (third parameter) with the value in the second parameter.
You can leverage that to essentially erase all of the non-numeric charaters out of the character string, including the spaces.
Just make the list in the third parameter contain all of the possible characters you might see that you don't want. Translate those to an empty space, and you end up with just the characters you want, essentially erasing the undesired characters.
Note: I included all of the common symbols (non-alpha numeric) for the benefit of others who may have character values of a larger variety than your example.
Select
TRANSLATE(UCASE(CHAR_COLUMN),'',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ!##$%^&*()-=+/\{}[];:.,<>? ')
FROM TABLE_A
More simply: For your particular set of values, since there is a much smaller set of possible characters you could trim the replace list down to this:
Select
TRANSLATE(UCASE(CHAR_COLUMN),'','ABCDEFGHIJKLMNOPQRSTUVWXYZ(), ')
FROM TABLE_A
NOTE: The "UCASE" on the CHAR_COLUMN is not necessary, but it was a nice enhancement to simplify this expression by eliminating the need to include all of the lower case alpha characters.
TRANSLATE(CHAR_COLUMN,'',
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!##$%^&*()-=+/\{}[];:.,<>? ')
As many of the answers above your best approach is to use the TRANSLATE function. However this approach is a different as you can white list the characters you want instead of black list the characters you don't want. We can do this by using the TRANSLATE function twice. We'll use the inner translate to generate a list of characters to remove for the parameter of the outer translate.
select TRANSLATE(dirty,'',TRANSLATE(dirty,'','1234567890',''),'') as clean
from (Values 'Nowshak 7,485 m'
,'Maja e Korabit (Golem Korab) 2,764 m'
,'Tahat 3,003 m','Morro de Moco 2,620 m'
,'Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)'
,'Mount Kosciuszko 2,229 m','Grossglockner 3,798 m'
) as temp(dirty)
Just taking #Darryls99 and turning it into a UDF
CREATE OR REPLACE FUNCTION REMOVE_ALLBUT(in_string VARCHAR(32000), characters_to_remote VARCHAR(32000))
RETURNS VARCHAR(32000)
LANGUAGE SQL CONTAINS SQL DETERMINISTIC NO EXTERNAL ACTION
RETURN
TRANSLATE(in_string,'',TRANSLATE(in_string,'',characters_to_remote,''),'')
;
use like this
select DB_REMOVE_ALLBUT(s,'1234567890')
from (values 'Nowshak 7,485 m'
,'Maja e Korabit (Golem Korab) 2,764 m'
,'Tahat 3,003 m','Morro de Moco 2,620 m'
,'Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)'
,'Mount Kosciuszko 2,229 m'
,'Grossglockner 3,798 m'
) t(s);
which returns
1
----
7485
2764
3003
2620
6960
2229
3798
Dirty string can be like this: 'qwerty12453lala<<>777*9'
We need to get cleared string and keep only digits.
We could remove any excess characters with TRANSLATE function,
but there is one problem: too long and ugly value of 3-th parameter.
Something like this:
VALUES
(
TRANSLATE( UPPER('qwerty12453lala<<>777*9'), '', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ!##$%^&*()-=+/\{}[];:.,<>? ')
)
So, this is not very convenient.
My idea is - use TRANSLATE functuion 2 times ( one time inside another one):
Calculate 3-th parameter as a particular list of replaced symbols
Use TRANSLATE function second time to replace excess symbols by using this calculated parameter
Let me show you here in code:
VALUES
(
REPLACE --Remove spaces from result
(
TRANSLATE
(
UPPER( 'qwerty12453lala<<>777*9')
, ' '
, TRANSLATE( UPPER( 'qwerty12453lala<<>777*9') , ' ' , '0123456789')-- This is calculation of 3-th param, it contains only NOT digital characters, like 'QWERTYLALA<<>*'
)
, ' '
, ''
)
)
Result must be like this: 124537779
In case SELECT statement, it would be like this:
SELECT REPLACE
(
TRANSLATE( UPPER(T.DIRTY_FIELD), ' ', TRANSLATE(UPPER(T.DIRTY_FIELD), '', '1234567890' ) )
, ' '
, ''
)
FROM SOMETABLE T
the proper combination of all
select replace(translate(dirty,' ','ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!##$%^&*()-=+/{}[];:.,<>?' ), ' ','') as clean
Is there a way in IBM DB2 version 9.5
to remove/delete all those non-numeric
letters by doing something like this:
SELECT replace(DATA, --somekind of
regular expression--, '') FROM TABLE_A
or any other ways?
No. You will have to create a User Defined Function or implement it in your host application's language.
The statement below will remove non-alphanumeric characters from any 'string-value' and prevents the SQLSTATE message 42815 when a zero length string-value is passed.
SELECT REPLACE(TRANSLATE(string-value || '|',
'||||||||||||||||||||||||||||||||',
'`¬!"£$%^&*()_-+={[}]:;#~#,<>.?/'''),'|','')
FROM SYSIBM.SYSDUMMY1;