Pattern matching in Postgres SELECT query - sql

I have a table in by database that has a name field with a non-null and unique constraint.
On the frontend, a user can clone certain entities. In this scenario, the name field gets suffixed with a version number.
For instance, if record A exists with the name TEST_NAME, cloning this record would result in record B being created with the name TEST_NAME [2]. Cloning record A again would result in a record C with the name TEST_NAME [3].
To determine the version number, I run a count(*) against the table, returning the number of records that match the root name, in this case: 'TEST_NAME'.
Here is the query:
SELECT COUNT(*)
FROM my_table
WHERE name LIKE 'TEST_NAME%'
The issue here, is that if a user changes the name of record C to TEST_NAME [3]abc, then the above query would still pick this up and create record D with the name TEST_NAME [4], instead of TEST_NAME [3]abc [2].
How can I avoid this? I'd like to only match name values that follow the format ^TEST_NAME [x]$, where x is any integer.

PostgreSQL has support for regexes.
For instance, I guess what you need is
SELECT COUNT(*)
FROM my_table
WHERE name ~ '^TEST_NAME \[[0-9]+\]$'
For computing the next version, I propose the following :
SELECT
version,
COALESCE(
matches[1] || ' [' || matches[2]::int + 1 || ']',
matches[3] || ' [2]'
) AS nextVersion
FROM versions
CROSS JOIN LATERAL (
SELECT regexp_matches(version, '^(.*) \[([0-9]+)\]$|^(.*)$') matches
) t
Here is what's going on :
For each version, we match the regexp ^(.)[([0-9]+)]$|^(.)$. Groups 1 and 2 will be populated if the version ends with a version number. Group 3 always contains the whole name. We put this result in the lateral table t(matches).
If group 1 and 2 have values, then "matches[1] || ' [' || matches[2]::int + 1 || ']'" is the next version, otherwise pick matches[3] and add [2] to it.
As a bonus, the following query will give the last version for every rootname, and the next version available.
SELECT rootname, MAX(t2.version) AS lastVersion, MAX(t2.version) + 1 AS nextVersion
FROM versions
CROSS JOIN LATERAL (
SELECT regexp_matches(version, '^(.*) \[([0-9]+)\]$|^(.*)$') matches
) t1
CROSS JOIN LATERAL (
SELECT
COALESCE(matches[1], matches[3]) AS rootname,
COALESCE(matches[2]::int, 1) AS version
) t2
GROUP BY rootname;
If you just have a rootname (say TEST_NAME), and assuming you only have one column version in your table which would be called versions, you can clone the record using :
INSERT INTO versions
SELECT rootname || ' [' || MAX(t2.version) + 1 || ']'
FROM versions
CROSS JOIN LATERAL (
SELECT regexp_matches(version, '^(.*) \[([0-9]+)\]$|^(.*)$') matches
) t1
CROSS JOIN LATERAL (
SELECT
COALESCE(matches[1], matches[3]) AS rootname,
COALESCE(matches[2]::int, 1) AS version
) t2
WHERE rootname = 'TEST_NAME';

Related

In BigQuery, identify when columns do not match on UNION ALL

with
table1 as (
select 'joe' as name, 17 as age, 25 as speed
),
table2 as (
select 'nick' as name, 21 as speed, 23 as strength
)
select * from table1
union all
select * from table2
In Google BigQuery, this union all does not throw an error because both tables have the same number of columns (3 each). However I receive bad data output because the columns do not match. Rather than outputting a new table with 4 columns name, age, speed, strength with correct values + nulls for missing values (which would probably be preferred), the union all keeps the 3 columns from the top row.
Is there a good way to catch that the columns do not match, rather than the query silently returning bad data? Is there any way for this to return an error perhaps, as opposed to a successful table? I'm not sure how to check in SQL that the columns in the 2 tables match.
Edit: in this example it is clear to see that the columns do not match, however in our data we have 100+ columns and we want to avoid a situation where we make an error in a UNION ALL
Below is for BigQuery Standard SQL and using scripting feature of BQ
DECLARE statement STRING;
SET statement = (
WITH table1_columns AS (
SELECT column FROM (SELECT * FROM `project.dataset.table1` LIMIT 1) t,
UNNEST(REGEXP_EXTRACT_ALL(TRIM(TO_JSON_STRING(t), '{}'), r'"([^"]*)":')) column
), table2_columns AS (
SELECT column FROM (SELECT * FROM `project.dataset.table2` LIMIT 1) t,
UNNEST(REGEXP_EXTRACT_ALL(TRIM(TO_JSON_STRING(t), '{}'), r'"([^"]*)":')) column
), all_columns AS (
SELECT column FROM table1_columns UNION DISTINCT SELECT column FROM table2_columns
)
SELECT (
SELECT 'SELECT ' || STRING_AGG(IF(t.column IS NULL, 'NULL as ', '') || a.column, ', ') || ' FROM `project.dataset.table1` UNION ALL '
FROM all_columns a LEFT JOIN table1_columns t USING(column)
) || (
SELECT 'SELECT ' || STRING_AGG(IF(t.column IS NULL, 'NULL as ', '') || a.column, ', ') || ' FROM `project.dataset.table2`'
FROM all_columns a LEFT JOIN table2_columns t USING(column)
)
);
EXECUTE IMMEDIATE statement;
when applied to sample data from your question - output is
Row name age speed strength
1 joe 17 25 null
2 nick null 21 23
After saving table1 and table2 as 2 tables in a dataset in BigQuery, I then used the metadata using INFORMATION_SCHEMA to check that the columns matched.
SELECT *
FROM models.INFORMATION_SCHEMA.COLUMNS
where table_name = 'table1'
SELECT *
FROM models.INFORMATION_SCHEMA.COLUMNS
where table_name = 'table2'
INFORMATION_SCHEMA.COLUMNS returns information including the column names and their positioning. I can join these 2 tables together then to check that the names match...

How to avoid duplicates in the STRING_AGG function SQL Server

I was testing a query in SQL in which I need to concatenate values ​​in the form of a comma-separated list, and it works, I just have the problem of duplicate values.
This is the query:
SELECT t0.id_marcas AS CodMarca,
t0.nombremarcas AS NombreMarca,
t0.imagenmarcas,
(SELECT String_agg((t2.name), ', ')
FROM exlcartu_devcit.store_to_cuisine t1
INNER JOIN exlcartu_devcit.cuisine t2
ON t1.cuisine_id = t2.cuisine_id
WHERE store_id = (SELECT TOP 1 store_id
FROM exlcartu_devcit.store
WHERE id_marcas = t0.id_marcas
AND status = 1)) AS Descripcion,
t0.logo,
t0.imagen,
(SELECT TOP 1 preparing_time
FROM exlcartu_devcit.store
WHERE id_marcas = t0.id_marcas
AND status = 1) AS Tiempo,
t0.orden,
(SELECT TOP 1 Avg(minimum_amount)
FROM exlcartu_devcit.store_delivery_zone
WHERE id_marcas = t0.id_marcas) AS MontoMinimo
FROM exlcartu_devcit.[marcas] t0
I thought the solution could be just adding a DISTINCT to the query to avoid repeated values ​​in this way ...
(SELECT STRING_AGG(DISTINCT (t2.name), ', ') AS Descripcion
But apparently the STRING_AGG() function does not support it, any idea how to avoid repeated values?
Simplest way is just select from select, like this:
with dups as (select 1 as one union all select 1 as one)
select string_agg(one, ', ') from (select distinct one from dups) q;
vs original
with dups as (select 1 as one union all select 1 as one)
select string_agg(one, ', ') from dups;

Find "LIKE" duplicates in MSSQL

I got a MSSQL database and there's this column with IDs.
Some are {}-wrapped around the ID and some are not.
I need to find out if there are duplicate entries like:
'{abcd}' and 'abcd' in one column.
Obviously I dont know 'abcd'...
Is there a simple way of joining the same column and searching for "LIKE" duplicates? Inner Join is not working for me...
You could do something like this:
SELECT Id
FROM TableName AS T0
WHERE EXISTS
(
SELECT 1
FROM TableName AS T1
WHERE T0.Id = '{' + T1.Id + '}'
-- Uncomment the next row if you want all duplicates (with or without brackets):
-- OR '{' + T0.Id + '}' = T1.Id
)
This will return all the records where the id is wrapped by curly brackets and has duplicate id just without the brackets.
You can also do like
CREATE TABLE T(
ID VARCHAR(25)
);
INSERT INTO T VALUES
('abc'),
('{abc}'),
('def'),
('ghi'),
('{ghi}');
SELECT *
FROM
(
SELECT TRIM(TRANSLATE(ID, '{}', ' ')) ID
FROM T
) TT
GROUP BY ID
HAVING COUNT(ID) > 1;
-- Or you can also do
SELECT *
FROM
(
SELECT REPLACE(REPLACE(ID, '{', ''), '}', '') ID
FROM T
) TT
GROUP BY ID
HAVING COUNT(ID) > 1;
Note that TRANSLATE() and TRIM() functions is available only in SQL Server 2017

Recursive CTE in H2: No data returned

I am trying to convert a proprietary Oracle CONNECT BY query into a standard SQL query that will run on H2, and generate the same data in the same order.
This is the Oracle query, which works:
SELECT id, name, parent
FROM myschema.mytable
START WITH id = 1
CONNECT BY PRIOR id = parent
This is what I've come up - however, it returns no rows in the ResultSet.
WITH RECURSIVE T(id, name, parent, path) AS (
SELECT id, name, '' AS parent, id AS path
FROM myschema.mytable WHERE id = 1
UNION ALL
SELECT ou.id, ou.name, ou.parent,
(T.path + '.' + CAST (ou.id AS VARCHAR)) AS path
FROM T INNER JOIN myschema.mytable AS ou ON T.id = ou.parent
) SELECT id, name, parent FROM T ORDER BY path
The initial row, and the related rows, both exist in the table.
I am not using H2's Oracle compatibility mode (which doesn't support CONNECT BY, by the way).
The following works for me, for both H2 as well as PostgreSQL (this you can test online using the SQL Fiddle). I had to make a few changes and assumptions (see below):
create table mytable(id int, name varchar(255), parent int);
insert into mytable values(1, 'root', null), (2, 'first', 1),
(3, 'second', 1), (4, '2b', 3);
WITH RECURSIVE T(id, name, parent, path) AS (
SELECT id, name, 0 AS parent,
cast(id as varchar) AS path
FROM mytable WHERE id = 1
UNION ALL
SELECT ou.id, ou.name, ou.parent,
(T.path || '.' || CAST (ou.id AS VARCHAR)) AS path
FROM T INNER JOIN mytable AS ou ON T.id = ou.parent
) SELECT id, name, parent, path FROM T ORDER BY path
Changes:
I assumed id and parent are integers. Because of that, I had to use cast(id as varchar) in the first select.
I replace + with || when concatenating strings.
I used 0 AS parent.
This seems to have been a problem with either the Anorm database access library or the JDBC driver not substituting a query parameter correctly (the query substitution was not shown in the question, because I assumed it wasn't relevant).

PostgreSQL convert columns to rows? Transpose?

I have a PostgreSQL function (or table) which gives me the following output:
Sl.no username Designation salary etc..
1 A XYZ 10000 ...
2 B RTS 50000 ...
3 C QWE 20000 ...
4 D HGD 34343 ...
Now I want the Output as below:
Sl.no 1 2 3 4 ...
Username A B C D ...
Designation XYZ RTS QWE HGD ...
Salary 10000 50000 20000 34343 ...
How to do this?
SELECT
unnest(array['Sl.no', 'username', 'Designation','salary']) AS "Columns",
unnest(array[Sl.no, username, value3Count,salary]) AS "Values"
FROM view_name
ORDER BY "Columns"
Reference : convertingColumnsToRows
Basing my answer on a table of the form:
CREATE TABLE tbl (
sl_no int
, username text
, designation text
, salary int
);
Each row results in a new column to return. With a dynamic return type like this, it's hardly possible to make this completely dynamic with a single call to the database. Demonstrating solutions with two steps:
Generate query
Execute generated query
Generally, this is limited by the maximum number of columns a table can hold. So not an option for tables with more than 1600 rows (or fewer). Details:
What is the maximum number of columns in a PostgreSQL select query
Postgres 9.4+
Dynamic solution with crosstab()
Use the first one you can. Beats the rest.
SELECT 'SELECT *
FROM crosstab(
$ct$SELECT u.attnum, t.rn, u.val
FROM (SELECT row_number() OVER () AS rn, * FROM '
|| attrelid::regclass || ') t
, unnest(ARRAY[' || string_agg(quote_ident(attname)
|| '::text', ',') || '])
WITH ORDINALITY u(val, attnum)
ORDER BY 1, 2$ct$
) t (attnum bigint, '
|| (SELECT string_agg('r'|| rn ||' text', ', ')
FROM (SELECT row_number() OVER () AS rn FROM tbl) t)
|| ')' AS sql
FROM pg_attribute
WHERE attrelid = 'tbl'::regclass
AND attnum > 0
AND NOT attisdropped
GROUP BY attrelid;
Operating with attnum instead of actual column names. Simpler and faster. Join the result to pg_attribute once more or integrate column names like in the pg 9.3 example.
Generates a query of the form:
SELECT *
FROM crosstab(
$ct$
SELECT u.attnum, t.rn, u.val
FROM (SELECT row_number() OVER () AS rn, * FROM tbl) t
, unnest(ARRAY[sl_no::text,username::text,designation::text,salary::text]) WITH ORDINALITY u(val, attnum)
ORDER BY 1, 2$ct$
) t (attnum bigint, r1 text, r2 text, r3 text, r4 text);
This uses a whole range of advanced features. Just too much to explain.
Simple solution with unnest()
One unnest() can now take multiple arrays to unnest in parallel.
SELECT 'SELECT * FROM unnest(
''{sl_no, username, designation, salary}''::text[]
, ' || string_agg(quote_literal(ARRAY[sl_no::text, username::text, designation::text, salary::text])
|| '::text[]', E'\n, ')
|| E') \n AS t(col,' || string_agg('row' || sl_no, ',') || ')' AS sql
FROM tbl;
Result:
SELECT * FROM unnest(
'{sl_no, username, designation, salary}'::text[]
,'{10,Joe,Music,1234}'::text[]
,'{11,Bob,Movie,2345}'::text[]
,'{12,Dave,Theatre,2356}'::text[])
AS t(col,row1,row2,row3,row4);
db<>fiddle here
Old sqlfiddle
Postgres 9.3 or older
Dynamic solution with crosstab()
Completely dynamic, works for any table. Provide the table name in two places:
SELECT 'SELECT *
FROM crosstab(
''SELECT unnest(''' || quote_literal(array_agg(attname))
|| '''::text[]) AS col
, row_number() OVER ()
, unnest(ARRAY[' || string_agg(quote_ident(attname)
|| '::text', ',') || ']) AS val
FROM ' || attrelid::regclass || '
ORDER BY generate_series(1,' || count(*) || '), 2''
) t (col text, '
|| (SELECT string_agg('r'|| rn ||' text', ',')
FROM (SELECT row_number() OVER () AS rn FROM tbl) t)
|| ')' AS sql
FROM pg_attribute
WHERE attrelid = 'tbl'::regclass
AND attnum > 0
AND NOT attisdropped
GROUP BY attrelid;
Could be wrapped into a function with a single parameter ...
Generates a query of the form:
SELECT *
FROM crosstab(
'SELECT unnest(''{sl_no,username,designation,salary}''::text[]) AS col
, row_number() OVER ()
, unnest(ARRAY[sl_no::text,username::text,designation::text,salary::text]) AS val
FROM tbl
ORDER BY generate_series(1,4), 2'
) t (col text, r1 text,r2 text,r3 text,r4 text);
Produces the desired result:
col r1 r2 r3 r4
-----------------------------------
sl_no 1 2 3 4
username A B C D
designation XYZ RTS QWE HGD
salary 10000 50000 20000 34343
Simple solution with unnest()
SELECT 'SELECT unnest(''{sl_no, username, designation, salary}''::text[] AS col)
, ' || string_agg('unnest('
|| quote_literal(ARRAY[sl_no::text, username::text, designation::text, salary::text])
|| '::text[]) AS row' || sl_no, E'\n , ') AS sql
FROM tbl;
Slow for tables with more than a couple of columns.
Generates a query of the form:
SELECT unnest('{sl_no, username, designation, salary}'::text[]) AS col
, unnest('{10,Joe,Music,1234}'::text[]) AS row1
, unnest('{11,Bob,Movie,2345}'::text[]) AS row2
, unnest('{12,Dave,Theatre,2356}'::text[]) AS row3
, unnest('{4,D,HGD,34343}'::text[]) AS row4
Same result.
If (like me) you were needing this information from a bash script, note there is a simple command-line switch for psql to tell it to output table columns as rows:
psql mydbname -x -A -F= -c "SELECT * FROM foo WHERE id=123"
The -x option is the key to getting psql to output columns as rows.
I have a simpler approach than Erwin pointed above, that worked for me with Postgres (and I think that it should work with all major relational databases whose support SQL standard)
You can use simply UNION instead of crosstab:
SELECT text 'a' AS "text" UNION SELECT 'b';
text
------
a
b
(2 rows)
Of course that depends on the case in which you are going to apply this. Considering that you know beforehand what fields you need, you can take this approach even for querying different tables. I.e.:
SELECT 'My first metric' as name, count(*) as total from first_table UNION
SELECT 'My second metric' as name, count(*) as total from second_table
name | Total
------------------|--------
My first metric | 10
My second metric | 20
(2 rows)
It's a more maintainable approach, IMHO. Look at this page for more information: https://www.postgresql.org/docs/current/typeconv-union-case.html
There is no proper way to do this in plain SQL or PL/pgSQL.
It will be way better to do this in the application, that gets the data from the DB.