Simple regex to filter out pre and suffix characters - sql

I have a field in my database which has a long list of strings separated by commas. Here are few row examples:
HAB
DHAB,RAB,DAB
HAB,RAB,DAB
RAB,HAB,
RAB,HAB,DAB
My query has the following condition:
WHERE description LIKE '%HAB%'
But it returns the second row which has 'DHAB'.
Can it be done using regex with the WHERE statement so that I only get entries which have 'HAB' in the list (one string) and not the entries with 'DHAB'?

You may use
WHERE description ~ '(^|,)HAB($|,)'
The regex matches
(^|,) - start of string or a ,
HAB - literal substring
($|,) - end of string or ,
See the online regex demo.

Regular expressions are powerful and versatile, but also expensive. Consider a different approach: transform the list to an actual array with string_to_array() and then:
WHERE 'HAB' = ANY (string_to_array(description, ',')
Or:
WHERE string_to_array(description, ',') #> '{HAB}'
db<>fiddle here
The latter can be supported with a GIN index, which makes it faster by orders of magnitude for big tables.
CREATE INDEX ON tbl USING gin (string_to_array(description, ','));
Related:
Can PostgreSQL index array columns?
Or consider a normalized DB design replacing the comma-separated values with a 1:n relationship. Related:
How to implement a many-to-many relationship in PostgreSQL?
Can PostgreSQL have a uniqueness constraint on array elements?

Related

Query JSONB column for any value where =?

I have a jsonb column which has the unfortunate case of being very unpredictable, in some cases its value may be an array with nested values:
["UserMailer", "applicant_setup_3", ["5cbffeb7-8d5e-4b52-a475-3cf320b2cee9"]]
Sometimes it will be something with key/values like this:
[{"reference_id": "5cbffeb7-8d5e-4b52-a475-3cf320b2cee9", "job_dictionary": ["StatusUpdater", "FollowTwitterUsersJob"]}]
Is there a way to write a query which just treats the whole column like text and does a like to see if I can find the uuid in the big text blob? I want to find all the records where a particular uuid string is present in the jsonb column.
The query doesn't need to be fast or efficient.
Postgres has search operator ? for jsonb, but that would require you to search the json content recursively.
A possible, although not very efficient method, would to stringify the object and use LIKE to search it:
myjsonb::text LIKE '%"5cbffeb7-8d5e-4b52-a475-3cf320b2cee9"%'
myjsonb::text LIKE '%"' || myuuid || '"%'
Demo on DB Fiddle:
The problem with the jsonb operator ? is that it only considers top-level keys (including array elements), not values, and no nested objects.
You seem to be looking for values and array elements (not keys) on any level. You can get that with a full text search on top of your json(b) column:
SELECT * FROM tbl
WHERE to_tsvector('simple', jsonb_column)
## tsquery '5cbffeb7-8d5e-4b52-a475-3cf320b2cee9';
db<>fiddle here
to_tsvector() extracts values and array elements on all levels - just what you need.
Requires Postgres 10 or later. json(b)_to_tsvector() in Postgres 11 offers more flexibility.
That's attractive for tables of non-trivial size as it can be supported with a full text index very efficiently:
CREATE INDEX tbl_jsonb_column_fts_gin_idx ON tbl USING GIN (to_tsvector('simple', jsonb_column));
I use the 'simple' text search configuration in the example. You might want a language-specific one, like 'english'. Doesn't matter much while you only look for UUID strings, but stemming for a particular language might make the index a bit smaller ...
Related:
LIKE query on elements of flat jsonb array
Does the phrase search operator <-> work with JSONB documents or only relational tables?
While you are only looking for UUIDs, you might optimize further with a custom (IMMUTABLE) function to extract UUIDs from the JSON document as array (uuid[]) and build a functional GIN index on top of it. (Considerably smaller index, yet.) Then:
SELECT * FROM tbl
WHERE my_uuid_extractor(jsonb_column) #> '{5cbffeb7-8d5e-4b52-a475-3cf320b2cee9}';
Such a function can be expensive, but does not matter much with a functional index that stores and operates on pre-computed values.
You can split the array elements first by using jsonb_array_elements(json), and then filter the casted string from those elements by like operator
select q.elm
from
(
select jsonb_array_elements(js) as elm
from tab
) q
where elm::varchar like '%User%'
elm
----------------------------------------------------------------------------------------------------------------------
"UserMailer"
{"reference_id": "5cbffeb7-8d5e-4b52-a475-3cf320b2cee9", "job_dictionary": ["StatusUpdater", "FollowTwitterUsersJob"]}
Demo

Search a JSON array for an object containing a value matching a pattern

I have a DB with a jsonb column where each row essentially holds an array of name value pairs. Example for a single jsonb value:
[
{"name":"foo", "value":"bar"},
{"name":"biz", "value":"baz"},
{"name":"beep", "value":"boop"}
]
How would I query for rows that contain a partial value? I.e., find rows with the JSON object key value ilike '%ba%'?
I know that I can use SELECT * FROM tbl WHERE jsoncol #> '[{"value":"bar"}]' to find rows where the JSON is that specific value, but how would I query for rows containing a pattern?
There are no built in jsonb operators nor any indexes supporting this kind of filter directly (yet).
I suggest an EXISTS semi-join:
SELECT t.*
FROM tbl t
WHERE EXISTS (
SELECT FROM jsonb_array_elements(t.jsoncol) elem
WHERE elem->>'value' LIKE '%ba%'
);
It avoids redundant evaluations and the final DISTINCT step you would need to get distinct rows with a plain CROSS JOIN.
If this still isn't fast enough, a way more sophisticated specialized solution for the given type of query would be to extract a concatenated string of unique values (with a delimiter that won't interfere with your search patterns) per row in an IMMUTABLE function, build a trigram GIN index on the functional expression and use the same expression in your queries.
Related:
Search for nested values in jsonb array with greater operator
Find rows containing a key in a JSONB array of records
Create Postgres JSONB Index on Array Sub-Object
Aside, if your jsonb values really look like the example, you could trim a lot of noise and just store:
[
{"foo":"bar"},
{"biz":"baz"},
{"beep":"boop"}
]
You can use the function jsonb_array_elements() in a lateral join and use its result value in the WHERE clause:
select distinct t.*
from my_table t
cross join jsonb_array_elements(jsoncol)
where value->>'value' like '%ba%'
Please, read How to query jsonb arrays with IN operator for notes about distinct and performance.

Select if comma separated string contains a value

I have table
raw TABLE
=========
id class_ids
------------------------
1 1234,12334,12341,1228
2 12281,12341,12283
3 1234,34221,31233,43434,1123
How to define regex to select raws if class_ids contains special id.
If we select raws with '1234' in class_ids result list should not contain raws with '12341' in class_ids.
IDs in column class_ids separated with ,
SELECT FROM raw re WHERE re.class_ids LIKE (regex)
You shouldn't be storing comma separated values in a single column.
However, this is better done using string_to_array() in Postgres instead of a regex:
SELECT *
FROM raw
WHERE '1234'= any(string_to_array(class_ids, ','));
If you really want to de-normalize your data, it's better to store those numbers in a proper integer array, instead of comma separated list of strings
A simple way uses like:
where ',' || re.class_ids || ',' like '%,1234,%'
However, this is not the real issue. You should not be storing lists of ids in a string. The SQLish way of storing them would have a table with one row per id and one row per class_id. This is called a junction table.
Even if you don't use a separate table, you should at least use Postgres's built-in mechanisms, such as an array. However, a separate table is much the preferred method, because you can explicitly declare foreign key relationships.
If you really want to do this with regular expressions, you can use the ~ operator:
SELECT FROM raw re WHERE re.class_ids ~ '^(^|,)1234(,|$)$';
But I prefer a_horse_with_no_name's answer that uses arrays.

REGEXP_LIKE Oracle function

I have a list of 100 words that I need to do a pattern match on 55 Million rows of data. Is there a way to create a list of words and pass the list through the REGEXP_LIKE function, instead of using the | (or) statement multiple times, can a list be input instead?
Search *
From table
Where REGEXP_LIKE(C1, 'wordlword2letc...', 'i');
You cannot pass a list of words as pattern in REGEXP_LIKE.
pattern is the regular expression and usually is text literal and cannot be more than 512 bytes.
What you can possibly do is, store the words you're trying to search in separate table/column and then use LIKE condition in your query as you're just trying to search for the occurrence of the words and not expecting regular expression search support.
So, if there is a table/column (new_table.col) which stores your input items to search for, your query might look like (using UPPER function to ensure case insensitive search as you were trying) -
SELECT a.* FROM table a, new_table b WHERE UPPER(a.col1) LIKE UPPER(b.col);

Using UPCASE or Regexp in Array Column on Postgres

I am trying to query a Postgres Array Column disregarding case and perhaps even disregarding spaces as well.
SELECT "cats".* FROM "cats" WHERE ('CATS - PERSA' = ANY(UPCASE(cat_types))) ORDER BY "cats"."id" ASC LIMIT 1;
But I get this error:
You might need to add explicit type casts.
AS a bonus I would like to also be able to do a regexp where the search ignores spaces in values on the cat_types column.
I am using Ruby on Rails to do this.
cat_type.upcase.delete(' ')
Cats.where("'#{cat_type}' = ANY(cat_types)").first
The query works just using ANY but I want to be able to disregard spaces and upcase the values in cat_types so that it has more chances of matching. Ilike could also be a possibility.
Thanks.
SELECT DISTINCT c.*
FROM cats c, unnest(c.cat_types) AS cat_type
WHERE upper(translate(cat_type, ' ', '')) = 'CATS-PERSA'
ORDER BY id
LIMIT 1;
The Postgres function is upper(), not upcase().
cat_types seems to be an array, assuming type text[] (info missing). I unnest() to treat array elements individually. This cannot be done with ANY, which is only good for simple comparison.
I use an implicit LATERAL JOIN here, requires Postgres 9.3+ (info missing).
If multiple array elements match, you get the row multiple times here. Hence the DISTINCT.
More about pattern-matching in Postgres:
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL