Using UPCASE or Regexp in Array Column on Postgres - sql

I am trying to query a Postgres Array Column disregarding case and perhaps even disregarding spaces as well.
SELECT "cats".* FROM "cats" WHERE ('CATS - PERSA' = ANY(UPCASE(cat_types))) ORDER BY "cats"."id" ASC LIMIT 1;
But I get this error:
You might need to add explicit type casts.
AS a bonus I would like to also be able to do a regexp where the search ignores spaces in values on the cat_types column.
I am using Ruby on Rails to do this.
cat_type.upcase.delete(' ')
Cats.where("'#{cat_type}' = ANY(cat_types)").first
The query works just using ANY but I want to be able to disregard spaces and upcase the values in cat_types so that it has more chances of matching. Ilike could also be a possibility.
Thanks.

SELECT DISTINCT c.*
FROM cats c, unnest(c.cat_types) AS cat_type
WHERE upper(translate(cat_type, ' ', '')) = 'CATS-PERSA'
ORDER BY id
LIMIT 1;
The Postgres function is upper(), not upcase().
cat_types seems to be an array, assuming type text[] (info missing). I unnest() to treat array elements individually. This cannot be done with ANY, which is only good for simple comparison.
I use an implicit LATERAL JOIN here, requires Postgres 9.3+ (info missing).
If multiple array elements match, you get the row multiple times here. Hence the DISTINCT.
More about pattern-matching in Postgres:
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Related

Query JSONB column for any value where =?

I have a jsonb column which has the unfortunate case of being very unpredictable, in some cases its value may be an array with nested values:
["UserMailer", "applicant_setup_3", ["5cbffeb7-8d5e-4b52-a475-3cf320b2cee9"]]
Sometimes it will be something with key/values like this:
[{"reference_id": "5cbffeb7-8d5e-4b52-a475-3cf320b2cee9", "job_dictionary": ["StatusUpdater", "FollowTwitterUsersJob"]}]
Is there a way to write a query which just treats the whole column like text and does a like to see if I can find the uuid in the big text blob? I want to find all the records where a particular uuid string is present in the jsonb column.
The query doesn't need to be fast or efficient.
Postgres has search operator ? for jsonb, but that would require you to search the json content recursively.
A possible, although not very efficient method, would to stringify the object and use LIKE to search it:
myjsonb::text LIKE '%"5cbffeb7-8d5e-4b52-a475-3cf320b2cee9"%'
myjsonb::text LIKE '%"' || myuuid || '"%'
Demo on DB Fiddle:
The problem with the jsonb operator ? is that it only considers top-level keys (including array elements), not values, and no nested objects.
You seem to be looking for values and array elements (not keys) on any level. You can get that with a full text search on top of your json(b) column:
SELECT * FROM tbl
WHERE to_tsvector('simple', jsonb_column)
## tsquery '5cbffeb7-8d5e-4b52-a475-3cf320b2cee9';
db<>fiddle here
to_tsvector() extracts values and array elements on all levels - just what you need.
Requires Postgres 10 or later. json(b)_to_tsvector() in Postgres 11 offers more flexibility.
That's attractive for tables of non-trivial size as it can be supported with a full text index very efficiently:
CREATE INDEX tbl_jsonb_column_fts_gin_idx ON tbl USING GIN (to_tsvector('simple', jsonb_column));
I use the 'simple' text search configuration in the example. You might want a language-specific one, like 'english'. Doesn't matter much while you only look for UUID strings, but stemming for a particular language might make the index a bit smaller ...
Related:
LIKE query on elements of flat jsonb array
Does the phrase search operator <-> work with JSONB documents or only relational tables?
While you are only looking for UUIDs, you might optimize further with a custom (IMMUTABLE) function to extract UUIDs from the JSON document as array (uuid[]) and build a functional GIN index on top of it. (Considerably smaller index, yet.) Then:
SELECT * FROM tbl
WHERE my_uuid_extractor(jsonb_column) #> '{5cbffeb7-8d5e-4b52-a475-3cf320b2cee9}';
Such a function can be expensive, but does not matter much with a functional index that stores and operates on pre-computed values.
You can split the array elements first by using jsonb_array_elements(json), and then filter the casted string from those elements by like operator
select q.elm
from
(
select jsonb_array_elements(js) as elm
from tab
) q
where elm::varchar like '%User%'
elm
----------------------------------------------------------------------------------------------------------------------
"UserMailer"
{"reference_id": "5cbffeb7-8d5e-4b52-a475-3cf320b2cee9", "job_dictionary": ["StatusUpdater", "FollowTwitterUsersJob"]}
Demo

Simple regex to filter out pre and suffix characters

I have a field in my database which has a long list of strings separated by commas. Here are few row examples:
HAB
DHAB,RAB,DAB
HAB,RAB,DAB
RAB,HAB,
RAB,HAB,DAB
My query has the following condition:
WHERE description LIKE '%HAB%'
But it returns the second row which has 'DHAB'.
Can it be done using regex with the WHERE statement so that I only get entries which have 'HAB' in the list (one string) and not the entries with 'DHAB'?
You may use
WHERE description ~ '(^|,)HAB($|,)'
The regex matches
(^|,) - start of string or a ,
HAB - literal substring
($|,) - end of string or ,
See the online regex demo.
Regular expressions are powerful and versatile, but also expensive. Consider a different approach: transform the list to an actual array with string_to_array() and then:
WHERE 'HAB' = ANY (string_to_array(description, ',')
Or:
WHERE string_to_array(description, ',') #> '{HAB}'
db<>fiddle here
The latter can be supported with a GIN index, which makes it faster by orders of magnitude for big tables.
CREATE INDEX ON tbl USING gin (string_to_array(description, ','));
Related:
Can PostgreSQL index array columns?
Or consider a normalized DB design replacing the comma-separated values with a 1:n relationship. Related:
How to implement a many-to-many relationship in PostgreSQL?
Can PostgreSQL have a uniqueness constraint on array elements?

Select if comma separated string contains a value

I have table
raw TABLE
=========
id class_ids
------------------------
1 1234,12334,12341,1228
2 12281,12341,12283
3 1234,34221,31233,43434,1123
How to define regex to select raws if class_ids contains special id.
If we select raws with '1234' in class_ids result list should not contain raws with '12341' in class_ids.
IDs in column class_ids separated with ,
SELECT FROM raw re WHERE re.class_ids LIKE (regex)
You shouldn't be storing comma separated values in a single column.
However, this is better done using string_to_array() in Postgres instead of a regex:
SELECT *
FROM raw
WHERE '1234'= any(string_to_array(class_ids, ','));
If you really want to de-normalize your data, it's better to store those numbers in a proper integer array, instead of comma separated list of strings
A simple way uses like:
where ',' || re.class_ids || ',' like '%,1234,%'
However, this is not the real issue. You should not be storing lists of ids in a string. The SQLish way of storing them would have a table with one row per id and one row per class_id. This is called a junction table.
Even if you don't use a separate table, you should at least use Postgres's built-in mechanisms, such as an array. However, a separate table is much the preferred method, because you can explicitly declare foreign key relationships.
If you really want to do this with regular expressions, you can use the ~ operator:
SELECT FROM raw re WHERE re.class_ids ~ '^(^|,)1234(,|$)$';
But I prefer a_horse_with_no_name's answer that uses arrays.

Replacement for 'OR' in SphinxQL

I'm currently trying to integrate Sphinx search engine into Python application. The problem is that SphinxQL doesn't support OR clause as common SQL does. There are some hacks to use, like writing expressions in SELECT like this:
SELECT id,(field1 = val1 OR field2 = val2) as expr FROM foo_bar WHERE expr = 1;
However, it doesn't work with strings, because they should be handled using MATCH function. So I decided to divide query into separate subqueries and combine results obtained. Yet there's still a problem of getting a proper META information, especially the total_found field. Sphinx counts it for separate queries, but rows obtained from these queries may intersect and I have no ability to check it (database is large).
I believe there must be a solution. I'm using Sphinxit (SphinxAlchemy has a version conflict with SQLAlchemy I'm using).
Repost from SphinxSearch forum:
I have a table I need to search in with text and numerical columns as well. I need to
write a query with OR condition; found out that there's a way to do it using SELECT
expressions like:
SELECT *, quantity>=50 OR quantity=0 AS mycond FROM table1 WHERE mycond = 1;
Hopelessly it doesn't work with string attributes. This query isn't parsed:
SELECT *, category='foo' OR category='bar' AS mycond FROM table1 WHERE mycond = 1;
Yet this is working in Beta 2.2.3:
SELECT * FROM table1 WHERE category='foo';
What should I do to find count of rows that fit one of conditions, not every one of them?
I can make a few queries and merge obtained items into one list, but I need to now how
much of these rows are in the database now.
For attribute / facet OR'ing, I think you're correct that the only way is to put an expression in the SELECT clause.
For strings, though, check out the documentation on the fulltext query syntax. You can't exactly use the OR keyword, but something like this should work:
SELECT id, name
FROM recipes
WHERE MATCH('(#ingredients chocolate) | (#name cake)')
LIMIT 10;

How to md5 all columns regardless of type

I would like to create a sql query (or plpgsql) that will md5() all given rows regardless of type. However, below, if one is null then the hash is null:
UPDATE thetable
SET hash = md5(accountid || accounttype || createdby || editedby);
I am later using the hash to compare uniqueness so null hash does not work for this use case.
The problem was the way it handles concatenating nulls. For example:
thedatabase=# SELECT accountid || accounttype || createdby || editedby
FROM thetable LIMIT 5;
1Type113225
<NULL>
2Type11751222
3Type10651010
4Type10651
I could use coalesce or CASE statements if I knew the type; however, I have many tables and I will not know the type ahead of time of every column.
There is much more elegant solution for this.
In Postgres, using table name in SELECT is permitted and it has type ROW. If you cast this to type TEXT, it gives all columns concatenated together in string that is actually JSON.
Having this, you can get md5 of all columns as follows:
SELECT md5(mytable::TEXT)
FROM mytable
If you want to only use some columns, use ROW constructor and cast it to TEXT:
SELECT md5(ROW(col1, col2, col3)::TEXT)
FROM mytable
Another nice property about this solution is that md5 will be different for NULL vs. empty string.
Obligatory SQLFiddle.
You can also use something else similar to mvp's solution. Instead of using ROW() function which is not supported by Amazon Redshift...
Invalid operation: ROW expression, implicit or explicit, is not supported in target list;
My proposition is to use NVL2 and CAST function to cast different type of columns to CHAR, as long as this type is compatible with all Redshift data types according to the documentation. Below there is an example of how to achieve null proof MD5 in Redshift.
SELECT md5(NVL2(col1,col1::char,''),
NVL2(col2,col2::char,''),
NVL2(col3,col3::char,''))
FROM mytable
This might work without casting second NVL2 function argument to char but it would definately fail if you'd try to get md5 from date column with null value.
I hope this would be helpful for someone.
Have you tried using CONCAT()? I just tried in my PG 9.1 install:
SELECT CONCAT('aaaa',1111,'bbbb'); => aaaa1111bbbb
SELECT CONCAT('aaaa',null,'bbbb'); => aaaabbbb
Therefore, you can try:
SELECT MD5(CONCAT(column1, column2, column3, column_n)) => md5_hash string here
select MD5(cast(p as text)) from fiscal_cfop as p