Find a string within an array column in PostgreSQL - sql

I have built a series of views in a PostgreSQL database that includes a couple of array columns. The view definition is as follows:
create view articles_view as
(select articles.*,
array(select row(people.*)::people
from people
where articles.spubid=people.spubid and
people.stype='Author' and
bactive='t'
order by people.iorder) as authors,
array(select row(people.*)::people
from people
where articles.spubid=people.spubid and
people.stype='Editor' and
bactive='t'
order by people.iorder) as editors,
array(select row(people.*)::people
from people
where articles.spubid=people.spubid and
people.stype='Reviewer' and
bactive='t'
order by people.iorder) as reviewers,
array(select row(status.*)::status
from status
where articles.spubid=status.spubid and
bactive='t') as status
from articles
where articles.bactive='t');
Essentially what I want to do is an iLike on the 'author' column to determine if a specific user id exists in that array. Obviously I can't use iLike on that datatype so I need to find another approach.
Here is an example of data in the 'authors' array:
{"(2373,t,f,f,\"2011-08-01
11:57:40.696496\",/Pubs/pubs_edit_article.php,\"2011-08-09
15:36:29.281833\",000128343,A00592,Author,1,Nicholas,K.,Kreidberg,\"\",123456789,t,Admin,A,A,A,0,\"\")","(2374,t,f,f,\"2011-08-01
11:57:40.706617\",/Pubs/pubs_edit_article.php,\"2011-08-09
15:36:29.285428\",000128343,A00592,Author,2,John,D.,Doe,\"\",234567890,t,IT,A,A,A,0,\"\")","(2381,t,f,f,\"2011-08-09
14:45:14.870418\",000128343,\"2011-08-09
15:36:29.28854\",000128343,A00592,Author,3,Jane,E,Doe,\"\",345678901,t,Admin,A,A,A,,\"\")","(2383,t,f,f,\"2011-08-09
15:35:11.845283\",567890123,\"2011-08-09
15:36:29.291388\",000128343,A00592,Author,4,Test,T,Testerton,\"\",TestTesterton,f,N/A,A,A,A,,\"\")"}
What I want to be able to do is a query the view and find out if the string '123456789' (that is the user id assigned to Nicholas Kreidberg in the array) exists in the array. I don't care which user it is assigned to or where it appears in the array, all I need to know is if '123456789' shows up anywhere in the array.
Once I know how to write a query that determines if the condition above is true then my application will simply execute that query and if rows are returned it will know that the user id passed to the query is an author for that publication and proceed accordingly.
Thanks in advance for any insight that can be provided on this topic.

Might this:
select ...
from ...
where ...
and array_to_string(authors, ', ') like '%123456789%';`
do the trick?
Otherwise, there is the unnest function...
The "Array Functions and Operators" chapter has more details.

The ANY() function can do the job for you:
SELECT * FROM people WHERE '123456789' = ANY(authors);
Given people.authors is of type text[].

Related

Unexpected Behavior with XML Lateral Flatten In Snowflake (Rows with 1 value being filtered)

I am working with some XML data in Snowflake where I am trying to access some data in XML subnodes.
The data I am after looks something like this:
<Employee>
<EmploymentStatuses>
<EmployeeEmploymentStatus>
<OtherInformation>
</OtherInformation>
</EmployeeEmploymentStatus>
</Employee>
For a given employee, they may have multiple 'EmployeeEmploymentStatus'.
To account for this, I created a view like the below:
SELECT
XMLGET(XML_CONTENT,'EMPLOYEE_CODE'):"$"::STRING AS EMPLOYEE_CODE
, XMLGET(EMPLOYMENT_STATUS.value, 'EffectiveStart'):"$"::DATE AS EffectiveStart
...........
FROM
XML_FILE
LATERAL FLATTEN(XML_CONTENT:"$") STATUSES
LATERAL FLATTEN(STATUS.VALUE:"$") EMPLOYMENT_STATUS
WHERE
GET(STATUSES.VALUE, '#') = 'EmploymentStatuses'
AND GET(EMPSTAT.VALUE, '#') = 'EmployeeEmploymentStatus'
The problem I am running into is that while this looks perfect in situations where an employee has multiple 'EmploymentStatus', if they only have one it is filtering them off. (i.e. anyone with 2 or more employment statuses shows ALL of their statuses as you would expect, but someone with only 1 employment status does not appear at all)
If I remove the second lateral flatten and just use nested XMLGET in the select portion, I am able to return the value, but for employees with multiple 'EmploymentStatus' it only returns the first value.
When looking at the output of STATUSES.VALUE, format-wise they appear identical and have all the same tags.
The only thing I can think to do is basically a union of those two tables, or doing something like modifying the 'where statement to be something like this, which would then require coalesce statements on all the fields:
GET(EMPLOYMENT_STATUS.value, '#') = 'EmployeeEmploymentStatus' OR GET(STATUSES.value:"$",'#')::STRING = 'EmployeeEmploymentStatus')
I tested this method and it seems to work, but it also seems kind of clunky and unintuitive.
Any advice on this would be appreciated.
This a common bug in XML libraries (so it's not just a Snowflake problem), and the general accept solution is to get the node that you know can be an array, and cast it to an array before doing the spilt, so that it always works.
like:
lateral flatten(to_array(xmlget(xmlget(src2.cdc_xml, 'portfolio'))))
or
IFF(IS_ARRAY(master_raw), master_raw, TO_ARRAY(master_raw)) as master

How can I assign pre-determined codes (1,2,3, etc,) to a JSON-type column in PostgreSQL?

I'm extracting a table of 2000+ rows which are park details. One of the columns is JSON type. Image of the table
We have about 15 attributes like this and we also have a documentation of pre-determined codes assigned to each attribute.
Each row in the extracted table has a different set of attributes that you can see in the image. Right now, I have cast(parks.services AS text) AS "details" to get all the attributes for a particular park or extract just one of them using the code below:
CASE
WHEN cast(parks.services AS text) LIKE '%uncovered%' THEN '2'
WHEN cast(parks.services AS text) LIKE '%{covered%' THEN '1' END AS "details"
This time around, I need to extract these attributes by assigning them the codes. As an example, let's just say
Park 1 - {covered, handicap_access, elevator} to be {1,3,7}
Park 2 - {uncovered, always_open, handicap_access} to be {2,5,3}
I have thought of using subquery to pre-assign the codes, but I cannot wrap my head around JSON operators - in fact, I don't know how to extract them on 2000+ rows.
It would be helpful if someone could guide me in this topic. Thanks a lot!
You should really think about normalizing your tables. Don't store arrays. You should add a mapping table to map the parks and the attribute codes. This makes everything much easier and more performant.
step-by-step demo:db<>fiddle
SELECT
t.name,
array_agg(c.code ORDER BY elems.index) as codes -- 3
FROM mytable t,
unnest(attributes) WITH ORDINALITY as elems(value, index) -- 1
JOIN codes c ON c.name = elems.value -- 2
GROUP BY t.name
Extract the array elements into one record per element. Add the WITH ORDINALITY to save the original order.
Join your codes on the elements
Create code arrays. To ensure the correct order, you can use the index values created by the WITH ORDINALITY clause.

Using rails , what 's wrong with this query , it does not return a valid id

store_id=Store.select(:id).where(user_id:current_user.id).to_a.first
it returns id like that : Store:0x00007f8717546c30
Store.select(:id).where(user_id:current_user.id).to_a.first
select does not return an array of strings or integers for the given column(s), but rather an active record relation containing objects with just the given field:
https://apidock.com/rails/ActiveRecord/QueryMethods/select
Your code is then converting that relation to an array, and taking the first object in that array, which is an instance of the Store class. If you want the ID, then try:
Store.select(:id).where(user_id:current_user.id).to_a.first.id
However, I think you're misunderstanding how to structure the queries. Put the where part first, and then find the ID of the first result:
Store.where(user_id: current_user.id).first.id
And if there is only 1 store, then:
Store.find_by(user_id: current_user.id).id
Or...
Store.find_by(user: current_user).id
or.....
current_user.store.id
(or current_user.stores.first.id if there are many)

Rails: Need to scope by max version

I have this problem, I've got database table that looks like this:
"63";"CLINICAL...";"Please...";Blah...;"2014-09-23 13:15:59";37;8
"64";"CLINICAL...";"Please...";Blah...;"2014-09-23 13:22:51";37;9
The values that matter are the second to last and last one.
As you can see, the second to last (abstract_category_numbers) are the same, but the last differs (version_numbers)
Here is the problem:
When I make a scope, it returns all of the records, which i need to focus on the one with the maximum version number.
In SQL i would do something like this:
'SELECT * FROM Category c WHERE
NOT EXISTS SELECT * FROM Category c1
WHERE c.version_number < c1.version_number
AND c.abstract_category_id = c1.abstract_category_id'
But i'm totally lost at Ruby, more specifically how to do this kind of select in the scope (I understand it should be a relation)
Thanks
We can create a scope to select the category with max version_number like this:
scope :with_max_version_number, -> {
joins("JOIN ( SELECT abstract_category_id, max(version_number) AS max_version
FROM categories
GROUP BY abstract_category_id
) AS temp
ON temp.abstract_category_id = categories.abstract_category_id
AND temp.max_version = categories.version_number"
)
}
Basically, we will select the category with the max_version value on temp table in the subquery.
Btw, I expect the table name is categories, you may correct it. Then the final query will be:
Category.with_max_version_number
Scopes are suppose to return an array of values even if there is only 1 record.
If you want to ALWAYS return 1 value, use a static method instead.
def self.max_abstract_category
<your_scope>.max_by{ |obj| obj.version_number }
end
If I understand your question: you have a database table with a version_number column, which rails represents using an Active Record model--that I'll call Category because I don't know what you've called it--and you want to find the single Category record with the largest version_number?
Category.all.order(version_numbers: :DESC).limit(1).first
This query asks for all Category records ordered by version_number from highest to lowest and limits the request to one record (the first record, a.k.a the highest). Because the result of this request is an array containing one record, we call .first on the request to simply return the record.
As far as I'm aware, a scope is simply a named query (I don't actually use scopes). I think you can save this query as a scope by adding the following to your Category model. This rails guide explains more about Scopes.
scope :highest_version, -> { all.order(version_numbers: :DESC).limit(1).first }
I join implementation with baby_squeel but for some reason it was very slow on mysql. So I ended up with something like:
scope :only_latest, -> do
where(%{
NOT EXISTS (SELECT * FROM Category c
WHERE categories.version_number < version_number
AND categories.abstract_category_id = abstract_category_id')
}
end
I filed a BabySqueel bug as I spent a long time trying to do in a code proper way to no avail.

Group by an array's element in PostgreSQL

In Postgres 8.3, I need to group by certain elements within an array field. When I group by the array itself, all possible combinations are shown. I only want to group by the element.
something like this works for finding the count of a single element:
SELECT count(*)
FROM table
WHERE foo=any(bar)
this will return the correct count for a single element in an array. How do I return multiple counts for all elements in an array? If I group by the array, it will use all array elements in the identical order they are stored (not what I need).
edit for clarity:
bar is an array with values like {foo, some, thing} or {foo, thing} or {some, thing, else}
I want to know how many records have an element value of foo, some, thing, and else in the array "bar".
Something like
GROUP BY bar[4]
Hm.. you can't group by a value.. but of course the value is the column that contains it...
So.. you might want this:
SELECT AVG(blah), foo
FROM whatever
WHERE foo=any(bar)
GROUP BY foo
should work..